Compare commits

...

3531 Commits

Author SHA1 Message Date
Brian Behlendorf bc06d8164b Linux: Enable Direct IO by default
Aligns the 2.3 release branch with the well tested default behavior
in the master branch.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-01-13 13:53:41 -08:00
Brian Behlendorf 76745cf5b8 Tag 2.3.0-1
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-01-13 10:03:37 -08:00
Brian Behlendorf 0c88ae6187 Tag 2.3.0-rc5
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-01-05 17:31:26 -08:00
n0-1 307fd0da1f Support for cross-compiling kernel modules
In order to correctly cross-compile, one has to pass ARCH and
CROSS_COMPILE make flags to kernel module build calls. Facilitate this
in the same way as for custom CC flag by recognizing KERNEL_-prefixed
configure environment variables of same name.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Closes #16924
2025-01-05 17:31:26 -08:00
Robert Evans 9f1c5e0b10 Remove duplicate dedup_legacy_create in common.run
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16926
2025-01-05 17:31:26 -08:00
Richard Kojedzinszky 5ba50c8135 fix: make zfs_strerror really thread-safe and portable
#15793 wanted to make zfs_strerror threadsafe, unfortunately, it
turned out that strerror_l() usage was wrong, and also, some libc 
implementations dont have strerror_l().

zfs_strerror() now simply calls original strerror() and copies the 
result to a thread-local buffer, then returns that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Kojedzinszky <richard@kojedz.in>
Closes #15793
Closes #16640
Closes #16923
2025-01-04 11:58:15 -08:00
Don Brady 25565403aa Too many vdev probe errors should suspend pool
Similar to what we saw in #16569, we need to consider that a
replacing vdev should not be considered as fully contributing
to the redundancy of a raidz vdev even though current IO has
enough redundancy.

When a failed vdev_probe() is faulting a disk, it now checks
if that disk is required, and if so it suspends the pool until
the admin can return the missing disks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16864
2025-01-04 11:58:15 -08:00
Robert Evans 47b7dc976b Add Makefile dependencies for scripts/zfs-tests.sh -c
This updates the Makefile to be more correct for parallel make.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16030
Closes #16922
2025-01-04 11:58:15 -08:00
Toomas Soome 125731436d ZTS: checkpoint_discard_busy should use save_tunable/restore_tunable
Instead of using hardwired value for SPA_DISCARD_MEMORY_LIMIT,
use save_tunable and restore_tunable to restore the pre-test state.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16919
2025-01-03 15:23:49 -08:00
Rob Norris 4425a7bb85 vdev_open: clear async remove flag after reopen
It's possible for a vdev to be flagged for async remove after the pool
has suspended. If the removed device has been returned when the pool is
resumed, the ASYNC_REMOVE task will still run at the end of txg, and
remove the device from the pool again.

To fix, we clear the async remove flag at reopen, just as we did for the
async fault flag in 5de3ac223.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16921
2025-01-03 15:23:49 -08:00
Toomas Soome e47b033eae ZTS: remove unused TESTDIRS from pam/cleanup.ksh
Remove TESTDIRS as it is not set for pam tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16920
2025-01-03 15:23:49 -08:00
pstef cfec8f13a2 zfs_vnops_os.c: fallocate is valid but not supported on FreeBSD
This works around
/usr/lib/go-1.18/pkg/tool/linux_amd64/link:
mapping output file failed: invalid argument

It's happened to me under a Linux jail, but it's also happened to other
people, see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270247#c4

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: pstef <pstef@users.noreply.github.com>
Closes #16918
2025-01-03 15:23:49 -08:00
Toomas Soome 997db7a7fc ZTS: checkpoint_discard_busy does not set 16M on cleanup
Originally hex value is used as decimal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16917
2025-01-02 17:04:10 -08:00
Toomas Soome e411081aa0 ZTS: functional/mount scripts are not removing /var/tmp/testdir.X dirs
cleanup.ksh is assuming we have TESTDIRS set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16915
2025-01-02 17:04:10 -08:00
Toomas Soome a55b6fe94a ZTS: zfs_mount_all_fail leaves /var/tmp/testrootPIDNUM directory around
Before we can remove test files, we need to unmount datasets
used by test first.

See also: zfs_mount_all_mountpoints.ksh

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #16914
2025-01-02 17:04:10 -08:00
James Reilly 939e9f0b6a ZTS: add centos stream10 (#16904)
Added centos as optional runners via workflow_dispatch

removed centos-stream9 from the FULL_OS runner list as CentOS is not
officially support by ZFS. This commit will add preliminary support for
EL10 and allow testing ZFS ahead of EL10 codebase solidifying in ~6
months

Signed-off-by: James Reilly <jreilly1821@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2025-01-02 17:04:10 -08:00
Andrew Walker 679b164cd3 Add missing zfs_exit() when snapdir is disabled (#16912)
zfs_vget doesn't zfs_exit when erroring out due to snapdir
being disabled.

Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Reviewed-by: @bmeagherix
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-01-02 17:04:10 -08:00
shodanshok c2d9494f99 set zfs_arc_shrinker_limit to 0 by default
zfs_arc_shrinker_limit was introduced to avoid ARC collapse due to
aggressive kernel reclaim. While useful, the current default (10000) is
too prone to OOM especially when MGLRU-enabled kernels with default
min_ttl_ms are used. Even when no OOM happens, it often causes too much
swap usage.

This patch sets zfs_arc_shrinker_limit=0 to not ignore kernel reclaim
requests. ARC now plays better with both kernel shrinker and pagecache
but, should ARC collapse happen again, MGLRU behavior can be tuned or
even disabled.

Anyway, zfs should not cause OOM when ARC can be released.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #16909
2024-12-29 11:53:45 -08:00
Ameer Hamza b952e061df zvol: implement platform-independent part of block cloning
In Linux, block devices currently lack support for `copy_file_range`
API because the kernel does not provide the necessary functionality.
However, there is an ongoing upstream effort to address this
limitation: https://patchwork.kernel.org/project/dm-devel/cover/20240520102033.9361-1-nj.shetty@samsung.com/.
We have adopted this upstream kernel patch into the TrueNAS kernel and
made some additional modifications to enable block cloning specifically
for the zvol block device. This patch implements the platform-
independent portions of these changes for inclusion in OpenZFS.
This patch does not introduce any new functionality directly into
OpenZFS. The `TX_CLONE_RANGE` replay capability is only relevant when
zvols are migrated to non-TrueNAS systems that support Clone Range
replay in the ZIL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16901
2024-12-29 11:53:45 -08:00
Alexander Motin 0fea7fc109 ZTS: Reduce file size in redacted_panic to 1GB
This test takes 3 minutes on RELEASE FreeBSD bots, but on CURRENT,
probably due to debugging it has in kernel, it does not complete
within 10 minutes, ending up killed.  As I see all the redacting
here happens within the first ~128MB of the file, so I hope it
won't matter if there is 1GB of data instead of 2GB.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:53:45 -08:00
Alexander Motin 0f6d955a35 ZTS: Remove procfs use from zpool_import_status
procfs might be not mounted on FreeBSD.  Plus checking for specific
PID might be not exactly reliable.  Check for empty list of jobs
instead.

Premature loop exit can result in failed test and failed cleanup,
failing also some following tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:53:45 -08:00
Alexander Motin c3d2412b05 ZTS: Remove non-standard awk hex numbers usage
FreeBSD recently removed non-standard hex numbers support from awk.
Neither it supports -n argument, enabling it in gawk.  Instead of
depending on those rewrite list_file_blocks() fuction to handle the
hex math in shell instead of awk.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #11141
2024-12-29 11:53:45 -08:00
Rob Norris 74064cb175 zpool_get_vdev_prop_value: show missing vdev userprops
If a vdev userprop is not found, present it as value '-', default
source, so it matches the output from pool userprops.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16887
2024-12-29 11:53:45 -08:00
Alexander Motin 30b97ce218 ZTS: Increase write sizes for RAIDZ/dRAID tests
Many RAIDZ/dRAID tests filled files doing millions of 100 or even
10 byte writes.  It makes very little sense since we are not
micro-benchmarking syscalls or VFS layer here, while before the
blocks reach the vdev layer absolute majority of the small writes
will be aggregated.  In some cases I see we spend almost as much
time creating the test files as actually running the tests.  And
sometimes the tests even time out after that.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16905
2024-12-29 11:53:45 -08:00
Rob Norris 9519e7ebcc microzap: set hard upper limit of 1M
The count of chunks in a microzap block is stored as an uint16_t
(mze_chunkid). Each chunk is 64 bytes, and the first is used to store a
header, so there are 32767 usable chunks, which is just under 2M. 1M is
the largest power-2-rounded block size under 2M, so we must set the
limit there.

If it goes higher, the loop in mzap_addent can overflow and fall into
the PANIC case.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16888
2024-12-29 11:53:45 -08:00
Alexander Motin f9b02fe7e3 Fix readonly check for vdev user properties
VDEV_PROP_USERPROP is equal do VDEV_PROP_INVAL and so is not a real
property.  That's why vdev_prop_readonly() does not work right for
it.  In particular it may declare all vdev user properties readonly
on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16890
2024-12-29 11:53:45 -08:00
Umer Saleem cb8da70329 Skip iterating over snapshots for share properties
Setting sharenfs and sharesmb properties on a dataset can become costly
if there are large number of snapshots, since setting the share
properties iterates over all snapshots present for a dataset. If it is
the root dataset for which we are trying to set the share property,
snapshots for all child datasets and their children will also be
iterated.

There is no need to iterate over snapshots for share properties
because we do not allow share properties or any other property,
to be set on a snapshot itself execpt for user properties.

This commit skips iterating over snapshots for share properties,
instead iterate over all child dataset and their children for share
properties.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16877
2024-12-29 11:53:45 -08:00
Rob Norris c944c46a98 zfs_main: fix alignment on props usage output
I guess we've got some long property names since this was first set up!

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16883
2024-12-29 11:53:45 -08:00
Tino Reichardt 166a7bc602 CI: Fix FreeBSD 13.4 STABLE build
In #16869 we added FreeBSD 13.4 STABLE, but forget the special
thing, that the virtio nic within FreeBSD 13.x is buggy.

This fix adds the needed rtl8139 nic to the VM.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16885
2024-12-29 11:53:45 -08:00
Rob Norris e90124a7c8 zprop: fix value help for ZPOOL_PROP_CAPACITY
It's a percentage and documented as such, but we were showing it as
<size>.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16881
2024-12-29 11:53:45 -08:00
Brian Behlendorf 18b3bea861 CI: Add FreeBSD 14.2 RELEASE+STABLE builds
Update the CI to include FreeBSD 14.2 as a regularly tested platform.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16869
2024-12-29 11:53:45 -08:00
Brian Atkinson d67eb17e27 Use pin_user_pages API for Direct I/O requests
As of kernel v5.8, pin_user_pages* interfaced were introduced. These
interfaces use the FOLL_PIN flag. This is preferred interface now for
Direct I/O requests in the kernel. The reasoning for using this new
interface for Direct I/O requests is explained in the kernel
documenetation:
Documentation/core-api/pin_user_pages.rst

If pin_user_pages_unlocked is available, the all Direct I/O requests
will use this new API to stay uptodate with the kernel API requirements.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16856
2024-12-16 10:26:52 -08:00
Brian Atkinson 1862c1c0a8 Removing old code outside of 4.18 kernsls
There were checks still in place to verify we could completely use
iov_iter's on the Linux side. All interfaces are available as of kernel
4.18, so there is no reason to check whether we should use that
interface at this point. This PR completely removes the UIO_USERSPACE
type. It also removes the check for the direct_IO interface checks.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16856
2024-12-16 10:26:49 -08:00
Shengqi Chen b57f53036d simd_stat: fix undefined CONFIG_KERNEL_MODE_NEON error on armel
CONFIG_KERNEL_MODE_NEON depends on CONFIG_NEON. Neither is defined
on armel. Add a guard to avoid compilation errors.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16871
2024-12-16 10:26:45 -08:00
Brian Behlendorf 4b8bf3c48a Fix stray "no" in configure output
This is purely a cosmetic fix which removes a stray "no" from
the configure output.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16867
2024-12-16 10:26:42 -08:00
Alexander Motin 696943533c Fix use-afer-free regression in RAIDZ expansion
We should not dereference rra after the last zio_nowait() is called.
It seems very unlikely, but ASAN in ztest managed to catch it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16868
2024-12-16 10:26:39 -08:00
kotauskas 2284a61129 Remount datasets on soft-reboot
The one-shot zfs-mount.service is incorrectly deemed active by 
Systemd after a systemctl soft-reboot. As such, soft-rebooting
prevents zfs mount -a from being ran automatically.

This commit makes it so that zfs-mount.service is marked as being 
undone by the time umount.target is reached, so that zfs.target then 
pulls it in again and gets it restarted after a soft reboot.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: kotauskas <v.toncharov@gmail.com>
Closes #16845
2024-12-16 10:26:35 -08:00
Rob Norris e1833a72f9 flush: only detect lack of flush support in one place
It seems there's no good reason for vdev_disk & vdev_geom to explicitly
detect no support for flush and set vdev_nowritecache.  Instead, just
signal it by setting the error to ENOTSUP, and let zio_vdev_io_assess()
take care of it in one place.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16855
2024-12-16 10:26:30 -08:00
Rob Norris 5bb034f533 flush: don't report flush error when disabling flush support
The first time a device returns ENOTSUP in repsonse to a flush request,
we set vdev_nowritecache so we don't issue flushes in the future and
instead just pretend the succeeded. However, we still return an error
for the initial flush, even though we just decided such errors are
meaningless!

So, when setting vdev_nowritecache in response to a flush error, also
reset the error code to assume success.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16855
2024-12-16 10:26:27 -08:00
Poscat 1e08e49a28 build: use correct bashcompletiondir on arch
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: poscat <poscat@poscat.moe>
Closes #16861
2024-12-16 10:26:23 -08:00
Rob Norris 6604fe9a06 backtrace: fix off-by-one on string output
sizeof("foo") includes the trailing null byte, so all the output had
nulls through it. Most terminals quietly ignore it, but it makes some
tools misdetect file types and other annoyances.

Easy fix: subtract 1.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16862
2024-12-16 10:26:20 -08:00
Brian Behlendorf 7cbe7bbbd4 Tag 2.3.0-rc4
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-12-12 16:20:30 -08:00
Chunwei Chen 2dcc8fe035 Fix DR_OVERRIDDEN use-after-free race in dbuf_sync_leaf
In dbuf_sync_leaf, we clone the arc_buf in dr if we share it with db
except for overridden case. However, this exception causes a race where
dbuf_new_size could free the arc_buf after the last dereference of
*datap and causes use-after-free. We fix this by cloning the buf
regardless if it's overridden.

The race:
--
P0                                     P1

                                       dbuf_hold_impl()
                                         // dbuf_hold_copy passed
                                         // because db_data_pending NULL

dbuf_sync_leaf()
  // doesn't clone *datap
  // *datap derefed to db_buf
  dbuf_write(*datap)

                                       dbuf_new_size()
                                         dmu_buf_will_dirty()
                                           dbuf_fix_old_data()
                                             // alloc new buf for P0 dr
                                             // but can't change *datap

                                         arc_alloc_buf()
                                         arc_buf_destroy()
                                           // alloc new buf for db_buf
                                           // and destroy old buf

  dbuf_write() // continue
    abd_get_from_buf(data->b_data,
    arc_buf_size(data))
      // use-after-free
--

Here's an example when it happens:

BUG: kernel NULL pointer dereference, address: 000000000000002e
RIP: 0010:arc_buf_size+0x1c/0x30 [zfs]
Call Trace:
 dbuf_write+0x3ff/0x580 [zfs]
 dbuf_sync_leaf+0x13c/0x530 [zfs]
 dbuf_sync_list+0xbf/0x120 [zfs]
 dnode_sync+0x3ea/0x7a0 [zfs]
 sync_dnodes_task+0x71/0xa0 [zfs]
 taskq_thread+0x2b8/0x4e0 [spl]
 kthread+0x112/0x130
 ret_from_fork+0x1f/0x30

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Closes #16854
2024-12-12 16:20:30 -08:00
Alexander Motin 38875918d8 BRT: Check bv_mos_entries in brt_entry_lookup()
When vdev first sees some block cloning, there is a window when
brt_maybe_exists() might already return true since something was
cloned, but bv_mos_entries is still 0 since BRT ZAP was not yet
created.  In such case we should not try to look into the ZAP
and dereference NULL bv_mos_entries_dnode.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16851
2024-12-12 16:20:30 -08:00
Rob Norris 0d51852ec7 Remove unnecessary CSTYLED escapes on top-level macro invocations
cstyle can handle these cases now, so we don't need to disable it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Rob Norris 73a73cba71 cstyle: ignore old non-POSIX types in macro invocations
In code generation macros, we often use names like `uint` when
constructing handler functions. These are not being used as types, so
exclude them from the admonishment to use POSIX type names.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Rob Norris 87947f2440 cstyle: understand macro params can be empty
It's not uncommon to have empty parameters in code generator macros,
usually when multiple parameters are concatenated or stringified into a
single token or literal. So, exclude the space-before-comma check, which
will allow construction like `MACRO_CALL(foo, , baz)`.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Rob Norris 0e87150b6c cstyle: understand basic top-level macro invocations
We quite often invoke macros outside of functions, usually to generate
functions or data. cstyle inteprets these as function headers, which at
least have opinions for indenting.

This introduces a separate state for top-level macro invocations, and
excludes it from matching functions. For the moment, most of the
existing rules will continue to apply, but this gives us a way to add or
removes rules targeting macros specifically.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
Alexander Motin 7742e29387 Optimize RAIDZ expansion
- Instead of copying one ashift-sized block per ZIO, copy as much
as we have contiguous data up to 16MB per old vdev.  To avoid data
moves use gang ABDs, so that read ZIOs can directly fill buffers
for write ZIOs.  ABDs have much smaller overhead than ZIOs in both
memory usage and processing time, plus big I/Os do not depend on
I/O aggregation and scheduling to reach decent performance on HDDs.
 - Reduce raidz_expand_max_copy_bytes to 16MB on 32bit platforms.
 - Use 32bit range tree when possible (practically always now) to
slightly reduce memory usage.
 - Use ZIO_PRIORITY_REMOVAL for early stages of expansion, same as
for main ones.
 - Fix rate overflows in `zpool status` reporting.

With these changes expanding RAIDZ1 from 4 to 5 children I am able
to reach 6-12GB/s rate on SSDs and ~500MB/s on HDDs, both are
limited by devices instead of CPU.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15680
Closes #16819
2024-12-06 09:05:02 -08:00
Alexander Motin f54052a122 Fix false assertion in dmu_tx_dirty_buf() on cloning
Same as writes block cloning can increase block size and number of
indirection levels.  That means it can dirty block 0 at level 0 or
at new top indirection level without explicitly holding them.

A block cloning test case for large offsets has been added.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16825
2024-12-05 11:49:06 -08:00
Rob Norris d874f27776 zdb_il: use flex array member to access ZIL records
In 6f50f8e16 we added flex arrays to lr_XX_t structs to silence kernel
bounds check warnings. Userspace code was mostly not updated to use them
though.

It seems that in the right circumstances, compilers can get confused
about sizes in the same way, and throw warnings. This commits switch
those uses over to use the flex array fields also.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16832
2024-12-05 09:33:21 -08:00
Alexander Motin 84d7d53e91 Improve speculative prefetcher for block cloning
- Issue prescient prefetches for demand indirect blocks after the
first one.  It should be quite rare for reads/writes, but much more
useful for cloning due to much bigger (up to 1022 blocks) accesses.
It covers the gap during the first couple accesses when we can not
speculate yet, but we know what is needed right now.  It reduces
dbuf_hold() sync read delays in dmu_buf_hold_array_by_dnode().
 - Increase maximum prefetch distance for indirect blocks from 64
to 128MB.  It should cover the maximum 1022 blocks of block cloning
access size in case of default 128KB recordsize used.  In case of
bigger recordsize the above prescient prefetch should also help.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16814
2024-12-05 09:33:21 -08:00
Alexander Motin d90042dedb Allow dsl_deadlist_open() return errors
In some cases like dsl_dataset_hold_obj() it is possible to handle
those errors, so failure to hold dataset should be better than
kernel panic.  Some other places where these errors are still not
handled but asserted should be less dangerous just as unreachable.

We have a user report about pool corruption leading to assertions
on these errors.  Hopefully this will make behavior a bit nicer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16836
2024-12-05 09:33:21 -08:00
Mark Johnston 0e46085ee6 FreeBSD: Remove an incorrect assertion in zfs_getpages()
The pages in the array may become valid after this initial unbusying,
so the assertion only holds during the first iteration of the outer
loop.

Later in zfs_getpages(), the dmu_read_pages() loop handles already-valid
pages.  Just drop the assertion, it's not terribly useful.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Sponsored-by: Klara, Inc.
Closes #16810
Closes #16834
2024-12-05 09:33:21 -08:00
Mariusz Zaborski 3b0c1131ef Add ability to scrub from last scrubbed txg
Some users might want to scrub only new data because they would like
to know if the new write wasn't corrupted.  This PR adds possibility
scrub only newly written data.

This introduces new `last_scrubbed_txg` property, indicating the
transaction group (TXG) up to which the most recent scrub operation
has checked and repaired the dataset, so users can run scrub only
from the last saved point. We use a scn_max_txg and scn_min_txg
which are already built into scrub, to accomplish that.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Sponsored-By: Klara Inc.
Closes #16301
2024-12-05 09:33:21 -08:00
shodanshok 5988de77b0 Fix race in libzfs_run_process_impl
When replacing a disk, a child process is forked to run a script called
zfs_prepare_disk (which can be useful for disk firmware update or health
check). The parent than calls waitpid and checks the child error/status
code.

However, the _reap_children thread (created from zed_exec_process to
manage zedlets) also waits for all children with the same PGID and can
stole the signal, causing the replace operation to be aborted.

As waitpid returns -1, the parent incorrectly assume that the child
process had an error or was killed. This, in turn, leaves the newly
added disk in REMOVED or UNAVAIL status rather than completing the
replace process.

This patch changes the PGID of the child process execuing the
prepare script, shielding it from the _reap_children thread.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #16801
2024-12-05 09:33:21 -08:00
Alexander Motin 00debc1361 FreeBSD: Remove some illumos compat from vnode.h
Should make no difference, just some dead code cleanup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Martin Matuska <mm@FreeBSD.org>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16808
2024-12-05 09:33:21 -08:00
Alexander Motin af10714e42 FreeBSD: Return ifndef IN_BASE back to fix the build
FreeBSD's libprocstat seems to build kernel code in user space,
which does not work here due to undefined vnode_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Martin Matuska <mm@FreeBSD.org>
Signed-off-by:Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16808
2024-12-05 09:33:21 -08:00
Rob Norris 747781a345 zinject(8): rename "ioctl" to "flush"
Doc bug missed in d7605ae77.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16827
2024-12-02 18:14:26 -08:00
Alexander Motin b17ea73f9d Fix regression in dmu_buf_will_fill()
Direct I/O implementation added condition to call dbuf_undirty()
only in case of block cloning.  But the condition is not right if
the block is no longer dirty in this TXG, but still in DB_NOFILL
state.  It resulted in block not reverting to DB_UNCACHED and
following NULL de-reference on attempt to access absent db_data.

While there, add assertions for db_data to make debugging easier.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16829
2024-12-02 18:14:26 -08:00
Alexander Motin 17cdb7a2b1 Add missing parenthesis in VERIFYF()
Without them the order of operations might get unexpected.

Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16826
2024-12-02 18:14:26 -08:00
Pavel Snajdr b673bcba4d Linux: fix zfs_uio_dio_check_for_zero_page
The intent here is to replace the zero page pointer in the array of
pointers to pages in the struct.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #16812 
Closes #16689
Closes #16642
2024-12-02 18:14:26 -08:00
Ivan Volosyuk d8886275df Linux: Fix detection of register_sysctl_sz
Adjust the m4 function to mimic sentinel we use in spl-proc.c
This fixes the detection on kernels compiled with CONFIG_RANDSTRUCT=y

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ivan Volosyuk <Ivan.Volosyuk@gmail.com>
Closes: #16620
Closes: #16805
2024-12-02 18:14:26 -08:00
Rob Norris 1aee375947 zdb: show dedup table and log attributes
There's interesting info in there that is going to help with
understanding dedup behavior at any given moment.

Since this is a format change, tests that rely on that output have been
modified to match.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16755
2024-12-02 18:14:26 -08:00
Pavel Snajdr a1907b038a Assert if we're logging after final txg was set
This allowed to debug #16714, fixed in #16782.  Without assertions
added here it is difficult to figure out what logs cause the problem,
since the assertion happens in sync thread context.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Closes #16795
2024-12-02 18:14:26 -08:00
Alexander Motin d359f7f547 FreeBSD: Reduce copy_file_range() source lock to shared
Linux locks copy_file_range() source as shared.  FreeBSD was doing
it also, but then was changed to exclusive, partially because KPI
of that time was doing so, and partially seems out of caution.
Considering zfs_clone_range() uses range locks on both source and
destination, neither should require exclusive vnode locks. But one
step at a time, just sync it with Linux for now.

Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16789
Closes #16797
2024-12-02 18:14:26 -08:00
Alexander Motin 90603601b4 FreeBSD: Lock vnode in zfs_ioctl()
Previously vnode was not locked there, unlike Linux.  It required
locking it in vn_flush_cached_data(), which recursed on the lock
if called from zfs_clone_range(), having the vnode locked.

Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16789
Closes #16796
2024-12-02 18:14:26 -08:00
Pavel Snajdr ecd0b1528e Linux: Fix zfs_prune panics
by protecting against sb->s_shrink eviction on umount with newer kernels

deactivate_locked_super calls shrinker_free and only then
sops->kill_sb cb, resulting in UAF on umount when trying
to reach for the shrinker functions in zpl_prune_sb of
in-umount dataset

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #16770
2024-12-02 18:14:26 -08:00
Brian Behlendorf 3ed1d608a8 Linux 6.12 compat: META
Update the META file to reflect compatibility with the 6.12 kernel.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16793
2024-11-21 08:24:37 -08:00
Alexander Motin c165daa0b1 BRT: Clear bv_entcount_dirty on destroy
This fixes assertion in brt_sync_table() on debug builds when last
cloned block on the vdev is freed and bv_meta_dirty is cleared,
while bv_entcount_dirty is not.  Should not matter in production.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16791
2024-11-21 08:24:37 -08:00
Alexander Motin 1a5414ba2f BRT: More optimizations after per-vdev splitting
- With both pending and current AVL-trees being per-vdev and having
effectively identical comparison functions (pending tree compared
also birth time, but I don't believe it is possible for them to be
different for the same offset within one transaction group), it
makes no sense to move entries from one to another.  Instead inline
dramatically simplified brt_entry_addref() into brt_pending_apply().
It no longer requires bv_lock, since there is nothing concurrent
to it at the time.  And it does not need to search the tree for the
previous entries, since it is the same tree, we already have the
entry and we know it is unique.
 - Put brt_vdev_lookup() and brt_vdev_addref() into different tree
traversals to avoid false positives in the first due to the second
entcount modifications.  It saves dramatic amount of time when a
file cloned first time by not looking for non-existent ZAP entries.
 - Remove avl_is_empty(bv_tree) check from brt_maybe_exists().  I
don't think it is needed, since by the time all added entries are
already accounted in bv_entcount. The extra check must be producing
too many false positives for no reason.  Also we don't need bv_lock
there, since bv_entcount pointer must be table at this point, and
we don't care about false positive races here, while false negative
should be impossible, since all brt_vdev_addref() have already
completed by this point.  This dramatically reduces lock contention
on massive deletes of cloned blocks.  The only remaining one is
between multiple parallel free threads calling brt_entry_decref().
 - Do not update ZAP if net change for a block over the TXG was 0.
In combination with above it makes file move between datasets as
cheap operation as originally intended if it fits into one TXG.
 - Do not allocate vdevs on pool creation or import if it did not
have active block cloning. This allows to save a bit in few cases.
 - While here, add proper error handling in brt_load() on pool
import instead of assertions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16773
2024-11-21 08:24:37 -08:00
Alexander Motin 409aad3f33 BRT: Rework structures and locks to be per-vdev
While block cloning operation from the beginning was made per-vdev,
before this change most of its data were protected by two pool-
wide locks.  It created lots of lock contention in many workload.

This change makes most of block cloning data structures per-vdev,
which allows to lock them separately.  The only pool-wide lock now
it spa_brt_lock, protecting array of per-vdev pointers and in most
cases taken as reader.  Also this splits per-vdev locks into three
different ones: bv_pending_lock protects the AVL-tree of pending
operations in open context, bv_mos_entries_lock protects BRT ZAP
object from while being prefetched, and bv_lock protects the rest
of per-vdev context during TXG commit process.  There should be
no functional difference aside of some optimizations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
Alexander Motin 1917c26944 ZAP: Add by_dnode variants to lookup/prefetch_uint64
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
Alexander Motin 2b64d41be8 BRT: Don't call brt_pending_remove() on holes/embedded
We are doing exactly the same checks around all brt_pending_add().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
Alexander Motin 9753feaa63 ZTS: Avoid embedded blocks in bclone/bclone_prop_sync
If we write less than 113 bytes with enabled compression we get
embeded block, which then fails check for number of cloned blocks
in bclone_test.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16740
2024-11-21 08:24:37 -08:00
tleydxdy 3c0b8da206 fix: block incompatible kernel from being installed
The current "Requires" lines only ensure the old kernel is
available on the system but it does not prevent fedora from
updating to an incompatible and breaking user's system.

Set Conflicts to block incompatible kernels from being installed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: tleydxdy <shironeko.github@tesaguri.club>
Closes #16139
2024-11-21 08:24:37 -08:00
Mark Johnston d7abeef621 zio: Avoid sleeping in the I/O path
zio_delay_interrupt(), apparently used for fault injection, is executed
in the I/O pipeline.  It can cause the calling thread to go to sleep,
which is not allowed on FreeBSD.  This happens only for small delays,
though, and there's no apparent reason to avoid deferring to a taskqueue
in that case, as it already does otherwise.

Simply go to sleep unconditionally.  This fixes an occasional panic I
see when running the ZTS on FreeBSD.  Also remove an unhelpful comment
referencing the non-existent timeout_generic().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16785
2024-11-21 08:24:37 -08:00
Brian Behlendorf 7fb7eb9a63 ZTS: Fix zpool_status_008_pos false positive
Increase the injected delay to 1000ms and the ZIO_SLOW_IO_MS threshold
to 750ms to avoid false positives due to unrelated slow IOs which may
occur in the CI environment.  Additionally, clear the fault injection as
soon as it is no longer required for the test case.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16769
2024-11-21 08:24:37 -08:00
Alexander Motin 3f9af023f6 L2ARC: Stop rebuild before setting spa_final_txg
Without doing that there is a race window on export when history
log write by completed rebuild dirties transaction beyond final,
triggering assertion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16714
Closes #16782
2024-11-21 08:24:37 -08:00
Alexander Motin f7675ae30f Remove hash_elements_max accounting from DBUF and ARC
Those values require global atomics to get current hash_elements
values in few of the hottest code paths, while in all the years I
never cared about it.  If somebody wants, it should be easy to
get it by periodic sampling, since neither ARC header nor DBUF
counts change so fast that it would be difficult to catch.

For now I've left hash_elements_max kstat for ARC, since it was
used/reported by arc_summary and it would break older versions,
but now it just reports the current value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16759
2024-11-21 08:24:37 -08:00
Rob Norris 920603990a Move "no name changes" from compression to checksum table
Compression names actually aren't used in dedup table names, but
checksum names are.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16776
2024-11-21 08:24:37 -08:00
Steve Mokris 9a4b2f08d3 Expand zpool-remove.8 manpage with example results
Also fix comment cross-referencing to zpool.8.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Steve Mokris <smokris@softpixel.com>
Closes #16777
2024-11-21 08:24:37 -08:00
Alexander Motin 8023d9d4b5 Fix few __VA_ARGS typos in assertions
It should be __VA_ARGS__, not __VA_ARGS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16780
2024-11-21 08:24:37 -08:00
Ameer Hamza 23f063d2e6 zed: prevent automatic replacement of offline vdevs
When an OFFLINE device is physically removed, a spare is automatically
activated. However, this behavior differs in FreeBSD, where we do not
transition from OFFLINE state to REMOVED.
Our support team has encountered cases where customers experienced
unexpected behavior during drive replacements, with multiple spares
activating for the same VDEV due to a single disk replacement. This
patch ensures that a drive in an OFFLINE state remains in that state,
preventing it from transitioning to REMOVED and being automatically
replaced by a spare.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16751
2024-11-21 08:24:37 -08:00
Rob Norris 18474efeec AUTHORS: refresh with recent new contributors
Welcome to the party 🎉

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16762
2024-11-15 15:08:59 -08:00
José Luis Salvador Rufo 260065099e tests: fix uClibc for getversion.c
This patch fixes compilation with uClibc by applying the same fallback
as commit e12d76176d to the `getversion.c`
file, which was previously overlooked.
 
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes #16735
Closes #16741
2024-11-14 16:56:34 -08:00
Ameer Hamza 4c9f2cec46 zvol_os.c: Increase optimal IO size
Since zvol read and write can process up to (DMU_MAX_ACCESS / 2) bytes
in a single operation, the current optimal I/O size is too low. SCST
directly reports this value as the optimal transfer length for the
target SCSI device. Increasing it from the previous volblocksize results
in performance improvement for large block parallel I/O workloads.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16750
2024-11-14 16:52:10 -08:00
Mark Johnston ee3677d321 Fix some nits in zfs_getpages()
- If we don't want dmu_read_pages() to perform extra readahead/behind,
  pass a pointer to 0 instead of a null pointer, as dum_read_pages()
  expects rahead and rbehind to be non-null.
- Avoid unneeded iterations in a loop.

Sponsored-by: Klara, Inc.
Reported-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16758
2024-11-14 16:52:06 -08:00
Rob Norris 0274a9a57d dsl_dataset: put IO-inducing frees on the pool deadlist
dsl_free() calls zio_free() to free the block. For most blocks, this
simply calls metaslab_free() without doing any IO or putting anything on
the IO pipeline.

Some blocks however require additional IO to free. This at least
includes gang, dedup and cloned blocks. For those, zio_free() will issue
a ZIO_TYPE_FREE IO and return.

If a huge number of blocks are being freed all at once, it's possible
for dsl_dataset_block_kill() to be called millions of time on a single
transaction (eg a 2T object of 128K blocks is 16M blocks). If those are
all IO-inducing frees, that then becomes 16M FREE IOs placed on the
pipeline. At time of writing, a zio_t is 1280 bytes, so for just one 2T
object that requires a 20G allocation of resident memory from the
zio_cache. If that can't be satisfied by the kernel, an out-of-memory
condition is raised.

This would be better handled by improving the cases that the
dmu_tx_assign() throttle will handle, or by reducing the overheads
required by the IO pipeline, or with a better central facility for
freeing blocks.

For now, we simply check for the cases that would cause zio_free() to
create a FREE IO, and instead put the block on the pool's freelist. This
is the same place that blocks from destroyed datasets go, and the async
destroy machinery will automatically see them and trickle them out as
normal.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #6783
Closes #16708
Closes #16722 
Closes #16697
2024-11-14 16:52:02 -08:00
Alexander Motin 025f8b2e74 L2ARC: Move different stats updates earlier
..., before we make the header or the log block visible to others.
It should fix assertion on allocated space going negative if the
header is freed once the lock is dropped, while the write is still
going.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16040
Closes #16743
2024-11-14 16:51:58 -08:00
Mark Johnston 37e8f3ae17 Grab the rangelock unconditionally in zfs_getpages()
As a deadlock avoidance measure, zfs_getpages() would only try to
acquire a rangelock, falling back to a single-page read if this was not
possible.  However, this is incompatible with direct I/O.

Instead, release the busy lock before trying to acquire the rangelock in
blocking mode.  This means that it's possible for the page to be
replaced, so we have to re-lookup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16643
2024-11-14 16:51:20 -08:00
Mark Johnston 7313c6e382 Fix a potential page leak in mappedread_sf()
mappedread_sf() may allocate pages; if it fails to populate a page
can't free it, it needs to ensure that it's placed into a page queue,
otherwise it can't be reclaimed until the vnode is destroyed.

I think this is quite unlikely to happen in practice, it was noticed by
code inspection.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #16643
2024-11-14 16:51:17 -08:00
Umer Saleem 1c6b0302ef Fix user properties output for zpool list
In zpool_get_user_prop, when called from zpool_expand_proplist and
collect_pool, we often have zpool_props present in zpool_handle_t equal
to NULL. This mostly happens when only one user property is requested
using zpool list -o <user_property>. Checking for this case and
correctly initializing the zpool_props field in zpool_handle_t fixes
this issue.

Interestingly, this issue does not occur if we query any other property
like name or guid along with a user property with -o flag because while
accessing properties like guid, zpool_prop_get_int is called which
checks for this case specifically and calls zpool_get_all_props.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16734
2024-11-14 16:51:12 -08:00
Umer Saleem 7e3af4658b JSON: fix user properties output for zpool list
This commit fixes JSON output for zpool list when user properties are
requested with -o flag. This case needed to be handled specifically
since zpool_prop_to_name does not return property name for user
properties, instead it is stored in pl->pl_user_prop.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16734
2024-11-14 16:51:07 -08:00
Brian Behlendorf 1a54b13aaf Tag 2.3.0-rc3
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-11-07 11:33:51 -08:00
Umer Saleem 9061a4da0b JSON: fix user properties output for zfs list
This commit fixes JSON output for zfs list when user properties are
requested with -o flag. This case needed to be handled specifically
since zfs_prop_to_name does not return property name for user
properties, instead it is stored in pl->pl_user_prop.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16732
2024-11-07 11:33:51 -08:00
Sam James e12d76176d Use <fcntl.h> instead of <sys/fcntl.h>
When building on musl, we get:

```
In file included from tests/zfs-tests/cmd/getversion.c:22:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
 #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>

In file included from module/os/linux/zfs/vdev_file.c:36:
/usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect
 #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
```

Bug: https://bugs.gentoo.org/925235
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sam James <sam@gentoo.org>
Closes #15925
2024-11-07 11:33:51 -08:00
Brian Atkinson 8131793d6f Update ABD stats for linear page Linux
a10e552 updated abd_free_linear_page() to no longer call
abd_update_scatter_stat(). This meant that linear pages that were not
attached to Direct I/O requests were not doing waste accounting for the
ARC. This led to performance issues due to incorrect ARC accounting that
resulted in 100% of CPU time being spent in arc_evict() during prolonged
I/O workloads with the ARC.

The call to abd_update_scatter_stats() is now conditionally called in
abd_free_linear_page() when the ABD is not from a Direct I/O request.

Reviewed-by: Mark Maybee <mmaybee@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16729
2024-11-07 11:33:51 -08:00
Chunwei Chen c82eb27b22 ZFS send should use spill block prefetched from send_reader_thread
Currently, even though send_reader_thread prefetches spill block,
do_dump() will not use it and issues its own blocking arc_read. This
causes significant performance degradation when sending datasets with
lots of spill blocks.

For unmodified spill blocks, we also create send_range struct for them
in send_reader_thread and issue prefetches for them. We piggyback them
on the dnode send_range instead of enqueueing them so we don't break
send_range_after check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: david.chen <david.chen@nutanix.com>
Closes #16701
2024-11-06 11:54:32 -08:00
tstabrawa 661bb434e6 Use simple folio migration function
Avoids using fallback_migrate_folio, which starts unnecessary writeback
(leading to BUG in migrate_folio_extra).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes #16568
Closes #16723
2024-11-06 11:54:32 -08:00
tstabrawa ae48c2f6a9 Revert "Avoid BUG in migrate_folio_extra"
This reverts commit b052035990.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes #16568
Closes #16723
2024-11-06 11:54:32 -08:00
Uglymotha b96845b632 Verify parent_dev before calling udev_device_get_sysattr_value
Not all udev devices have parent devices.
Calling udev_device_get_ functions yield an assertion error
if called with a NULL pointer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Sietse <sietse@wizdom.nu>
Co-authored-by: Sietse <sietse@wizdom.nu>
Closes #16705 
Closes #16717
2024-11-04 16:46:39 -08:00
Alexander Motin 55cbd1f9bd Reduce dirty records memory usage
Small block workloads may use a very large number of dirty records.
During simple block cloning test due to BRT still using 4KB blocks
I can easily see up to 2.5M of those used.  Before this change
dbuf_dirty_record_t structures representing them were allocated via
kmem_zalloc(), that rounded their size up to 512 bytes.

Introduction of specialized kmem cache allows to reduce the size
from 512 to 408 bytes.  Additionally, since override and raw params
in dirty records are mutually exclusive, puting them into a union
allows to reduce structure size down to 368 bytes, increasing the
saving to 28%, that can be a 0.5GB or more of RAM.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16694
2024-11-04 16:46:39 -08:00
Rob Norris 880b73956b zfs(4): remove "experimental" from zfs_bclone_enabled
I think we've done enough experiments.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16189 
Closes #16712
2024-11-04 16:46:39 -08:00
Tony Hutter d367ef2995 ZTS: Add Fedora 41, remove Fedora 39
Fedora 41 was released 10/29/24, and Fedora 39 will be EOL on 11/12/24.
Update Fedora runners in the test suite.  Some minor tweaks also needed
to support ksh 1.0.10.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16700
2024-11-01 10:03:05 -07:00
Rob Norris 7546fbd6e9 zdb: add extra -T flag to show histograms of BRT refcounts
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16692
2024-11-01 09:49:54 -07:00
Rich Ercolani 903d3f9187 Added output to zpool online and offline
I was surprised to discover today that `zpool online` and
`zpool offline` don't print any information about why they failed in
many cases, they just return 1 with no information about why.

Let's improve that where we can without changing the library function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #16244
2024-11-01 09:49:49 -07:00
Rob Norris 86b5853cfb vdev_disk: move abd return and free off the interrupt handler
Freeing an ABD can take sleeping locks to update various stats. We
aren't allowed to sleep on an interrupt handler. So, move the free off
to the io_done callback.

We should never have been freeing things in the interrupt handler, but
we got away with it because we were usually freeing a linear ABD, which
at most is returning two objects to a cache and never sleeping. Scatter
ABDs can be used now, and those have more complex locking.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16687
2024-11-01 09:49:23 -07:00
Rob Norris 19a8dd48e1 vdev_disk: try harder to ensure IO alignment rules
It seems out our notion of "properly" aligned IO was incomplete. In
particular, dm-crypt does its own splitting, and assumes that a logical
block will never cross an order-0 page boundary (ie, the physical page
size, not compound size). This effectively means that it needs to be
possible to split a BIO at any page or block size boundary and have it
work correctly.

This updates the alignment check function to enforce these rules (to the
extent possible).

Our response to misaligned data is to make some new allocation that is
properly aligned, and copy the data into it. It turns out that
linearising (via abd_borrow_buf()) is not enough, because we allocate eg
4K blocks from a general purpose slab, and so may receive (or already
have) a 4K block that crosses pages.

So instead, we allocate a new ABD, which is guaranteed to be aligned
properly to block sizes, and then copy everything into it, and back out
on the way back.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16687 #16631 #15646 #15533 #14533
2024-11-01 09:49:14 -07:00
Serapheim Dimitropoulos 8ac70aade7 Add warning for external consumers of dmu_tx_callback_register
While reading some code @grwilson came across the above function that
seemingly had no consumers besides a ztest callback that ensures that
the tx_callback infrastructure works correctly. It turns out that Lustre
is the main (and potentially the only) consumer of this. Refer to
`osd_trans_commit_cb` of `lustre/osd-zfs/osd_handler.c` in the Lustre
repo for more info. Let's add a comment highlighting this before someone
removes it by mistake.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Serapheim Dimitropoulos <serapheimd@gmail.com>
Closes #16698
2024-11-01 09:49:05 -07:00
Alexander Motin bbc0d34bfd On the first vdev open ignore impossible ashift hints
If on the first open device's logical ashift is bigger than set
by pool's ashift property, ignore the last as unusable instead of
creating vdev that will fail most of I/Os due to misalignment.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #16690
2024-11-01 09:49:00 -07:00
Dimitry Andric f3823a9ab2 Fix gcc uninitialized warning in FreeBSD zio_crypt.c
In FreeBSD's `zio_do_crypt_data()`, ensure that two `struct uio`
variables are cleared before copying data out of them. This avoids
accessing garbage data, and fixes gcc `-Wuninitialized` warnings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Toomas Soome <tsoome@me.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #16688
2024-11-01 09:48:55 -07:00
Dimitry Andric fd2cae969f Fix gcc unused value warning in FreeBSD simd.h
The macros `simd_stat_init()` and `simd_stat_fini()` in FreeBSD's
`simd.h` are defined as zero, but they are actually only used as
statements. Replace the definitions with `do {} while (0)` instead, to
avoid gcc `-Wunused-value` warnings.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Toomas Soome <tsoome@me.com>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #16693
2024-11-01 09:48:51 -07:00
Tony Hutter f7b4bca66a ZTS: Add LUKS sanity test
Add a LUKS sanity test to trigger: #16631

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16681
2024-11-01 09:48:47 -07:00
Alexander Motin 7e3ce4efaa Pack dmu_buf_impl_t by 16 bytes
On 64bit FreeBSD this reduces one from 296 to 280 bytes.  On small
block workloads dbufs may consume gigabytes of ARC, and this saves
5% of it.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16684
2024-11-01 09:48:42 -07:00
Tino Reichardt 77d81974b6 Fix dependency install on Debian 11 (#16683)
Adding cryptsetup breaks some dialog things on Debian 11.
Apply some workaround for it.

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-11-01 09:48:37 -07:00
Brian Behlendorf 5237760b17 ZTS: Add additional exceptions
The following tests have been observed to occasionally fail when
running under the CI.  Updated our exceptions list to track them.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16670
2024-10-21 13:02:07 -07:00
Rob Norris ede715d1e4 spl/thread: explicitly define thread_func_t as noreturn
All of our thread entry functions have this signature:

    void (*)(void*) __attribute__((noreturn))

The low-level `__thread_create()` function accepts a `thread_func_t` as
the entry point, which is defined more simply as:

    void (*)(void *)

And then the `thread_create()` and `thread_create_named()` macros cast
the passed-in function point down to `thread_func_t`, that is, casting
away the `noreturn` attribute.

Clang considers casting between these two types to be invalid because
both the caller and the callee may have elided parts of the stack frame
save and restore, knowing that they won't be needed.

Recent Linux appears to be setting `-Wcast-function-type-strict`, which
causes this invalid cast to emit a warning, which with `-Werror` is
converted to an error, breaking the build.

This commit fixes this in the simplest possible way: adding `noreturn`
to the `thread_func_t` attribute. Since all our thread entry functions
already have this attribute, it's arguably a just a consistency fix
anyway.

I considered removing the casts in the macros, which silences the
warnings, but it turns out that Clang has a bug that won't emit this
error for implicit conversions, only explicit casts. So leaving them
there seems like a reasonable belt-and-suspenders approach. Also,
frankly, this whole mechanism seems a little undercooked inside LLVM, so
I'm content go with my intuition about the smallest, least invaisve
change.

**NOTE**: `__thread_create` is exported by `spl.ko` and has a
`thread_func_t` arg, so this is an ABI break. Whether that matters in
practice, I have no idea.

Further reading:
- https://github.com/llvm/llvm-project/commit/1aad641c793090b4d036c03e737df2ebe2c32c57
- https://github.com/llvm/llvm-project/issues/7325
- https://github.com/llvm/llvm-project/issues/41465

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16672 
Closes #16673
2024-10-21 13:02:07 -07:00
Rob Norris e30c69365d config: fix dequeue_signal check for kernels <4.20
Before 4.20, kernel_siginfo_t was just called siginfo_t. This was
causing the kthread_dequeue_signal_3arg_task check, which uses
kernel_siginfo_t, to fail on older kernels.

In d6b8c17f1, we started checking for the "new" three-arg
dequeue_signal() by testing for the "old" version. Because that test is
explicitly using kernel_siginfo_t, it would fail, leading to the build
trying to use the new three-arg version, which would then not compile.

This commit fixes that by avoiding checking for the old 3-arg
dequeue_signal entirely. Instead, we check for the new one, as well as
the 4-arg form, and we use the old form as a fallback. This way, we
never have to test for it explicitly, and once we're building
HAVE_SIGINFO will make sure we get the right kernel_siginfo_t for it, so
everything works out nice.

Original-patch-by: Finix <yancw@info2soft.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16666
2024-10-21 13:02:07 -07:00
Rob Norris 78d39d91fa zdb: show bp in uberblock dump
Just another useful nugget of info in times of strife.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16667
2024-10-21 13:02:07 -07:00
Tomohiro Kusumi 2b359c7824 Fix compile-time warnings caused by duplicate struct typedefs
Some compiler/versions warn these typedefs according to #16660.

The platform specific header sys/abd_os.h shouldn't define or use abd_t,
as it's defined in its non-platform specific consumer sys/abd.h.
Do the same as what FreeBSD header does.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #16660 
Closes #16665
2024-10-21 13:02:07 -07:00
Alexander Motin ace2e17a9b zfs_debug: Restore log size limit for userspace
For some reason it was dropped when split from kernel, that makes
raidz_test to accumulate in RAM up to 100GB of logs we don't need.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by:  Rob Norris <robn@despairlabs.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16492
Closes #16566
Closes #16664
2024-10-21 13:02:07 -07:00
Rob Norris b4cd10ce5b libspl/backtrace: comment and harden libunwind backtracer
This is the sort of code that we get right once and never look at again.
Anyone reading this code is already likely in the middle of a debugging
nightmare, and then they have a wall of manual string construction and
an unfamiliar and idiosyncratic library to deal with. So, comment the
whole thing to try to make it clear what's going on.

In pursuit of the above, I've added return checks to some of the
libunwind calls, fixed the frame loop to not skip the "top" frame
(however unseful it may be), and fix a couple of calls to
spl_bt_u64_to_hex_str() which requested 18 digits instead of 16.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-21 13:02:07 -07:00
Rob Norris f52d7aaaac libspl/backtrace: rename and document hex conversion function
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-21 13:02:07 -07:00
Rob Norris d5db840260 libspl/backtrace: helper macros for output
My eyes are going blurry looking at all those write calls. This is much
nicer.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Close #16653
2024-10-21 13:02:07 -07:00
Rob Norris bcd61d9579 libspl/backtrace: dump registers in libunwind backtraces
More useful stuff, especially when trying to follow a disassembly.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16653
2024-10-21 13:02:07 -07:00
Umer Saleem 36a67b50a2 Fix inconsistent mount options for ZFS root
While mounting ZFS root during boot on Linux distributions from initrd,
mount from busybox is effectively used which executes mount system call
directly. This skips the ZFS helper mount.zfs, which checks and enables
the mount options as specified in dataset properties. As a result,
datasets mounted during boot from initrd do not have correct mount
options as specified in ZFS dataset properties.

There has been an attempt to use mount.zfs in zfs initrd script,
responsible for mounting the ZFS root filesystem (PR#13305). This was
later reverted (PR#14908) after discovering that using mount.zfs breaks
mounting of snapshots on root (/) and other child datasets of root have
the same issue (Issue#9461).

This happens because switching from busybox mount to mount.zfs correctly
parses the mount options but also adds 'mntpoint=/root' to the mount
options, which is then prepended to the snapshot mountpoint in
'.zfs/snapshot'. '/root' is the directory on Debian with initramfs-tools
where root filesystem is mounted before pivot_root. When Linux runtime
is reached, trying to access the snapshots on root results in
automounting the snapshot on '/root/.zfs/*', which fails.

This commit attempts to fix the automounting of snapshots on root, while
using mount.zfs in initrd script. Since the mountpoint of dataset is
stored in vfs_mntpoint field, we can check if current mountpoint of
dataset and vfs_mntpoint are same or not. If they are not same, reset
the vfs_mntpoint field with current mountpoint. This fixes the
mountpoints of root dataset and children in respective vfs_mntpoint
fields when we try to access the snapshots of root dataset or its
children. With correct mountpoint for root dataset and children stored
in vfs_mntpoint, all snapshots of root dataset are mounted correctly
and become accessible.

This fix will come into play only if current process, that is trying to
access the snapshots is not in chroot context. The Linux kernel API
that is used to convert struct path into char format (d_path), returns
the complete path for given struct path. It works in chroot environment
as well and returns the correct path from original filesystem root.

However d_path fails to return the complete path if any directory from
original root filesystem is mounted using --bind flag or --rbind flag
in chroot environment. In this case, if we try to access the snapshot
from outside the chroot environment, d_path returns the path correctly,
i.e. it returns the correct path to the directory that is mounted with
--bind flag. However inside the chroot environment, it only returns the
path inside chroot.

For now, there is not a better way in my understanding that gives the
complete path in char format and handles the case where directories from
root filesystem are mounted with --bind or --rbind on another path which
user will later chroot into. So this fix gets enabled if current
process trying to access the snapshot is not in chroot context.

With the snapshots issue fixed for root filesystem, using mount.zfs in
ZFS initrd script, mounts the datasets with correct mount options.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16646
2024-10-21 13:02:07 -07:00
Warner Losh 3d9129a7b6 freebsd: Use compiler.h from FreeBSD's base's linuxkpi
The FreeBSD linux/compiler.h in OpenZFS was copied from a very old
version of FreeBSD's linuxkpi's linux/compiler.h. There's no need for
this duplication. Use FreeBSD's linuxkpi version instead, and provide
zfs_fallthrough to augment it (it's all that's needed). Use #pragma once
to avoid naming issues for guard variables. Since this is a complete
rewrite, use my copyright here (the original code in FreeBSD still
credits everybody). This works back at least to FreeBSD 12.4, which
is not out of support, and all newer releases.

Remove extra copies of macros that were defined elsewhere, but are now
properly defined in LinuxKPI so are redundant.

Sponsored-by: Netflix
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@bsdimp.com>
Closes #16650
2024-10-21 13:02:07 -07:00
Brian Behlendorf 0409c47fe0 Tag 2.3.0-rc2
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-10-13 19:25:58 -07:00
Tino Reichardt b5a3825244 ZTS: Make use of optimal CPU pinning
With CPU pinning, we should get some speedup because of better
cpu cache re-use.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16641
2024-10-13 19:25:58 -07:00
Tino Reichardt 77df762a1b ZTS: Optimize Kernel Same-page Merging (KSM)
Kernel same-page Merging (KSM) allows KVM guests to share identical
memory pages. These shared pages are usually common libraries or other
identical, high-use data.

The current configuration was a bit to lazy - so KSM didn't work very
well. With the new configuration I could run 3 Linux VMs in parralel.

FreeBSD can't benefit from it. But FreeBSD is not so memory hungry in
general, so there is no need for it ;)

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16641
2024-10-13 19:25:58 -07:00
Brian Behlendorf 56871e465a Fallback to strerror() when strerror_l() isn't available
Some C libraries, such as uClibc, do not provide strerror_l() in
which case we fallback to strerror().

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16636 
Closes #16640
2024-10-13 19:25:58 -07:00
Brian Behlendorf c645b07eaa ZTS: Increase zpool_import_parallel_pos import margin
Increase the pool import time allowed by assuming a minimum reduction
to 1/2 instead of 1/3 when comparing sequential to parallel import
times.  This is sufficient to verify parallel imports are working as
intended and should address the occasional false positive failure
when the time is slightly exceeded.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16638
2024-10-11 16:21:25 -07:00
Brian Behlendorf 5bc27acf51 ZTS: Slightly increase dedup_quota limit
As described in the comment above this check the space used by
logged entries is not accounted for and some margin needs to be
added in.  While uncommon we have slightly exceeded the 600,000
threshold on some CI run so we increase the limit a bit more.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16637
2024-10-11 16:21:25 -07:00
Brian Behlendorf 7f830d783b CI: Stick with ubuntu-22.04 for CodeQL analysis
The ubuntu-latest alias now refers to ubuntu-24.04 instead of
ubuntu-22.04 which causes CodeQL's autobuild to fail with:

    cpp/autobuilder: deptrace not supported in ubuntu 24.04

Until deptrace is supported by ubuntu-24.04 hosted runners request
ubuntu-22.04 which is supported.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Closes #16639
2024-10-11 16:21:25 -07:00
Martin Matuška 58162960a1 zdb: fix printf format in dump_zap()
When compiling zdb.c on 32-bit platforms, a format conversion error 
is reported for a printf() in dump_zap().  Change %l to macro 
%" PRIu64 " to match the platform size of a 64-bit unsigned integer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16635
2024-10-11 16:21:25 -07:00
Rob Norris 666903610d zpool/zfs: allow --json wherever -j is allowed
Mostly so that with the JSON formatting options are also used, they all
look the same. To my eye, `-j --json-flat-vdevs` suggests that they are
different or unrelated, while `--json --json-flat-vdevs` invites no
further questions.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16632
2024-10-11 16:21:25 -07:00
Brian Atkinson 26ecd8b993 Always validate checksums for Direct I/O reads
This fixes an oversight in the Direct I/O PR. There is nothing that
stops a process from manipulating the contents of a buffer for a
Direct I/O read while the I/O is in flight. This can lead checksum
verify failures. However, the disk contents are still correct, and this
would lead to false reporting of checksum validation failures.

To remedy this, all Direct I/O reads that have a checksum verification
failure are treated as suspicious. In the event a checksum validation
failure occurs for a Direct I/O read, then the I/O request will be
reissued though the ARC. This allows for actual validation to happen and
removes any possibility of the buffer being manipulated after the I/O
has been issued.

Just as with Direct I/O write checksum validation failures, Direct I/O
read checksum validation failures are reported though zpool status -d in
the DIO column. Also the zevent has been updated to have both:
1. dio_verify_wr -> Checksum verification failure for writes
2. dio_verify_rd -> Checksum verification failure for reads.
This allows for determining what I/O operation was the culprit for the
checksum verification failure. All DIO errors are reported only on the
top-level VDEV.

Even though FreeBSD can write protect pages (stable pages) it still has
the same issue as Linux with Direct I/O reads.

This commit updates the following:
1. Propogates checksum failures for reads all the way up to the
   top-level VDEV.
2. Reports errors through zpool status -d as DIO.
3. Has two zevents for checksum verify errors with Direct I/O. One for
   read and one for write.
4. Updates FreeBSD ABD code to also check for ABD_FLAG_FROM_PAGES and
   handle ABD buffer contents validation the same as Linux.
5. Updated manipulate_user_buffer.c to also manipulate a buffer while a
   Direct I/O read is taking place.
6. Adds a new ZTS test case dio_read_verify that stress tests the new
   code.
7. Updated man pages.
8. Added an IMPLY statement to zio_checksum_verify() to make sure that
   Direct I/O reads are not issued as speculative.
9. Removed self healing through mirror, raidz, and dRAID VDEVs for
   Direct I/O reads.

This issue was first observed when installing a Windows 11 VM on a ZFS
dataset with the dataset property direct set to always. The zpool
devices would report checksum failures, but running a subsequent zpool
scrub would not repair any data and report no errors.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16598
2024-10-09 13:45:06 -07:00
Martin Matuška 774dcba86d FreeBSD: ignore some includes when not building kernel
The function abd_alloc_from_pages() is used only in kernel.
Excluding sys/vm.h, and vm/vm_page.h includes avoids dependency
problems.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16616
2024-10-09 13:45:02 -07:00
Brian Behlendorf 09f6b2ebe3 ztest: Fix scrub check in ztest_raidz_expand_check()
The scrub code may return EBUSY under several possible scenarios
causing ztest to incorrectly ASSERT when verifying the result of
a raidz expansion.  Update the test case to allow EBUSY since it
does not indicate pool damage.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16627
2024-10-09 13:44:58 -07:00
Matthew Heller 2609d93b65 vdev_id: multi-lun disks & slot num zero pad
Add ability to generate disk names that contain both a slot number
and a lun number in order to support multi-actuator SAS hard drives
with multiple luns. Also add the ability to zero pad slot numbers to
a desired digit length for easier sorting.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Heller <matthew.f.heller@accre.vanderbilt.edu>
Closes #16603
2024-10-09 13:44:55 -07:00
Brian Behlendorf 10f46d2aba ZTS: resilver_restart_001.ksh restore defaults
Update resilver_restart_001.ksh to restore the default
resilver_defer_percent when the test completes.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16618
2024-10-09 13:44:50 -07:00
Umer Saleem 0df10dc911 Only serialize native-deb* targets
.NOTPARALLEL target is being forced on userspace as well. This commit
removes .NOTPARALEL target and only serializes the execution of
native-deb* targets.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16622
2024-10-09 13:44:46 -07:00
Rob Norris 0fbe9d352c zpool/zfs: restore -V & --version options
The -j option added a round of getopt, which didn't know the magic
version flags. So just bypass the whole thing and go straight to the
human output function for the special case.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16615 
Closes #16617
2024-10-09 13:44:42 -07:00
Martin Matuška 84f44ec07f Return boolean_t in inline functions of lib/libspl/include/sys/uio.h
The inline functions zfs_dio_offset_aligned(), zfs_dio_size_aligned()
and zfs_dio_aligned() are declared as boolean_t but return the bool
type.

This fixes the build of FreeBSD.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16613
2024-10-09 13:44:36 -07:00
Shengqi Chen fc9608e2e6 Bump SONAME of libzfs and libzpool
The ABI of libzfs and libzpool have breaking changes since last
SONAME bump in commit fe6babc:

* libzfs: `zpool_print_unsup_feat` removed (used by zpool cmd).
* libzpool: multiple `ddt_*` symbols removed (used by zdb cmd).

Bump them to avoid ABI breakage.

See: https://github.com/openzfs/zfs/pull/11817
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16609
2024-10-09 13:44:32 -07:00
Shengqi Chen d32c05949a contrib/debian: add new manpages to installation list
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16609
2024-10-09 13:44:26 -07:00
JKDingwall 1ebb6b866f Fix generation of kernel uevents for snapshot rename on linux
`zvol_rename_minors()` needs to be given the full path not just the
snapshot name.  Use code removed in a0bd735ad as a guide
to providing the necessary values.

Add ZTS check for /dev changes after snapshot rename.  After
renaming a snapshot with 'snapdev=visible' ensure that the /dev
entries are updated to reflect the rename.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: James Dingwall <james@dingwall.me.uk>
Closes #14223 
Closes #16600
2024-10-09 13:44:22 -07:00
Tino Reichardt f019b445f3 ZTS: Fix summary page creation again - second try
In PR #16599 I used 'return' like in C - which is wrong :/
This fix generates the summary as needed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16611
2024-10-09 13:44:18 -07:00
Tino Reichardt 03822a61be ZTS: Remove FreeBSD 13.4-STABLE
Current CI is failing on FreeBSD 13.4-STABLE, because samba4 can't be
installed there. Lets remove it for now.

Update also the FreeBSD version definitions a bit.

The naming is like this now:

FreeBSD variants:
- freebsd13-3r, freebsd13-4r, freebsd14-0r, freebsd14-1r (RELEASE)
- freebsd13-4s, freebsd14-1s (STABLE)
- freebsd15-0c (CURRENT)

RHL based distros:
- almalinux8, almalinux9, centos-stream9, fedora39, fedora40

Debian based:
- debian11, debian12, ubuntu20, ubuntu22, ubuntu24

Misc Linux distros:
- archlinux, tumbleweed

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16610
2024-10-09 13:43:46 -07:00
Brian Behlendorf 3a9fca901b Tag 2.3.0-rc1
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-10-04 11:18:19 -07:00
Umer Saleem 45addf7605 Update path for zed in zfs-zed.service for native debian packages
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes#15638
2024-10-04 11:18:15 -07:00
Umer Saleem cc9e36a42e Disable parallel build for native-deb* targets
Running native-deb* targets in parallel via make is not supported.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes#14736
2024-10-04 11:18:12 -07:00
Umer Saleem c204c3f340 Fix missing packaging files from release tarballs
Properly distribute files for native Debian packages. This fixes the
issue with broken release tarballs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes#15404
Closes#15586
2024-10-04 11:18:08 -07:00
Alexander Motin 42ce4b11e7 ZAP: Align za_name in zap_attribute_t to 8 bytes
Our code reading/writing there may not handle misaligned accesses
on a platforms that may care about it.  I don't see a point to
complicate it to satisfy UBSan in CI. This alignment costs nothing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15921
Closes #16606
2024-10-04 11:06:26 -07:00
Alexander Motin 4ebe674d91 ARC: Cache arc_c value during arc_evict()
Since arc_evict() run can take some time, arc_c change during it
may result in undesired shift in ARC states balance. Primarily in
case of arc_c reduction it may cause eviction from MFU data state
despite its being below the target already.  Instead we should
evict as originally planned and if needed do another round after.

Reviewed-by: Theera K. <tkittich@hotmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16576
Closes #16605
2024-10-04 10:56:43 -07:00
Pavel Snajdr 0d77e738e6 Defer resilver only when progress is above a threshold
Restart a resilver from scratch, if the current one in progress is
below a new tunable, zfs_resilver_defer_percent (defaulting to 10%).

The original rationale for deferring additional resilvers, when there is
already one in progress, was to help achieving data redundancy sooner
for the data that gets scanned at the end of the resilver.

But in case the admin wants to attach multiple disks to a single vdev,
it wasn't immediately obvious the admin is supposed to run
`zpool resilver` afterwards to reset the deferred resilvers and start
a new one from scratch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #15810
2024-10-04 10:41:17 -07:00
Tino Reichardt 3d0175d10e ZTS: Fix summary page creation
There are cases, where some needed files for the summary page aren't
created. Currently the whole Summary Page creation will fail then.
Sample run: https://github.com/openzfs/zfs/actions/runs/11148248072/job/30999748588

Fix this, by properly checking for existence of the needed files.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16599
2024-10-03 11:42:25 -07:00
Brian Behlendorf 17a2b35be5 Update compatibility.d files
Add an openzfs-2.3 compatibility file for the next release.

While there are no compatibility difference between Linux and
FreeBSD for 2.3 symlinks for the -linux and -freebsd names are
created for any scripts expecting that convention.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16588
2024-10-02 20:59:35 -07:00
Rob Norris 224393a321 feature: large_microzap
In a4b21eadec we added the zap_micro_max_size tuneable to raise the size
at which "micro" (single-block) ZAPs are upgraded to "fat" (multi-block)
ZAPs. Before this, a microZAP was limited to 128KiB, which was the old
largest block size. The side effect of raising the max size past 128KiB
is that it be stored in a large block, requiring the large_blocks
feature.

Unfortunately, this means that a backup stream created without the
--large-block (-L) flag to zfs send would split the microZAP block into
smaller blocks and send those, as is normal behaviour for large blocks.
This would be received correctly, but since microZAPs are limited to the
first block in the object by definition, the entries in the later blocks
would be inaccessible. For directory ZAPs, this gives the appearance of
files being lost.

This commit adds a feature flag, large_microzap, that must be enabled
for microZAPs to grow beyond 128KiB, and which will be activated the
first time that occurs. This feature is later checked when generating
the stream and if active, the send operation will abort unless
--large-block has also been requested.

Changing the limit still requires zap_micro_max_size to be changed. The
state of this flag effectively sets the upper value for this tuneable,
that is, if the feature is disabled, the tuneable will be clamped to
128KiB.

A stream flag is also added to ensure that the receiver also activates
its own feature flag upon receiving the stream. This is not strictly
necessary to _use_ the received microZAP, since it doesn't care how
large its block is, but it is required to send the microZAP object on,
otherwise the original problem occurs again.

Because it's difficult to reliably distinguish a microZAP from a fatZAP
from outside the ZAP code, and because it seems unlikely that most
users are affected (a fairly niche tuneable combined with what should be
an uncommon use of send), and for the sake of expediency, this change
activates the feature the first time a microZAP grows to use a large
block, and is never deactivated after that. This can be improved in the
future.

This commit changes nothing for existing pools that already have large
microZAPs. The feature will not be retroactively applied, but will be
activated the next time a microZAP grows past the limit.

Don't use large_blocks feature for enable/disable tests.  The
large_microzap depends on large_blocks, so it gets enabled as a
dependency, breaking the test. Instead use feature "longname", which has
the exact same feature characteristics.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16593
2024-10-02 20:47:11 -07:00
Brian Behlendorf 412105977c Temporarily disable Direct IO by default
While some remaining issues are resolved with the recently merged
Direct IO functionality disable it by default.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16597
2024-10-02 18:24:29 -07:00
Brian Behlendorf d34d4f97a8 snapdir: add 'disabled' value to make .zfs inaccessible
In some environments, just making the .zfs control dir hidden from sight
might not be enough. In particular, the following scenarios might
warrant not allowing access at all:
- old snapshots with wrong permissions/ownership
- old snapshots with exploitable setuid/setgid binaries
- old snapshots with sensitive contents

Introducing a new 'disabled' value that not only hides the control dir,
but prevents access to its contents by returning ENOENT solves all of
the above.

The new property value takes advantage of 'iuv' semantics ("ignore
unknown value") to automatically fall back to the old default value when
a pool is accessed by an older version of ZFS that doesn't yet know
about 'disabled' semantics.

I think that technically the zfs_dirlook change is enough to prevent
access, but preventing lookups and dir entries in an already opened .zfs
handle might also be a good idea to prevent races when modifying the
property at runtime.

Add zfs_snapshot_no_setuid parameter to control whether automatically
mounted snapshots have the setuid mount option set or not.

this could be considered a partial fix for one of the scenarios
mentioned in desired.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Co-authored-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #3963
Closes #16587
2024-10-02 09:12:02 -07:00
rilysh 86737c5927 Avoid computing strlen() inside loops
Compiling with -O0 (no proper optimizations), strlen() call
in loops for comparing the size, isn't being called/initialized
before the actual loop gets started, which causes n-numbers of
strlen() calls (as long as the string is). Keeping the length
before entering in the loop is a good idea.

On some places, even with -O2, both GCC and Clang can't
recognize this pattern, which seem to happen in an array
of char pointer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: rilysh <nightquick@proton.me>
Closes #16584
2024-10-02 09:10:06 -07:00
Brian Behlendorf e8cbb5952d Update all ABI files
Refresh all ABI files using the CI generated files as of
commit 0cf14bf4b5.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16592
2024-10-01 17:10:23 -07:00
Rob Norris 0cf14bf4b5 Linux 6.12: PG_error flag was removed
torvalds/linux@09022bc196 removes the flag, and the corresponding
SetPageError() and ClearPageError() macros, with no replacement offered.

Going back through the upstream history, use of this flag has been
gradually removed over the last year as part of the long tail of
converting everything to folios. Interesting tidbit comments from
torvalds/linux@29e9412b25 and torvalds/linux@420e05d0de suggest that
this flag has not been used meaningfully since page writeback failures
started being recorded in errseq_t instead (the whole "fsyncgate" thing,
~2017, around torvalds/linux@8ed1e46aaf).

Given that, it's possible that since perhaps Linux 4.13 we haven't been
getting anything by setting the flag. I don't know if that's true and/or
if there's something we should be doing instead, but my gut feel is that
its probably fine we only use the page cache as a proxy to allow mmap()
to work, rather than backing IO with it.

As such, I'm expecting that removing this will do no harm, but I'm
leaving it in for older kernels to maintain status quo, and if there is
an overall better way, that is left for a future change.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16582
2024-10-01 13:54:05 -07:00
Rob Norris 1c7f2f6a50 Linux 6.12: f_version removed from struct file
linux/torvalds@11068e0b64 removes it, suggesting this was a always
there as a helper to handle concurrent seeks, which all filesystems now
handle themselves if necessary.

Without looking into the mechanism, I can imagine how it might have been
used, but we have always set it to zero and never read from it,
presumably because we've always tracked per-caller position through the
znode anyway. So I don't see how there can be any functional change for
us by removing it. I've stayed conservative though and left it in for
older kernels, since its clearly not hurting anything there.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16582
2024-10-01 13:54:00 -07:00
Rob Norris df3b9d881b Linux 6.12: FMODE_UNSIGNED_OFFSET is now FOP_UNSIGNED_OFFSET
torvalds/linux@641bb4394f asserts that this is a static flag, not
intended to be variable per-file, so it moves it to
file_operations instead. We just change our check to follow.

No configure check is necessary because FOP_UNSIGNED_OFFSET didn't exist
before this commit, and FMODE_UNSIGNED_OFFSET flag is removed in the
same commit, so there's no chance of a conflict.

It's not clear to me that we need this check at all, as we never set
this flag on our own files, and I can't see any way that our llseek
handler could recieve a file from another filesystem. But, the whole
zpl_llseek() has a number of opportunities for pleasing cleanup that are
nothing to do with this change, so I'll leave that for a future change.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16582
2024-10-01 13:53:55 -07:00
Rob Norris d6b8c17f1d Linux 6.12: support 3arg dequeue_signal() without task param
See torvalds/linux@a2b80ce87a. It claims the task arg is always
`current`, and so it is with us, so this is a safe change to make. The
only spanner is that we also support the older pre-5.17 3-arg
dequeue_signal() which had different meaning, so we have to check the
types to get the right one.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16582
2024-10-01 13:53:50 -07:00
Rob Norris bc96b80550 Linux 6.12: avoid kmem_cache_create redefinition
torvalds/linux@b2e7456b5c makes kmem_cache_create() a macro, which
gets in the way of our our own redefinition, so we undef the macro first
for our own clients. This follows what we did for kmem_cache_alloc(),
see e951dba48.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16582
2024-10-01 13:53:33 -07:00
Sanjeev Bagewadi 20232ecfaa Support for longnames for files/directories (Linux part)
This patch adds the ability for zfs to support file/dir name up to 1023
bytes. This number is chosen so we can support up to 255 4-byte
characters. This new feature is represented by the new feature flag
feature@longname.

A new dataset property "longname" is also introduced to toggle longname
support for each dataset individually. This property can be disabled,
even if it contains longname files. In such case, new file cannot be
created with longname but existing longname files can still be looked
up.

Note that, to my knowledge native Linux filesystems don't support name
longer than 255 bytes. So there might be programs not able to work with
longname.

Note that NFS server may needs to use exportfs_get_name to reconnect
dentries, and the buffer being passed is limit to NAME_MAX+1 (256). So
NFS may not work when longname is enabled.

Note, FreeBSD vfs layer imposes a limit of 255 name lengh, so even
though we add code to support it here, it won't actually work.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #15921
2024-10-01 13:40:27 -07:00
Sanjeev Bagewadi 3cf2bfa570 Allocate zap_attribute_t from kmem instead of stack
This patch is preparatory work for long name feature. It changes all
users of zap_attribute_t to allocate it from kmem instead of stack. It
also make zap_attribute_t and zap_name_t structure variable length.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #15921
2024-10-01 13:39:08 -07:00
Don Brady 141368a4b6 Restrict raidz faulted vdev count
Specifically, a child in a replacing vdev won't count when assessing
the dtl during a vdev_fault()

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16569
2024-10-01 09:12:11 -07:00
Rob Norris 7e957fde73 send/recv: open up additional stream feature flags
The docs for drr_versioninfo have marked the top 32 bits as "reserved"
since its introduction (illumos/illumos-gate@9e69d7d). There's no
indication of why they're reserved, so it seems uncontroversial to make
a lot more flags available.

I'm keeping the top eight reserved, and explicitly calling them out as
such, so we can extend the header further in the future if we run out of
flags or want to do some kind of change that isn't about feature flags.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15454
2024-10-01 09:08:53 -07:00
Brian Behlendorf d6cb544669 Linux 6.11 compat: META
Update the META file to reflect compatibility with the 6.11 kernel.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16586
2024-09-30 19:59:33 -07:00
Rob Norris c84a37ae93 lua: add flex array field to TString type
Linux 6.10+ with CONFIG_FORTIFY_SOURCE notices memcpy() accessing past
the end of TString, because it has no indication that there there may be
an additional allocation there.

There's no appropriate upstream change for this (ancient) version of
Lua, so this is the narrowest change I could come up with to add a flex
array field to the end of TString to satisfy the check. It's loosely
based on changes from lua/lua@ca41b43f and lua/lua@9514abc2.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16541
Closes #16583
2024-09-30 10:30:03 -07:00
George Melikov 5591505299 man: update recordsize max size info
Reflect https://github.com/openzfs/zfs/commit/f2330bd1568489ae1fb16d975a5a9bcfe12ed219
change in our man pages and add some context.

Wording is primarily copy-pasted from code comments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #16581
2024-09-29 13:19:52 -07:00
Tino Reichardt b1958b531b ZTS: Replace MD5 and SHA256 wit XXH128
For data integrity checks as done in ZTS, the verification for
unintended data corruption with xxhash128 should be a lot faster
and perfectly usable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16577
2024-09-28 09:24:05 -07:00
Brian Behlendorf 834b90fb81 ZTS: Fix zpool_import_hostid_changed_unclean_export
Update the test case to freeze the pool then export it to better
simulate a hard failure.  This is preferable to copying the vdev
while the pool's imported since with a copy we're not guaranteed
the on-disk state will be consistent.  That can in turn result
in a pool import failure and a spurious test failure.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16578
2024-09-28 09:18:21 -07:00
Rob Norris 6f50f8e16b zfs_log: add flex array fields to log record structs
ZIL log record structs (lr_XX_t) are frequently allocated with extra
space after the struct to carry variable-sized "payload" items.

Linux 6.10+ compiled with CONFIG_FORTIFY_SOURCE has been doing runtime
bounds checking on memcpy() calls. Because these types had no indicator
that they might use more space than their simple definition,
__fortify_memcpy_chk will frequently complain about overruns eg:

    memcpy: detected field-spanning write (size 7) of single field
        "lr + 1" at zfs_log.c:425 (size 0)
    memcpy: detected field-spanning write (size 9) of single field
        "(char *)(lr + 1)" at zfs_log.c:593 (size 0)
    memcpy: detected field-spanning write (size 4) of single field
        "(char *)(lr + 1) + snamesize" at zfs_log.c:594 (size 0)
    memcpy: detected field-spanning write (size 7) of single field
        "lr + 1" at zfs_log.c:425 (size 0)
    memcpy: detected field-spanning write (size 9) of single field
        "(char *)(lr + 1)" at zfs_log.c:593 (size 0)
    memcpy: detected field-spanning write (size 4) of single field
        "(char *)(lr + 1) + snamesize" at zfs_log.c:594 (size 0)
    memcpy: detected field-spanning write (size 7) of single field
        "lr + 1" at zfs_log.c:425 (size 0)
    memcpy: detected field-spanning write (size 9) of single field
        "(char *)(lr + 1)" at zfs_log.c:593 (size 0)
    memcpy: detected field-spanning write (size 4) of single field
        "(char *)(lr + 1) + snamesize" at zfs_log.c:594 (size 0)

To fix this, this commit adds flex array fields to all lr_XX_t structs
that require them, and then uses those fields to access that
end-of-struct area rather than more complicated casts and pointer
addition.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16501
Closes #16539
2024-09-27 09:18:11 -07:00
Brian Behlendorf ac8837d4d6 ZTS: Update deadman_sync threshold
Lower the minimum number of expected deadman events from 4 to 3.  All
that is strictly required is a single event to consider the test a
pass.  However, since I've never seen a count of less than 3 reported
by the CI that should be sufficient.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16575
2024-09-27 09:14:13 -07:00
Brian Behlendorf d7ab2816ee CI: Add logs to zloop workflow
On failure attempt to include the most relevant portions of the
ztest logs in the CI output.  This full logs are still available
for download but often a backtrace and the last output is enough.

Install libunwind to improve the odds of a useful backtrace.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16573
2024-09-27 09:05:49 -07:00
Brian Behlendorf ab1b87e747 ZTS: Fix zpool_import_hostid_changed_cachefile_unclean_export
Update the test case to freeze the pool then export it to better
simulate a hard failure.  This is preferable to copying the vdev
while the pool's imported since with a copy we're not guaranteed
the on-disk state will be consistent.  That can in turn result
in a pool import failure and a spurious test failure.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16570
2024-09-26 09:58:11 -07:00
tstabrawa b052035990 Avoid BUG in migrate_folio_extra
Linux page migration code won't wait for writeback to complete unless
it needs to call release_folio.  Call SetPagePrivate wherever
PageUptodate is set and define .release_folio, to cause
fallback_migrate_folio to wait for us.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes #15140
Closes #16568
2024-09-26 08:57:09 -07:00
Brian Behlendorf b2ca5105b7 ZTS: Fix zpool_reguid Makefile.am and tests
The zpool_reguid tests were not being included the dist tarball
resulting in them not running.  This is reported as a "failed
verification" warning by the CI.  Add the tests to the correct
Makefile.am.

Additionally, remove the usage of 'bc -e <expr>' from the tests.
This option is only supported by the FreeBSD version of bc.

Update the test case to reflect the 0 is not a valid GUID.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16559
2024-09-25 07:50:04 -07:00
Shengqi Chen 05a7a9594e CI: run only sanity check on limited OSes for nonbehavioral changes
The commit uses heuristics to determine whether a PR is behavioral:

It runs "quick" CI (i.e., only use sanity.run on fewer OSes)
if (explicitly requested by user):
- the *last* commit message contains a line 'ZFS-CI-Type: quick',
or if (by heuristics):
- the files changed are not in the list of specified directory, and
- all commit messages does not contain 'ZFS-CI-Type: full'.

It runs "full" CI otherwise.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16564
2024-09-25 07:46:52 -07:00
Alexander Motin 48d1be254f Properly release key in spa_keystore_dsl_key_hold_dd()
Since dsl_crypto_key_open() references the key, 0d23f5e2e4 should
have called dsl_crypto_key_rele() to drop it first instead of
calling dsl_crypto_key_free() directly.  The final result should
actually be the same, but without triggering dck_holds assertion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16567
2024-09-25 07:40:17 -07:00
Alexander Motin 832f66b218 FreeBSD: Sync taskq_cancel_id() returns with Linux
Couple places in the code depend on 0 returned only if the task was
actually cancelled.  Doing otherwise could lead to extra references
being dropped.  The race could be small, but I believe CI hit it
from time to time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16565
2024-09-24 16:29:18 -07:00
w0xel ccc420acd5 Add missing guard defines for simd_stat
This adds the HAVE_KERNEL_NEON and HAVE_KERNEL_FPU_INTERNAL
guards to simd_stat.c defaulted to 0 to make it build again.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Shengqi Chen <harry-chen@outlook.com>
Signed-off-by: Sebastian Wuerl <s.wuerl@mailbox.org>
Closes #16558
2024-09-24 09:07:26 -07:00
Rob Norris ed7d224ee1 AUTHORS: refresh with recent new contributors
"I don't know half of you half as well as I should like; and I like less
than half of you half as well as you deserve."

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16563
2024-09-24 09:03:05 -07:00
Brian Behlendorf 4b5fbf70f2 CI: cancel workflows when PRs are updated (#16562)
For checkstyle, zloop, zfs-qemu, and codeql workflows cancel
in-progress jobs when the PR is updated.

Relevant GitHub Actions documentation:

  The following concurrency group cancels in-progress jobs or run
  on pull_request events only; if github.head_ref is undefined, the
  concurrency group will fallback to the run ID, which is guaranteed
  to be both unique and defined for the run.

https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#example-using-a-fallback-value

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16562
2024-09-24 09:02:01 -07:00
Theera K. d40d40913d Evicting too many bytes from MFU metadata
Without updating 'm' we evict from MFU metadata all that we wanted
to evict from all metadata, including already evicted MRU metadata
('m' is the total amount of metadata we had at the beginning,
and 'w' is the total amount of metadata we want to have). 

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Theera K. <tkittich@hotmail.com>
Closes #16521
Closes #16546
2024-09-23 22:12:56 -07:00
Brian Behlendorf 9c03b22f8f ZTS: CI Documentation Updates
Update the CONTRIBUTING.md documentation to refer to the GitHub Actions
workflows which have replaced the buildbot infrastructure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16561
2024-09-23 22:04:27 -07:00
Brian Behlendorf 56ab541521 ZTS: CodeQL Action v3 update
Switch from v2 to v3 CodeQL Actions.  The v2 actions will no longer
be supported as of Dec '24 so we need to move to v3.  According to
the release notes they should be functionally equivalent.

    Note that the only difference between v2 and v3 of the CodeQL
    Action is the node version they support, ... For example 3.22.11
    was the first v3 release and is functionally identical to 2.22.11.

https://github.com/github/codeql-action/blob/main/CHANGELOG.md

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16560
2024-09-23 17:07:29 -07:00
Rob Norris 78e9e987e1 linux: log a scary warning when used with an experimental kernel
Since the person using the kernel may not be the person who built it,
show a warning at module load too, in case they aren't aware that it
might be weird.

Reviewed-by: Robert Evans <evansr@google.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15986
2024-09-23 10:44:54 -07:00
Rob Norris 410287f7f8 config/kernel: enforce maximum kernel version, with escape hatch
META lists the maximum kernel version we consider to be fully supported.
However, we don't enforce this.

Sometimes we ship experimental patches for a newer kernel than we're
ready to support or, less often, we compile just fine against a newer
kernel. Invariably, something doesn't quite work properly, and it's
difficult for users to understand that they're actually running against
a kernel that we're not yet ready to support.

This commit tries to improve this situation. First, it simply enforces
Linux-Maximum, by having configure bail out if you try to compile
against a newer version that.

Then, it adds the --enable-linux-experimental switch to configure. When
supplied, this disables enforcing the maximum version, allowing the user
to attempt to build against a kernel with version higher than
Linux-Maximum.

Finally, if the switch is supplied _and_ configure is run against a
higher kernel version, it shows a big warning message when configure
finishes, and defines HAVE_LINUX_EXPERIMENTAL for the build. This allows
us to add code to modify runtime behaviour as well.

Reviewed-by: Robert Evans <evansr@google.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15986
2024-09-23 10:44:49 -07:00
George Melikov e419a63bf4 xattr dataset prop: change defaults to sa
It's the main recommendation to set xattr=sa
even in man pages, so let's set it by default.

xattr=sa don't use feature flag, so in the worst
case we'll have non-readable xattrs by other
non-openzfs platforms.

Non-overridden default `xattr` prop of existing pools
will automatically use `sa` after this commit too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #15147
2024-09-23 09:50:48 -07:00
Rich Ercolani 1d84c9eb66 Fix /proc/spl/kstat/simd on x86
Evidently while reworking it on aarch64, I broke it on x86 and
didn't notice.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #16556
2024-09-22 13:11:19 -07:00
Brian Behlendorf d565835c47 ZTS: Add additional exceptions
The following tests have been observed to occasionally fail when
running under the CI.  Updated our exceptions list to track them.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16553
2024-09-22 13:04:17 -07:00
Brian Behlendorf 5520fdde29 ZTS: Retire "tmpfile_reason" exception
All supported Linux kernels, 4.18 and newer, provide O_TMPFILE.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16553
2024-09-22 13:04:06 -07:00
Brian Behlendorf 9122772b2a ZTS: Retire "ci_reason" exceptions
There is no longer be a need for the ci_reason exception with
the update CI GitHub Actions infrastruture.  Retire it.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16553
2024-09-22 13:03:47 -07:00
Tino Reichardt 53b77c32af ZTS: Fix Summary Page
The qemu-9-summary-page.sh script reads the file env.txt in the
first lines. When the module didn't build, this file was not copied
into the tarfile - causing the scipt to abort.

Fix: copy needed files into the tarfile in case of module build
failures. The fix ignores also empty tarfiles in future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16555
2024-09-22 09:21:42 -07:00
Alexander Motin 3014dcb762 Reduce and handle EAGAIN errors on AIO label reads
At least FreeBSD has a limit of 256 simultaneous AIO requests per
process. Attempt to issue more results in EAGAIN errors. Since we
issue 4 requests per disk/partition from 2xCPUs threads, it is
quite easy to reach that limit on large systems, that results in
random pool import failures.  It annoyed me for quite a while on
a system with 64 CPUs and 70+ partitioned disks.

This patch from one side limits the number of threads to avoid the
error, while from another should softly fall back to sync reads in
case of error.  It takes into account _SC_AIO_MAX as a system-wide
AIO limit and _SC_AIO_LISTIO_MAX as a closest value to per-process
limit.  The last not exactly right, but it is the best I found.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16551
2024-09-21 10:36:25 -07:00
Umer Saleem 2aafc2ea1f Add compatibility file for GRUB versions up to v2.06
GRUB is not able to detect ZFS pool if snaphsot of top level boot
pool is created. This issue is observed with GRUB versions up to
v2.06 if extensible_dataset feature is enabled on ZFS boot pool.

compatibility=grub2-2.06 would enable all read-only compatible
zpool features except extensible_dataset and other features that
depend on it.

The existing grub2 compatibility file is now renamed to grub2-2.12 to
reflect the appropriate grub2 version. grub2-2.12 lists all read-only
features that can be enabled on boot pool for grub2 with version 2.12
onwards.

A new symlink grub2 is created that currently points to the grub2-2.12
compatibility file.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13873
Closes #15261
Closes #15909
2024-09-21 10:29:27 -07:00
Umer Saleem aeb79f2b29 ZTS: Fix skipping over comment lines in zpool_create.shlib
In zpool_create.shlib, check_feature_set iterates over all features
mentioned in provided compatibility file to check if only those
features are enabled on the pool.

This commit fixes skipping over comment lines correctly. Otherwise,
the test case fails as comment lines are also treated as feature names
by check_feature_set function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15909
2024-09-21 10:29:20 -07:00
Rob Norris 80645d6582 FreeBSD: restore zfs_znode_update_vfs()
I accidentally removed this in c22d56e3e, and didn't notice because it
doesn't fail the build, but does fail to load into the kernel because it
can't link it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16554
2024-09-21 10:03:54 -07:00
Brian Behlendorf f9d4f1b480 Add SIMD metadata in /proc on Linux follow up
This change accidentally broke the FreeBSD build due to
a conflict between the simd_stat_init()/simd_stat_fini()
macros on FreeBSD and the extern function prototype.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16552
2024-09-20 15:48:12 -07:00
Rich Ercolani 5d01243964 Add SIMD metadata in /proc on Linux
Too many times, people's performance problems have amounted to
"somehow your SIMD support isn't working", and determining that
at runtime is difficult to describe to people.

This adds a /proc/spl/kstat/zfs/simd node, which exposes
metadata about which instructions ZFS thinks it can use,
on AArch64 and x86_64 Linux, to make investigating things
like this much easier.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #16530
2024-09-20 08:16:44 -07:00
Tino Reichardt 45a4a94b64 ZTS: Remove functional tests via matrix
This commit changes the workflow of the github actions.

- Ubuntu 20.04, 22.04, 24.04 will be tested via QEMU now
- remove unused scripts of this commit: b7bc334d1
- re-add the zloop standalone testings via zloop.yml

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16549
2024-09-20 08:01:05 -07:00
Tino Reichardt 7a965b9bc8 ZTS: Fix Test Summary page generation
Fix that error: "cat /tmp/failed.txt: No such file or directory"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16549
2024-09-20 08:00:40 -07:00
George Melikov 01852ffbf8 arc_hdr_authenticate: make explicit error
On compression we could be more explicit here for cases
where we can not recompress the data.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #9416
2024-09-19 17:25:02 -07:00
George Melikov b32d48a625 ZLE compression: don't use BPE_PAYLOAD_SIZE
ZLE compressor needs additional bytes to process
d_len argument efficiently.
Don't use BPE_PAYLOAD_SIZE as d_len with it
before we rework zle compressor somehow.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #9416
2024-09-19 17:24:51 -07:00
George Melikov 522f2629c8 zio_compress: introduce max size threshold
Now default compression is lz4, which can stop
compression process by itself on incompressible data.
If there are additional size checks -
we will only make our compressratio worse.

New usable compression thresholds are:
- less than BPE_PAYLOAD_SIZE (embedded_data feature);
- at least one saved sector.

Old 12.5% threshold is left to minimize affect
on existing user expectations of CPU utilization.

If data wasn't compressed - it will be saved as
ZIO_COMPRESS_OFF, so if we really need to recompress
data without ashift info and check anything -
we can just compress it with zero threshold.
So, we don't need a new feature flag here!

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #9416
2024-09-19 17:23:58 -07:00
Tino Reichardt 4bf6a2ab87 ZTS: use openssl for md5digest and sha256digest
On larger files this should improve the speed.

Sample values of my system:

[mcmilk@xz]$ time dd if=/dev/zero bs=128k count=1k | sha256sum
254bcc3fc4f27172636df4bf32de9f107f620d559b20d760197e452b97453917  -
real    0m1,050s
user    0m0,985s
sys     0m0,153s

[mcmilk@xz]$ time dd if=/dev/zero bs=128k count=1k | openssl sha256 -r
254bcc3fc4f27172636df4bf32de9f107f620d559b20d760197e452b97453917 *stdin
real    0m0,254s
user    0m0,206s
sys     0m0,160s

I think cli_root/zdb/zdb_backup.ksh runs also an FreeBSD and I needed to
include the sysutils/coreutils package for the FreeBSD tests within the
QEMU patchset.

This could be reverted, when this pull request gets upstream

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16543
2024-09-19 15:53:57 -07:00
Rob Norris e8ede2ba78 zfs_debug: specific variant for userspace
Just nice and simple, with room to grow.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16492
2024-09-19 15:49:50 -07:00
Rob Norris c22d56e3ed zfs_znode: lift common code to a single shared file
For now, userspace has no znode implementation. Some of the property and
path handling code is used there though and is the same on all
platforms, so we only need a single copy of it.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16492
2024-09-19 15:49:45 -07:00
Rob Norris 4c9b59e541 zfs_racct: copy Linux implementation for userspace
The no-op is fine for both.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16492
2024-09-19 15:49:39 -07:00
Rob Norris 305d0a5fba libzpool: don't include trace.c
It does nothing in userspace anyway.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16492
2024-09-19 15:49:34 -07:00
Rob Norris d70b2c0687 vdev_label_os: copy Linux implementation for userspace
The no-op is fine for both.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16492
2024-09-19 15:49:29 -07:00
Rob Norris 8fc0beb66b arc_os: split userspace and Linux kernel code
The Linux arc_os.c carries userspace and kernel code, with very little
overlap between the two. This lifts the userspace parts out into a
separate arc_os.c for libzpool and removes it from the Linux side.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16492
2024-09-19 15:48:54 -07:00
Rob Norris b7e43d6e7f linux/abd_os: remove kernel version check for compound page support
All kernels we support have compound pages that work the way we would
like. However, this code is new and this knowledge was hard won, so I'd
like to leave the description and option there for a little while, even
if it can only be disabled with a recompile.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16545
2024-09-19 15:45:05 -07:00
Rob Norris a83762b3f4 linux: remove kernel version checks for unsupported kernels
Following 2b069768a (#16479), anything gated on a kernel version before
4.18 can be always included/excluded.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16545
2024-09-19 15:43:44 -07:00
Shengqi Chen a877b39624 cityhash: replace invocations with specialized versions when possible
So that we can get actual benefit from last commit.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16131
Closes #16483
2024-09-19 15:19:17 -07:00
Shengqi Chen 0ae4460c61 zcommon: add specialized versions of cityhash4
Specializing cityhash4 on 32-bit architectures can reduce the size
of stack frames as well as instruction count. This is a tiny but
useful optimization, since some callers invoke it frequently.

When specializing into 1/2/3/4-arg versions, the stack usage
(in bytes) on some 32-bit arches are listed as follows:

- x86: 32, 32, 32, 40
- arm-v7a: 20, 20, 28, 36
- riscv: 0, 0, 0, 16
- power: 16, 16, 16, 32
- mipsel: 8, 8, 8, 24

And each actual argument (even if passing 0) contributes evenly
to the number of multiplication instructions generated:

- x86: 9, 12, 15 ,18
- arm-v7a: 6, 8, 10, 12
- riscv / power: 12, 18, 20, 24
- mipsel: 9, 12, 15, 19

On 64-bit architectures, the tendencies are similar. But both stack
sizes and instruction counts are significantly smaller thus negligible.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16131
Closes #16483
2024-09-19 15:18:59 -07:00
Shengqi Chen 1c35206124 dmu_objset: replace dnode_hash impl with cityhash4
As mentioned in PR #16131, replacing CRC-based hash with cityhash4
could slightly improve the performance by eliminating memory access.
Replacing algorightm is safe since the hash result is not persisted.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16131
Closes #16483
2024-09-19 15:18:12 -07:00
Theera K. 4d469acd17 arcstat: add structural, types, states breakdown
Add ARC structural breakdown, ARC types breakdown, ARC states
breakdown similar to arc_summary.  Additional cleanups included.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Theera K. <tkittich@hotmail.com>
Closes #16509
2024-09-18 11:44:18 -07:00
Don Brady ddf5f34f06 Avoid fault diagnosis if multiple vdevs have errors
When multiple drives are throwing errors, it is likely not
a drive failing but rather a failure above the drives, like
a controller.  The active cases context of the drive's peers
is now considered when making a diagnosis.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16531
2024-09-18 11:36:48 -07:00
Rob Norris f245541e24 zfs_file: implement zfs_file_deallocate for FreeBSD 14
FreeBSD 14 gained a `VOP_DEALLOCATE` VFS operation and a `fspacectl`
syscall to use it. At minimum, these zero the given region, and if the
underlying filesystem supports it, can make the region sparse. We can
use this to get TRIM-like behaviour for file vdevs.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16496
2024-09-18 11:35:48 -07:00
Rob Norris fa330646b9 zfs_file: rename zfs_file_fallocate to zfs_file_deallocate
We only use it on a specific way: to punch a hole in (make sparse) a
region of a file, in order to implement TRIM-like behaviour.

So, call the op "deallocate", and move the Linux-style mode flags down
into the Linux implementation, since they're an implementation detail.

FreeBSD gets a no-op stub (for the moment).

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16496
2024-09-18 11:35:04 -07:00
Rob Norris 7cdfda3934 config: fix page_mapping test
It always failed from "unused variable" warnings-errors. The resulting
`#define page_mapping(...)` happend to work because it always overrode
the kernel's function prototype, but that's brittle.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:51 -07:00
Rob Norris 0807423369 config: fix various bits of missing output
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:51 -07:00
Rob Norris 5df65ca9c1 config: remove HAVE_GET_USER_PAGES_*
get_user_pages_unlocked() had stabilised by 4.9.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:51 -07:00
Rob Norris 696d7a71a0 config: remove test for unused s_d_op
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:51 -07:00
Rob Norris 1cb46d9d1a config: remove HAVE_MODE_LOOKUP_BDEV
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:51 -07:00
Rob Norris c57d268a78 config: remove HAVE_HAS_CAPABILITY
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:51 -07:00
Rob Norris a298801426 config: remove HAVE_BIO_SET_DEV
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:51 -07:00
Rob Norris 0a61e51736 config: rework ZFS_GENHD_FL_*
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:51 -07:00
Rob Norris d4e5538014 config: remove HAVE_GENERIC_IO_ACCT_3ARG
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:51 -07:00
Rob Norris 7a02229293 config: remove HAVE_VFSMOUNT_IOPS_GETATTR
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:51 -07:00
Rob Norris df9795f2d7 config: remove HAVE_GENERIC_READLINK
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:51 -07:00
Rob Norris f6661d1153 linux/zvol_os: convert END_IO macro to inline function
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:51 -07:00
Rob Norris dcb8e5ec7c config: remove HAVE_BLK_MQ
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 1bf93713d8 config: remove HAVE_BLK_QUEUE_FLAG_*
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris ec48dd0976 config: remove ZFS_GLOBAL_ZONE_PAGE_STATE and ZFS_ENUM_* generation
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 8d9cb04ea8 config: remove ZFS_GLOBAL_ZONE_PAGE_STATE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 0a00804585 config: remove HAVE_WAIT_QUEUE_*
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris de10132c34 config: remove HAVE_TMPFILE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 2de203163d config: remove HAVE_SUPER_SETUP_BDI_NAME
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris b32b6ac6e5 config: remove HAVE_SIGNAL_STOP
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 8396c84346 config: remove HAVE_SET_SPECIAL_STATE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 4059455cda config: remove HAVE_SCHED_SIGNAL_HEADER
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 7736b68376 config: remove HAVE_PERCPU_COUNTER_ADD_BATCH
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 6e625bd7bd config: remove HAVE_KVMALLOC
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris f95d76cc28 config: remove HAVE_KTIME_GET_RAW_TS64
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris b3458270b5 config: remove HAVE_KTIME_GET_COARSE_REAL_TS64
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 2c84b59e73 config: remove HAVE_KMEM_CACHE_CREATE_USERCOPY
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 536a0a8a84 config: remove HAVE_KERNEL_TIMER_SETUP
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 940f5d5658 config: remove HAVE_KERNEL_TIMER_FUNCTION_TIMER_LIST
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris a5b3a87030 config: remove HAVE_KERNEL_(READ|WRITE)_PPOS
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 80d7f0f98e config: remove HAVE_INODE_TIMESPEC64_TIMES
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 06c34465b7 config: remove HAVE_INODE_SET_IVERSION
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris a817992559 config: remove HAVE_FILEMAP_RANGE_HAS_PAGE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 54af0088fb config: remove HAVE_FILE_FADVISE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris f4c4df1638 config: remove HAVE_BIO_BI_STATUS and bio error compat
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 3008e691a2 config: remove HAVE_ACL_REFCOUNT
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 6e3b863df3 config: remove HAVE_[24]ARGS_VFS_GETATTR
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris d69bb93c9f config: remove HAVE_BLK_QUEUE_SECDISCARD
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 30a2907ce9 config: remove HAVE_RENAME2
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 6a00b01385 config: remove HAVE_GENERIC_SETXATTR
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 0f15852981 config: remove HAVE_FILE_AIO_FSYNC
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris cde79fe11a config: remove ZFS_GLOBAL_NODE_PAGE_STATE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 1af510b550 config: remove HAVE_XATTR_GET_DENTRY_INODE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris e055f0e053 config: remove HAVE_XATTR_LIST_SIMPLE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris e6713cfd54 config: remove HAVE_XATTR_(GET|SET|LIST)_HANDLER
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 79c307def9 config: remove HAVE_XATTR_HANDLER_NAME
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 72d3fa215f config: remove HAVE_VFS_ITERATE/HAVE_VFS_ITERATE_SHARED
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris df35eab0bf config: remove HAVE_VFS_COPY_FILE_RANGE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 432f6e9eec config: remove HAVE_SUPER_USER_NS
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 846b598519 config: remove HAVE_REQ_OP_* and HAVE_REQ_*
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 95d85f032f config: remove HAVE_(GET|PUT)_LINK_DELAYED
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 7cc89f83ff config: remove HAVE_POSIX_ACL_VALID_WITH_NS
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris a1832d1ecb config: remove HAVE_KERNEL_GET_ACL_HANDLE_CACHE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris d1a07741f8 config: remove HAVE_INODE_LOCK_SHARED
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 6374171bb4 config: remove HAVE_IN_COMPAT_SYSCALL
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 9b6f93a72f config: remove HAVE_GROUP_INFO_GID
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 6a17061213 config: remove HAVE_CURRENT_TIME
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris bbc52ed501 config: remove HAVE_CPU_HOTPLUG
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris ddf5122a29 config: remove HAVE_BLK_QUEUE_WRITE_CACHE/HAVE_BLK_QUEUE_FLUSH
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 43bf9a712b config: remove HAVE_BIO_BI_OPF
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris afcc0fb0fa config: remove HAVE_1ARG_SUBMIT_BIO
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 96b0785f52 config: remove HAVE_GET_LINK_COOKIE
As far as I can tell, this never made it to a real release. It was
introduced in 6b2553918d8b and removed a couple of weeks later in
fceef393a538. This was all part of the development of what would become
4.5. So I assume this was OpenZFS chasing upstream development at the
time.

    fceef393a538 viro 2015-12-30 switch ->get_link() to delayed_call, kill ->put_link()
    cd3417c8fc95 viro 2015-12-29 kill free_page_put_link()
    0d0def49d05a viro 2015-12-08 teach nfs_get_link() to work in RCU mode
    1a384eaac265 viro 2015-12-08 teach proc_self_get_link()/proc_thread_self_get_link() to work in RCU mode
    6a6c99049635 viro 2015-12-08 teach shmem_get_link() to work in RCU mode
    d3883d4f9344 viro 2015-12-08 teach page_get_link() to work in RCU mode
    6b2553918d8b viro 2015-12-08 replace ->follow_link() with new method that could stay in RCU mode

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 9a1c7240ba config: remove HAVE_RENAME2_OPERATIONS_WRAPPER
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 230bc538cb config: remove HAVE_VFS_FILE_OPERATIONS_EXTEND
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 9914684d36 config: remove HAVE_NEW_SYNC_READ
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 733317966f config: remove HAVE_XATTR_(GET|SET|LIST)_DENTRY
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris eb73000dbb config: remove HAVE_WAIT_ON_BIT_ACTION
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris a987057c67 config: remove HAVE_VFS_DIRECT_IO_ITER_RW_OFFSET
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris c9e8d0e0b5 config: remove HAVE_PUT_LINK_NAMEIDATA
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 2bba420245 config: remove HAVE_LSEEK_EXECUTE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 99c143a5a1 config: remove HAVE_FOLLOW_LINK_NAMEIDATA
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris ed048fdc5b config: remove HAVE_D_REVALIDATE_NAMEIDATA
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris ec6ba977b7 config: remove HAVE_3ARGS_VFS_GETATTR
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 6a28491f8e config: remove HAVE_3ARGS_BDI_SETUP_AND_REGISTER
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 9f5c9af77c config: remove HAVE_VFS_DIRECT_IO_IOVEC
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 1a64c06ec0 config: remove SHRINK_CONTROL_HAS_NID
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 72be1f4062 config: remove HAVE_VFS_RW_ITERATE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris f3d30f1ce0 config: remove HAVE_USER_NS_COMMON_INUM
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris b545b07b2f config: remove HAVE_SPLIT_SHRINKER_CALLBACK and HAVE_SINGLE_SHRINKER_CALLBACK
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris d60d4ad809 config: remove HAVE_SET_CACHED_ACL_USABLE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 6840e3b18b config: remove HAVE_SET_ACL
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 3d37b1d6d4 config: remove HAVE_POSIX_ACL_RELEASE and HAVE_POSIX_ACL_RELEASE_GPL_ONLY
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:50 -07:00
Rob Norris 67b0c883df config: remove HAVE___POSIX_ACL_CHMOD
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:49 -07:00
Rob Norris 583e2e25b9 config: remove HAVE_PERCPU_COUNTER_INIT_WITH_GFP
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:49 -07:00
Rob Norris f07485c46e config: remove HAVE_LINUX_BLK_CGROUP_HEADER
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:49 -07:00
Rob Norris 1b522c4583 config: remove HAVE_KERNEL_TIMER_LIST_FLAGS
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:49 -07:00
Rob Norris eb230c789a config: remove HAVE_KERNEL_STRSCPY
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:49 -07:00
Rob Norris d4bbe2ff38 config: remove HAVE_IO_SCHEDULE_TIMEOUT
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:49 -07:00
Rob Norris 714d7666e5 config: remove HAVE_INODE_SET_FLAGS
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:49 -07:00
Rob Norris cf006e3496 config: remove HAVE_GENERIC_WRITE_CHECKS_KIOCB
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:49 -07:00
Rob Norris 7af642af4d config: remove HAVE_FSYNC_RANGE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:49 -07:00
Rob Norris 6a1d8a9cf0 config: remove HAVE_FILE_INODE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:49 -07:00
Rob Norris 257e40f9d9 config: remove HAVE_FILE_DENTRY
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:49 -07:00
Rob Norris 525f06b5f6 config: remove HAVE_FALLOC_FL_ZERO_RANGE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:49 -07:00
Rob Norris f70ffacdfc config: remove HAVE_ENCODE_FH_WITH_INODE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:49 -07:00
Rob Norris 8e002ee26e config: remove HAVE_D_PRUNE_ALIASES
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:49 -07:00
Rob Norris dc6af4a4b5 config: remove HAVE_D_MAKE_ROOT
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:49 -07:00
Rob Norris 92f7ec6075 config: remove HAVE_DIRTY_INODE_WITH_FLAGS
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:49 -07:00
Rob Norris efc293e371 config: remove HAVE_DENTRY_D_U_ALIASES
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:48 -07:00
Rob Norris 147c82bd5e config: remove HAVE_CLEAR_INODE and HAVE_EVICT_INODE
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:48 -07:00
Rob Norris 609559e5b9 config: remove HAVE_BIO_BVEC_ITER
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:45 -07:00
Rob Norris 233bed67a8 config: remove HAVE_1ARG_BIO_END_IO_T
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:40 -07:00
Rob Norris 02f4b63db9 config: remove checks with unused defines
All of these set a #define that doesn't appear anywhere in the tree.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:35 -07:00
Rob Norris 2b069768ab META: set Linux minimum version to 4.18
This sets RHEL8's base kernel[1] as the floor, and includes the oldest
kernel.org LTS (4.19).

1. https://access.redhat.com/articles/3078#RHEL8

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16479
2024-09-18 11:23:29 -07:00
rmacklem 29c9e6c324 Fix handling of DNS names with '-' in them for sharenfs
An old FreeBSD bugzilla report PR#168158 notes that DNS
names with '-'s in them cannot be used for the sharenfs
property.  This patch fixes the parsing of these DNS names.
The only negative affect this patch might have is that,
if a user has incorrectly separated options with a '-'
the sharenfs setting will no longer work once this patch
is applied.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes #16529
2024-09-17 13:56:26 -07:00
Rob Norris ec0209418f sa_impl: fix SA header bitfield docs
Off by one, confused me a while!

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16500
2024-09-17 13:53:39 -07:00
Pavel Snajdr 90af1e83e8 Linux 6.10 compat: Fix tracepoints definitions
__string field definition includes the source variable for a value
of the string when the TP hits; in 6.10+ kernels, __assign_str()
uses that to copy a value from src to the string, with older
kernels, __assign_str still accepted src as a second parameter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Co-authored-by: Tony Hutter <hutter2@llnl.gov>
Closes #16475 
Closes #16515
2024-09-17 13:38:02 -07:00
Alexander Motin ac04407ffe Remove extra newline from spa_set_allocator().
zfs_dbgmsg() does not need newline at the end of the message.

While there, slightly update/sync FreeBSD __dprintf().

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16536
2024-09-17 13:15:42 -07:00
Tino Reichardt bca9b64e7b ZTS: Use QEMU for tests on Linux and FreeBSD
This commit adds functional tests for these systems:
- AlmaLinux 8, AlmaLinux 9, ArchLinux
- CentOS Stream 9, Fedora 39, Fedora 40
- Debian 11, Debian 12
- FreeBSD 13, FreeBSD 14, FreeBSD 15
- Ubuntu 20.04, Ubuntu 22.04, Ubuntu 24.04

- enabled by default:
 - AlmaLinux 8, AlmaLinux 9
 - Debian 11, Debian 12
 - Fedora 39, Fedora 40
 - FreeBSD 13, FreeBSD 14

Workflow for each operating system:
- install qemu on the github runner
- download current cloud image of operating system
- start and init that image via cloud-init
- install dependencies and poweroff system
- start system and build openzfs and then poweroff again
- clone build system and start 2 instances of it
- run functional testings and complete in around 3h
- when tests are done, do some logfile preparing
- show detailed results for each system
- in the end, generate the job summary

Real-world benefits from this PR:

1. The github runner scripts are in the zfs repo itself. That means
   you can just open a PR against zfs, like "Add Fedora 41 tester", and
   see the results directly in the PR. ZFS admins no longer need
   manually to login to the buildbot server to update the buildbot config
   with new version of Fedora/Almalinux.

2. Github runners allow you to run the entire test suite against your
   private branch before submitting a formal PR to openzfs. Just open a
   PR against your private zfs repo, and the exact same
   Fedora/Alma/FreeBSD runners will fire up and run ZTS. This can be
   useful if you want to iterate on a ZTS change before submitting a
   formal PR.

3. buildbot is incredibly cumbersome. Our buildbot config files alone
   are ~1500 lines (not including any build/setup scripts)!
   It's a huge pain to setup.

4. We're running the super ancient buildbot 0.8.12. It's so ancient
   it requires python2. We actually have to build python2 from source
   for almalinux9 just to get it to run. Ugrading to a more modern
   buildbot is a huge undertaking, and the UI on the newer versions is
   worse.

5. Buildbot uses EC2 instances. EC2 is a pain because:
   * It costs money
   * They throttle IOPS and CPU usage, leading to mysterious,
   * hard-to-diagnose, failures and timeouts in ZTS.
   * EC2 is high maintenance. We have to setup security groups, SSH
   * keys, networking, users, etc, in AWS and it's a pain. We also
   * have to periodically go in an kill zombie EC2 instances that
   * buildbot is unable to kill off.

6. Buildbot doesn't always handle failures well. One of the things we
   saw in the past was the FreeBSD builders would often die, and each
   builder death would take up a "slot" in buildbot. So we would
   periodically have to restart buildbot via a cron job to get the slots
   back.

7. This PR divides up the ZTS test list into two parts, launches two
   VMs, and on each VM runs half the test suite. The test results are
   then merged and shown in the sumary page. So we're basically
   parallelizing ZTS on the same github runner. This leads to lower
   overall ZTS runtimes (2.5-3 hours vs 4+ hours on buildbot), and one
   unified set of results per runner, which is nice.

8. Since the tests are running on a VM, we have much more control over
   what happens. We can capture the serial console output even if the
   test completely brings down the VM. In the future, we could also
   restart the test on the VM where it left off, so that if a single test
   panics the VM, we can just restart it and run the remaining ZTS tests
   (this functionaly is not yet implemented though, just an idea).

9. Using the runners, users can manually kill or restart a test run
   via the github IU. That really isn't possible with buildbot unless
   you're an admin.

10. Anecdotally, the tests seem to be more stable and constant under
    the QEMU runners.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16537
2024-09-17 12:03:27 -07:00
Tino Reichardt c4d1a19b33 ZTS: increase timeout of mmap_sync_001_pos
On load the test needs sometimes a bit more time then just one second.
Doubling the time will help on the QEMU based testings.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16537
2024-09-17 12:03:08 -07:00
Tino Reichardt 5cb3e2861e ZTS: fix raidz_expand_001_pos and raidz_expand_002_pos
Sometimes the pool may start an auto scrub.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16537
2024-09-17 12:02:58 -07:00
Tino Reichardt 4999f49513 ZTS: fix zpool_status_008_pos test on qemu vm's
The test needs some adjusting within the timings.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Co-authored-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16537
2024-09-17 12:01:13 -07:00
Brian Atkinson a10e552b99 Adding Direct IO Support
Adding O_DIRECT support to ZFS to bypass the ARC for writes/reads.

O_DIRECT support in ZFS will always ensure there is coherency between
buffered and O_DIRECT IO requests. This ensures that all IO requests,
whether buffered or direct, will see the same file contents at all
times. Just as in other FS's , O_DIRECT does not imply O_SYNC. While
data is written directly to VDEV disks, metadata will not be synced
until the associated  TXG is synced.
For both O_DIRECT read and write request the offset and request sizes,
at a minimum, must be PAGE_SIZE aligned. In the event they are not,
then EINVAL is returned unless the direct property is set to always (see
below).

For O_DIRECT writes:
The request also must be block aligned (recordsize) or the write
request will take the normal (buffered) write path. In the event that
request is block aligned and a cached copy of the buffer in the ARC,
then it will be discarded from the ARC forcing all further reads to
retrieve the data from disk.

For O_DIRECT reads:
The only alignment restrictions are PAGE_SIZE alignment. In the event
that the requested data is in buffered (in the ARC) it will just be
copied from the ARC into the user buffer.

For both O_DIRECT writes and reads the O_DIRECT flag will be ignored in
the event that file contents are mmap'ed. In this case, all requests
that are at least PAGE_SIZE aligned will just fall back to the buffered
paths. If the request however is not PAGE_SIZE aligned, EINVAL will
be returned as always regardless if the file's contents are mmap'ed.

Since O_DIRECT writes go through the normal ZIO pipeline, the
following operations are supported just as with normal buffered writes:
Checksum
Compression
Encryption
Erasure Coding
There is one caveat for the data integrity of O_DIRECT writes that is
distinct for each of the OS's supported by ZFS.
FreeBSD - FreeBSD is able to place user pages under write protection so
          any data in the user buffers and written directly down to the
	  VDEV disks is guaranteed to not change. There is no concern
	  with data integrity and O_DIRECT writes.
Linux - Linux is not able to place anonymous user pages under write
        protection. Because of this, if the user decides to manipulate
	the page contents while the write operation is occurring, data
	integrity can not be guaranteed. However, there is a module
	parameter `zfs_vdev_direct_write_verify` that controls the
	if a O_DIRECT writes that can occur to a top-level VDEV before
	a checksum verify is run before the contents of the I/O buffer
        are committed to disk. In the event of a checksum verification
	failure the write will return EIO. The number of O_DIRECT write
	checksum verification errors can be observed by doing
	`zpool status -d`, which will list all verification errors that
	have occurred on a top-level VDEV. Along with `zpool status`, a
	ZED event will be issues as `dio_verify` when a checksum
	verification error occurs.

ZVOLs and dedup is not currently supported with Direct I/O.

A new dataset property `direct` has been added with the following 3
allowable values:
disabled - Accepts O_DIRECT flag, but silently ignores it and treats
	   the request as a buffered IO request.
standard - Follows the alignment restrictions  outlined above for
	   write/read IO requests when the O_DIRECT flag is used.
always   - Treats every write/read IO request as though it passed
           O_DIRECT and will do O_DIRECT if the alignment restrictions
	   are met otherwise will redirect through the ARC. This
	   property will not allow a request to fail.

There is also a module parameter zfs_dio_enabled that can be used to
force all reads and writes through the ARC. By setting this module
parameter to 0, it mimics as if the  direct dataset property is set to
disabled.

Reviewed-by: Brian Behlendorf <behlendorf@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Co-authored-by: Mark Maybee <mark.maybee@delphix.com>
Co-authored-by: Matt Macy <mmacy@FreeBSD.org>
Co-authored-by: Brian Behlendorf <behlendorf@llnl.gov>
Closes #10018
2024-09-14 13:47:59 -07:00
Tino Reichardt 1713aa7b4d Remove set but not used variable in ddt.c (#16522)
module/zfs/ddt.c:2612:6: error: variable 'total' set but not used

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-09-10 12:46:50 -07:00
Alan Somers 308f7c2f14 Fix an uninitialized data access (#16511)
zfs_acl_node_alloc allocates an uninitialized data buffer, but upstack
zfs_acl_chmod only partially initializes it.  KMSAN reported that this
memory remained uninitialized at the point when it was read by
lzjb_compress, which suggests a possible kernel memory disclosure bug.

The full KMSAN warning may be found in the PR.
https://github.com/openzfs/zfs/pull/16511

Signed-off-by:	Alan Somers <asomers@gmail.com>
Sponsored by:	Axcient
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-09-10 09:08:45 -07:00
Rob Norris 63253dbf4f zts-report: don't crash on non-UTF-8 chars in the log (#16497)
The report generator expects the log to be clean and tidy UTF-8. That
can be a problem if you use some of the verbose/debug test runner
options, which sends all sorts of weird output from arbitrary programs
to the log.

This just makes Python a little more relaxed about such things. It
shouldn't matter in practice, as those lines didn't match the test
result regex anyway, and are discarded immediately.


Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-09-09 17:49:14 -07:00
Jessica Clarke 88433e640d sys/types32.h: Remove struct timeval32 from libspl's header (#16491)
macOS Sequoia's sys/sockio.h, as included by various bootstrap tools
whilst building FreeBSD, has started to include net/if.h, which then
includes sys/_types/_timeval32.h and provide a conflicting definition
for struct timeval32. Since this type is entirely unused within OpenZFS,
simply delete the type rather than adding in some kind of OS detection.

This fixes building FreeBSD on macOS Sequoia (Beta).

Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-09-09 17:37:12 -07:00
Rob Norris 8be2f4c3d2 zio_resume: log when unsuspending the pool (#16485)
When reviewing logs after a failure, its useful to see where
unsuspend/resume was requested.


Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-09-09 17:21:20 -07:00
Rob Norris 5c67820265 libzstd: also build with LIBZPOOL_CPPFLAGS
libzstd now also allocates its own abd_t, and so has the same issue as
zstream did, so this applies the same workaround: compile it with
ZFS_DEBUG. See 92fca1c2d.

This looks weird, because libzstd doesn't appear to look related to the
ZFS kernel, but there is already a cross-dependency there: zstd needs
zfs_lz4_compress, and zfs needs zfs_zstd_compress (and others), so the
two can never really be separated without more work. Another job for
another time.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mmaybee@delphix.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16489
2024-09-09 14:13:27 -07:00
Rob Norris b109925820 spa_prop_get: require caller to supply output nvlist
All callers to spa_prop_get() and spa_prop_get_nvlist() supplied their
own preallocated nvlist (except ztest), so we can remove the option to
have them allocate one if none is supplied.

This sidesteps a bug in spa_prop_get(), where the error var wasn't
initialised, which could lead to the provided nvlist being freed at the
end.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16505
2024-09-06 08:45:58 -07:00
Rob Norris 17dd66deda zpool events: expand value strings for ZIO error values
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2024-09-05 13:40:05 -07:00
Rob Norris 82ff9aafd6 value strings: pretty printers for flags and enums
This adds zfs_valstr, a collection of pretty printers for bitfields and
enums. These are useful in debugging, logging and other display contexts
where raw values are difficult for the untrained (or even trained!) eye
to decipher.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2024-09-05 13:40:05 -07:00
Don Brady d4d79451cb Add DDT prune command
Requires the new 'flat' physical data which has the start
time for a class entry.

The amount to prune can be based on a target percentage of
the unique entries or based on the age (i.e., every entry
older than N days).

Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16277
2024-09-04 14:17:02 -07:00
Rob Norris 4a4f7b019f zdb: rework dedup accounting for log, quota and prune
The simplest thing first: add the FDT and log objects to the list of
objects to be considered when checking for leaks.

The rest is based on a conceptual change in all of this patch stack: a
block on disk with a 'D' bit is not necessarily in the DDT at all
(pruned), or in the DDT ZAPs (still on the log).

As such, walking the DDT up front is difficult (for all the reasons that
walking an unflushed log is difficult) and not really useful, since it's
not a reflection of what's on disk anyway.

Instead, we rework things here to be more like the BRT checks. When we
see a dedup'd block, we look it up in the DDT, consume a refcount, and
for the second-or-later instances, count them as duplicates.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #16277
2024-09-04 14:16:42 -07:00
Seth Hoffert bf8c61f489 Remove unused sysctl node
PR #14953 removed vdev-level read cache but accidentally left this
sysctl node behind.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Seth Hoffert <seth.hoffert@gmail.com>
Closes #16493
2024-09-03 17:52:33 -07:00
Rob Norris b3b7491615 build: rename FORCEDEBUG_CPPFLAGS to LIBZPOOL_CPPFLAGS
This is just a very small attempt to make it more obvious that these
flags aren't optional for libzpool-using programs, by not making it seem
like there's an option to say "well, I don't _want_ to force debugging".

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Issue #16476
Closes #16477
2024-08-27 12:53:27 -07:00
Rob Norris 92fca1c2d0 zstream: build with debug to fix stack overruns
abd_t differs in size depending on whether or not ZFS_DEBUG is set. It
turns out that libzpool is built with FORCEDEBUG_CPPFLAGS, which sets
-DZFS_DEBUG, and so it always has a larger abd_t with extra debug
fields, regardless of whether or not --enable-debug is set.

zdb, ztest and zhack are also all built with FORCEDEBUG_CPPFLAGS, so had
the same idea of the size of abd_t, but zstream was not, and used the
"smaller" abd_t. In practice this didn't matter because it never used
abd_t directly.

This changed in b4d81b1a6, zstream was switched to use stack ABDs for
compression. When built with --enable-debug, zstream implicitly gets
ZFS_DEBUG, and everything was fine. Productions builds without that flag
ends up with the smaller abd_t, which is now mismatched with libzpool,
and causes stack overruns in zstream recompress.

The simplest fix for now is to compile zstream with FORCEDEBUG_CPPFLAGS
like the other binaries. This commit does that.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Issue #16476
Closes #16477
2024-08-27 12:52:23 -07:00
Rob Norris 50b32cb925 fm: pass io_flags through events & zed as uint64_t
In 4938d01db (#14086) zio_flag_t was converted from an enum (generally
signed 32-bit) to a uint64_t. The corresponding change wasn't made to
the error reporting subsystem, limiting the error flags being delivered
to zed to 32 bits. This bumps the whole pipeline to use uint64s.

A tiny bit of compatibility is added for newer zed working agsinst an
older kernel module, because its easy to do and misdetecting
scrub/resilver errors and taking action is potentially dangerous. Making
it work for new kernel modules against older zed seems to be far more
invasive for far less benefit, so I have not.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16469
2024-08-26 17:39:13 -07:00
Jitendra Patidar 73866cf346 Fix issig() to check signal_pending after dequeue SIGSTOP/SIGTSTP
When process got SIGSTOP/SIGTSTP, issig() dequeue them and return 0.
But process could still have another signal pending after dequeue. So,
after dequeue, check and return 1, if signal_pending.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #16464
2024-08-26 17:36:49 -07:00
Mateusz Piotrowski 6be8bf5552 zpool: Provide GUID to zpool-reguid(8) with -g (#16239)
This commit extends the zpool-reguid(8) command with a -g flag, which
allows the user to specify the GUID to set.

This change also adds some general tests for zpool-reguid(8).

Sponsored-by: Wasabi Technology, Inc.
Sponsored-by: Klara, Inc.

Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-08-26 09:27:24 -07:00
Rob Norris 2420ee6e12 spl-taskq: fix task counts for delayed and cancelled tasks
Dispatched delayed tasks were not added to tasks_total, and cancelled
tasks were not removed. This notably could make tasks_total go to
UNIT64_MAX, but just generally meant the count could be wrong. So lets
not!

Sponsored-by: Klara, Inc.
Sponsored-by: Syneto
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16473
2024-08-23 10:40:45 -07:00
Low-power 34118eac06 Make mount.zfs(8) calling zfs_mount_at for legacy mounts as well
Commit 329e2ffa4bca456e65c3db7f5c5c04931c551b61 has made mount.zfs(8) to
call libzfs function 'zfs_mount_at', in order to propagate dataset
properties into mount options. This fix however, is limited to a special
use case where mount.zfs(8) is used in initrd with option '-o zfsutil'.
If either initrd or the user need to use mount.zfs(8) to mount a file
system with 'mountpoint' set to 'legacy', '-o zfsutil' can't be used and
the original issue #7947 will still happen.

Since the existing code already excluded the possibility of calling
'zfs_mount_at' when it was invoked as a helper program from zfs(8), by
checking 'ZFS_MOUNT_HELPER' environment variable, it makes no sense to
avoid calling 'zfs_mount_at' without '-o zfsutil'.

An exception however, is when mount.zfs(8) was invoked with '-o remount'
to update the mount options for an existing mount point. In this case
call mount(2) directly without modifying the mount options passed from
command line.

Furthermore, don't run mount.zfs(8) helper for automounting snapshot.
The above change to make mount.zfs(8) to call 'zfs_mount_at'
apparently caused it to trigger an automount for the snapshot
directory. When the helper was invoked as a result of a snapshot
automount, an infinite recursion will occur.

Since the need of invoking user mode mount(8) for automounting was to
overcome that the 'vfs_kern_mount' being GPL-only, just run mount(8)
without the mount.zfs(8) helper by adding option '-i'.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: WHR <whr@rivoreo.one>
Closes #16393
2024-08-23 10:39:09 -07:00
Rob Norris cb36f4f352 zstream recompress: fix zero recompressed buffer and output
If compression happend, any garbage past the compress size was not
zeroed out.

If compression didn't happen, then the payload size was still set to
the rounded-up return from zio_compress_data(), which is dependent on
the input, which is not necessarily the logical size.

So that's all fixed too, mostly from stealing the math from zio.c.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2024-08-22 16:22:24 -07:00
Rob Norris a537d90734 zstream decompress: fix decompress size and output
This was incorrectly using the compressed length for the size of the
decompress buffer, and quietly doing nothing if the decompressor refused
to decompress the block because there wasn't enough space.

After that, it wasn't correctly rewriting the record to indicate
"not compressed".

So that's fixed now. Sigh.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2024-08-22 16:22:24 -07:00
Rob Norris a9c94bea9f zio_compress_data: limit dest length to ABD size
Some callers (eg `do_corrective_recv()`) pass in a dest buffer much
smaller than the wanted 87.5% of the source buffer, because the
incoming abd is larger than the source data and they "know" what the
decompressed size with be.

However, `abd_borrow_buf()` rightly asserts if we try to borrow more
than is available, so these callers fail.

Previously when all we had was a dest buffer, we didn't know how big it
was, so we couldn't do anything. Now we have a dest abd, with a size, so
we can clamp dest size to the abd size.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2024-08-22 16:22:24 -07:00
Rob Norris f62e6e1f98 compress: change zio_compress API to use ABDs
This commit changes the frontend zio_compress_data and
zio_decompress_data APIs to take ABD points instead of buffer pointers.

All callers are updated to match. Any that already have an appropriate
ABD nearby now use it directly, while at the rest we create an one.

Internally, the ABDs are passed through to the provider directly.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2024-08-22 16:22:24 -07:00
Rob Norris d3c12383c9 compress: change compression providers API to use ABDs
This commit changes the provider compress and decompress API to take ABD
pointers instead of buffer pointers for both data source and
destination. It then updates all providers to match.

This doesn't actually change the providers to do chunked compression,
just changes the API to allow such an update in the future. Helper
macros are added to easily adapt the ABD functions to their buffer-based
implementations.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2024-08-22 16:22:24 -07:00
Rob Norris 522816498c compress: standardise names of compression functions
This is mostly to make searching easier.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2024-08-22 16:22:24 -07:00
Rob Norris dd0c08f9c6 compress: remove unused abd compress prototype
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2024-08-22 16:22:24 -07:00
Rob Norris e119483a95 compress: remove zio_decompress_data_buf
Nothing uses it anymore!

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2024-08-22 16:22:24 -07:00
Rob Norris b4d81b1a6a zstream: use zio_compress calls for compression
This is updating zstream to use the zio_compress calls rather than using
its own dispatch. Since that was fairly entangled, some refactoring
included.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2024-08-22 16:22:24 -07:00
Rob Norris 5eede0d5fd compress: rework callers to always use the zio_compress calls
This will make future refactoring easier.

There are two we can't change for the moment, because zio_compress_data
does hole detection & collapsing which zio_decompress_data does not
actually know how to handle.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2024-08-22 16:22:24 -07:00
Rob Norris ba2209ec9e abd_get_from_buf_struct: wrap existing buf with ABD stored on stack
This allows a simple "wrapping" ABD for an existing linear buffer to be
allocated on the stack, avoiding an allocation.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2024-08-22 16:22:24 -07:00
Tony Hutter 9e15877dfb Linux 6.10 compat: META
Update the META file to reflect compatibility with the 6.10 kernel.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16466
2024-08-21 17:38:06 -07:00
Rob Norris b69bebb535 libzpool/abd_os: iovec-based scatter abd
This is intended to be a simple userspace scatter abd based on struct
iovec. It's not very sophisticated as-is, but sets a base for something
much more interesting.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16253
2024-08-21 13:37:25 -07:00
Rob Norris 5b9e695392 abd_os: break out platform-specific header parts
Removing the platform #ifdefs from shared headers in favour of
per-platform headers. Makes abd_t much leaner, among other things.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16253
2024-08-21 13:37:18 -07:00
Rob Norris 7a5b4355e2 abd_os: split userspace and Linux kernel code
The Linux abd_os.c serves double-duty as the userspace scatter abd
implementation, by carrying an emulation of kernel scatterlists. This
commit lifts common and userspace-specific parts out into a separate
abd_os.c for libzpool.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16253
2024-08-21 13:37:13 -07:00
Rob Norris 2b7d9a7863 zio: no alloc canary in userspace
Makes it harder to use memory debuggers like valgrind directly, because
they can't see canary overruns.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16253
2024-08-21 13:37:07 -07:00
Rob Norris b3f4e4e1ec abd: remove ABD_FLAG_ZEROS
Nothing ever checks it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16253
2024-08-21 13:36:24 -07:00
shodanshok bbe8512a93 Ignore zfs_arc_shrinker_limit in direct reclaim mode
zfs_arc_shrinker_limit (default: 10000) avoids ARC collapse
due to excessive memory reclaim. However, when the kernel is
in direct reclaim mode (ie: low on memory), limiting ARC reclaim
increases OOM risk. This is especially true on system without
(or with inadequate) swap.

This patch ignores zfs_arc_shrinker_limit when the kernel is in
direct reclaim mode, avoiding most OOM. It also restores
"echo 3 > /proc/sys/vm/drop_caches" ability to correctly drop
(almost) all ARC.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #16313
2024-08-21 10:00:33 -07:00
Ameer Hamza a2c4e95cfd linux/zvol_os.c: cleanup limits for non-blk mq case
Rob Noris suggested that we could clean up redundant limits for the case
of non-blk mq scenario.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16462
2024-08-20 17:16:08 -07:00
Ameer Hamza 8e6a9aabb1 linux/zvol_os.c: Fix max_discard_sectors limit for 6.8+ kernel
In kernels 6.8 and later, the zvol block device is allocated with
qlimits passed during initialization. However, the zvol driver does not
set `max_hw_discard_sectors`, which is necessary to properly
initialize `max_discard_sectors`. This causes the `zvol_misc_trim` test
to fail on 6.8+ kernels when invoking the `blkdiscard` command. Setting
`max_hw_discard_sectors` in the `HAVE_BLK_ALLOC_DISK_2ARG` case resolve
the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16462
2024-08-20 17:14:44 -07:00
Rob Norris 816d2b2bfc spl-proc: remove old taskq stats
These had minimal useful information for the admin, didn't work properly
in some places, and knew far too much about taskq internals.

With the new stats available, these should never be needed anymore.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Syneto
Closes #16171
2024-08-19 09:50:45 -07:00
Rob Norris 3f8fd3cae0 spl-taskq: summary stats for all taskqs
This adds /proc/spl/kstats/taskq/summary, which attempts to show a
useful subset of stats for all taskqs in the system.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Syneto
Closes #16171
2024-08-19 09:50:41 -07:00
Rob Norris db40fe4cf6 spl-taskq: per-taskq kstats
This exposes a variety of per-taskq stats under /proc/spl/kstat/taskq,
one file per taskq, named for the taskq name.instance.

These include a small amount of info about the taskq config, the current
state of the threads and queues, and various counters for thread and
queue activity since the taskq was created.

To assist with decrementing queue size counters, the list an entry is on
is encoded in spare bits in the entry flags.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Syneto
Closes #16171
2024-08-19 09:50:35 -07:00
Rob Norris f0ad031cd9 spl-generic: bring up kstats subsystem before taskq
For spl-taskq to use the kstats infrastructure, it has to be available
first.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Syneto
Closes #16171
2024-08-19 09:49:28 -07:00
Chunwei Chen 06a7b123ac Skip ro check for snaps when multi-mount
Skip ro check for snapshots since they are always ro regardless if ro
flag is passed by mount or not. This allows multi-mounting snapshots
without requiring to specify ro flag.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #16299
2024-08-19 09:42:17 -07:00
shodanshok 77a797a382 Enable L2 cache of all (MRU+MFU) metadata but MFU data only
`l2arc_mfuonly` was added to avoid wasting L2 ARC on read-once MRU
data and metadata. However it can be useful to cache as much
metadata as possible while, at the same time, restricting data
cache to MFU buffers only.

This patch allow for such behavior by setting `l2arc_mfuonly` to 2
(or higher). The list of possible values is the following:
0: cache both MRU and MFU for both data and metadata;
1: cache only MFU for both data and metadata;
2: cache both MRU and MFU for metadata, but only MFU for data.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #16343 
Closes #16402
2024-08-16 13:34:07 -07:00
Allan Jude a60e15d6b9 Man page updates for dmu_ddt_copies
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #15895
2024-08-16 12:04:04 -07:00
Rob Norris 0d2707815d ddt: lookup and log stats
Adds per-DDT stats counting lookups and where they were serviced from
(either log or backing zap), number of log entries in memory, and flow
rates.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15895
2024-08-16 12:03:51 -07:00
Rob Norris a1902f4950 ddt: block scan until log is flushed, and flush aggressively
The dedup log does not have a stable cursor, so its not possible to
persist our current scan location within it across pool reloads.
Beccause of this, when walking (scanning), we can't treat it like just
another source of dedup entries.

Instead, when a scan is wanted, we switch to an aggressive flushing
mode, pushing out entries older than the scan start txg as fast as we
can, before starting the scan proper.

Entries after the scan start txg will be handled via other methods; the
DDT ZAPs and logs will be written as normal, and blocks not seen yet
will be offered to the scan machinery as normal.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15895
2024-08-16 12:03:43 -07:00
Rob Norris cd69ba3d49 ddt: dedup log
Adds a log/journal to dedup. At the end of txg, instead of writing the
entry directly to the ZAP, instead its adding to an in-memory tree and
appended to an on-disk object. The on-disk object is only read at
import, to reload the in-memory tree.

Lookups first go the the log tree before going to the ZAP, so
recently-used entries will remain close by in memory. This vastly
reduces overhead from dedup IO, as it will not have to do so many
read/update/write cycles on ZAP leaf nodes.

A flushing facility is added at end of txg, to push logged entries out
to the ZAP. There's actually two separate "logs" (in-memory tree and
on-disk object), one active (recieving updated entries) and one flushing
(writing out to disk). These are swapped (ie flushing begins) based on
memory used by the in-memory log trees and time since we last flushed
something.

The flushing facility monitors the amount of entries coming in and being
flushed out, and calibrates itself to try to flush enough each txg to
keep up with the ingest rate without competing too much with other IO.
Multiple tuneables are provided to control the flushing facility.

All the histograms and stats are update to accomodate the log as a
separate entry store. zdb gains knowledge of how to count them and dump
them. Documentation included!

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15895
2024-08-16 12:03:35 -07:00
Rob Norris cbb9ef0a4c ddt: tuneable to override copies= on dedup metadata objects
All objects stored in the MOS get copies=3. For a large dedup table,
this requires significant extra IO and disk space, when its not really
necessary - the dedup table itself isn't needed to read or write data,
only to keep data usage down. Losing the dedup table does not render the
pool unusable, it just messes up the accounting somewhat.

This adds a dmu_ddt_copies tuneable. When set to 0, the existing
behaviour is used. When set higher, dedup table blocks (ZAP and log)
will have this many copies rather than the usual 3, while indirect
blocks will have one more again.

This is a tuneable for now mostly for testing. Losing a dedup table can
cause blocks to be leaked, and we currently have no facilities to repair
that.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15895
2024-08-16 12:03:27 -07:00
Rob Norris 592f38900d ddt: compare keys 64-bits at a time, trying to match ZAP order
This yields substantial performance improvements when we only write out
some small % of entries at a time, as it will cause entries that will go
into "nearby" ZAP leaf nodes to be grouped closer together in the AVL, and
so touch fewer blocks. Without this, the distribution is an even spread,
so we touch a lot more ZAP leaf nodes for any given number of entries.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15895
2024-08-16 12:03:19 -07:00
Rob Norris 27e9cb5f80 ddt: cleanup the stats & histogram code
Both the API and the code were kinda mangled and I was really struggling
to follow it. The worst offender was the old ddt_stat_add(); after
fixing it up the rest of the changes are mostly knock-on effects and
targets of opportunity.

Note that the old ddt_stat_add() was safe against overflows - it could
produce crazy numbers, but the compiler wouldn't do anything stupid. The
assertions in ddt_stat_sub() go a lot of the way to protecting against
this; getting in a position where overflows are a problem is definitely
a programming error.

Also expanding ddt_stat_add() and ddt_histogram_empty() produces less
efficient assembly. I'm not bothered about this right now though; these
should not be hot functions, and if they are we'll optimise them later.
If we have to go back to the old form, we'll comment it like crazy.

Finally, I've removed the assertion that the bucket will never be
negative, as it will soon be possible to have entries with zero
refcounts: an entry for a block that is no longer on the pool, but is on
the log waiting to be synced out. It might be better to have a separate
bucket for these, since they're still using real space on disk, but
ultimately these stats are driving UI, and for now I've chosen to keep
them matching how they've looked in the past, as well as match the
operators mental model - pool usage is managed elsewhere.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15895
2024-08-16 12:02:56 -07:00
Rob Norris f4aeb23f52 ddt: add "flat phys" feature
Traditional dedup keeps a separate ddt_phys_t "type" for each possible
count of DVAs (that is, copies=) parameter. Each of these are tracked
independently of each other, and have their own set of DVAs. This leads
to an (admittedly rare) situation where you can create as many as six
copies of the data, by changing the copies= parameter between copying.
This is both a waste of storage on disk, but also a waste of space in
the stored DDT entries, since there never needs to be more than three
DVAs to handle all possible values of copies=.

This commit adds a new FDT feature, DDT_FLAG_FLAT. When active, only the
first ddt_phys_t is used. Each time a block is written with the dedup
bit set, this single phys is checked to see if it has enough DVAs to
fulfill the request. If it does, the block is filled with the saved DVAs
as normal. If not, an adjusted write is issued to create as many extra
copies as are needed to fulfill the request, which are then saved into
the entry too.

Because a single phys is no longer an all-or-nothing, but can be
transitioning from fewer to more DVAs, the write path now has to keep a
copy of the previous "known good" DVA set so we can revert to it in case
an error occurs. zio_ddt_write() has been restructured and heavily
commented to make it much easier to see what's happening.

Backwards compatibility is maintained simply by allocating four
ddt_phys_t when the DDT_FLAG_FLAT flag is not set, and updating the phys
selection macros to check the flag. In the old arrangement, each number
of copies gets a whole phys, so it will always have either zero or all
necessary DVAs filled, with no in-between, so the old behaviour
naturally falls out of the new code.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15893
2024-08-16 12:02:39 -07:00
Rob Norris 0ba5f503c5 ddt: slim down ddt_entry_t
This slims down the in-memory entry to as small as it can be. The
IO-related parts are made into a separate entry, since they're
relatively rarely needed.

The variable allocation for dde_phys is to support the upcoming flat
format.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15893
2024-08-16 12:02:31 -07:00
Rob Norris 4d686c3da5 ddt: introduce lightweight entry
The idea here is that sometimes you need the contents of an entry with
no intent to modify it, and/or from a place where its difficult to get
hold of its originating ddt_t to know how to interpret it.

A lightweight entry contains everything you might need to "read" an
entry - its key, type and phys contents - but none of the extras for
modifying it or using it in a larger context. It also has the full
complement of phys slots, so it can represent any kind of dedup entry
without having to know the specific configuration of the table it came
from.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15893
2024-08-16 12:02:22 -07:00
Rob Norris d17ab631a9 ddt: rework access to phys array slots
The "flat phys" feature will use only a single phys slot for all
entries, which means the old "single", "double" etc naming now makes no
sense, and more importantly, means that choosing the right slot for a
given block pointer will depend on how many slots are in use for a given
DDT.

This removes the old names, and adds accessor macros to decouple
specific phys array indexes from any particular meaning.

(These macros look strange in isolation, mainly in the way they take the
ddt_t* as an arg but don't use it. This is mostly a separate commit to
introduce the concept to the reader before the "flat phys" commit
extends it).

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15893
2024-08-16 12:02:02 -07:00
Rob Norris d63f5d7e50 zdb: rework DDT block count and leak check to just count the blocks
The upcoming dedup features break the long held assumption that all
blocks on disk with a 'D' dedup bit will always be present in the DDT,
or will have the same set of DVA allocations on disk as in the DDT.

If the DDT is no longer a complete picture of all the dedup blocks that
will be and should be on disk, then it does us no good to walk and prime
it up front, since it won't necessarily match up with every block we'll
see anyway.

Instead, we rework things here to be more like the BRT checks. When we
see a dedup'd block, we look it up in the DDT, consume a refcount, and
for the second-or-later instances, count them as duplicates.

The DDT and BRT are moved ahead of the space accounting. This will
become important for the "flat" feature, which may need to count a
modified version of the block.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15892
2024-08-16 12:01:41 -07:00
Rob Norris 2b131d7345 ZTS: tests for dedup legacy/FDT tables
Very basic coverage to make sure things appear to work, have the right
format on disk, and pool upgrades and mixed table types work as
expected.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15892
2024-08-16 12:00:58 -07:00
Rob Norris db2b1fdb79 ddt: add FDT feature and support for legacy and new on-disk formats
This is the supporting infrastructure for the upcoming dedup features.

Traditionally, dedup objects live directly in the MOS root. While their
details vary (checksum, type and class), they are all the same "kind" of
thing - a store of dedup entries.

The new features are more varied than that, and are better thought of as
a set of related stores for the overall state of a dedup table.

This adds a new feature flag, SPA_FEATURE_FAST_DEDUP. Enabling this will
cause new DDTs to be created as a ZAP in the MOS root, named
DDT-<checksum>. The is used as the root object for the normal type/class
store objects, but will also be a place for any storage required by new
features.

This commit adds two new fields to ddt_t, for version and flags. These
are intended to describe the structure and features of the overall dedup
table, and are stored as-is in the DDT root. In this commit, flags are
always zero, but the intent is that they can be used to hang optional
logic or state onto for new dedup features. Version is always 1.

For a "legacy" dedup table, where no DDT root directory exists, the
version will be 0.

ddt_configure() is expected to determine the version and flags features
currently in operation based on whether or not the fast_dedup feature is
enabled, and from what's available on disk. In this way, its possible to
support both old and new tables.

This also provides a migration path. A legacy setup can be upgraded to
FDT by creating the DDT root ZAP, moving the existing objects into it,
and setting version and flags appropriately. There's no support for that
here, but it would be straightforward to add later and allows the
possibility that newer features could be applied to existing dedup
tables.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15892
2024-08-16 11:58:59 -07:00
Ameer Hamza bdf4d6be1d linux/zvol_os: fix zvol queue limits initialization
zvol queue limits initialization depends on `zv_volblocksize`, but it is
initialized later, leading to several limits being initialized with
incorrect values, including `max_discard_*` limits. This also causes
`blkdiscard` command to consistently fail, as `blk_ioctl_discard` reads
`bdev_max_discard_sectors()` limits as 0, leading to failure. The fix is
straightforward: initialize `zv->zv_volblocksize` early, before setting
the queue limits. This PR should fix `zvol/zvol_misc/zvol_misc_trim`
failure on recent PRs, as the test case issues `blkdiscard` for a zvol.
Additionally, `zvol_misc_trim` was recently enabled in `6c7d41a`,
which is why the issue wasn't identified earlier.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16454
2024-08-15 14:29:50 -07:00
Justin Gottula 5807de90a1 Fix null ptr deref when renaming a zvol with snaps and snapdev=visible (#16316)
If a zvol is renamed, and it has one or more snapshots, and
snapdev=visible is true for the zvol, then the rename causes a kernel
null pointer dereference error. This has the effect (on Linux, anyway)
of killing the z_zvol taskq kthread, with locks still held; which in
turn causes a variety of zvol-related operations afterward to hang
indefinitely (such as udev workers, among other things).

The problem occurs because of an oversight in #15486
(e36ff84c33). As documented in
dataset_kstats_create, some datasets may not actually have kstats
allocated for them; and at least at the present time, this is true for
snapshots. In practical terms, this means that for snapshots,
dk->dk_kstats will be NULL. The dataset_kstats_rename function
introduced in the patch above does not first check whether dk->dk_kstats
is NULL before proceeding, unlike e.g. the nearby
dataset_kstats_update_* functions.

In the very particular circumstance in which a zvol is renamed, AND that
zvol has one or more snapshots, AND that zvol also has snapdev=visible,
zvol_rename_minors_impl will loop over not just the zvol dataset itself,
but each of the zvol's snapshots as well, so that their device nodes
will be renamed as well. This results in dataset_kstats_create being
called for snapshots, where, as we've established, dk->dk_kstats is
NULL.

Fix this by simply adding a NULL check before doing anything in
dataset_kstats_rename.

This still allows the dataset_name kstat value for the zvol to be
updated (as was the intent of the original patch), and merely blocks
attempts by the code to act upon the zvol's non-kstat-having snapshots.
If at some future time, kstats are added for snapshots, then things
should work as intended in that case as well.

Signed-off-by: Justin Gottula <justin@jgottula.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-08-15 14:13:18 -07:00
Tony Hutter fb432660c3 Linux 6.10 compat: Fix zvol NULL pointer deference
zvol_alloc_non_blk_mq()->blk_queue_set_write_cache() needs the disk
queue setup to prevent a NULL pointer deference.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16453
2024-08-15 14:05:58 -07:00
Tony Hutter f2f4ada240 Linux 6.10 compat: fix rpm-kmod and builtin
The 6.10 kernel broke our rpm-kmod builds.  The 6.10 kernel really
wants the source files in the same directory as the object files.
This workaround makes rpm-kmod work again.  It also updates
the builtin kernel codepath to work correctly with 6.10.

See kernel commits:

b1992c3772e6 kbuild: use $(src) instead of $(srctree)/$(src) for source
                     directory
9a0ebe5011f4 kbuild: use $(obj)/ instead of $(src)/ for common pattern
                     rules

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16439
Closes #16450
2024-08-15 14:00:18 -07:00
Ameer Hamza 963e6c9f3f Fix incorrect error report on vdev attach/replace
Report the correct error message in libzfs when attaching/replacing a
vdev with a higher ashift.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16449
2024-08-15 12:39:44 -07:00
Gleb Smirnoff 83f359245a FreeBSD: fix build without kernel option MAC
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Gleb Smirnoff <glebius@FreeBSD.org>
Closes #16446
2024-08-15 09:08:43 -07:00
Jitendra Patidar d2ccc21552 Fix projid accounting for xattr objects
zpool upgraded with 'feature@project_quota' needs re-layout of SA's
to fix the SA_ZPL_PROJID at SA_PROJID_OFFSET (128). Its necessary for
the correct accounting of object usage against its projid.
Old object (created before upgrade) when gets a projid assigned, its
SA gets re-layout via sa_add_projid(). If object has xattr dir, SA
of xattr dir also gets re-layout. But SA re-layout of xattr objects
inside a xattr dir is not done.

Fix zfs_setattr_dir() to re-layout SA's on xattr objects, when setting
projid on old xattr object (created before upgrade).

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #16355
Closes #16356
2024-08-14 17:59:19 -07:00
Paul Dagnelie 244ea5c488 Add missing kstats to dataset kstats
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #16431
2024-08-14 14:18:46 -07:00
Tony Hutter d06de4f007 ZTS: Use /dev/urandom instead of /dev/random
Use /dev/urandom so we never have to wait on entropy.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16442
2024-08-14 12:27:07 -07:00
Rob Norris 2633075e09 Linux 6.11: avoid passing "end" sentinel to register_sysctl()
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-13 17:47:22 -07:00
Rob Norris 3abffc8781 Linux 6.11: add compat macro for page_mapping()
Since the change to folios it has just been a wrapper anyway. Linux has
removed their wrapper, so we add one.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-13 17:47:18 -07:00
Rob Norris f5236fe47a Linux 6.11: add more queue_limit fields with removed setters
These fields are very old, so no detection necessary; we just move them
into the limit setup functions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-13 17:47:13 -07:00
Rob Norris 0b741a0351 Linux 6.11: IO stats is now a queue feature flag
Apply them with with the rest of the settings.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-13 17:47:08 -07:00
Rob Norris 22619523f6 Linux 6.11: first arg to proc_handler is now const
Detect it, and use a macro to make sure we always match the prototype.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-13 17:47:01 -07:00
Rob Norris 7e98d30f46 Linux 6.11: get backing_dev_info through queue gendisk
It's no longer available directly on the request queue, but its easy to
get from the attached disk.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-13 17:46:49 -07:00
Rob Norris e95b732e49 Linux 6.11: enable queue flush through queue limits
In 6.11 struct queue_limits gains a 'features' field, where, among other
things, flush and write-cache are enabled. Detect it and use it.

Along the way, the blk_queue_set_write_cache() compat wrapper gets a
little cleanup. Since both flags are alway set together, its now a
single bool. Also the very very ancient version that sets q->flush_flags
directly couldn't actually turn it off, so I've fixed that. Not that we
use it, but still.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-13 17:46:41 -07:00
Rob Norris 767b37019f linux/zvol_os: tidy and document queue limit/config setup
It gets hairier again in Linux 6.11, so I want some actual theory of
operation laid out for next time.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-13 17:45:12 -07:00
Ameer Hamza 9c56b8ec78 Github workflow: fix typo in zloop artifact
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16432
2024-08-09 16:49:19 -07:00
Rob Norris 6caff8447d config: don't force shared linkage on FreeBSD
-shared was hardcoded, so when building with --disable-shared it amounts
to trying to do shared linkage against static libs, which naturally
fails.

The fix is straightforward; just don't hardcode it. libtool will work
out what to do.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16427
2024-08-09 14:34:04 -07:00
Alan Somers ed0db1cc8b Make txg_wait_synced conditional in zfsvfs_teardown, for FreeBSD
This applies the same change in #9115 to FreeBSD.  This was actually the
old behavior in FreeBSD 12; it only regressed when FreeBSD support was
added to OpenZFS.  As far as I can tell, the timeline went like this:

* Illumos's zfsvfs_teardown used an unconditional txg_wait_synced
* Illumos added the dirty data check [^4]
* FreeBSD merged in Illumos's conditional check [^3]
* OpenZFS forked from Illumos
* OpenZFS removed the dirty data check in #7795 [^5]
* @mattmacy forked the OpenZFS repo and began to add FreeBSD support
* OpenZFS PR #9115[^1] recreated the same dirty data check that Illumos
  used, in slightly different form.  At this point the OpenZFS repo did
  not yet have multi-OS support.
* Matt Macy merged in FreeBSD support in #8987[^2] , but it was based on
  slightly outdated OpenZFS code.

In my local testing, this vastly improves the reboot speed of a server
with a large pool that has 1000 datasets and is resilvering an HDD.

[^1]: https://github.com/openzfs/zfs/pull/9115
[^2]: https://github.com/openzfs/zfs/pull/8987
[^3]: https://github.com/freebsd/freebsd-src/commit/10b9d77bf1ccf2f3affafa6261692cb92cf7e992
[^4]: https://github.com/illumos/illumos-gate/commit/5aaeed5c617553c4cec6328c1f4c19079a5a495a
[^5]: https://github.com/openzfs/zfs/pull/7795

Sponsored by: Axcient
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #16268
2024-08-09 14:32:59 -07:00
Rob Norris cf6e8b218d zstream: remove duplicate highbit64 definition
When building a static build (--disable-shared), zstream fails to link
because of the duplicate highbit64() in libzpool/kernel.c. Since they're
identical, and the libzpool one is visible to zstream, we remove
zstream's copy and just use the common one.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16426
2024-08-09 14:31:41 -07:00
Rob Norris b0bf14cdb5 abd: lift ABD zero scan from zio_compress_data() to abd_cmp_zero()
It's now the caller's responsibility do special handling for holes if
that's something it wants.

This also makes zio_compress_data() and zio_decompress_data() properly
the inverse of each other.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Jason Lee <jasonlee@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16326
2024-08-09 14:30:26 -07:00
Brian Atkinson f87fe67b44 Updating bash completion build file
Commit 46ebd0a updated the build system to make symbolic link for zpool.
However, this commit did not update the automake file to also add the
symbolic link to the CLEANFILES variable. This is necessary so the link
is removed when running make clean/distclean.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16422
2024-08-08 15:39:25 -07:00
Rob Norris 8041b2f019 contrib: bash_completion.d: force zpool symlink recreation
ln will fail if the target already exists, which causes make to bail
out. Adding -f makes it more "compiler-like", overwriting the target
instead.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16423
2024-08-08 15:36:09 -07:00
Alexander Motin 3ae05e34e5 Linux: Make zfs_prune() fair on NUMA systems
Previous code evicted nr_to_scan items from each NUMA node.  This
not only multiplied the eviction by the number of nodes, but could
exhaust the smaller ones, evicting inodes used by acive workload
and requiring their immediate recreation.  This patch spreads the
requested eviction between all NUMA nodes proportionally to their
evictable counts, which should be closer to expected LRU logic.
See kernel's super_cache_scan() as a similar logic example.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16397
2024-08-08 15:33:36 -07:00
Alexander Motin 5b9f3b7664 Soften pruning threshold on not evictable metadata
Previous code pruned 10% of dnodes once 3/4 of metadata appeared
unevictable.  On workloads with many millions of dnodes and little
other metadata it creates significant load spikes for many seconds
straight.  This change instead gradually increases pruning as
unevictable metadata grow above the 3/4, which may allow it to
stabilize at some level.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16401
2024-08-08 15:26:35 -07:00
Alexander Motin aef452f108 Improve zfs_blkptr_verify()
- Skip config lock enter/exit for embedded blocks.  They have no
DVAs, so there is nothing to check under the lock.
 - Skip CHECKSUM check and properly check PSIZE for embedded blocks.
 - Add static branch predictions for unlikely conditions.
 - Do not verify DVAs for blocks already in ARC.  ARC hit already
"verified" the first (often the only) DVA, and it does not worth to
enter/exit config lock for nothing.

Some profiles show me up to 3% of CPU saving from this change.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16387
2024-08-08 15:25:10 -07:00
Mateusz Piotrowski 24e6585e76 libzfs.h: Set ZFS_MAXPROPLEN and ZPOOL_MAXPROPLEN to ZAP_MAXVALUELEN
So far, the values of ZFS_MAXPROPLEN and ZPOOL_MAXPROPLEN were equal to
MAXPATHLEN, which is 1024 on FreeBSD and 4096 on Linux. This wasn't
ideal. Some of the surprising outcomes of this implementation are:

1. When creating a pool user property with zpool-set(8), libzfs makes
   sure that the length of the property's value is less than
   ZFS_MAXPROPLEN. However, the ZFS kernel module does not do that.
   Instead, it checks the length against ZAP_MAXVALUELEN. As a result,
   it is possible to create a property the length of which is going to
   be larger than zpool(8) is ready to read.
2. A pool user property created on Linux is too big to be read on
   FreeBSD.

This change sets both ZFS_MAXPROPLEN and ZPOOL_MAXPROPLEN to
ZAP_MAXVALUELEN, which is 8192 at the moment.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16248
2024-08-08 15:23:40 -07:00
Mateusz Piotrowski 2558518c5d zpoolprops.7: Fix max length of name of user property
The documentation mentioned that the property name can be 256 characters
long. This was incorrect. The last byte is reserved for NUL, so the
name provided by the operator can be only 255 characters long.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16248
2024-08-08 15:23:33 -07:00
Mateusz Piotrowski 7ceb9ad630 tests: user_property_001_pos: Remove unnecessary evals
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16248
2024-08-08 15:23:15 -07:00
Mateusz Piotrowski b38fccc646 tests: user_property: Clarify comments
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16248
2024-08-08 15:21:52 -07:00
Ameer Hamza 5536c0dee2 Sync AUX label during pool import
Spare and l2cache vdev labels are not updated during import. Therefore,
if disk paths are updated between pool export and import, the AUX label
still shows the old paths. This patch syncs the AUX label
during import to show the correct path information.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15817
2024-08-08 15:16:46 -07:00
Mark Johnston 0ccd4b9d01 ZTS: Add a test to verify that copy_file_range obeys RLIMIT_FSIZE
Signed-off-by: Mark Johnston <markj@FreeBSD.org>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-08-08 10:44:13 -07:00
Mark Johnston dea8fabf73 FreeBSD: Fix RLIMIT_FSIZE handling for block cloning
ZFS implements copy_file_range(2) using block cloning when possible.
This implementation must respect the RLIMIT_FSIZE limit.

zfs_clone_range() already checks the limit, so it is safe to remove this
check in zfs_freebsd_copy_file_range().  Moreover, the removed check
produces false positives: the length passed to copy_file_range(2) may be
larger than the input file size; as the man page notes, "for best
performance, call copy_file_range() with the largest len value
possible."  In particular, some existing code passes SSIZE_MAX there.

The check in zfs_clone_range() clamps the length to the input file's
size before checking, but the removed check uses the caller supplied
length, so something like

$ echo a > /tmp/foo
$ limits -f 1024 cat /tmp/foo > /tmp/bar

fails because FreeBSD's cat(1) uses copy_file_range(2) in the manner
described above.

Reported-by: Philip Paeps <philip@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-08-08 10:43:02 -07:00
Alan Somers 1f5bf91a85 Fix memory corruption during parallel zpool import with -o cachefile (#16419)
When importing multiple pools, the nvlist of properties given with "-o"
is shared amongst the several threads.  So no thread should modify it.
Previously, in the course of validating the cachefile property, the
zpool_valid_proplist function would temporarily modify the value, and
then change it back.  Now it will operate on a clone of the value.

Sponsored by:   Axcient
Fixes #16405
Signed-off-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2024-08-07 13:44:55 -07:00
Tino Reichardt bd949b10be ZTS: small fix for SEEK_DATA/SEEK_HOLE tests (#16413)
Some libc's like uClibc lag the proper definition of SEEK_DATA
and SEEK_HOLE. Since we have only two files in ZTS which use
these definitons, let's define them by hand:

```
#ifndef SEEK_DATA
#define SEEK_DATA 3
#endif
#ifndef SEEK_HOLE
#define SEEK_HOLE 4
#endif
```

There should be no failures, because:
- FreeBSD has support for SEEK_DATA/SEEK_HOLE since FreeBSD 8
- Linux has it since Linux 3.1
- the libc will submit the parameters unchanged to the kernel

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-08-07 09:52:37 -07:00
Allan Jude cbcb522439 Fix the names of some FreeBSD sysctls in include/tunables.cfg (#16395)
Sponsored-by: Klara, Inc.

Signed-off-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2024-08-06 16:36:55 -07:00
Tino Reichardt 90a02d7063 ZTS: fix io_uring test on RHEL 9 variants (#16411)
Simplify the test, by using the variable "$PLATFORM_ID" in favor
of "$REDHAT_SUPPORT_PRODUCT_VERSION".

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2024-08-06 16:30:11 -07:00
Tony Hutter 02a9f7fed7 JSON: Fix class values for mirrored special vdevs
This fixes things so mirrored special vdevs report themselves as
"class=special" rather than "class=normal".

This happens due to the way the vdev nvlists are constructed:

mirrored special devices - The 'mirror' vdev has allocation bias as
"special" and it's leaf vdevs are "normal"

single or RAID0 special devices - Leaf vdevs have allocation bias as
"special".

This commit adds in code to check if a leaf's parent is a "special"
vdev to see if it should also report "special".

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16217
2024-08-06 12:47:58 -07:00
Tony Hutter dab810014e ZTS: Add zfs/zpool JSON sanity tests
Run basic JSON validation tests on the new `zfs|zpool -j` output.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16217
2024-08-06 12:47:15 -07:00
Umer Saleem 959e963c81 JSON output support for zpool status
This commit adds support for zpool status command to displpay status
of ZFS pools in JSON format using '-j' option. Status information is
collected in nvlist which is later dumped on stdout in JSON format.
Existing options for zpool status work with '-j' flag. man page for
zpool status is updated accordingly.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16217
2024-08-06 12:47:10 -07:00
Umer Saleem 4e6b3f7e1d JSON output support for zpool list
This commit adds support for zpool list command to output the list of
ZFS pools in JSON format using '-j' option.. Information about available
pools is collected in nvlist which is later printed to stdout in JSON
format.

Existing options for zfs list command work with '-j' flag. man page for
zpool list is updated accordingly.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16217
2024-08-06 12:47:04 -07:00
Umer Saleem eb2b824bde JSON output support for zpool get
This commit adds support for zpool get command to output the list of
properties for ZFS Pools and VDEVS in JSON format using '-j' option.
Man page for zpool get is updated to include '-j' option.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16217
2024-08-06 12:47:00 -07:00
Umer Saleem 5cbdd5ea4f JSON output support for zpool version
This commit adds support for zpool version to output in JSON format
using '-j' option. Userland kernel module version is collected in nvlist
which  is later displayed in JSON format. man page for zpool is updated.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16217
2024-08-06 12:46:51 -07:00
Umer Saleem cad4c0ef1a JSON output support zfs mount
This commit adds support for zfs mount to display mounted file systems
in JSON format using '-j' option. Data is collected in nvlist which is
printed in JSON format. man page for zfs mount is updated accordingly.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16217
2024-08-06 12:46:40 -07:00
Umer Saleem 443abfc71d JSON output support for zfs list
This commit adds support for JSON output for zfs list using '-j' option.
Information is collected in JSON format which is later printed in jSON
format. Existing options for zfs list also work with '-j'. man pages are
updated with relevant information.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16217
2024-08-06 12:46:34 -07:00
Umer Saleem aa15b60e58 JSON output support for zfs version and zfs get
This commit adds support for JSON output for zfs version and zfs get
commands. '-j' flag can be used to get output in JSON format.

Information is collected in nvlist objects which is later printed in
JSON format. Existing options that work for zfs get and zfs version
also work with '-j' flag.

man pages for zfs get and zfs version are updated accordingly.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16217
2024-08-06 12:45:45 -07:00
Rob Norris 6c7d41a643 ZTS: remove skips for zvol_misc tests
Last commit should fix the underlying problem, so these should be
passing reliably again.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16364
2024-08-06 12:08:24 -07:00
Rob Norris 670147be53 zvol: ensure device minors are properly cleaned up
Currently, if a minor is in use when we try to remove it, we'll skip it
and never come back to it again. Since the zvol state is hung off the
minor in the kernel, this can get us into weird situations if something
tries to use it after the removal fails. It's even worse at pool export,
as there's now a vestigial zvol state with no pool under it. It's
weirder again if the pool is subsequently reimported, as the zvol code
(reasonably) assumes the zvol state has been properly setup, when it's
actually left over from the previous import of the pool.

This commit attempts to tackle that by setting a flag on the zvol if its
minor can't be removed, and then checking that flag when a request is
made and rejecting it, thus stopping new work coming in.

The flag also causes a condvar to be signaled when the last client
finishes. For the case where a single minor is being removed (eg
changing volmode), it will wait for this signal before proceeding.
Meanwhile, when removing all minors, a background task is created for
each minor that couldn't be removed on the spot, and those tasks then
wake and clean up.

Since any new tasks are queued on to the pool's spa_zvol_taskq,
spa_export_common() will continue to wait at export until all minors are
removed.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #14872
Closes #16364
2024-08-06 12:08:14 -07:00
Rob Norris 88aab1d2d0 linux/zvol_os: fix SET_ERROR with negative return codes
SET_ERROR is our facility for tracking errors internally. The negation
is to match the what the kernel expects from us. Thus, the negation
should happen outside of the SET_ERROR.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16364
2024-08-06 12:07:31 -07:00
Rob Norris 9b9a3934ad zvol_impl: document and tidy flags
ZVOL_DUMPIFIED is a vestigial Solaris leftover, and not used anywhere.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16364
2024-08-06 12:06:46 -07:00
Rob Norris 6c82951d11 FreeBSD: remove support for FreeBSD < 13.0-RELEASE (#16372)
This includes the last 12.x release (now EOL) and 13.0 development
versions (<1300139).

Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-08-05 16:56:45 -07:00
Tino Reichardt e9f51ebd94 ZTS: fix zfs_copies_006_pos test on Ubuntu 20.04 (#16409)
This test was failing before:
- FAIL cli_root/zfs_copies/zfs_copies_006_pos (expected PASS)

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2024-08-05 16:18:07 -07:00
Tino Reichardt 8d4ad5adc7 ZTS: fix history_007_pos test on Ubuntu 24.04 (#16410)
The timezone "US/Mountain" isn't supported on newer linux versions.
Using the correct timezone "America/Denver" like it's done in FreeBSD
will fix this. Older Linux distros should behave also okay with this.

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2024-08-05 16:17:23 -07:00
Shengqi Chen 46ebd0af8a contrib: link zpool to zfs in bash-completion (#16376)
Currently user won't have completion of `zpool` command until they
trigger completion of `zfs` first. This patch adds a link to `zfs`,
thus user can use both to initialize the completion.

Fixes: #16320

Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2024-08-05 09:44:10 -07:00
Alexander Motin 1fdcb653bc Once more refactor arc_summary output
Before this arc_summary was not reporting any information about
evictable ARC memory.  As result I've found difficult to analyze
behavior of dnode-heavy workload with lots of unevictable buffers.

This change adds evictable sizes into states breakdown section.
While there, add/refactor sections for global memory statistics,
for ARC breakdown between different structures, for data/metadata.
Add information about memory reclamation requests.

While there, refactor and polish graph mode, neglected for a while.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-08-05 09:22:51 -07:00
Alexander Motin cdd53fea1e FreeBSD: Add missing memory reclamation accounting
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-08-05 09:21:29 -07:00
Brian Atkinson c8184d714b Block cloning conditionally destroy ARC buffer
dmu_buf_will_clone() calls arc_buf_destroy() if there is an associated
ARC buffer with the dbuf. However, this can only be done conditionally.
If the previous dirty record's dr_data is pointed at db_dbf then
destroying it can lead to NULL pointer deference when syncing out the
previous dirty record.

This updates dmu_buf_fill_clone() to only call arc_buf_destroy() if the
previous dirty records dr_data is not pointing to db_buf. The block
clone wil still set the dbuf's db_buf and db_data to NULL, but this will
not cause any issues as any previous dirty record dr_data will still be
pointing at the ARC buffer.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16337
2024-08-01 18:22:43 -07:00
Tino Reichardt c092bddfe7 Fix sa.c to build on FreeBSD again. (#16403)
Fix multiple build errors on FreeBSD.

The main reason is, that the variable 'dxattr_obj' is used
uninitialized within the start of the 'out label'.

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2024-08-01 13:04:08 -07:00
Jitendra Patidar d60debbf59 Fix sa_add_projid to lookup and update SA_ZPL_DXATTR (avoid DXATTR loss) (#16288)
sa_add_projid() gets called via zfs_setattr() for setting project id
on old file/dir, which were created before upgrading to project quota
feature. This function does lookup for all possible SA and update them
all together along with project ID at needed fixed offset. But its
missing lookup and update of SA_ZPL_DXATTR, effectively it losses
SA_ZPL_DXATTR.

Closes #16287
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
2024-07-31 18:41:49 -07:00
Chunwei Chen c21dc56ea3 Fix zdb_dump_block for little endian (#16310)
The endian macros were changed but zdb_dump_block wasn't updated
accordingly.

Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Allan Jude <allan@klarasystems.com>
2024-07-31 18:33:39 -07:00
c1ick ec580bc520 zfs: add bounds checking to zil_parse (#16308)
Make sure log record don't stray beyond valid memory region.

There is a lack of verification of the space occupied by fixed members
of lr_t in the zil_parse.

We can create a crafted image to trigger an out of bounds read by
following these steps:
    1) Do some file operations and reboot to simulate abnormal exit
       without umount
    2) zil_chain.zc_nused: 0x1000
    3) First lr_t
       lr_t.lrc_txtype: 0x0
       lr_t.lrc_reclen: 0x1000-0xb8-0x1
       lr_t.lrc_txg: 0x0
       lr_t.lrc_seq: 0x1
    4) Update checksum in zil_chain.zc_eck

Fix:
Add some checks to make sure the remaining bytes are large enough to
hold an log record.

Signed-off-by: XDTG <click1799@163.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-31 17:17:04 -07:00
Alexander Motin d4b5517ef9 Linux: Report reclaimable memory to kernel as such (#16385)
Linux provides SLAB_RECLAIM_ACCOUNT and __GFP_RECLAIMABLE flags to
mark memory allocations that can be freed via shinker calls.  It
should allow kernel to tune and group such allocations for lower
memory fragmentation and better reclamation under pressure.

This patch marks as reclaimable most of ARC memory, directly
evictable via ZFS shrinker, plus also dnode/znode/sa memory,
indirectly evictable via kernel's superblock shrinker.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
2024-07-30 11:40:47 -07:00
Rob Norris d54d0fff39 dnode: allow storage class to be overridden by object type
spa_preferred_class() selects a storage class based on (among other
things) the DMU object type. This only works for old-style object types
that match only one specific kind of thing. For DMU_OTN_ types we need
another way to signal the storage class.

This commit allows the object type to be overridden in the IO policy for
the purposes of choosing a storage class. It then adds the ability to
set the storage type on a dnode hold, such that all writes generated
under that hold will get it.

This method has two shortcomings:

- it would be better if we could "name" a set of storage class
  preferences rather than it being implied by the object type.
- it would be better if this info were stored in the dnode on disk.

In the absence of those things, this seems like the smallest possible
change.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15894
2024-07-29 17:05:41 -07:00
Rob Norris e26b3771ee spa_preferred_class: pass the entire zio
Rather than picking out specific values out of the properties, just pass
the entire zio in, to make it easier in the future to use more of that
info to decide on the storage class.

I would have rathered just pass io_prop in, but having spa.h include
zio.h gets a bit tricky.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15894
2024-07-29 17:05:08 -07:00
Alexander Motin ed87d456e4 Skip dnode handles use when not needed
Neither FreeBSD nor Linux currently implement kmem_cache_set_move(),
which means dnode_move() is never called.  In such situation use of
dnode handles with respective locking to access dnode from dbuf is
a waste of time for no benefit.

This patch implements optional simplified code for such platforms,
saving at least 3 dnode lock/dereference/unlock per dbuf life cycle.
Originally I hoped to drop the handles completely to save memory,
but they are still used in dnodes allocation code, so left for now.

Before this change in CPU profiles of some workloads I saw 4-20% of
CPU time spent in zrl_add_impl()/zrl_remove(), which are gone now.

Reviewed-by: Rob Wing <rob.wing@klarasystems.com
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #16374
2024-07-29 14:48:12 -07:00
Alexander Motin 1a3e32e6a2 Cleanup DB_DNODE() macros usage
- Use the macros in few places it was missed.
 - Reduce scope of DB_DNODE_ENTER/EXIT() and inline some DB_DNODE()
uses to make it more obvious what exactly is protected there and
make unprotected accesses by mistake more difficult.
 - Make use of zrl_owner().

Reviewed-by: Rob Wing <rob.wing@klarasystems.com
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16374
2024-07-29 14:47:01 -07:00
Allan Jude 62e7d3c89e ddt: add support for prefetching tables into the ARC
This change adds a new `zpool prefetch -t ddt $pool` command which
causes a pool's DDT to be loaded into the ARC. The primary goal is to
remove the need to "warm" a pool's cache before deduplication stops
slowing write performance. It may also provide a way to reload portions
of a DDT if they have been flushed due to inactivity.

Sponsored-by: iXsystems, Inc.
Sponsored-by: Catalogics, Inc.
Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Will Andrews <will.andrews@klarasystems.com>
Signed-off-by: Fred Weigel <fred.weigel@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Will Andrews <will.andrews@klarasystems.com>
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Closes #15890
2024-07-26 09:16:18 -07:00
Jitendra Patidar 2ed1aebaf6 Fix ZDB to dump projid for projectquota enabled (#16291)
ZDB is supposed to dump "projid" via dump_znode(), when projectquota
is enabled.
-----------
static void
dump_znode(objset_t *os, uint64_t object, void *data, size_t size)
{
...
    if (dmu_objset_projectquota_enabled(os) && (pflags & ZFS_PROJID)) {
	uint64_t projid;

	if (sa_lookup(hdl, sa_attr_table[ZPL_PROJID], &projid,
	    sizeof (uint64_t)) == 0)
		(void) printf("\tprojid %llu\n", (u_longlong_t)projid);
    }
...
}
----------
But its not dumping "projid", even for project quota enabled.

dmu_objset_projectquota_enabled() does following 3 checks,
----------
boolean_t
dmu_objset_projectquota_enabled(objset_t *os)
{
        return (file_cbs[os->os_phys->os_type] != NULL &&
            DMU_PROJECTUSED_DNODE(os) != NULL &&
            spa_feature_is_enabled(os->os_spa,
		SPA_FEATURE_PROJECT_QUOTA));
}
----------
It fails on file_cbs[] check. file_cbs[] gets initialised via
dmu_objset_register_type(); which is not done for the ZDB, its done for
the kernel via zfs_init().

Register a dummy callback handle for the DMU_OST_ZFS type in
ZDB main() function to dump the projid for projectquota enabled.

Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #16290
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2024-07-25 17:18:11 -07:00
Rob Norris 7ddc1f737f zil: add stats for commit failure/fallback (#16315)
There's no good way to tell when a ZIL commit fails and falls back to a
transaction sync, other than perhaps a throughput drop. This adds
counters so we can see when it happens and why.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-25 16:53:59 -07:00
Alexander Motin 2fc646160f Replace goo.gl style link (#16373)
That URL shortening scheme should stop working soon [1], while we
don't really need it here.

1. https://developers.googleblog.com/en/google-url-shortener-links-will-no-longer-be-available/

Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-25 11:00:32 -07:00
Alexander Motin 55427add3c Several improvements to ARC shrinking (#16197)
- When receiving memory pressure signal from OS be more strict
trying to free some memory.  Otherwise kernel may come again and
request much more.  Return as result how much arc_c was actually
reduced due to this request, that may be less than requested.
 - On Linux when receiving direct reclaim from some file system
(that may be ZFS) instead of ignoring request completely, just
shrink the ARC, but do not wait for eviction.  Waiting there may
cause deadlock.  Ignoring it as before may put extra pressure on
other caches and/or swap, and cause OOM if nothing help.  While
not waiting may result in more ARC evicted later, and may be too
late if OOM killer activate right now, but I hope it to be better
than doing nothing at all.
 - On Linux set arc_no_grow before waiting for reclaim, not after,
or it may grow back while we are waiting.
 - On Linux add new parameter zfs_arc_shrinker_seeks to balance
ARC eviction cost, relative to page cache and other subsystems.
 - Slightly update Linux arc_set_sys_free() math for new kernels.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-25 10:31:14 -07:00
Allan Jude c7ada64bb6 ddt: dedup table quota enforcement
This adds two new pool properties:
- dedup_table_size, the total size of all DDTs on the pool; and
- dedup_table_quota, the maximum possible size of all DDTs in the pool

When set, quota will be enforced by checking when a new entry is about
to be created. If the pool is over its dedup quota, the entry won't be
created, and the corresponding write will be converted to a regular
non-dedup write. Note that existing entries can be updated (ie their
refcounts changed), as that reuses the space rather than requiring more.

dedup_table_quota can be set to 'auto', which will set it based on the
size of the devices backing the "dedup" allocation device. This makes it
possible to limit the DDTs to the size of a dedup vdev only, such that
when the device fills, no new blocks are deduplicated.

Sponsored-by: iXsystems, Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Sean Eric Fagan <sean.fagan@klarasystems.com>
Closes #15889
2024-07-25 09:47:36 -07:00
Alexander Motin 82f281ad99 ZTS: Make do_vol_test() more deterministic (#16379)
- Explicitly disable compression since mkfile uses a zero buffer.
 - Explicitly sync file systems instead of waiting for timeout.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-24 09:33:30 -07:00
Tony Hutter a1be921673 Linux 6.9: Fix UBSAN errors in sa.c (#16380)
This is a follow-on to 156a64161b
that ignores UBSAN errors in sa.c.

Thank you @thwalker3 for the fix.

Original-patch-by: @thwalker3
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16278
Closes #16330
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-23 17:13:04 -07:00
rmacklem dbe07928ba Add support for multiple lines to the sharenfs property for FreeBSD (#16338)
There has been a bugzilla PR#147881 requesting this
for a long time (14 years!). It extends the syntax of
the ZFS shanenfs property (for FreeBSD only) to allow
multiple sets of options for different hosts/nets,
separated by ';'s.

Signed-off-by:	Rick Macklem <rmacklem@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-23 16:38:19 -07:00
Don Brady fb6d8cf229 Add some missing vdev properties (#16346)
Sponsored-by: Klara, Inc.
Sponsored-By: Wasabi Technology, Inc.

Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-23 16:34:09 -07:00
Rob Norris 6657f89eca AUTHORS: refresh with recent new contributors (#16362)
Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2024-07-23 11:47:04 -07:00
Chunwei Chen 9dfc5c4a0c Fix long_free_dirty accounting for small files (#16264)
For files smaller than recordsize, it's most likely that they don't have
L1 blocks. However, current calculation will always return at least 1 L1
block.

In this change, we check dnode level to figure out if it has L1 blocks
or not, and return 0 if it doesn't. This will reduce the chance of
unnecessary throttling when deleting a large number of small files.

Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-23 11:34:19 -07:00
Tino Reichardt 37275fd109 ZTS: Change cp_stress to fit timings (#16369)
cp_stress is getting killed on the new QEMU-based github runners
we're developing. The problem is that the Linux based runners
should do 10 RUNS, where the FreeBSD based runners only have 3
RUNS to succeed.

This patch removes this different handling of Linux and FreeBSD.
The cp_stress test is running fine in around 2 minutes now.

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-22 14:03:22 -07:00
Rob Norris aea42e1379 zdb: fix BRT dump (#16335)
BRT refcounts are stored as eight uint8_ts rather than a single
uint64_t. This means that za_first_integer is only the first byte, so
max 256. This fixes it by doing a lookup for the whole value.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
2024-07-18 10:51:27 -07:00
glibg10b 1147a27978 Fix printf typo for zfs receive -cv (#16295)
Current output:
> receiving  correctivefull stream of a into b
New output:
> receiving corrective full stream of a into b

Signed-off-by: glibg10b <56197853+glibg10b@users.noreply.github.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-17 17:18:12 -07:00
youzhongyang ab6d9bd89a Make sure avl_tree.avl_pad is not in kernel module (#16280)
The commit b192a2c (Remove avl_size field from struct avl_tree) uses a
def _KERNEL to decide to include avl_pad or not, but this _KERNEL is
defined in sys/sysmacros.h. If avl.h and sysmacros.h are not included
in the right order, it can cause a headache when working on a zfs
related kernel module.

Add sysmacros.h in avl_impl.h to fix. sysmacros.h is also removed
from spa.h as it's reduntant.

Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Co-authored-by: Youzhong Yang <yyang@mathworks.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-17 13:54:11 -07:00
Rob Norris dc91e74524 zdb: dump ZAP_FLAG_UINT64_KEY ZAPs properly (#16334)
These are used for DDT and BRT stores. There's limited information
available to produce meaningful output, but at least we can put
something on screen rather than crashing.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-17 12:02:28 -07:00
Rob Norris 5de3ac2236 vdev_open: clear async fault flag after reopen
After c3f2f1aa2, vdev_fault_wanted is set on a vdev after a probe fails.
An end-of-txg async task is charged with actually faulting the vdev.

In a single-disk pool, the probe failure will degrade the last disk, and
then suspend the pool. However, vdev_fault_wanted is not cleared. After
the pool returns, the transaction finishes and the async task runs and
faults the vdev, which suspends the pool again.

The fix is simple: when reopening a vdev, clear the async fault flag. If
the vdev is still failed, the startup probe will quickly notice and
degrade/suspend it again. If not, all is well!

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
2024-07-17 10:03:41 -07:00
Rob Norris 393b7ad695 zts: test single-disk pool resumes properly after disk pull
A single disk pool should suspend when its disk fails and hold the IO.
When the disk is returned, the pool should return and the IO be
reissued, leaving everything in good shape.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
2024-07-17 10:03:32 -07:00
Jason Lee 41902c8e6d Use kmap_local_page instead of kmap_atomic (#16329)
Changed zfs_k(un)map_atomic to zfs_k(un)map_local

Signed-off-by: Jason Lee <jasonlee@lanl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
2024-07-16 17:27:29 -07:00
Tony Hutter f2ebbe46f6 Linux 6.9 compat: META (#16358)
Update the META file to reflect compatibility with the 6.9
kernel.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-16 16:27:54 -07:00
Rob Norris 7ca7bb7fd7 Linux 5.16: use bdev_nr_bytes() to get device capacity
This helper was introduced long ago, in 5.16. Since 6.10, bd_inode no
longer exists, but the helper has been updated, so detect it and use it
in all versions where it is available.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-15 17:10:06 -07:00
Rob Norris e951dba48a Linux 6.10: work harder to avoid kmem_cache_alloc reuse
Linux 6.10 change kmem_cache_alloc to be a macro, rather than a
function, such that the old #undef for it in spl-kmem-cache.c would
remove its definition completely, breaking the build.

This inverts the model used before. Rather than always defining the
kmem_cache_* macro, then undefining then inside spl-kmem-cache.c,
instead we make a special tag to indicate we're currently inside
spl-kmem-cache.c, and not defining those in macros in the first place,
so we can use the kernel-supplied kmem_cache_* functions to implement
spl_kmem_cache_*, as we expect.

For all other callers, we create the macros as normal and remove access
to the kernel's own conflicting names.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-15 17:10:02 -07:00
Rob Norris b409892ae5 Linux 6.10: rework queue limits setup
Linux has started moving to a model where instead of applying block
queue limits through individual modification functions, a complete
limits structure is built up and applied atomically, either when the
block device or open, or some time afterwards. As of 6.10 this
transition appears only partly completed.

This commit matches that model within OpenZFS in a way that should work
for past and future kernels. We set up a queue limits structure with any
limits that have had their modification functions removed. For newer
kernels that can have limits applied at block device open
(HAVE_BLK_ALLOC_DISK_2ARG), we have a conversion function to turn the
OpenZFS queue limits structure into Linux's queue_limits structure,
which can then be passed in. For older kernels, we provide an
application function that just calls the old functions for each limit in
the structure.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-15 17:09:55 -07:00
Zhao Yongming 4ee66cdf4e Add building support for Artix Linux (#16265)
Artix Linux is systemd free distribution based on Arch Linux, with
openrc dinit runit s6 as init alternatives. This patch will make
init scripts installation work the way Gentoo Linux with openrc.

The scripts tweaking for other init will be left to packager.

Signed-off-by: Yongming Zhao <ming.zym@gmail.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-15 16:58:00 -07:00
Mateusz Guzik a7fc4c85e3 zstd: don't call zstd_mempool_reap if there are no buffers (#16302)
zfs_zstd_cache_reap_now is issued every second.

zstd_mempool_reap checks for both pool existence and buffer count, but
that's still 2 func calls which are trivially avoidable.

With clang it even avoids pushing the stack pointer (but still suffers
the mispredict due to a forward jump, not modified in case someone is
using zstd):

<+0>:     cmpq   $0x0,0x0(%rip)        # <zfs_zstd_cache_reap_now+8>
<+8>:     je     0x217de4 <zfs_zstd_cache_reap_now+36>
<+10>:    push   %rbp
<+11>:    mov    %rsp,%rbp
<+14>:    mov    0x0(%rip),%rdi        # <zfs_zstd_cache_reap_now+21>
<+21>:    call   0x217df0 <zstd_mempool_reap>
<+26>:    mov    0x0(%rip),%rdi        # <zfs_zstd_cache_reap_now+33>
<+33>:    pop    %rbp
<+34>:    jmp    0x217df0 <zstd_mempool_reap>
<+36>:    ret

Preferably the call would not be made to begin with if zstd is not used,
but this retains all the logic confined to zstd code.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-15 14:51:37 -07:00
George Amanakis c87cb22ba9 head_errlog: fix use-after-free
In the commit of the head_errlog feature we introduced a bug in
dsl_dataset_promote_sync(): we may dereference origin_head and hds, both
dereferencing ddpa after calling promote_sync() on ddpa.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #16272
Closes #16273
2024-07-15 09:05:42 -07:00
Daniel Berlin f7d8b13336 Fix missing semicolon in trace_dbuf.h (#16281)
On fedora 40, on the 6.9.4 kernel (in updates-testing), assign_str
expands to a "do {<stuff> } while(0)" loop.  Without this semicolon,
the while(0) is unterminated, causing a cascade of useless errors.
With this semicolon, it compiles fine.  It also compiles fine on 6.8.11
(the previous kernel).  I have not tested earlier kernels than that, but
at worst it should add a pointless semicolon.

All other instances in the source tree are already terminated with
semicolons.

Signed-off-by: Daniel Berlin <dberlin@dberlin.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-12 17:44:10 -07:00
a1ea321 398e675f58 one-word manpage correction: snapshot->rollback (#16294)
This commit fixes what is probably a copy-paste mistake. The
`dracut.zfs` manpage claims that the `bootfs.rollback` option executes
`zfs snapshot -Rf`. `zfs snapshot` does not have a `-R` option. `zfs
rollback` does.

Signed-off-by: Alphan Yılmaz <alphanyilmaz@gmail.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-12 16:27:12 -07:00
Rob Norris cbd95a950a ZTS: handle FreeBSD version numbers correctly (#16340)
FreeBSD patchlevel versions are optional and, if present, in a different
location in the version string.

Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-12 10:58:03 -07:00
Mark Johnston a10faf5ce6 FreeBSD: Use the new freeuio() helper to free dynamically allocated UIOs (#16300)
This freeuio() interface was introduced to FreeBSD recently.  For now
it simply calls free(), so this change has no effect.  However, this
may not always be true, and in CheriBSD this change is required.

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brooks Davis <brooks.davis@sri.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-11 16:52:51 -07:00
Tony Hutter 156a64161b Linux 6.9: Fix UBSAN errors in zap_micro.c
You can use the UBSAN_SANITIZE_* Kbuild options to exclude certain
kernel objects from the UBSAN checks.  We previously excluded
zap_micro.o with:

UBSAN_SANITIZE_zap_micro.o := n

For some reason that didn't work for the 6.9 kernel, which wants us
to use:

UBSAN_SANITIZE_zfs/zap_micro.o := n

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16278
Closes #16330
2024-07-11 16:41:26 -07:00
Mark Johnston 4367312760 zvol: Fix suspend lock leaks (#16270)
In several functions, we use a flag variable to track whether
zv_suspend_lock is held.  This flag was not getting reset in a
particular case where we need to retry the underlying operation,
resulting in a lock leak.  Make sure to update the flag where necessary.

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-10 14:27:44 -07:00
Peter Doherty 326040b285 Fix the name of the zfs_prefetch_disable parameter (#16319)
The ZFS module parameter name is zfs_prefetch_disable, not
zfs_disable_prefetch.

Signed-off-by: Peter Doherty <peterd@acranox.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-09 09:59:55 -07:00
Tino Reichardt 9ffe441361 Fix zdb "Memory fault" found on FreeBSD ZTS (#16332)
Reason: nvlist_free() tries to free sth. which isn't allocted
Solution: init this variable with NULL

Closes #16311
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-09 09:36:17 -07:00
Mark Johnston f72e081fbf FreeBSD: Use a statement expression to implement SET_ERROR() (#16284)
This way we can avoid making assumptions about the SDT probe
implementation.  No functional change intended.

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-08 17:59:08 -07:00
Mateusz Piotrowski fd51786f86 zfs.4: Document the actual default for zfs_txg_history (#16305)
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-06-28 11:21:08 -07:00
Allan Jude 5f220c62e1 Fix a mis-merge in the zdb man page (#16304)
Sponsored-by: Klara, Inc.
Sponsored-By: Wasabi Technology, Inc.

Signed-off-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-06-28 10:38:22 -07:00
Tony Hutter 49f3ce3385 Linux 6.9: Call add_disk() from workqueue to fix zfs_allow_010_pos (#16282)
The 6.9 kernel behaves differently in how it releases block devices.  In
the common case it will async release the device only after the return
to userspace.  This is different from the 6.8 and older kernels which
release the block devices synchronously.  To get around this, call
add_disk() from a workqueue so that the kernel uses a different
codepath to release our zvols in the way we expect.  This stops
zfs_allow_010_pos from hanging.

Fixes: #16089

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
2024-06-28 09:52:03 -07:00
Martin Wagner c98295eed2 disable automatic dependency tracking for dkms builds
Previously the dkms build left some unwanted files
in `/usr/lib/modules` which could cause package
managers to not properly clean up old kernels.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Wagner <martin.wagner.dev@gmail.com>
Closes #16221 
Closes #16241
2024-06-13 18:08:49 -07:00
Mateusz Guzik 121a2d3354 FreeBSD: unregister mountroot eventhandler on unload
Otherwise if zfs is unloaded and reroot is being used it trips over a
stale pointer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #16242
2024-06-13 17:49:50 -07:00
bnovkov 20c8bdd85e FreeBSD: Update use of UMA-related symbols in arc_available_memory
Recent UMA changes repurposed the use of UMA_MD_SMALL_ALLOC in a way
that breaks arc_available_memory on -CURRENT. This change
ensures that arc_available_memory uses the new symbol
while maintaining compatibility with older FreeBSD releases.
    
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Bojan Novković <bnovkov@FreeBSD.org>
Closes #16230
2024-06-06 18:11:00 -07:00
Derek Schrock 4de260efe3 contrib/bash_completion.d: squelch FreeBSD seq when first < last
With seq x -1 z and x is less than z FreeBSD seq will print the error:

	$ seq 1 -1 2
	seq: needs positive increment

Hide this error.  Alternatively $COMP_CWORD could be checked for < 2.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Derek Schrock <dereks@lifeofadishwasher.com>
Closes #16234
2024-06-06 17:37:26 -07:00
Ameer Hamza b558f0a9d6 zdb: fix FreeBSD build failure
This fixes FreeBSD build failure with clang-18 after 23a489a got merged.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16252
2024-06-06 17:01:26 -07:00
Ameer Hamza 23a489a411 zdb: detect cachefile automatically otherwise force import
If a pool is created with the cache file located in a non-default 
path /etc/default/zpool.cache, removed, or the cachefile property 
is set to none, zdb fails to show the pool unless we specify the 
cache file or use the -e option. This PR automates this process.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16071
2024-06-03 16:28:43 -07:00
Rob Norris a72751a342 icp: remove redundant FreeBSD check
We don't build illumos-crypto for FreeBSD.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:13:59 -07:00
Rob Norris 4e714c0be1 icp: remove unused headers
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:13:51 -07:00
Rob Norris ae512620d0 icp: remove skein module
Nothing calls it through the KCF interface, so this is all unused.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:13:39 -07:00
Rob Norris f39241aeb3 icp: remove unused SHA2 HMAC mechanisms
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:13:30 -07:00
Rob Norris 10de12e9ed icp: reorganise SHA2 digest mechanisms
sha2_mech_type_t serves double-duty, as the list of MAC providers and
also the algo type for direct callers to SHA2Init. Until we disentangle
that, reorganise it to make the separation more clear. While we're
there, remove the digest mechs we don't use.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:13:23 -07:00
Rob Norris 1291c46ea4 icp: remove digest entry points
For whatever reason, we call digest mechanisms directly, not through the
KCF digest provider. So we can remove those entry points entirely.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:13:16 -07:00
Rob Norris 94f1e56e41 icp: remove unused KCF_ macros
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:13:06 -07:00
Rob Norris 4ed91dc26e icp: remove unusued incremental cipher methods
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:12:59 -07:00
Rob Norris 57249bcddc icp: brutally remove unused AES modes
Still retaining the struture, for now.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:12:51 -07:00
Rob Norris 4185179190 icp: remove unused blowfish_ctx and des_ctx
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:12:31 -07:00
Tony Hutter a301dc364c ZTS: Fix redacted_send failures on FreeBSD
We're seeing failures for redacted_deleted and redacted_mount
on FreeBSD 13-15:

    09:58:34.74 diff: /dev/fd/3: No such file or directory
    09:58:34.74 ERROR: diff /dev/fd/3 /dev/fd/4 exited 2

The test was trying to diff the file listings between two directories to
see if they are the same.  The workaround is to do a string comparison
of the directory listings instead of using `diff`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16224
2024-05-31 15:11:00 -07:00
Zhenlei Huang e2357561b9 FreeBSD: Add const qualifier to members of struct opensolaris_utsname
These members have directly references to the global variables
exposed by the kernel. They are not going to be changed by this
kernel module.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Zhenlei Huang <zlei@FreeBSD.org>
Closes #16210
2024-05-30 09:58:20 -07:00
Pawel Jakub Dawidek 5137c132a5 zpool import output is not formated properly.
The 'zpool status' output assumes that the longest prefix is six
character long plus colon plus space, eg. 'status: ', 'action: '
or 'config: ' (so eight in total). This works well even when we have
messages that requires more than one line, as '\t' is exactly eight
characters, just like the longest prefix.

The 'zpool import' output is a bit different, as it may display the
comment pool property, then the longest prefix is 'comment: ', which is
nine characters long, not eight.
All the prefixes were given an extra space in front, but:
- 'status: ' did not get an extra space.
- Messages that require more than one line should use nine spaces of
  indentation, not eight.
- The extra space in front looks redundant if there is no comment
  property set on the given pool.

Fix it by adding an extra space to all prefixes, but only if the comment
property is defined. Also, when we need to continue the message in a new
line, use '\t ' for indentation.

While here, apply small corrections to a couple messages.

Before:

   pool: tank
     id: 7412636063178848859
  state: ONLINE
status: Some supported features are not enabled on the pool.
	(Note that they may be intentionally disabled if the
	'compatibility' property is set.)
 action: The pool can be imported using its name or numeric identif[...]
	some features will not be available without an explicit 'zp[...]
comment: Example comment.
 config:

	bclone      ONLINE
	  ada0      ONLINE

After:

  pool: tank
    id: 10180960571062436759
 state: ONLINE
status: Some supported features are not enabled on the pool.
	(Note that they may be intentionally disabled if the
	'compatibility' property is set.)
action: The pool can be imported using its name or numeric identifi[...]
	some features will not be available without an explicit 'zp[...]
config:

	tank        ONLINE
	  ada3      ONLINE

   pool: dozer
     id: 11028319538368222579
  state: ONLINE
 status: Some supported features are not enabled on the pool.
	 (Note that they may be intentionally disabled if the
	 'compatibility' property is set.)
 action: The pool can be imported using its name or numeric identif[...]
	 some features will not be available without an explicit 'z[...]
comment: Example comment.
 config:

	dozer       ONLINE
	  ada1      ONLINE

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Dawidek <pawel@dawidek.net>
Closes #16128
2024-05-29 13:34:59 -07:00
Martin Matuška ae22044da9 spl: fix compilation without HAVE_BACKTRACE
The __maybe_unused macro is defined in spl/sys/debug.h

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16229
2024-05-29 10:51:01 -07:00
Pawel Jakub Dawidek 01c8efdd59 Simplify issig().
We always call it twice with JUSTLOOKING and then FORREAL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #16225
2024-05-29 10:49:11 -07:00
Brian Behlendorf 6b95031f56 zed: Add deadman-slot_off.sh zedlet
Optionally turn off disk's enclosure slot if an I/O is hung
triggering the deadman.

It's possible for outstanding I/O to a misbehaving SCSI disk to
neither promptly complete or return an error.  This can occur due
to retry and recovery actions taken by the SCSI layer, driver, or
disk.  When it occurs the pool will be unresponsive even though
there may be sufficient redundancy configured to proceeded without
this single disk.

When a hung I/O is detected by the kmods it will be posted as a
deadman event.  By default an I/O is considered to be hung after
5 minutes.  This value can be changed with the zfs_deadman_ziotime_ms
module parameter.  If ZED_POWER_OFF_ENCLOSURE_SLOT_ON_DEADMAN is set
the disk's enclosure slot will be powered off causing the outstanding
I/O to fail.  The ZED will then handle this like a normal disk failure.
By default ZED_POWER_OFF_ENCLOSURE_SLOT_ON_DEADMAN is not set.

As part of this change `zfs_deadman_events_per_second` is added
to control the ratelimitting of deadman events independantly of
delay events.  In practice, a single deadman event is sufficient
and more aren't particularly useful.

Alphabetize the zfs_deadman_* entries in zfs.4.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16226
2024-05-29 10:46:41 -07:00
Alexander Motin 800d59d577 Some improvements to metaslabs eviction
- Add old eviction for special and dedup metaslab classes. Those
vdevs may be potentially big and fragmented with large metaslabs,
while their asynchronous write pattern is not really different
from normal class. It seems an omission to not evict old metaslabs
from them.
 - If we have metaslab preload enabled, which means we are not too
low on memory, do not evict active metaslabs even if they are not
used for some time.  Eviction of active metaslabs means we won't
be able to write anything until we load them, that may take some
time, that is straight opposite to metaslab preload goals.  For
small systems the memory saving should be less important after
recent reduction in number of allocators and so open metaslabs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16214
2024-05-29 08:53:31 -07:00
Alexander Motin 02c5aa9b09 Destroy ARC buffer in case of fill error
In case of error dmu_buf_fill_done() returns the buffer back into
DB_UNCACHED state.  Since during transition from DB_UNCACHED into
DB_FILL state dbuf_noread() allocates an ARC buffer, we must free
it here, otherwise it will be leaked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15665
Closes #15802
Closes #16216
2024-05-24 19:11:18 -07:00
George Amanakis 8865dfbcaa Fix assertion in Persistent L2ARC
At the end of l2arc_evict() fix an assertion in the case that l2ad_hand
+ distance == l2ad_end.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #16202
Closes #16207
2024-05-24 19:02:58 -07:00
Rob N d0aa9dbccf Use memset to zero stack allocations containing unions
C99 6.7.8.17 says that when an undesignated initialiser is used, only
the first element of a union is initialised. If the first element is not
the largest within the union, how the remaining space is initialised is
up to the compiler.

GCC extends the initialiser to the entire union, while Clang treats the
remainder as padding, and so initialises according to whatever
automatic/implicit initialisation rules are currently active.

When Linux is compiled with CONFIG_INIT_STACK_ALL_PATTERN,
-ftrivial-auto-var-init=pattern is added to the kernel CFLAGS. This flag
sets the policy for automatic/implicit initialisation of variables on
the stack.

Taken together, this means that when compiling under
CONFIG_INIT_STACK_ALL_PATTERN on Clang, the "zero" initialiser will only
zero the first element in a union, and the rest will be filled with a
pattern. This is significant for aes_ctx_t, which in
aes_encrypt_atomic() and aes_decrypt_atomic() is initialised to zero,
but then used as a gcm_ctx_t, which is the fifth element in the union,
and thus gets pattern initialisation. Later, it's assumed to be zero,
resulting in a hang.

As confusing and undiscoverable as it is, by the spec, we are at fault
when we initialise a structure containing a union with the zero
initializer. As such, this commit replaces these uses with an explicit
memset(0).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16135
Closes #16206
2024-05-24 19:00:29 -07:00
Rob N 34906f8bbe zap: reuse zap_leaf_t on dbuf reuse after shrink
If a shrink or truncate had recently freed a portion of the ZAP, the
dbuf could still be sitting on the dbuf cache waiting for eviction. If
it is then allocated for a new leaf before it can be evicted, the
zap_leaf_t is still attached as userdata, tripping the VERIFY.

Instead, just check for the userdata, and if we find it, reuse it.

Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16157.
Closes #16204
2024-05-24 18:55:47 -07:00
Rob N 708be0f415 Linux 6.7 compat: detect if kernel defines intptr_t
Since Linux 6.7 the kernel has defined intptr_t. Clang has
-Wtypedef-redefinition by default, which causes the build to fail
because we also have a typedef for intptr_t.

Since its better to use the kernel's if it exists, detect it and skip
our own.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16201
2024-05-24 18:54:24 -07:00
Brooks Davis 7572e8ca04 Avoid a gcc -Wint-to-pointer-cast warning
On 32-bit platforms long long is generally 64-bits.  Sufficiently modern
versions of gcc (13 in my testing) complains when casting a pointer to
an integer of a different width so cast to uintptr_t first to avoid the
warning.

Fixes: c183d164aa Parallel pool import

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #16203
2024-05-24 18:45:58 -07:00
Pawel Jakub Dawidek 08648cf0da Allow block cloning to be interrupted by a signal.
Even though block cloning is much faster than regular copying,
it is not instantaneous - the file might be large and the recordsize
small. It would be nice to be able to interrupt it with a signal
(e.g., SIGINFO on FreeBSD to see the progress).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #16208
2024-05-24 18:45:09 -07:00
Alexander Motin efbef9e6cc FreeBSD: Add zfs_link_create() error handling
Originally Solaris didn't expect errors there, but they may happen
if we fail to add entry into ZAP.  Linux fixed it in #7421, but it
was never fully ported to FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13215
Closes #16138
2024-05-16 17:56:55 -07:00
omni fec16b93c4 config/zfs-build.m4: add Alpine Linux bash-completion path
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: omni <omni+vagant@hack.org>
Closes #16164
2024-05-16 17:55:09 -07:00
omni d0d7c0d8f9 config/zfs-build.m4: sort vendors
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: omni <omni+vagant@hack.org>
Closes #16164
2024-05-16 17:54:55 -07:00
Rich Ercolani a043b60f1e Correct level handling in zstream recompress.
sscanf returns number of items parsed on success and EOF on failure.

Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #16198
2024-05-16 15:37:50 -07:00
Rob N e675852bc1 dbuf: separate refcount calls for dbuf and dbuf_user
In 92dc4ad83 I updated the dbuf_cache accounting to track the size of
userdata associated with dbufs. This adds the size of the dbuf+userdata
together in a single call to zfs_refcount_add_many(), but sometime
removes them in separate calls to zfs_refcount_remove_many(), if dbuf
and userdata are evicted separately.

What I didn't realise is that when refcount tracking is on,
zfs_refcount_add_many() and zfs_refcount_remove_many() are expected to
be paired, with their second & third args (count & holder) the same on
both sides. Splitting the remove part into two calls means the counts
don't match up, tripping a panic.

This commit fixes that, by always adding and removing the dbuf and
userdata counts separately.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reported-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16191
2024-05-15 13:03:41 -07:00
Rob Norris 3c941d1818 zdb/ztest: send dbgmsg output to stderr
And, make the output fd an arg to zfs_dbgmsg_print(). This is a change
in behaviour, but keeps it consistent with where crash traces go, and
it's easy to argue this is what we want anyway; this is information
about the task, not the actual output of the task.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16181
2024-05-14 09:49:00 -07:00
Rob Norris fa99d9cd9c zfs_dbgmsg_print: make FreeBSD and Linux consistent
FreeBSD was using fprintf(), which might not be signal-safe. Meanwhile,
Linux's locking did not cover the header output. This two quirks are
unrelated, but both have the same response: be like the other one. So
with this commit, both functions are the same except for the names of
their lock and list variables.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16181
2024-05-14 09:48:56 -07:00
Rob Norris 1ea8c59441 backtrace: rework for signal safety
Mostly, try a lot harder to not allocate anything.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16181
2024-05-14 09:48:51 -07:00
Rob Norris 3974ef045e libspl: lift backtrace into a separate file
If it's going to be used directly by zdb/ztest, then it sort of doesn't
make sense to carry it with the assert code.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16181
2024-05-14 09:48:45 -07:00
Rob Norris e7b451941b zdb/ztest: use libspl backtrace for crashes
We can show much nicer backtraces these days, lets use them.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16181
2024-05-14 09:48:39 -07:00
Rob Norris 91c46d4399 zdb: bring crash handling over from ztest
ztest has a very nice ability to show a backtrace when there's an
unexpected crash. zdb is used often enough on corrupted data and can
blow up too, so nice output is useful there too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16181
2024-05-14 09:48:23 -07:00
Rob Norris 0a543db371 spa_taskq_dispatch_ent: simplify arguments
This renames it to spa_taskq_dispatch(), and reduces and simplifies its
arguments based on these observations from its two call sites:

- arg is always the zio, so it can be typed that way, and we don't need
  to provide it twice;
- ent is always &zio->io_tqent, and zio is always provided, so we can
  use it directly;
- the only flag used is TQ_FRONT, which can just be a bool;
- zio != NULL was part of the "use allocator" test, but it never would
  have got that far, because that arg was only set to NULL in the
  reexecute path, which is forced to type CLAIM, so the condition would
  fail at t == WRITE anyway.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16151
2024-05-14 09:40:16 -07:00
Rob Norris 515c4dd213 spa: flatten spa_taskq_dispatch_ent()
It is the only user of spa_taskq_dispatch_select(), so might as well
just carry it directly.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16151
2024-05-14 09:40:09 -07:00
Rob Norris adda768e3e spa: remove spa_taskq_dispatch_sync()
It has no callers anymore.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16151
2024-05-14 09:40:02 -07:00
Rob Norris cc38691534 zfs_ioc_send: use a dedicated taskq thread for send
When stack space is tight, the stream is written to its target on a
separate taskq thread to make sure there's enough stack space to
complete it.

This has always used an IO taskq, but that doesn't really make sense for
it, and moving it onto a regular taskq lets us get rid of
spa_taskq_dispatch_sync(), which is not used anywhere else.

Stream writes may block for a long time depending on what the target is,
and we have no way of discovering this, so we can't risk using the
system taskq, as there may be many tens of sends in progress. Instead,
we create a dedicated taskq thread for each send writer to run on, and
clean it up when it's done.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16151
2024-05-14 09:39:26 -07:00
Alan Somers b64afa41d5 Better control the thread pool size when mounting datasets
Ever since a10d50f999, ZFS has mounted file systems in parallel when
importing a pool.  It uses a fixed size of 512 for the thread pool.  But
since c183d164aa, it has also imported pools in parallel.  So the total
number of threads at one time is 513 * npools + 1.  That can easily
exceed the system's limit on the number of threads per process, which
will cause one or more pools to be unable to allocate any worker
threads, forcing them to fallback to slow serial mounting .  To
forestall that, manage the threadpool size in /sbin/zpool, not libzfs.
Use the same size (512), but divided by the number of pools.

This is a backwards-incompatible change to the libzfs abi.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Alan Somers <asomers@FreeBSD.org>
Closes #16178
2024-05-14 09:36:21 -07:00
Alan Somers eced2e2f1e libzfs: Fix mounting datasets under thread limit pressure
During parallel zpool import, /sbin/zpool will create a separate thread
pool for each pool, used to mount that pool's datasets.  If the total
thread count exceed's the system's limit on threads per process, then
tpool_dispatch may fail.  If it does, directly execute the mount
operation instead.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Alan Somers <asomers@FreeBSD.org>
Closes #16178
Fixes #16172
2024-05-14 09:36:14 -07:00
Alan Somers f625d038d2 tpool_dispatch: fail if it cannot start at least 1 worker.
Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Alan Somers <asomers@FreeBSD.org>
Closes #16178
2024-05-14 09:35:48 -07:00
Don Brady 89acef992b Simplified the scope of the namespace lock
If we wait until after we check for no spa references to drop the
namespace lock, then we know that spa consumers will need to call
spa_lookup() and end up waiting on the spa_namespace_cv until we
finish.  This narrows the external checks to spa_lookup and we no
longer need to worry about the spa_vdev_enter case.

Sponsored-By: Klara Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16153
2024-05-14 08:58:15 -07:00
Don Brady 975a13259b Add support for parallel pool exports
Changed spa_export_common() such that it no longer holds the
spa_namespace_lock for the entire duration and instead sets
spa_export_thread to indicate an import is in progress on the
spa.  This allows for an export to a diffent pool to proceed
in parallel while an export is still processing potentially
long operations like spa_unload_log_sm_flush_all().

Calls like spa_lookup() and spa_vdev_enter() that rely on
the spa_namespace_lock to serialize them against a concurrent
export, now wait for any in-progress export thread to complete
before proceeding.

The 'zpool import -a' sub-command also provides multi-threaded
support, using a thread pool to submit the exports in parallel.

Sponsored-By: Klara Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16153
2024-05-14 08:57:41 -07:00
Brian Behlendorf abec7dcd30 Linux: disable lockdep for a couple of locks
When running a debug kernel with lockdep enabled there
are several locks which report false positives.  Set
MUTEX_NOLOCKDEP/RW_NOLOCKDEP to disable these warnings.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16188
2024-05-13 15:12:07 -07:00
Alexander Motin 136c053211 ZAP: Fix leaf references on zap_expand_leaf() errors
Depending on kind of error zap_expand_leaf() may return with or
without valid leaf reference held.  Make sure it returns NULL if
due to error it has no leaf to return.  Make its callers to check
the returned leaf pointer, and release the leaf if it is not NULL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #12366 
Closes #16159
2024-05-10 12:35:20 -07:00
chenqiuhao1997 41ae864b69 Replace P2ALIGN with P2ALIGN_TYPED and delete P2ALIGN.
In P2ALIGN, the result would be incorrect when align is unsigned
integer and x is larger than max value of the type of align.
In that case, -(align) would be a positive integer, which means
high bits would be zero and finally stay zero after '&' when
align is converted to a larger integer type.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Qiuhao Chen <chenqiuhao1997@gmail.com>
Closes #15940
2024-05-10 08:47:21 -07:00
Rob N 1ede0c716b libspl_assert: always link -lpthread on FreeBSD
The pthread_* functions are in -lpthread on FreeBSD. Some of them are
implicitly linked through libc, but on FreeBSD 13 at least
pthread_getname_np() is not. Just be explicit, since -lpthread is the
documented interface anyway.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16168
2024-05-09 07:43:48 -07:00
Martin Matuška 414acbd37e Unbreak FreeBSD cross-build on MacOS broken in 051460b8b
MacOS used FreeBSD-compatible getprogname() and pthread_getname_np().
But pthread_getthreadid_np() does not exist on MacOS. This implements
libspl_gettid() using pthread_threadid_np() to get the thread id
of the current thread.

Tested with FreeBSD GitHub actions
freebsd-src/.github/workflows/cross-bootstrap-tools.yml

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16167
2024-05-09 07:42:51 -07:00
Alexander Motin 3400127a75 Fix ZIL clone records for legacy holes
Previous code overengineered cloned range calculation by using
BP_GET_LSIZE(). The problem is that legacy holes don't have the
logical size, so result will be wrong.  But we also don't need
to look on every block size, since they all must be identical.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16165
2024-05-09 07:39:57 -07:00
Alexander Motin af5dbed319 Fix scn_queue races on very old pools
Code for pools before version 11 uses dmu_objset_find_dp() to scan
for children datasets/clones.  It calls enqueue_clones_cb() and
enqueue_cb() callbacks in parallel from multiple taskq threads.
It ends up bad for scan_ds_queue_insert(), corrupting scn_queue
AVL-tree.  Fix it by introducing a mutex to protect those two
scan_ds_queue_insert() calls.  All other calls are done from the
sync thread and so serialized.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16162
2024-05-09 07:32:59 -07:00
Ameer Hamza a0f3c8aaf1 zdb: add missing cleanup for early return
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16152
2024-05-09 07:31:57 -07:00
Daniel Perry 2dff7527d4 Replace usage of schedule_timeout with schedule_timeout_interruptible (#16150)
This commit replaces current usages of schedule_timeout() with
schedule_timeout_interruptible() in code paths that expect the running
task to sleep for a short period of time. When schedule_timeout() is
called without previously calling set_current_state(), the running
task never sleeps because the task state remains in TASK_RUNNING.

By calling schedule_timeout_interruptible() to set the task state to
TASK_INTERRUPTIBLE before calling schedule_timeout() we achieve the
intended/desired behavior of putting the task to sleep for the
specified timeout.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Daniel Perry <dtperry@amazon.com>
Closes #16150
2024-05-09 07:30:28 -07:00
Alexander Motin 04bae5ec95 Disable high priority ZIO threads on FreeBSD and Linux
High priority threads are handling ZIL writes.  While there is no
ZIL compression, there is encryption, checksuming and RAIDZ math.
We've found that on large systems 1 taskq with 5 threads can be
a bottleneck for throughput, IOPS or both. Instead of just bumping
number of threads with a risk of overloading CPUs and increasing
latency, switch to using TQ_FRONT mechanism to increase sync write
requests priority within standard write threads.  Do not do it on
Illumos, since its TQ_FRONT implementation is inherently unfair.
FreeBSD and Linux don't have this problem, so we can do it there.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #16146
2024-05-03 09:53:34 -07:00
Rob N 8f1b7a6fa6 vdev_disk: disable flushes if device does not support it
If the underlying device doesn't have a write-back cache, the kernel
will just return a successful response. This doesn't hurt anything, but
it's extra work on the IO taskqs that are unnecessary. So, detect this
when we open the device for the first time.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16148
2024-05-02 15:18:35 -07:00
Alexander Motin 645b833079 Improve write issue taskqs utilization
- Reduce number of allocators on small system down to one per 4
CPU cores, keeping maximum at 4 on 16+ core systems. Small systems
should not have the lock contention multiple allocators supposed
to solve, while having several metaslabs open and modified each
TXG is not free.
 - Reduce number of write issue taskqs down to one per 16 CPU
cores and an integer fraction of number of allocators.  On mid-
sized systems, where multiple allocators already make sense, too
many write issue taskqs may reduce write speed on single-file
workloads, since single file is handled by only one taskq to
reduce fragmentation. On large systems, that can actually benefit
from many taskq's better IOPS, the bottleneck is less important,
since in worst case there will be at least 16 cores to handle it.
 - Distribute dnodes between allocators (and taskqs) in a round-
robin fashion instead of relying on sync taskqs to be balanced.
The last is not guarantied and may depend on scheduling.
 - Remove io_wr_iss_tq from struct zio.  io_allocator is enough.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16130
2024-05-01 11:07:20 -07:00
Alexander Motin 8fd3a5d02f Slightly improve dnode hash
As I understand just for being less predictable dnode hash includes
8 bits of objset pointer, starting at 6.  But since objset_t is
more than 1KB in size, its allocations are likely aligned to 2KB,
that means 11 lower bits provide no entropy. Just take the 8 bits
starting from 11.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16131
2024-05-01 10:59:32 -07:00
Rob Norris 051460b8b2 libspl/assert: use libunwind for backtrace when available
libunwind seems to do a better job of resolving a symbols than
backtrace(), and is also useful on platforms that don't have backtrace()
(eg musl). If it's available, use it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
2024-05-01 10:52:05 -07:00
Rob Norris 2152c405ba libspl/assert: dump backtrace in assert
Adds a check for the backtrace() function. If available, uses it to show
a stack backtrace in the assertion output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
2024-05-01 10:52:00 -07:00
Rob Norris dec697ad68 libspl/assert: add lock around assertion output
If multiple threads trip an assertion at the same moment (quite common),
they can be printing at the same time, and their output gets messy.

This adds a simple lock around the whole thing, to prevent a second task
printing assert output before the first has finished.

Additionally, if libspl_assert_ok is not set, abort() is called without
dropping the lock, so that any other asserting tasks will be killed
before starting any output, rather than only getting part-way through.
This is a tradeoff; it's assumed that multiple threads asserting at the
same moment are likely the same fault in different instances of a
thread, and so there won't be any more useful information from the other
tasks anyway.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
2024-05-01 10:51:54 -07:00
Rob Norris 394800200e libspl/assert: show process/task details in assert output
Makes it much easier to see what thing complained.

Getting thread id, program name and thread name vary wildly between
Linux and FreeBSD, so those are set up in macros. pthread_getname_np()
did not appear in musl until very recently, but the same info has always
been available via prctl(PR_GET_NAME), so we use that instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
2024-05-01 10:51:49 -07:00
Rob Norris 4429ad9276 libzpool: set thread names
Arrange for the thread/task name to be set when new threads are created.
This makes them visible in the process table etc.

pthread_setname_np() is generally available in glibc, musl and FreeBSD,
so no test is required.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
2024-05-01 10:51:44 -07:00
Rob Norris 7ac00d3c26 find_system_library: fix var cleanup when library not found
The "not found" path is attempting to clear SOMELIB_CFLAGS and
SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to
AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`,
which is defined as "if the first arg is non-empty". The m4 "empty"
construction is [], therefore, the existing AC_SUBST calls never modify
the variables at all.

The effect of this is that leftovers from the library test can leak out.
At least, if a library header is found in the first stage, but the
library itself is not, -lsomelib is added to SOMELIB_LIBS and further
tests done. If that library is not found, SOMELIB_LIBS will not be
cleared.

For most of our library tests this hasn't been a problem, as they're
either always found properly via pkg-config or set directly, or the
calling test immediately aborts configure. For an optional dependency
however, an apparent "partial" result where the header is found but no
corresponding library causes link errors later.

I think a complete fix should probably not be setting SOMELIB_xxx until
the final result is known, but for now, adjusting the AC_SUBST calls to
explictly set the empty shell string (which is not "empty" to m4) at
least restores the intent.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
2024-05-01 10:51:14 -07:00
Rob N a6edc0adb2 zio: try to execute TYPE_NULL ZIOs on the current task
Many TYPE_NULL ZIOs are used to provide a sync point for child ZIOs, and
do not do any actual work themselves. However, they are still dispatched
to a dedicated, single-thread taskq, which leads to their execution
being entirely task switch and dequeue overhead for no actual reason.

This commit changes it so that when selecting a parent ZIO to execute,
if the parent is TYPE_NULL and has no done function (that is, no
additional work), it is executed on the same thread. This reduces task
switches and frees up CPU cores for other work.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16134
2024-04-29 15:57:32 -07:00
Don Brady c3f2f1aa2d vdev probe to slow disk can stall mmp write checker
Simplify vdev probes in the zio_vdev_io_done context to
avoid holding the spa config lock for a long duration.

Also allow zpool clear if no evidence of another host
is using the pool.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15839
2024-04-29 14:35:53 -07:00
Ameer Hamza b28461b7c6 Fix arcstats for FreeBSD after zfetch support
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16141
2024-04-29 13:28:50 -07:00
Rich Ercolani db499e68f9 Overflowing refreservation is bad
Someone came to me and pointed out that you could pretty
readily cause the refreservation calculation to exceed
2**64, given the 2**17 multiplier in it, and produce
refreservations wildly less than the actual volsize in cases where
it should have failed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15996
2024-04-29 11:32:49 -07:00
Tony Hutter 4840f023af GCC: Fixes for gcc 14 on Fedora 40
- Workaround dangling pointer in uu_list.c (#16124)
- Fix calloc() transposed arguments in zpool_vdev_os.c
- Make some temp variables unsigned to prevent triggering a
  '-Werror=alloc-size-larger-than' error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16124
Closes #16125
2024-04-29 11:31:50 -07:00
Alan Somers 21bc066ece Fix updating the zvol_htable when renaming a zvol
When renaming a zvol, insert it into zvol_htable using the new name, not
the old name.  Otherwise some operations won't work.  For example,
"zfs set volsize" while the zvol is open.

Sponsored by:	Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by:	Alan Somers <asomers@FreeBSD.org>
Closes #16127
Closes #16128
2024-04-25 14:24:52 -07:00
Brian Behlendorf 317b31eedb Python 3.12 deprecated python3-distutils
As for python-3.12 the distutils package has been deprecated.
The latest ax_python_devel.m4 macro from the autoconf archive
has been updated accordingly so let's pull in the new version.

We can also drop the changes made to our customized version
to continue if the development version is not installed since
this functionality has been included upstream.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16126
Closes #16129
2024-04-25 13:40:09 -07:00
Allan Jude 5044c4e3ff Fast Dedup: ZAP Shrinking
This allows ZAPs to shrink. When there are two empty sibling leafs,
one of them is collapsed and its storage space is reused.
This improved performance on directories that at one time contained
a large number of files, but many or all of those files have since
been deleted.

This also applies to all other types of ZAPs as well.

Sponsored-by: iXsystems, Inc.
Sponsored-by: Klara, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Alexander Stetsenko <alex.stetsenko@klarasystems.com>
Closes #15888
2024-04-24 14:51:21 -07:00
Alexander Motin 67d13998b3 Make more taskq parameters writable
There is no reason for these module parameters to be read-only.
Being modified they just apply on next pool import/creation, that
is useful for testing different values.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16118
2024-04-24 14:38:48 -07:00
Alexander Motin 1f940de072 L2ARC: Cleanup buffer re-compression
When compressed ARC is disabled, we may have to re-compress when
writing into L2ARC.  If doing so we can't fit it into the original
physical size, we should just fail immediately, since even if it
may still fit into allocation size, its checksum will never match.

While there, refactor the code similar to other compression places
without using abd_return_buf_copy().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16038
2024-04-23 09:06:00 -07:00
Todd 87d81d1d13 zfs-kmod: fix empty rpm requires/conflicts
Fix an error in zfs-kmod.spec that causes kmod-zfs packages not to
include the correct RPM requires/conflicts relationships.  With this
change applied, RPM correctly no longer allows kmod-zfs & zfs-dkms
packages to be installed together.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Todd Seidelmann <18294602+seidelma@users.noreply.github.com>
Closes #16121
2024-04-22 17:55:41 -07:00
Alexander Motin 4036b8d027 Refactor dbuf_read() for safer decryption
In dbuf_read_verify_dnode_crypt():
 - We don't need original dbuf locked there. Instead take a lock
on a dnode dbuf, that is actually manipulated.
 - Block decryption for a dnode dbuf if it is currently being
written.  ARC hash lock does not protect anonymous buffers, so
arc_untransform() is unsafe when used on buffers being written,
that may happen in case of encrypted dnode buffers, since they
are not copied by dbuf_dirty()/dbuf_hold_copy().

In dbuf_read():
 - If the buffer is in flight, recheck its compression/encryption
status after it is cached, since it may need arc_untransform().

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16104
2024-04-22 11:41:03 -07:00
Ryan c346068e5e zfs get: add '-t fs' and '-t vol' options
Make `zfs get` accept `fs` for `filesystem` and `vol` for `volume`.

Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan <errornointernet@envs.net>
Closes #16117
2024-04-22 10:59:31 -07:00
Brooks Davis 7e52795aad ztest: use ASSERT3P to compare pointers
With a sufficiently modern gcc (I saw this with gcc13), gcc complains
when casting pointers to an integer of a different type (even a larger
one).  On 32-bt ASSERT3U does this on 32-bit systems by casting a 32-bit
pointer to uint64_t so use ASSERT3P which uses uintptr_t.

Fixes: 5caeef02fa RAID-Z expansion feature

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #16115
2024-04-22 10:48:58 -07:00
Seth Troisi cdae59e153 ZTS: user_namespace_004.ksh avoid error in cleanup if unsupported
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Seth Troisi <sethtroisi@google.com>
Closes #16114
2024-04-22 10:47:44 -07:00
Seth Troisi 9b43d7ba85 Add newline to two zpool messages
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Seth Troisi <sethtroisi@google.com>
Closes #16113
2024-04-22 10:45:39 -07:00
George Wilson c183d164aa Parallel pool import
This commit allow spa_load() to drop the spa_namespace_lock so
that imports can happen concurrently. Prior to dropping the
spa_namespace_lock, the import logic will set the spa_load_thread
value to track the thread which is doing the import.

Consumers of spa_lookup() retain the same behavior by blocking
when either a thread is holding the spa_namespace_lock or the
spa_load_thread value is set. This will ensure that critical
concurrent operations cannot take place while a pool is being
imported.

The zpool command is also enhanced to provide multi-threaded support
when invoking zpool import -a.

Lastly, zinject provides a mechanism to insert artificial delays
when importing a pool and new zfs tests are added to verify parallel
import functionality.

Contributions-by: Don Brady <don.brady@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #16093
2024-04-22 09:42:38 -07:00
Rob N f4f156157d abd_iter_page: rework to handle multipage scatterlists
Previously, abd_iter_page() would assume that every scatterlist would
contain a single page (compound or no), because that's all we ever
create in abd_alloc_chunks(). However, scatterlists can contain multiple
pages of arbitrary provenance, and if we get one of those, we'd get all
the math wrong.

This reworks things to handle multiple pages in a scatterlist, by
properly finding the right page within it for the given offset, and
understanding better where the end of the page is and not crossing it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reported-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16108
2024-04-19 16:41:31 -07:00
Alexander Motin 9f83eec039 Handle FLUSH errors as "expected"
Before #16061 zio_vdev_io_done() was not used for FLUSH requests.
Addition of it triggers reprobe each TXG for vdevs not supporting
them.  Since those errors are often expected, they are normally
handled by individual vdev drivers and should be ignored here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16110
2024-04-19 16:18:54 -07:00
Rob Norris 26d49fec5f tests/quota: consistently clear quota property between tests
When run in isolation, quota_005_pos would fail in cleanup because it
would attempt restore the previous quota, which was 0, and so get an
error (because you can't set quota to '0', you have to use 'none').

It worked as part of the quota tag set because the previous tests did
not clean up their quota, so there was always a non-zero quota to return
to.

This adds a simple quota reset function, and has all quota tests run it
at cleanup. For the ones that weren't cleaning up, they now do, and for
quota_005_pos, which was trying to do the right thing, it now just
resets it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16097
2024-04-19 16:06:56 -07:00
Rob Norris f75574cbaa tests/quota_005_pos: use a long int for doubling the quota size
When run in isolation, quota_005_pos would see an empty ~300G dataset.
Doubling it's space overflows a int32, which meant it was trying to then
set the quota to a negative value, and would fail.

When run as part of the quota tests, the filesystem appears to have
stuff in it, and so a lower available space, which doesn't overflow, and
so succeeds.

The bare minimum fix seems to be to use a int64 for the available space,
so it can be comfortably doubled. Here it is.

(Also a typo fix and a tiny bit of cleanup).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16097
2024-04-19 16:06:37 -07:00
Ameer Hamza cd3e6b4f4c Add zfetch stats in arcstats
arc_summary also reports zfetch stats but it's inconvenient to monitor
contiguously incrementing numbers. Adding them in arcstats allows us to
observe streams more conveniently.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16094
2024-04-19 10:19:12 -07:00
Tino Reichardt 35bf258485 Fix: FreeBSD Arm64 does not build currently
The define LD_VERSION isn't defined on FreeBSD Arm64 when OpenZFS is
build with the default compiler: clang.
I used only gcc for testing - my fault.

Fast fix as suggested by @mmatuska

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16103
2024-04-19 10:15:38 -07:00
Tony Hutter 454c0b0e46 Linux 6.8 compat: META (#16099)
Update the META file to reflect compatibility with the 6.8 kernel.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
2024-04-17 09:29:21 -07:00
Rob N cf60db6ebe zts: add a debug option to get full test output
The test runner accumulates output from individual tests, then writes it
to the log at the end. If a test hangs or crashes the system half way
through, we get no insight into how it got to where it did.

This adds a -D option for "debug". When set, all test output is written
to stdout.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16096
2024-04-16 09:13:01 -07:00
Tino Reichardt 90ba19eb7b Do no use .cfi_negate_ra_state within the assembly on Arm64
Compiling openzfs on aarch64 with gcc-8 and gcc-9 is failing currently.
See issue #14965 for deeper context.

On platforms without pointer authentication, .cfi_negate_ra_state can be
defined to a no-op:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/aarch64-tdep.c#l1413

I have tested this on Arm64 FreeBSD 13.2 and AlmaLinux-8.

Reviewed-by: Andrew Turner <andrew.turner4@arm.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14965
Closes #15784
2024-04-15 13:56:10 -07:00
Andrew Turner c6da985e28 Add the BTI elf note to the AArch64 SHA2 assembly
On ELF platforms there is a note to specify when an application or
library supports BTI. When linking one of these the linker needs
all input object files to have the note. If not it will not include
it in the output file.

Normally the compiler would generate it, but for assembly files we
need to do it our selves.

Add the note to the aarch64 sha256 and sha512 assembly files.

Tested by building with BTI enabled and using the -zbti-report=error
flag to lld that makes it an error if the note is missing.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Turner <andrew.turner4@arm.com>
Closes #16086
2024-04-15 13:53:39 -07:00
Rob N 4725e543be zinject: "no-op" error injection
When injected, this causes the matching IO to appear to succeed, but the
actual work is never submitted to the physical device. This can be used
to simulate a write-back cache servicing a write, but the backing device
has failed and the cache cannot complete the operation in the
background.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16085
2024-04-15 13:52:20 -07:00
Rob N f22b110f60 zts: allow running a single test by name only
Specifying a single test is kind of a hassle, because the full relative
path under the test suite dir has to be included, but it's not always
clear what that path even is.

This change allows `-t` to take the name of a single test instead of a
full path. If the value has no `/` characters, we search for a file of
that name under the test root, and if found, use that as the full test
path instead.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16088
2024-04-15 13:44:12 -07:00
Rob N b181b2e604 bdev_discard_supported: understand discard_granularity=0
Kernel documentation for the discard_granularity property says:

    A discard_granularity of 0 means that the device does not support
    discard functionality.

Some older kernels had drivers (notably loop, but also some USB-SATA
adapters) that would set the QUEUE_FLAG_DISCARD capability flag, but
have discard_granularity=0. Since 5.10 (torvalds/linux@b35fd7422c) the
discard entry point blkdev_issue_discard() has had a check for this,
which would immediately reject the call with EOPNOTSUPP, and throw a
scary diagnostic message into the log. See #16068.

Since 6.8, the block layer sets a non-zero default for
discard_granularity (torvalds/linux@3c407dc723), and a future kernel
will remove the check entirely[1].

As such, there's no good reason for us to enable discard when
discard_granularity=0. The kernel will never let the request go in
anyway; better that we just disable it so we can report it properly to
the user.

1. https://patchwork.kernel.org/project/linux-block/patch/20240312144826.1045212-2-hch@lst.de/

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16068
Closes #16082
2024-04-12 09:00:20 -07:00
Rob Norris d7605ae77b zio: rename ZIO_TYPE_IOCTL to ZIO_TYPE_FLUSH
The only possible ioctl is a flush, and any other kind of meta-operation
introduced in the future is likely to have different semantics (much
like trim did). So, lets just call it what it is.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16064
2024-04-11 17:17:23 -07:00
Rob Norris b613709c46 dkio: remove kernel dkio.h compatibility header
Without DKIOCFLUSHWRITECACHE, we no longer need the compat header. Note
that we're keeping the userspace SPL compat header, which is used by
libefi.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16064
2024-04-11 17:17:18 -07:00
Rob Norris c9c838aa1f zio: remove io_cmd and DKIOCFLUSHWRITECACHE
There's no other options, so we can just always assume its a flush.

Includes some light refactoring where a switch statement was doing
control flow that no longer works.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16064
2024-04-11 17:17:11 -07:00
Rob Norris cac416f106 zio: remove zio_ioctl()
It only had one user, zio_flush(), and there are no other vdev ioctls
anyway.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16064
2024-04-11 17:16:46 -07:00
Umer Saleem a100a195fa Add support for zfs mount -R <filesystem>
This commit adds support for mounting a dataset along with all of
it's children with '-R' flag for zfs mount. There can be scenarios
where we want to mount all datasets under one hierarchy instead of
mounting all datasets present on system with '-a' flag.

'-R' flag should work on all root and non-root datasets. Usage
information and man page has been updated for zfs mount. A test
for verifying the behavior for '-R' flag is also added.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16015
2024-04-11 15:10:24 -07:00
Rob N e2035cdbf7 AUTHORS: refresh with recent new contributors
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16079
2024-04-11 14:49:57 -07:00
Rob Norris 1bf649cb0a vdev_disk: fix alignment check when buffer has non-zero starting offset
If a linear buffer spans multiple pages, and the first page has a
non-zero starting offset, the checker would not include the offset, and
so would think there was an alignment gap at the end of the first page,
rather than at the start.

That is, for a 16K buffer spread across five pages with an initial 512B
offset:

    [.XXXXXXX][XXXXXXXX][XXXXXXXX][XXXXXXXX][XXXXXXX.]

It would be interpreted as:

    [XXXXXXX.][XXXXXXXX]...

And be rejected as misaligned.

Since it's already a linear ABD, the "linearising" copy would just reuse
the buffer as-is, and the second check would failing, tripping the
VERIFY in vdev_disk_io_rw().

This commit fixes all this by including the offset in the check for
end-of-page alignment.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16076
2024-04-11 14:43:27 -07:00
Rob Norris bc27c49404 tests: add test for vdev_disk page alignment check
This provides a test driver and a set of test vectors for the page
alignment check callback function vdev_disk_check_pages_cb().

Because there's no good facility for exposing this function to a
userspace test right now, for now I'm just duplicating the function and
adding commentary to remind people to keep them in sync.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16076
2024-04-11 14:42:46 -07:00
Andy Fiddaman 44f337be30 Illumos#16463 zfs_ioc_recv leaks nvlist
In https://www.illumos.org/issues/16463 it was observed that
an nvlist was being leaked in zfs_ioc_recv() due a missing
call to nvlist_free for "hidden_args".
For OpenZFS the same issue exists in zfs_ioc_recv_new() and
is addressed by this PR.

This change also properly frees nvlists in the unlikely
event that a call to get_nvlist() fails.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Andy Fiddaman <illumos@fiddaman.net>
Closes #16077
2024-04-11 14:38:22 -07:00
Jason Lee e5ddecd1a7 return NULL at end of send_progress_thread
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason Lee <jasonlee@lanl.gov>
Closes #16074
2024-04-10 15:01:39 -07:00
Rich Ercolani e5e2a5a3b8 Add custom debug printing for your asserts
Being able to print custom debug information on assert trip
seems useful.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15792
2024-04-10 13:30:25 -07:00
Benda Xu d98973dbdd config/Substfiles.am: restrict to the dedicated list.
We recover the scope of $(SUBSTFILES) to explicitly control what files
are being generated from the corresponding .in.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Benda Xu <orv@debian.org>
Closes #15980
2024-04-09 16:34:58 -07:00
Alexander Motin 997f85b4d3 L2ARC: Relax locking during write
Previous code held ARC state sublist lock throughout all L2ARC
write process, which included number of allocations and even ZIO
issues.  Being blocked in any of those places the code could also
block ARC eviction, that could cause OOM activation or even dead-
lock if system is low on memory or one is too fragmented.

Fix it by dropping the lock as soon as we see a block eligible
for L2ARC writing and pick it up later using earlier inserted
marker.  While there, also reduce scope of hash lock, moving
ZIO allocation and other operations not requiring header access
out of it.  All operations requiring header access move under
hash lock, since L2_WRITING flag does not prevent header eviction
only transition to arc_l2c_only state with L1 header.

To be able to manipulate sublist lock and marker as needed add few
more multilist functions and modify one.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16040
2024-04-09 16:23:19 -07:00
Alexander Motin 9e63631dea Small fix to prefetch ranges aggregation
When after #16022 adding new range we aggregate more than two
existing ranges, that should be very rare, only if several streams
overlap, we may need to zero not the last range, but some earlier.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16072
2024-04-09 16:14:04 -07:00
Benda Xu 162cc80b81 etc/init.d: decide which variant to use at build time.
Let Debian use the sysv-rc variant of the script, even when OpenRC is
installed. Unlike on Gentoo, OpenRC on Debian consumes both the
sysv-rc scripts and OpenRC ones. ZFS initscripts on Debian should be
the sysv-rc version to provide most compatibility and to integrate
with the rest of initscripts for dependency tracking.

Restrict the substitution in the Makefile to the dedicated list.

This construct is inspired by Mo Zhou's detection of the execution
shell and follows the strategy of Peter in 6ef28c526b.

As of 2024, the initscripts are mostly relevant on Debian, Gentoo and
their derivatives.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Benda Xu <orv@debian.org>
Issue #8063
Issue #8204
Issue #8359
Closes #15977
2024-04-08 16:52:24 -07:00
Maxim Filimonov f07389d3ad Fix locale-specific time
In `zpool status -t`, scrub date/time is reported using the C locale,
while trim time is reported using the current one. This is inconsistent.
This patch fixes that.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Maxim Filimonov <che@bein.link>
Closes #15878
Closes #15879
2024-04-08 15:37:41 -07:00
Alexander Motin aa5445c28b Remove db_state DB_NOFILL checks from syncing context
Syncing context should not depend on current state of dbuf, which
could already change several times in later transaction groups,
but rely solely on dirty record for the transaction group being
synced. Some of the checks seem already impossible, while instead
of others I think we should better check for absence of data in
the specific dirty record rather than DB_NOFILL.

Reviewed-by: Robert Evans <evansr@google.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16057
2024-04-08 15:23:43 -07:00
Alexander Motin 5e5fd0a178 Speculative prefetch for reordered requests
Before this change speculative prefetcher was able to detect a stream
only if all of its accesses are perfectly sequential.  It was easy to
implement and is perfectly fine for single-threaded applications.
Unfortunately multi-threaded network servers, such as iSCSI, SMB or
NFS usually have plenty of threads and may often reorder requests,
preventing successful speculation and prefetch.

This change allows speculative prefetcher to detect streams even if
requests are reordered by introducing a list of 9 non-contiguous
ranges up to 16MB ahead of current stream position and filling the
gaps as more requests arrive.  It also allows stream to proceed
even with holes up to a certain configurable threshold (25%).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16022
2024-04-08 15:13:27 -07:00
Alexander Motin eeca9a91d6 Fix read errors race after block cloning
Investigating read errors triggering panic fixed in #16042 I've
found that we have a race in a sync process between the moment
dirty record for cloned block is removed and the moment dbuf is
destroyed.  If dmu_buf_hold_array_by_dnode() take a hold on a
cloned dbuf before it is synced/destroyed, then dbuf_read_impl()
may see it still in DB_NOFILL state, but without the dirty record.
Such case is not an error, but equivalent to DB_UNCACHED, since
the dbuf block pointer is already updated by dbuf_write_ready().
Unfortunately it is impossible to safely change the dbuf state
to DB_UNCACHED there, since there may already be another cloning
in progress, that dropped dbuf lock before creating a new dirty
record, protected only by the range lock.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Robert Evans <evansr@google.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16052
2024-04-08 12:03:18 -07:00
Rob N 76d1dde94c zinject: inject device errors into ioctls
Adds 'ioctl' as a valid IO type for device error injection, so we can
simulate a flush error (which OpenZFS currently ignores, but that's by
the by).

To support this, adding ZIO_STAGE_VDEV_IO_DONE to ZIO_IOCTL_PIPELINE,
since that's where device error injection happens. This needs a small
exclusion to avoid the vdev_queue, since flushes are not queued, and I'm
assuming that the various failure responses are still reasonable for
flush failures (probes, media change, etc). This seems reasonable to me,
as a flush failure is not unlike a write failure in this regard, however
this may be too aggressive or subtle to assume in just this change.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16061
2024-04-08 11:59:04 -07:00
Rob N ba9f587a77 vdev_disk: ensure trim errors are returned immediately
After 06e25f9c4, the discard issuing code was organised such that if
requesting an async discard or secure erase failed before the IO was
issued (that is, calling __blkdev_issue_discard() returned an error),
the failed zio would never be executed, resulting in txg_sync hanging
forever waiting for IO to finish.

This commit fixes that by immediately executing a failed zio on error.
To handle the successful synchronous op case, we fake an async op by,
when not using an asynchronous submission method, queuing the successful
result zio as part of the discard handler.

Since it was hard to understand the differences between discard and
secure erase, and sync and async, across different kernel versions, I've
commented and reorganised the code a bit to try and make everything more
contained and linear.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16070
2024-04-08 11:50:24 -07:00
Rob N 03987f71e3 zvol_os: fix compile with blk-mq on Linux 4.x
99741bde5 accesses a cached blk-mq hardware context through the mq_hctx
field of struct request. However, this field did not exist until 5.0.
Before that, the private function blk_mq_map_queue() was used to dig it
out of broader queue context. This commit detects this situation, and
handles it with a poor-man's simulation of that function.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16069
2024-04-08 11:38:49 -07:00
Rob N c13400c9a2 zvol_os: fix build on Linux <3.13
99741bde5 introduced zvol_num_taskqs, but put it behind the HAVE_BLK_MQ
define, preventing builds on versions of Linux that don't have it
(<3.13, incl EL7).

Nothing about it seems dependent on blk-mq, so this just moves it out
from behind that define and so fixes the build.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16062
2024-04-08 10:13:27 -07:00
Ameer Hamza 99741bde59 zvol: use multiple taskq
Currently, zvol uses a single taskq, resulting in throughput bottleneck
under heavy load due to lock contention on the single taskq. This patch
addresses the performance bottleneck under heavy load conditions by
utilizing multiple taskqs, thus mitigating lock contention. The number
of taskqs scale dynamically based on the available CPUs in the system,
as illustrated below:

                taskq   total
cpus    taskqs  threads threads
------- ------- ------- -------
1       1       32       32
2       1       32       32
4       1       32       32
8       2       16       32
16      3       11       33
32      5       7        35
64      8       8        64
128     11      12       132
256     16      16       256

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15992
2024-04-03 18:21:25 -07:00
Pavel Snajdr 30c4eba4ea Fix panics when truncating/deleting files
There's an union in dbuf_dirty_record_t; dr_brtwrite could evaluate
to B_TRUE if the dirty record is of another type than dl. Adding
more explicit dr type check before trying to access dr_brtwrite.

Fixes two similar panics:

[ 1373.806119] VERIFY0(db->db_level) failed (0 == 1)
[ 1373.807232] PANIC at dbuf.c:2549:dbuf_undirty()
[ 1373.814979]  dump_stack_lvl+0x71/0x90
[ 1373.815799]  spl_panic+0xd3/0x100 [spl]
[ 1373.827709]  dbuf_undirty+0x62a/0x970 [zfs]
[ 1373.829204]  dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs]
[ 1373.831010]  dnode_free_range+0x532/0x1220 [zfs]
[ 1373.833922]  dmu_free_long_range+0x4e0/0x930 [zfs]
[ 1373.835277]  zfs_trunc+0x75/0x1e0 [zfs]
[ 1373.837958]  zfs_freesp+0x9b/0x470 [zfs]
[ 1373.847236]  zfs_setattr+0x161a/0x3500 [zfs]
[ 1373.855267]  zpl_setattr+0x125/0x320 [zfs]
[ 1373.856725]  notify_change+0x1ee/0x4a0
[ 1373.859207]  do_truncate+0x7f/0xd0
[ 1373.859968]  do_sys_ftruncate+0x28e/0x2e0
[ 1373.860962]  do_syscall_64+0x38/0x90
[ 1373.861751]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

[ 1822.381337] VERIFY0(db->db_level) failed (0 == 1)
[ 1822.382376] PANIC at dbuf.c:2549:dbuf_undirty()
[ 1822.389232]  dump_stack_lvl+0x71/0x90
[ 1822.389920]  spl_panic+0xd3/0x100 [spl]
[ 1822.399567]  dbuf_undirty+0x62a/0x970 [zfs]
[ 1822.400583]  dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs]
[ 1822.401752]  dnode_free_range+0x532/0x1220 [zfs]
[ 1822.402841]  dmu_object_free+0x74/0x120 [zfs]
[ 1822.403869]  zfs_znode_delete+0x75/0x120 [zfs]
[ 1822.404906]  zfs_rmnode+0x3f6/0x7f0 [zfs]
[ 1822.405870]  zfs_inactive+0xa3/0x610 [zfs]
[ 1822.407803]  zpl_evict_inode+0x3e/0x90 [zfs]
[ 1822.408831]  evict+0xc1/0x1c0
[ 1822.409387]  do_unlinkat+0x147/0x300
[ 1822.410060]  __x64_sys_unlinkat+0x33/0x60
[ 1822.410802]  do_syscall_64+0x38/0x90
[ 1822.411458]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #15983
2024-04-03 18:09:19 -07:00
Shengqi Chen 66929f6829 man: move zfs_prepare_disk.8 to nodist_man_MANS
The commit b53077a added zfs_prepare_disk.8 to the wrong list
dist_man_MANS, in which @zfsexecdir@ will not be properly substituted.
This leads to wrong path in the manpage in generated release tarballs.

Reported-by: Benda Xu <orv@debian.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15979
2024-04-03 18:04:15 -07:00
Alek P ea2862cdda vdev props comment and manpage should include zfsd and FreeBSD mentions
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #15968
2024-04-03 17:56:34 -07:00
Rob N b21b967bd5 zap_leaf: make l_hash[] variable length to silence UBSAN
When UBSAN is active and OpenZFS is a debug build, the l_hash assert at
the bottom of zap_open_leaf() causes UBSAN to complain.

This follows the example in 786641dcf to shut it up.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15964
2024-04-03 16:38:18 -07:00
Paul Dagnelie b6bbaa8372 Give a better message from 'zpool get' with invalid pool name
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15942
2024-04-03 16:34:46 -07:00
Rob Norris 756e10b0a1 tests: simple zinject disk fault arg check
Just making sure the valid values for disk faults are accepted.
Obviously we can do a lot more, but this will do to get us started.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15953
2024-04-03 16:06:43 -07:00
Rob Norris fa480fe5ba zinject: show more device fault fields
Once there's a few different kinds injected, its pretty hard to see them
otherwise.

So, lets show IO type, error type and frequency fields in the table too.

Since we now have to convert from error code to pretty string, refactor
the error names into a table and add lookup functions.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15953
2024-04-03 16:06:19 -07:00
Rob N ca678bc0bc Makefile.bsd: sort and cleanup source file list
All files now in their correct sections, and all sections match on-disk
dir layout, and all sorted.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15943
2024-04-03 15:49:22 -07:00
Rob Norris 6097a7ba8b Linux 6.9 compat: blk_alloc_disk() now takes two args
There's an extra nullable arg for queue limits. Detect it, and set it to
NULL. Similar change for blk_mq_alloc_disk(), now three args, same
treatment.

Error return now has error encoded in the return, so detect with
IS_ERR() and explicitly NULL our own return.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16027
Closes #16033
2024-04-03 15:29:39 -07:00
Rob Norris e3120f73d0 Linux 6.9 compat: bdev handles are now struct file
bdev_open_by_path() is replaced by bdev_file_open_by_path(), which
returns a plain old struct file*. Release function is gone entirely; the
regular file release function fput() will take care of the bdev
specifics.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16027
Closes #16033
2024-04-03 15:29:33 -07:00
Rob N 917ff75e95 vdev_disk: don't touch vbio after its handed off to the kernel
After IO is unplugged, it may complete immediately and vbio_completion
be called on interrupt context. That may interrupt or deschedule our
task. If its the last bio, the vbio will be freed. Then, we get
rescheduled, and try to write to freed memory through vbio->.

This patch just removes the the cleanup, and the corresponding assert.
These were leftovers from a previous iteration of vbio_submit() and were
always "belt and suspenders" ops anyway, never strictly required.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc
Reported-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Laurențiu Nicola <lnicola@dend.ro>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16045
Closes #16050
Closes #16049
2024-04-03 15:17:07 -07:00
Rob N a9a4290173 xdr: header cleanup
#16047 notes that include/os/freebsd/spl/rpc/xdr.h carried an
(apparently) incompatible license. While looking into it, it seems that
this file is actually unnecessary these days - FreeBSD's kernel XDR has
XDR_CONTROL, xdrmem_control and XDR_GET_BYTES_AVAIL, while userspace has
XDR_CONTROL and xdrmem_control, and our implementation of
XDR_GET_BYTES_AVAIL for libspl works nicely with it. So this removes
that file outright.

To keep the includes in nvpair.c tidy, I've made a few small adjustments
to the Linux headers. By definition, rpc/types.h provides bool_t and is
included before rpc/xdr.h, so I've created rpc/types.h for Linux. This
isn't necessary for userspace; both FreeBSD native and tirpc on Linux
already have these headers set up correctly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16047 
Closes #16051
2024-04-03 15:13:27 -07:00
Alexander Motin b12738182c Improve dbuf_read() error reporting
Previous code reported non-ZIO errors only via return value, but
not via parent ZIO.  It could cause NULL-dereference panics due
to dmu_buf_hold_array_by_dnode() ignoring the return value,
relying solely on parent ZIO status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reported by:	Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16042
2024-04-03 15:04:26 -07:00
Robert Evans 39be46f43f Linux 5.18+ compat: Detect filemap_range_has_page
In v5.18 `filemap_range_has_page` moved to `pagemap.h`

`pagemap.h` has been around since 3.10 so just include both

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16034
2024-03-29 17:11:52 -07:00
Robert Evans 2553f94c42 Fix buffer underflow if sysfs file is empty
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Jason Lee <jasonlee@lanl.gov>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16028
Closes #16035
2024-03-29 14:59:23 -07:00
Rob N cfb96c772b vdev_disk: clean up spa/bdev mode conversion
43e8f6e37 introduced a subtle API misuse, in that it passed the output
from vdev_bdev_mode() back into itself. Fortunately, the
SPA_MODE_(READ|WRITE) bit values exactly map to the FMODE_(READ|WRITE) &
BLK_OPEN_(READ|WRITE) bit values, so it didn't result in a bug, but it
was hard to read and understand, so I cleaned it up.

In doing so, I noticed that the only call to vdev_bdev_mode() without
the "exclusive" flag set was in that misuse, and actually, we never do a
non-exclusive blkdev_get_by_path(). So I've just made exclusive be
always-on.


Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15995
2024-03-29 14:51:33 -07:00
Fabian-Gruenbichler c0aab8b8f9 zvols: prevent overflow of minor device numbers
currently, the linux kernel allows 2^20 minor devices per major device
number.  ZFS reserves blocks of 2^4 minors per zvol: 1 for the zvol
itself, the other 15 for the first partitions of that zvol. as a result,
only 2^16 such blocks are available for use.

there are no checks in place to avoid overflowing into the major device
number when more than 2^16 zvols are allocated (with volmode=dev or
default). instead of ignoring this limit, which comes with all sorts of
weird knock-on effects, detect this situation and simply fail allocating
the zvol block device early on.

without this safeguard, the kernel will reject the attempt to create an
already existing block device, but ZFS doesn't handle this error and
gets confused about which zvol occupies which minor slot, potentially
resulting in kernel NULL derefs and other issues later on.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #16006
2024-03-29 14:37:40 -07:00
George Wilson b1e46f869e Add ashift validation when adding devices to a pool
Currently, zpool add allows users to add top-level vdevs that have
different ashifts but doing so prevents users from being able to
perform a top-level vdev removal. Often times consumers may not realize
that they have mismatched ashifts until the top-level removal fails.

This feature adds ashift validation to the zpool add command and will
fail the operation if the sector size of the specified vdev does not
match the existing pool. This behavior can be disabled by using the -f
flag. In addition, new flags have been added to provide fine-grained
control to disable specific checks. These flags
are:

--allow-in-use
--allow-ashift-mismatch
--allow-replicaton-mismatch

The force flag will disable all of these checks.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Maybee <mmaybee@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #15509
2024-03-29 13:15:56 -06:00
Robert Evans e39e20b6dc ZTS: fix flakiness in cp_files_002_pos
Fix RANDOM to not return zero.

Overwriting with `dd ... count=0` does not test anything.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16029
2024-03-27 14:59:16 -07:00
Alexander Motin 0c8eb974ff BRT: Check pool clone stats in more tests
This should allow to catch some leaks, if those happen.

While there fix some cosmetic issues.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16007
2024-03-27 14:47:06 -07:00
Alexander Motin b403427624 BRT: Fix tests to work on non-empty pools
It should not normally happen, but if it does, better to not fail
everything for no good reason, or it may be hard to debug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16007
2024-03-27 14:46:48 -07:00
Alexander Motin a89d209bb6 BRT: Fix holes cloning.
- When reading L0 block pointers handle buffers without ones and
without dirty records as a holes.  Those appear when dnode size
was increased, but the end was never written, so there are no new
indirection levels to store the pointers.  It makes no sense to
return EAGAIN here, since sync won't create new indirection levels
until there will be actual writes.
 - When cloning blocks set destination hole logical birth time
to the current TXG.  Otherwise if we are cloning over existing
data, newly created holes may not be properly replicated later.
Use BP_SET_BIRTH() when possible to not replicate its logic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15994
Closes #16007
2024-03-27 14:45:27 -07:00
Alexander Motin 8cd8ccca53 BRT: Skip getting length in brt_entry_lookup()
Unlike DDT, where ZAP values may have different lengths due to
compression, all BRT entries are identical 8-byte counters.  It
does not make sense to first fetch the length only to assert it.
zap_lookup_uint64() is specifically designed to work with counters
of different size and should return error if something odd found.
Calling it straight allows to save some measurable CPU time.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15950
2024-03-25 17:13:45 -07:00
Rob Norris c6be6ce175 abd_iter_page: don't use compound heads on Linux <4.5
Before 4.5 (specifically, torvalds/linux@ddc58f2), head and tail pages
in a compound page were refcounted separately. This means that using the
head page without taking a reference to it could see it cleaned up later
before we're finished with it. Specifically, bio_add_page() would take a
reference, and drop its reference after the bio completion callback
returns.

If the zio is executed immediately from the completion callback, this is
usually ok, as any data is referenced through the tail page referenced
by the ABD, and so becomes "live" that way. If there's a delay in zio
execution (high load, error injection), then the head page can be freed,
along with any dirty flags or other indicators that the underlying
memory is used. Later, when the zio completes and that memory is
accessed, its either unmapped and an unhandled fault takes down the
entire system, or it is mapped and we end up messing around in someone
else's memory. Both of these are very bad.

The solution on these older kernels is to take a reference to the head
page when we use it, and release it when we're done. There's not really
a sensible way under our current structure to do this; the "best" would
be to keep a list of head page references in the ABD, and release them
when the ABD is freed.

Since this additional overhead is totally unnecessary on 4.5+, where
head and tail pages share refcounts, I've opted to simply not use the
compound head in ABD page iteration there. This is theoretically less
efficient (though cleaning up head page references would add overhead),
but its safe, and we still get the other benefits of not mapping pages
before adding them to a bio and not mis-splitting pages.

There doesn't appear to be an obvious symbol name or config option we
can match on to discover this behaviour in configure (and the mm/page
APIs have changed a lot since then anyway), so I've gone with a simple
version check.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
2024-03-25 16:51:54 -07:00
Rob Norris 72fd834c47 vdev_disk: use bio_chain() to submit multiple BIOs
Simplifies our code a lot, so we don't have to wait for each and
reassemble them.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
2024-03-25 16:51:47 -07:00
Rob Norris df2169d141 vdev_disk: add module parameter to select BIO submission method
This makes the submission method selectable at module load time via the
`zfs_vdev_disk_classic` parameter, allowing this change to be backported
to 2.2 safely, and disabled in favour of the "classic" submission method
if new problems come up.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
2024-03-25 16:51:30 -07:00
Rob Norris 06a196020e vdev_disk: rewrite BIO filling machinery to avoid split pages
This commit tackles a number of issues in the way BIOs (`struct bio`)
are constructed for submission to the Linux block layer.

The kernel has a hard upper limit on the number of pages/segments that
can be added to a BIO, as well as a separate limit for each device
(related to its queue depth and other scheduling characteristics).

ZFS counts the number of memory pages in the request ABD
(`abd_nr_pages_off()`, and then uses that as the number of segments to
put into the BIO, up to the hard upper limit. If it requires more than
the limit, it will create multiple BIOs.

Leaving aside the fact that page count method is wrong (see below), not
limiting to the device segment max means that the device driver will
need to split the BIO in half. This is alone is not necessarily a
problem, but it interacts with another issue to cause a much larger
problem.

The kernel function to add a segment to a BIO (`bio_add_page()`) takes a
`struct page` pointer, and offset+len within it. `struct page` can
represent a run of contiguous memory pages (known as a "compound page").
In can be of arbitrary length.

The ZFS functions that count ABD pages and load them into the BIO
(`abd_nr_pages_off()`, `bio_map()` and `abd_bio_map_off()`) will never
consider a page to be more than `PAGE_SIZE` (4K), even if the `struct
page` is for multiple pages. In this case, it will load the same `struct
page` into the BIO multiple times, with the offset adjusted each time.

With a sufficiently large ABD, this can easily lead to the BIO being
entirely filled much earlier than it could have been. This is also
further contributes to the problem caused by the incorrect segment limit
calculation, as its much easier to go past the device limit, and so
require a split.

Again, this is not a problem on its own.

The logic for "never submit more than `PAGE_SIZE`" is actually a little
more subtle. It will actually never submit a buffer that crosses a 4K
page boundary.

In practice, this is fine, as most ABDs are scattered, that is a list of
complete 4K pages, and so are loaded in as such.

Linear ABDs are typically allocated from slabs, and for small sizes they
are frequently not aligned to page boundaries. For example, a 12K
allocation can span four pages, eg:

     -- 4K -- -- 4K -- -- 4K -- -- 4K --
    |        |        |        |        |
          :## ######## ######## ######:    [1K, 4K, 4K, 3K]

Such an allocation would be loaded into a BIO as you see:

    [1K, 4K, 4K, 3K]

This tends not to be a problem in practice, because even if the BIO were
filled and needed to be split, each half would still have either a start
or end aligned to the logical block size of the device (assuming 4K at
least).

---

In ideal circumstances, these shortcomings don't cause any particular
problems. Its when they start to interact with other ZFS features that
things get interesting.

Aggregation will create a "gang" ABD, which is simply a list of other
ABDs. Iterating over a gang ABD is just iterating over each ABD within
it in turn.

Because the segments are simply loaded in order, we can end up with
uneven segments either side of the "gap" between the two ABDs. For
example, two 12K ABDs might be aggregated and then loaded as:

    [1K, 4K, 4K, 3K, 2K, 4K, 4K, 2K]

Should a split occur, each individual BIO can end up either having an
start or end offset that is not aligned to the logical block size, which
some drivers (eg SCSI) will reject. However, this tends not to happen
because the default aggregation limit usually keeps the BIO small enough
to not require more than one split, and most pages are actually full 4K
pages, so hitting an uneven gap is very rare anyway.

If the pool is under particular memory pressure, then an IO can be
broken down into a "gang block", a 512-byte block composed of a header
and up to three block pointers. Each points to a fragment of the
original write, or in turn, another gang block, breaking the original
data up over and over until space can be found in the pool for each of
them.

Each gang header is a separate 512-byte memory allocation from a slab,
that needs to be written down to disk. When the gang header is added to
the BIO, its a single 512-byte segment.

Pulling all this together, consider a large aggregated write of gang
blocks. This results a BIO containing lots of 512-byte segments. Given
our tendency to overfill the BIO, a split is likely, and most possible
split points will yield a pair of BIOs that are misaligned. Drivers that
care, like the SCSI driver, will reject them.

---

This commit is a substantial refactor and rewrite of much of `vdev_disk`
to sort all this out.

`vdev_bio_max_segs()` now returns the ideal maximum size for the device,
if available. There's also a tuneable `zfs_vdev_disk_max_segs` to
override this, to assist with testing.

We scan the ABD up front to count the number of pages within it, and to
confirm that if we submitted all those pages to one or more BIOs, it
could be split at any point with creating a misaligned BIO.  If the
pages in the BIO are not usable (as in any of the above situations), the
ABD is linearised, and then checked again. This is the same technique
used in `vdev_geom` on FreeBSD, adjusted for Linux's variable page size
and allocator quirks.

`vbio_t` is a cleanup and enhancement of the old `dio_request_t`. The
idea is simply that it can hold all the state needed to create, submit
and return multiple BIOs, including all the refcounts, the ABD copy if
it was needed, and so on. Apart from what I hope is a clearer interface,
the major difference is that because we know how many BIOs we'll need up
front, we don't need the old overflow logic that would grow the BIO
array, throw away all the old work and restart. We can get it right from
the start.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
2024-03-25 16:51:14 -07:00
Rob Norris c4a13ba483 vdev_disk: make read/write IO function configurable
This is just setting up for the next couple of commits, which will add a
new IO function and a parameter to select it.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
2024-03-25 16:51:04 -07:00
Rob Norris 867178ae1d vdev_disk: reorganise vdev_disk_io_start
Light reshuffle to make it a bit more linear to read and get rid of a
bunch of args that aren't needed in all cases.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
2024-03-25 16:50:56 -07:00
Rob Norris f3b85d706b vdev_disk: rename existing functions to vdev_classic_*
This is just renaming the existing functions we're about to replace and
grouping them together to make the next commits easier to follow.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
2024-03-25 16:50:47 -07:00
Rob Norris 390b448726 abd: add page iterator
The regular ABD iterators yield data buffers, so they have to map and
unmap pages into kernel memory. If the caller only wants to count
chunks, or can use page pointers directly, then the map/unmap is just
unnecessary overhead.

This adds adb_iterate_page_func, which yields unmapped struct page
instead.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
2024-03-25 16:50:35 -07:00
Rob Norris df04efe321 linux 5.4 compat: page_size()
Before 5.4 we have to do a little math.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
2024-03-25 16:48:15 -07:00
Alexander Motin f68bde7236 BRT: Make BRT block sizes configurable
Similar to DDT make BRT data and indirect block sizes configurable
via module parameters.  I am not sure what would be the best yet,
but similar to DDT 4KB blocks kill all chances of compression on
vdev with ashift=12 or more, that on my tests reaches 3x.

While here, fix documentation for respective DDT parameters.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15967
2024-03-25 15:02:38 -07:00
George Wilson 493fcce9be Provide macros for setting and getting blkptr birth times
There exist a couple of macros that are used to update the blkptr birth
times but they can often be confusing. For example, the
BP_PHYSICAL_BIRTH() macro will provide either the physical birth time
if it is set or else return back the logical birth time. The
complement to this macro is BP_SET_BIRTH() which will set the logical
birth time and set the physical birth time if they are not the same.
Consumers may get confused when they are trying to get the physical
birth time and use the BP_PHYSICAL_BIRTH() macro only to find out that
the logical birth time is what is actually returned.

This change cleans up these macros and makes them symmetrical. The same
functionally is preserved but the name is changed. Instead of calling
BP_PHYSICAL_BIRTH(), consumer can now call BP_GET_BIRTH(). In
additional to cleaning up this naming conventions, two new sets of
macros are introduced -- BP_[SET|GET]_LOGICAL_BIRTH() and
BP_[SET|GET]_PHYSICAL_BIRTH.  These new macros allow the consumer to
get and set the specific birth time.

As part of the cleanup, the unused GRID macros have been removed and
that portion of the blkptr are currently unused.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #15962
2024-03-25 15:01:54 -07:00
Alexander Motin 4616b96a64 BRT: Relax brt_pending_apply() locking
Since brt_pending_apply() is running in syncing context, no other
brt_pending_tree accesses are possible for the TXG.  We don't need
to acquire brt_pending_lock here.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15955
2024-03-25 14:59:55 -07:00
Alexander Motin 80cc516295 ZAP: Massively switch to _by_dnode() interfaces
Before this change ZAP called dnode_hold() for almost every block
access, that was clearly visible in profiler under heavy load, such
as BRT.  This patch makes it always hold the dnode reference between
zap_lockdir() and zap_unlockdir().  It allows to avoid most of dnode
operations between those.  It also adds several new _by_dnode() APIs
to ZAP and uses them in BRT code.  Also adds dmu_prefetch_by_dnode()
variant and uses it in the ZAP code.

After this there remains only one call to dmu_buf_dnode_enter(),
which seems to be unneeded.  So remove the call and the functions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15951
2024-03-25 14:58:50 -07:00
Alexander Motin bf8f72359d BRT: Skip duplicate BRT prefetches
If there is a pending entry for this block, then we've already
issued BRT prefetch for it within this TXG, so don't do it again.
BRT vdev lookup and following zap_prefetch_uint64() call can be
pretty expensive and should be avoided when not necessary.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15941
2024-03-25 14:58:04 -07:00
Robert Evans 102b468b5e Fix corruption caused by mmap flushing problems
1) Make mmap flushes synchronous. Linux may skip flushing dirty pages
   already in writeback unless data-integrity sync is requested.

2) Change zfs_putpage to use TXG_WAIT. Otherwise dirty pages may be
   skipped due to DMU pushing back on TX assign.

3) Add missing mmap flush when doing block cloning.

4) While here, pass errors from putpage to writepage/writepages.

This change fixes corruption edge cases, but unfortunately adds
synchronous ZIL flushes for dirty mmap pages to llseek and bclone
operations. It may be possible to avoid these sync writes later
but would need more tricky refactoring of the writeback code.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #15933 
Closes #16019
2024-03-25 14:56:49 -07:00
Alexander Motin c28f94f32e ZAP: Some cleanups/micro-optimizations
- Remove custom zap_memset(), use regular memset().
- Use PANIC() instead of opaque cmn_err(CE_PANIC).
- Provide entry parameter to zap_leaf_rehash_entry().
- Reduce branching in zap_leaf_array_create() inner loop.
- Remove signedness where it should not be.

Should be no function changes.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15976
2024-03-21 16:43:53 -07:00
Fabian-Gruenbichler f1b368359b udev: correctly handle partition #16 and later
If a zvol has more than 15 partitions, the minor device number exhausts
the slot count reserved for partitions next to the zvol itself. As a
result, the minor number cannot be used to determine the partition
number for the higher partition, and doing so results in wrong named
symlinks being generated by udev.

Since the partition number is encoded in the block device name anyway,
let's just extract it from there instead.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #15904
Closes #15970
2024-03-21 16:38:24 -07:00
Alexander Motin 2c01cae8b9 BRT: Change brt_pending_tree sorting order
It does not look important how exactly brt_pending_tree is sorted.
When cloning large file, it is quite likely that all of its blocks
have identical physical birth times, so comparing them first does
not provide useful entropy, while accesses additional cache line.
In most cases combination of vdev and offset provides unique result
and physical birth time comparison is not even needed.  Meanwhile,
when traversing the tree inside brt_pending_apply(), it can be
beneficial for dbuf cache and CPU cache hits to group processing
by vdev and so by the per-VDEV BRT ZAPs.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15954
2024-03-21 15:42:21 -07:00
Rob N 5c4a4f82c8 zio: update ZIO type x stage documentation
- add column for TRIM ZIOs
- remove R from ZIO_STAGE_ISSUE_ASYNC, never happened
- remove I from ZIO_STAGE_VDEV_IO_DONE, never happened

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15959
2024-03-21 12:10:04 -07:00
Cameron Harr c9d8f6c59a Fix option string, adding -e and fixing order
The recently added '-e' option (PR #15769) missed adding the
new option in the online `zpool status` help command. This
adds the options and reorders a couple of the other options
that were not listed alphabetically.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #16008
2024-03-21 09:00:29 -07:00
Alexander Motin 45e23abed5 Update resume token at object receive.
Before this change resume token was updated only on data receive.
Usually it is enough to resume replication without much overlap.
But we've got a report of a curios case, where replication source
was traversed with recursive grep, which through enabled atime
modified every object without modifying any data.  It produced
several gigabytes of replication traffic without a single data
write and so without a single resume point.

While the resume token was not designed to resume from an object,
I've found that the send implementation always sends object before
any data. So by requesting resume from offset 0 we are effectively
resuming from the object, followed (or not) by the data at offset
0, just as we need it.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15927
2024-03-20 17:22:36 -07:00
Rob N ef08a4d406 Linux 6.8 compat: use splice_copy_file_range() for fallback
Linux 6.8 removes generic_copy_file_range(), which had been reduced to a
simple wrapper around splice_copy_file_range(). Detect that function
directly and use it if generic_ is not available.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15930 
Closes #15931
2024-03-20 16:46:15 -07:00
Rob N 90ff732358 freebsd: fix missing headers in distribution tarball
arc_os.h and freebsd_event.h aren't included in release tarballs, so the
build fails on FreeBSD. This fixes it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15963
2024-03-20 10:08:50 -07:00
Rob Norris 8f2f6cd2ac ddt: reduce DDT_NAMELEN
This is the buffer size passed to ddt_object_name(), to expand the
DMU_POOL_DDT format. That format inserts the table checksum, class and
type names, which as I write this are max 6, 9 and 3, respectively.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15908
2024-02-26 12:24:22 -08:00
Rob Norris c00c085bfb config: use -Wno-format-truncation globally
-Wformat-truncation looks for places where the return code of snprintf()
is unchecked and the provided buffer might be too short. This is based
on a heuristic that can change between compiler versions.

It has been seen to get this wrong in ddt_object_name(), leading to
DDT_NAMELEN being increased somewhat arbitrarily.

There's no good reason to have this warning enabled, so here we disable
it everywhere. Truncation may be undesirable, but snprintf() is
guaranteed to emit a trailing null, so at worst we get a short string,
not a buffer overrun.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15908
2024-02-26 12:23:55 -08:00
Quartz 5600dff0ef Fixed parameter passing error when calling zfs_acl_chmod
Follow up to 99495ba6ab which
accidentally introduce this regression.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Quartz <yyhran@163.com>
Closes #15907
2024-02-26 11:41:44 -08:00
Brian Behlendorf af4da5ccf2 Check for minimum partition size
On Linux block devices used for vdevs will by partitioned.  The block
device must be large enough for an 64M partition starting at offset
of 2048 sectors (part1), and a second 64M reserved partition at the
end of the device (part9).

This commit adds a capacity check when creating the GPT label to
immediately detect a device which is too small.  With the existing
code this would be caught slightly latter when attempting to use
the partition.  Catching it sooner let's us print a more useful error.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15898
2024-02-16 09:07:32 -08:00
Tony Hutter d6a3d3f12a ZTS: Skip cross-fs bclone tests if FreeBSD < 14.0
Skip cross filesystem block cloning tests on FreeBSD if running
less than version 14.0.  Cross filesystem copy_file_range() was
added in FreeBSD 14.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15901
2024-02-16 08:59:56 -08:00
Rob Norris 5720b00632 ddt: document the theory and the key data structures
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15887
2024-02-15 11:46:00 -08:00
Rob Norris d961954688 ddt: only create tables for dedup-capable checksums
Most values in zio_checksum can never be used for dedup, partly because
the dedup= property only offers a limited list, but also some values (eg
ZIO_CHECKSUM_OFF) aren't real and will never be seen.

A true flag would be better than a hardcoded list, but thats more
cleanup elsewhere than I want to do right now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15887
2024-02-15 11:45:55 -08:00
Rob Norris 406562c563 ddt: simplify entry load and flags
Only a single bit is needed to track entry state, and definitely not two
whole bytes. Some light refactoring in ddt_lookup() is needed to support
this, but it reads a lot better now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15887
2024-02-15 11:45:50 -08:00
Rob Norris 2cffddd405 ddt: remove ddt_node
Nothing uses it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15887
2024-02-15 11:45:44 -08:00
Rob Norris 9029278dde ddt: rework ops interface in terms of keys and values
Store objects store keys and values, so have them take those types and
nothing more. This way, they don't need to be concerned about the "kind"
of entry being operated on; the dispatch layer can take care of the
appropriate conversions.

This adds a "contains" op to see if a particular entry exists without
loading it, which makes a couple of things easier to do; in particular,
it allows us to avoid an allocation in ddt_class_contains().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15887
2024-02-15 11:45:38 -08:00
Rob Norris 5ee0f9c649 ddt: ensure ddt objects exist before trying to get stats from them
ddt_get_dedup_histogram() was actually checking it, just in an extremely
cursed way. ddt_get_dedup_object_stats() wasn't, but wasn't being called
from a dangerous place so no one noticed.

These checks are necessary, because spa_ddt[] is not populated until
spa_load(), but the spa can exist before that, while being created, and
as vdevs and metaslabs are initialised the space accounting functions
will be called to update pool space counts.

Probably the whole create path doesn't need to go asking for space
accounting from metadata subsystems until after the pool is created.
This will at least catch misuse.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15887
2024-02-15 11:45:33 -08:00
Rob Norris 3bad70040a ddt: remove struct names and forward declarations
Things get confused when there's more than one name for a thing.

Note that we don't do this for ddt_object_t, ddt_histogram_t and
ddt_stat_t because they're part of the public ZFS interface.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15887
2024-02-15 11:45:24 -08:00
Rob Norris c8f694fe39 ddt: typedef ddt_type and ddt_class
Mostly for consistency, so the reader is less likely to wonder why these
things look different.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15887
2024-02-15 11:45:19 -08:00
Rob Norris 8e414fcdf4 ddt: split internal DDT API into separate header
Just to make it easier to know which bits to pay attention to.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15887
2024-02-15 11:45:15 -08:00
Rob Norris 909006049f ddt: remove DDE_GET_NDVAS macro
It was a weird and confusing name, because it wasn't actually returning
the number of DVAs in the entry (as in, in the value/phys part) but the
maximum number of possible DVAs in a BP generated from the entry, based
on the encrypt bit in the key. This is unlike the similarly named
BP_GET_NDVAS, which really does return the number of DVAs.

Since its only used in this one place, and for a specific purpose, it
seemed more sensible to just write it in-place and remove the name.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15887
2024-02-15 11:45:10 -08:00
Rob Norris 5973854153 ddt: lift dedup stats out to separate file
We want to add other kinds of dedup-related objects and keep stats for
them. This makes those functions easier to use from outside ddt.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15887
2024-02-15 11:45:05 -08:00
Rob Norris 0cb1ef60ae ddt: compare keys, not entries
We're about to have different kinds of things that we'll compare on key,
so generalise this function to support that.

(It actually worked fine because of the way the casts work out, but it
requires the key to be at the start of the object so the cast through
ddt_entry_t works, and even then it reads strangely for anything that's
not a ddt_entry_t).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15887
2024-02-15 11:45:00 -08:00
Rob Norris 5c4cc21fd4 ddt_zap: standardise temp buffer allocations
Always do them on the heap, and when we know how much we need, only that
much.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15887
2024-02-15 11:44:55 -08:00
Rob Norris 86e91c030c ddt: move entry compression into ddt_zap
I think I can say with some confidence that anyone making a new storage
type in 2023 is doing their own thing with compression, not this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15887
2024-02-15 11:44:47 -08:00
Rob Norris d3bafe4554 ddt: modernise assertions
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15887
2024-02-15 11:44:21 -08:00
Alexander Motin e0bd8118d0 Linux: Cleanup taskq threads spawn/exit
This changes taskq_thread_should_stop() to limit maximum exit rate
for idle threads to one per 5 seconds.  I believe the previous one
was broken, not allowing any thread exits for tasks arriving more
than one at a time and so completing while others are running.

Also while there:
 - Remove taskq_thread_spawn() calls on task allocation errors.
 - Remove extra taskq_thread_should_stop() call.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15873
2024-02-13 11:15:16 -08:00
Bi11 a0635ae731 zdb: Fix false leak report for BRT objects
Fix a misreport in 'zdb -d' where it falsely marked
BRT objects as leaked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Yuxin Wang <yuxinwang9999@gmail.com>
Closes #15882
2024-02-12 16:58:47 -08:00
Bi11 6cc93ccde7 BRT: Fix slop space calculation with block cloning
Similar to deduplication, the size of data duplicated by block cloning
should not be included in the slop space calculation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Yuxin Wang <yuxinwang9999@gmail.com>
Closes #15874
2024-02-12 13:53:33 -08:00
Kevin Greene 79c6dffa6b Allowing PERFPOOL to be defined by zfs-test users
Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kevin Greene <kevin.greene@delphix.com>
Closes #15868
2024-02-09 10:02:46 -08:00
Shawn Bayern d0d2733204 Update zfs-snapshot.8
Fixes a small inaccuracy in the description of snapshot
atomicity

zfs-snapshot(8) appears to contain a small error.  The existing
version reads "Snapshots are taken atomically, so that all
snapshots correspond to the same moment in time."  Per
zfs_main.c, which in do_snapshot() simply loops over argv, this
does not appear to be correct when multiple snapshots are
specified explicitly on the command line.  I believe the intent
of the man page was to say that *recursive* snapshots are all
created atomically.

This proposed change fixes that error.  Because the existing
statement may confuse some readers anyway, the commit also also
adds a small amount of general explanatory information that may
be helpful.

The change also adds an introductory sentence that summarizes
what 'zfs snapshot' does in the first place.  In that sentence,
the text "different datasets" is intended to indicate that
(again per the code) the same dataset cannot be specified
multiple times on the command line.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shawn Bayern <sbayern@law.fsu.edu>
Closes #15857
2024-02-08 13:06:12 -08:00
Rob N a5a725440b zfs list: add '-t fs' and '-t vol' options
Because "filesystem" and "volume" are just too long!

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15864
2024-02-08 10:22:58 -08:00
Don Brady cbe882298e Add slow disk diagnosis to ZED
Slow disk response times can be indicative of a failing drive. ZFS
currently tracks slow I/Os (slower than zio_slow_io_ms) and generates
events (ereport.fs.zfs.delay).  However, no action is taken by ZED,
like is done for checksum or I/O errors.  This change adds slow disk
diagnosis to ZED which is opt-in using new VDEV properties:
  VDEV_PROP_SLOW_IO_N
  VDEV_PROP_SLOW_IO_T

If multiple VDEVs in a pool are undergoing slow I/Os, then it skips
the zpool_vdev_degrade().

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15469
2024-02-08 09:19:52 -08:00
the-Chain-Warden-thresh 229b9f4ed0 LUA: Backport CVE-2020-24370's patch
CVE-2020-24370 is a security vulnerability in lua. Although the CVE
description in CVE-2020-24370 said that this CVE only affected lua
5.4.0, according to lua this CVE actually existed since lua 5.2. The
root cause of this CVE is the negation overflow that occurs when you
try to take the negative of 0x80000000. Thus, this CVE also exists in
openzfs. Try to backport the fix to the lua in openzfs since the
original fix is for 5.4 and several functions have been changed.

https://github.com/advisories/GHSA-gfr4-c37g-mm3v
https://nvd.nist.gov/vuln/detail/CVE-2020-24370
https://www.lua.org/bugs.html#5.4.0-11
https://github.com/lua/lua/commit/a585eae6e7ada1ca9271607a4f48dfb1786

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ChenHao Lu <18302010006@fudan.edu.cn>
Closes #15847
2024-02-07 11:53:05 -08:00
Cameron Harr 0823388752 Add 'zpool status -e' flag to see unhealthy vdevs
When very large pools are present, it can be laborious to find
reasons for why a pool is degraded and/or where an unhealthy vdev
is. This option filters out vdevs that are ONLINE and with no errors
to make it easier to see where the issues are. Root and parents of
unhealthy vdevs will always be printed.

Testing:
ZFS errors and drive failures for multiple vdevs were simulated with
zinject.

Sample vdev listings with '-e' option
- All vdevs healthy
    NAME        STATE     READ WRITE CKSUM
    iron5       ONLINE       0     0     0

- ZFS errors
    NAME        STATE     READ WRITE CKSUM
    iron5       ONLINE       0     0     0
      raidz2-5  ONLINE       1     0     0
        L23     ONLINE       1     0     0
        L24     ONLINE       1     0     0
        L37     ONLINE       1     0     0

- Vdev faulted
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     FAULTED      0     0     0  too many errors

- Vdev faults and data errors
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-1  DEGRADED     0     0     0
        L2      FAULTED      0     0     0  too many errors
      raidz2-5  ONLINE       1     0     0
        L23     ONLINE       1     0     0
        L24     ONLINE       1     0     0
        L37     ONLINE       1     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     FAULTED      0     0     0  too many errors

- Vdev missing
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     UNAVAIL      3     1     0

- Slow devices when -s provided with -e
    NAME        STATE     READ WRITE CKSUM  SLOW
    iron5       DEGRADED     0     0     0     -
      raidz2-5  DEGRADED     0     0     0     -
        L10     FAULTED      0     0     0     0  external device fault
        L51     ONLINE       0     0     0    14

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #15769
2024-02-07 09:12:12 -08:00
Brian Behlendorf 6dccdf501e BRT: Fix FICLONE/FICLONERANGE shortened copy
On Linux the ioctl_ficlonerange() and ioctl_ficlone() system calls
are expected to either fully clone the specified range or return an
error.  The range may be for an entire file.  While internally ZFS
supports cloning partial ranges there's no way to return the length
cloned to the caller so we need to make this all or nothing.

As part of this change support for the REMAP_FILE_CAN_SHORTEN flag
has been added.  When REMAP_FILE_CAN_SHORTEN is set zfs_clone_range()
will return a shortened range when encountering pending dirty records.
When it's clear zfs_clone_range() will block and wait for the records
to be written out allowing the blocks to be cloned.

Furthermore, the file range lock is held over the region being cloned
to prevent it from being modified while cloning.  This doesn't quite
provide an atomic semantics since if an error is encountered only a
portion of the range may be cloned.  This will be converted to an
error if REMAP_FILE_CAN_SHORTEN was not provided and returned to the
caller.  However, the destination file range is left in an undefined
state.

A test case has been added which exercises this functionality by
verifying that `cp --reflink=never|auto|always` works correctly.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15728
Closes #15842
2024-02-05 16:44:45 -08:00
Rich Ercolani a0d3fe72bf libzdb: Initial breakout of libzdb
Step 1 in trying to slowly rip the zdb functions out of zdb.c
to allow people to play with more flexible things to leverage
zdb's functionality.

No promises on any functions or structs being stable, now or probably
in general unless someone builds a more polished abstraction, the
goal at the moment is to slowly untangle the global state usage
in zdb...

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15804
2024-02-05 10:00:41 -08:00
Umer Saleem 06e25f9c4b Improve performance for zpool trim on linux
On Linux, ZFS uses blkdev_issue_discard in vdev_disk_io_trim to issue
trim command which is synchronous.

This commit updates vdev_disk_io_trim to use __blkdev_issue_discard,
which is asynchronous. Unfortunately there isn't any asynchronous
version for blkdev_issue_secure_erase, so performance of secure trim
will still suffer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15843
2024-02-02 11:51:51 -08:00
Rob Norris 2e6b3c4d94 Linux 6.8 compat: handle mnt_idmap user_namespace change
struct mnt_idmap no longer has a struct user_namespace within it. Work
around this by creating a temporary with the copy of the map we need
taken from the idmap.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 11:36:07 -08:00
Rob Norris 7a2e54b7d3 Linux 6.8 compat: fix inode permission tests
The name inode_permission is now defined in the kernel. Rename ours to
test_permission, in line with most of our other tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 11:36:07 -08:00
Rob Norris 7692d86de4 Linux 6.8 compat: replace MAX_ORDER define
MAX_ORDER has been renamed to MAX_PAGE_ORDER. Rather than just
redefining it, instead define our own name and set it consistently from
the start.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 11:36:07 -08:00
Rob Norris 84980ee0e6 Linux 6.8 compat: implement strlcpy fallback
Linux has removed strlcpy in favour of strscpy. This implements a
fallback implementation of strlcpy for this case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 11:36:07 -08:00
Rob Norris 386d6a7533 Linux 6.8 compat: update for new bdev access functions
blkdev_get_by_path() and blkdev_put() have been replaced by
bdev_open_by_path() and bdev_release(), which return a "handle" object
with the bdev object itself inside.

This adds detection for the new functions, and macros to handle the old
and new forms consistently.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 11:36:07 -08:00
Rob Norris a41d0b29a9 Linux 6.8 compat: make test functions static
The kernel is now being compiled with -Wmissing-prototypes. Most of our
test stub functions had no prototype, and failed to compile. Since they
don't need to be visible anywhere else, just make them all static.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 11:36:07 -08:00
Brian Behlendorf 0c11f7c96f Linux 6.7 compat: META
Update the META file to reflect compatibility with the 6.7 kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15833
2024-01-29 11:35:43 -08:00
Paul Dagnelie 8161b73272 Don't assert mg_initialized due to device addition race
During device removal stress tests, we noticed that we were tripping 
the assertion that mg_initialized was true. After investigation, it was 
determined that the mg in question was the embedded log metaslab 
group for a newly added vdev; the normal mg had been initialized (by 
metaslab_sync_reassess, via vdev_sync_done). However, because the spa 
config alloc lock is not held as writer across both calls to 
metaslab_sync_reassess, it is possible for an allocation to happen 
between the two metaslab_groups being initialized. Because the metaslab 
code doesn't check the group in question, just the vdev's main mg, it 
is possible to get past the initial check in vdev_allocatable and 
later fail due to the assertion.

We simply remove the assertions. We could also consider locking the 
ALLOC lock around the reassess calls in vdev_sync_done, but that risks 
deadlocks. We could check the actual target mg in vdev_allocatable, 
but that risks racing with a passivation that comes in after that 
check but before the assertion. We still won't be able to actually 
allocate from the metaslab group if no metaslabs are ready, so this 
change shouldn't break anything.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15818
2024-01-29 10:36:42 -08:00
Richard Kojedzinszky 401c3563d4 libzfs: use zfs_strerror() in place of strerror()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Richard Kojedzinszky <richard@kojedz.in>
Closes #15793
2024-01-29 09:54:57 -08:00
Richard Kojedzinszky 692f0daba3 libzfs: make userquota_propname_decode threadsafe
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Richard Kojedzinszky <richard@kojedz.in>
Closes #15793
2024-01-29 09:54:43 -08:00
rilysh 0cbf135293 libnvpair.c: replace strstr() with strchr() for a single character
Since we're looking for a single new-line character in the haystack,
it's better (and slightly more efficient) to use strchr() instead of
strstr().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: rilysh <nightquick@proton.me>
Closes #15798
2024-01-29 09:46:13 -08:00
Chris Davidson c3fd7a5217 Update man pages to time(1) from time(2)
zpool-iostat.8: Updated time(2) -> time(1) to align to manual page
zpool-list.8: Updated time(2) -> time(1) to align to manual page
zpool-status.8: Updated time(2) -> time(1) to align to manual page
zpool-wait.8: Update time(2) -> time(1) to align to manual page

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christopher Davidson <christopher.davidson@gmail.com>
Closes #15823
2024-01-29 09:44:08 -08:00
Brian Behlendorf 9ad362c1de ZTS: Allow longer run time for zdb_args_pos
The zdb_args_pos test may take slightly longer than 600 seconds to run
on some of the CI builders.  To prevent this from causing failures allow
up to 1200 seconds for tests in this group.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15826
2024-01-29 09:41:26 -08:00
Andrew Innes 3965c9ba38 Move nodes into correct subgraphs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Closes #15828
2024-01-29 09:16:02 -08:00
MigeljanImeri 78e8c1f844 Remove list_size struct member from list implementation
Removed the list_size struct member as it was only used in a single
assertion, as mentioned in PR #15478.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: MigeljanImeri <imerimigel@gmail.com>
Closes #15812
2024-01-26 14:46:42 -08:00
Rob N 884a48d991 zpool wait: print timestamp before the header
list, status and iostat all display the -T timestamp before the header,
but wait showed it after. Make it be like the others.

Reported-by: Kyle Evans <kevans@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15825
2024-01-26 14:41:31 -08:00
Ameer Hamza aeb33776f5 Update vdev devid and physpath if changed between imports
If devid or physpath for a vdev changes between imports, ensure it is
updated to the new value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15816
2024-01-26 14:24:35 -08:00
Tino Reichardt fb27698cfc ZTS: Update deprecated Github Action version numbers
GitHub Actions is transitioning from Node 16 to Node 20.

So we need to update these:
- actions/checkout@v3 -> v4
- actions/download-artifact@v3 -> v4
- actions/upload-artifact@v3 -> v4 and some minor changes

Update also the documentation of the testings workflow.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Andrew Innes <andrew.c12@gmail.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15820
2024-01-26 14:22:26 -08:00
Richard Yao e7af89d972 Switch to CodeQL to detect prohibited function use
The LLVM/Clang developers pointed out that using the CPP to detect use
of functions that our QA policies prohibit risks invoking undefined
behavior. To resolve this, we configure CodeQL to detect forbidden
function usage.

Note that cpp in the context of CodeQL refers to C/C++, rather than the
C PreProcessor, which C++ also uses. It really should have been written
cxx, but that ship sailed a long time ago. This misuse of the term cpp
is retained in the CodeQL configuration for consistency with upstream
CodeQL.

As a side benefit, verbose make no longer is a wall of text showing a
bunch of CPP macros, which can make debugging slightly easier.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #15819 
Closes #14134
2024-01-26 14:11:33 -08:00
Tino Reichardt dac0bae561 ZTS: Apply small changes for speeding up the tests
The Github Action Runner got some new hardware metrics.  We should use
the provided and empty disk which is pre-mounted at /mnt now.

Disk1: 89GiB -> rootfs + bootfs with ~80MB/s -> don't care
Disk2: 64GiB -> /mnt with 420MB/s -> new testing ssd

This commit will mount the new disk to /var/tmp and provide hopefully
some speedups within our testings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Andrew Innes <andrew.c12@gmail.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15811
2024-01-26 13:36:59 -08:00
Pawel Jakub Dawidek a4bf6baaeb Fix file descriptor leak on pool import.
Descriptor leak can be easily reproduced by doing:

	# zpool import tank
	# sysctl kern.openfiles
	# zpool export tank; zpool import tank
	# sysctl kern.openfiles

We were leaking four file descriptors on every import.

Similar leak most likely existed when using file-based VDEVs.

External-issue: https://reviews.freebsd.org/D43529
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15630
2024-01-23 15:03:48 -08:00
Brian Behlendorf 435b173fd9 ZTS: Apply zfs_bclone_enabled to bclone tests
If block cloning is disabled by default then enable it when running
the bclone tests.  Follow up to #15529.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15796
2024-01-22 16:14:08 -08:00
Val Packett d9cb42da99 FreeBSD: Fix bootstrapping tools under Linux/musl
musl libc has deprecated LFS64 aliases, so bootstrapping FreeBSD tools
under musl distros has been failing with stat64 errors.

Apply the aliases under non-glibc Linux to fix this problem.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Val Packett <val@packett.cool>
Closes #15780
2024-01-19 13:01:26 -08:00
Tino Reichardt a0b2a93c41 fix: variable type with zfs-tests/cmd/clonefile.c
Compiling on arm64 freebsd-13.2 and arm64 almalinux-8 brings currently
this error:

```
  CC       tests/zfs-tests/cmd/clonefile.o
tests/zfs-tests/cmd/clonefile.c:166:43: error: result of comparison of \
constant -1 with expression of type 'char' is always true \
[-Werror,-Wtautological-constant-out-of-range-compare]
        while ((c = getopt(argc, argv, "crfdq")) != -1) {
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^  ~~
1 error generated.
gmake[2]: *** [Makefile:8675: tests/zfs-tests/cmd/clonefile.o] Error 1
```

Fix: use correct variable type `int`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15783
2024-01-17 09:06:14 -08:00
Tino Reichardt e3d3d772de linux spl: fix typo in top comment of spl-condvar.c
Credential Implementation -> Condition Variables Implementation

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15782
2024-01-17 09:05:12 -08:00
Kevin Jin 1494e8fbaa Autotrim High Load Average Fix
Switch from cv_wait() to cv_wait_idle() in vdev_autotrim_wait_kick(),
which should mitigate the high load average while waiting.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes #15781
2024-01-17 09:03:58 -08:00
Pawel Jakub Dawidek f45dd90f34 Fix cloning into mmaped and cached file.
If the destination file is mmaped and the mmaped region was already
read, so it is cached, we need to update mmaped pages after successful
clone using update_pages().

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Pointed out by: Ka Ho Ng <khng@freebsd.org>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15772
2024-01-17 08:51:07 -08:00
Rob N f0bf7a247d Linux 6.7 compat: zfs_setattr fix atime update
In db4fc559c I messed up and changed this bit of code to set the inode
atime to an uninitialised value, when actually it was just supposed to
loading the atime from the inode to be stored in the SA. This changes it
to what it should have been.

Ensure times change by the right amount Previously, we only checked 
if the times changed at all, which missed a bug where the atime was 
being set to an undefined value.

Now ensure the times change by two seconds (or thereabouts), ensuring
we catch cases where we set the time to something bonkers

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15762
Closes #15773
2024-01-16 14:01:17 -08:00
Lalufu ef00da803d Make sure all necessary RPM path macros are defined
When building (s)rpm files through the Makefile, a directory structure
is created in /tmp to hold the various files.

In case the user running the command has overridden some of the RPM path
settings through their user profile (for example in `~/.rpmmacros`),
these paths do not line up with the configuration, and the build fails.

Make sure all paths used are properly defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ralf Ertzinger <ralf@skytale.net>
Closes #15756
2024-01-16 13:32:59 -08:00
youzhongyang 29ea6faf8f Make spl_kmem_cache size check consistent
On Linux x86_64, kmem cache can have size up to 4M,
however increasing spl_kmem_cache_slab_limit can lead
to crash due to the size check inconsistency.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #15757
2024-01-16 13:30:58 -08:00
Ameer Hamza b64be1624c Add path handling for aux vdevs in label_path
If the AUX vdev is added using UUID, importing the pool falls back AUX
vdev to open it with disk name instead of UUID due to the absence of
path information for AUX vdevs. Since AUX label now have path
information, this PR adds path handling for it in `label_path`.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15737
2024-01-16 13:18:07 -08:00
Ameer Hamza 2df2a58dc1 Extend aux label to add path information
Pool import logic uses vdev paths, so it makes sense to add path
information on AUX vdev as well.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15737
2024-01-16 13:17:59 -08:00
Ameer Hamza d9885b3776 fix: Uber block label not always found for aux vdevs
When spare or l2cache (aux) vdev is added during pool creation,
spa->spa_uberblock is not dumped until that point. Subsequently,
the aux label is never synchronized after its initial creation,
resulting in the uberblock label remaining undumped. The uberblock
is crucial for lib_blkid in identifying the ZFS partition type. To
address this issue, we now ensure sync of the uberblock label once
if it's not dumped initially.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15737
2024-01-16 13:17:14 -08:00
Rich Ercolani 1f5bf96001 Make zdb -R a little more sane.
zdb -R has a minor flaw in which it will not always print the full
output of a decompressed block. Oops.

While I was in there, I also reworked the logic so it won't try
ZLE unless everything else fails, which will hopefully avoid the
problem ZDB_NO_ZLE was intended to mitigate of reporting a lot of
false positives of ZLE compressed blocks...

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15723
2024-01-16 13:16:08 -08:00
Umer Saleem 995734ed12 ZTS: Test for clone, mmap and write for block cloning
For block cloning, if we mmap the cloned file and write from the
map into the file, it triggers a panic in dbuf_redirty() on Linux.

The same scenario causes data corruption on FreeBSD. Both these
issues are fixed under PR#15656 and PR#15665.

It would be good to add a test for this scenario in ZTS. The test
program and issue was produced by @robn.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15717
2024-01-16 13:15:10 -08:00
Brian Behlendorf a1771d243a Fix "out of memory" error
Drop the no_memory() call from zpool_in_use() when reading the
label fails and instead return the error to the caller.  This
prevents a misleading "internal error: out of memory" error
when the label can't be read.  This will result in is_spare()
returning B_FALSE instead of aborting, which is already safely
handled.

Furthermore, on Linux it's possible for EREMOTEIO to returned
by an NVMe device if the device has been low-level formatted
and not rescanned.  In this case we want to fallback to the
legacy scanning method and read any of the labels we can.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #13538
Closes #15747
2024-01-12 12:35:29 -08:00
Benjamin Sherman 363368c670 fix: preserve linux kmod signature in zfs-kmod rpm spec
This change provides rpm spec macros to sign the zfs and spl kmods as
the final step after the %install scriptlet. This is needed since the
find-debuginfo.sh script strips out debug symbols plus signatures.

Kernel module signing only occurs when the required files are present
as typically required in the Linux source tree:
- certs/signing_key.pem
- certs/signing_key.x509

The method for overriding the default __spec_install_post macro is
inspired by (and largely copied from) the Fedora kernel.spec.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Benjamin Sherman <benjamin@holyarmy.org>
Closes #15744
2024-01-12 12:33:41 -08:00
Mark Johnston 5a703d1368 spa: Let spa_taskq_param_get()'s addition of a newline be optional
For FreeBSD sysctls, we don't want the extra newline, since the
sysctl(8) utility will format strings appropriately.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15719
2024-01-12 12:24:56 -08:00
Mark Johnston 3bddc4daec spa: Fix FreeBSD sysctl handlers
sbuf_cpy() resets the sbuf state, which is wrong for sbufs allocated by
sbuf_new_for_sysctl().  In particular, this code triggers an assertion
failure in sbuf_clear().

Simplify by just using sysctl_handle_string() for both reading and
setting the tunable.

Fixes: 6930ecbb7 ("spa: make read/write queues configurable")
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15719
2024-01-12 12:24:21 -08:00
Rich Ercolani 6138af86b3 Stop wasting time on malloc in snprintf_zstd_header
Profiling zdb -vvvvv on datasets with a lot of zstd blocks, we find
ourselves spending quite a lot of time on malloc/free, because we
allocate a 16M abd each call, and never free it, so we're leaking
16M per call as well.

This seems sub-optimal. So let's just keep the buffer around and
reuse it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15721
2024-01-12 12:17:26 -08:00
Stefan Lendl 66670ba9f0 fix(mount): do not truncate shares not zfs mount
When running zfs share -a resetting the exports.d/zfs.exports makes
sense the get a clean state.
Truncating was also called with zfs mount which would not populate the
file again.
Add test to verify shares persist after mount -a.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stefan Lendl <s.lendl@proxmox.com>
Closes #15607 
Closes #15660
2024-01-12 12:05:11 -08:00
Brian Behlendorf c4fa674367 Enable block_cloning tests on FreeBSD
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15749
2024-01-12 11:57:13 -08:00
Rich Ercolani 20dd16d9f7 Make zdb -R scale less poorly
zdb -R with :d tries to use gzip decompression 9 times per size.
There's absolutely no reason for that, they're all the same
decompressor.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15726
2024-01-12 11:55:17 -08:00
Mark Johnston 1a11ad9d20 Fix a potential use-after-free in zfs_setsecattr()
In general, VOPs must not load the "z_log" field until having called
zfs_enter_verify_zp().

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15752
2024-01-12 11:52:18 -08:00
Mark Johnston d8b2686603 Linux: Defer loading the object set in zfs_setattr()
We need to wait until after having done a zfs_enter() to load some
fields from the zfsvfs structure.  Otherwise a use-after-free is
possible in the face of a concurrent rollback.

Other functions in this file are careful to avoid this bug, I believe
this is the only instance.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15752
2024-01-12 11:51:53 -08:00
gofaster a382e21194 Add Gotify notification support to ZED
This commit adds the zed_notify_gotify() function and hooks it
into zed_notify(). This will allow ZED to send notifications
to a self-hosted Gotify service, which can be received
on a desktop or mobile device. It is configured with ZED_GOTIFY_URL,
ZED_GOTIFY_APPTOKEN and ZED_GOTIFY_PRIORITY variables in zed.rc.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: gofaster <felix.gofaster@gmail.com>
Closes #15693
2024-01-09 09:49:30 -08:00
Alexander Motin e78aca3b33 Fix livelist assertions for dedup and cloning
Two block pointers in livelist pointing to the same location may
be caused not only by dedup, but also by block cloning. We should
not assert D bit set in them.

Two block pointers in livelist pointing to the same location may
have different logical birth time in case of dedup or cloning. We
should assert identical physical birth time instead.

Assert identical physical block size between pointers in addition
to checksum, since that is what checksums are calculated on.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15732
2024-01-09 09:48:40 -08:00
Alexander Motin 255741fc97 Improve block sizes checks during cloning
- Fail if source block is smaller than destination.  We can only
grow blocks, not shrink them.
 - Fail if we do not have full znode range lock.  In that case grow
is not even called.  We should improve zfs_rangelock_cb() somehow
to know when cloning needs to grow the block size unlike write.
 - Fail of we tried to resize, but failed.  There are many reasons
for it to fail that we can not predict at this level, so be ready
for them.  Unlike write, that may proceed after growth failure,
block cloning can't and must return error.

This fixes assertion inside dmu_brt_clone() when it sees different
number of blocks held in destination than it got block pointers.
Builds without ZFS_DEBUG returned EXDEV, so are not affected much.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15724 
Closes #15735
2024-01-09 09:46:43 -08:00
Kent Ross 7ecaa07580 make zdb_decompress_block check decompression reliably
This function decompresses to two buffers and then compares them to
check whether the (opaque) decompression process filled the whole
buffer. Previously it began with lbuf uninitialized and lbuf2 filled
with pseudorandom data. This neither guarantees that any bytes not
written by the compressor would be different, nor seems incredibly
sound otherwise!

After these changes, instead of filling one buffer with generated
pseudorandom data we overwrite each buffer with completely different
data. This should remove the possibility of low-probability failures,
as well as make the process simpler and cheaper.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Kent Ross <k@mad.cash>
Closes #15733
2024-01-09 09:13:52 -08:00
Jose Luis Duran bd3f90c0c1 zpoolprops.7: Remove unnecessary .Ns
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jose Luis Duran <jlduran@gmail.com>
Closes #15727
2024-01-08 17:03:15 -08:00
Alexander Motin 9e0d12e310 ZIL: Update Linux tracing after #15635
While picking parts from #14909 I've missed Linux tracing specific
ones, that went unnoticed in default configurations, but breaks the
build in some.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15730
2024-01-08 16:49:39 -08:00
Shengqi Chen 7b89149c9f Linux 6.2 compat: add check for kernel_neon_* availability
This patch adds check for `kernel_neon_*` symbols on arm and arm64
platforms to address the following issues:

1. Linux 6.2+ on arm64 has exported them with `EXPORT_SYMBOL_GPL`, so
   license compatibility must be checked before use.
2. On both arm and arm64, the definitions of these symbols are guarded
   by `CONFIG_KERNEL_MODE_NEON`, but their declarations are still
   present. Checking in configuration phase only leads to MODPOST
   errors (undefined references).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15711 
Closes #14555 
Closes: #15401
2024-01-08 16:05:24 -08:00
Mark Johnston 07e95b4670 Fix the FreeBSD userspace build (#15716)
- Mark some parameters to zpool_power*() as unused.
- Add a stub zpool_disk_wait().

Fixes: a9520e6e5 ("zpool: Add slot power control, print power status")

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2023-12-27 12:17:53 -08:00
Pawel Jakub Dawidek 4cf4bc7334 Block cloning tests.
The test mostly focus on testing various corner cases.
The tests take a long time to run, so for the common.run runfile
we randomly select a hundred tests.
To run all the bclone tests, bclone.run runfile should be used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15631
2023-12-26 12:01:53 -08:00
Brian Behlendorf 233d34e47e Linux 6.5 compat: check BLK_OPEN_EXCL is defined
On some systems we already have blkdev_get_by_path() with 4 args
but still the old FMODE_EXCL and not BLK_OPEN_EXCL defined.
The vdev_bdev_mode() function was added to handle this case
but there was no generic way to specify exclusive access.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15692
2023-12-21 11:22:56 -08:00
chrisperedun 5a4915660c Don't panic on unencrypted block in encrypted dataset
While 763ca47 closes the situation of block cloning creating
unencrypted records in encrypted datasets, existing data still causes
panic on read. Setting zfs_recover bypasses this but at the cost of
potentially ignoring more serious issues.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Peredun <chris.peredun@ixsystems.com>
Closes #15677
2023-12-21 11:12:30 -08:00
Alexander Motin eff77a802d ZIL: Improve next log block size prediction
Track history in context of bursts, not individual log blocks. It
allows to not blow away all the history by single large burst of
many block, and same time allows optimizations covering multiple
blocks in a burst and even predicted following burst.  For each
burst account its optimal block size and minimal first block size.
Use that statistics from the last 8 bursts to predict first block
size of the next burst.

Remove predefined set of block sizes. Allocate any size we see fit,
multiple of 4KB, as required by ZIL now.  With compression enabled
by default, ZFS already writes pretty random block sizes, so this
should not surprise space allocator any more.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15635
2023-12-21 10:54:44 -08:00
Tony Hutter a9520e6e59 zpool: Add slot power control, print power status
Add `zpool` flags to control the slot power to drives.  This assumes
your SAS or NVMe enclosure supports slot power control via sysfs.

The new `--power` flag is added to `zpool offline|online|clear`:

    zpool offline --power <pool> <device>    Turn off device slot power
    zpool online --power <pool> <device>     Turn on device slot power
    zpool clear --power <pool> [device]      Turn on device slot power

If the ZPOOL_AUTO_POWER_ON_SLOT env var is set, then the '--power'
option is automatically implied for `zpool online` and `zpool clear`
and does not need to be passed.

zpool status also gets a --power option to print the slot power status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mart Frauenlob <AllKind@fastest.cc>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15662
2023-12-21 10:53:16 -08:00
Rob N 6930ecbb75 spa: make read/write queues configurable
We are finding that as customers get larger and faster machines
(hundreds of cores, large NVMe-backed pools) they keep hitting
relatively low performance ceilings. Our profiling work almost always
finds that they're running into bottlenecks on the SPA IO taskqs.
Unfortunately there's often little we can advise at that point, because
there's very few ways to change behaviour without patching.

This commit adds two load-time parameters `zio_taskq_read` and
`zio_taskq_write` that can configure the READ and WRITE IO taskqs
directly.

This achieves two goals: it gives operators (and those that support
them) a way to tune things without requiring a custom build of OpenZFS,
which is often not possible, and it lets us easily try different config
variations in a variety of environments to inform the development of
better defaults for these kind of systems.

Because tuning the IO taskqs really requires a fairly deep understanding
of how IO in ZFS works, and generally isn't needed without a pretty
serious workload and an ability to identify bottlenecks, only minimal
documentation is provided. Its expected that anyone using this is going
to have the source code there as well.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15675
2023-12-20 14:17:14 -08:00
Rob Norris 957dc1037a Linux 6.7 compat: rework shrinker setup for heap allocations
6.7 changes the shrinker API such that shrinkers must be allocated
dynamically by the kernel. To accomodate this, this commit reworks
spl_register_shrinker() to do something similar against earlier kernels.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://github.com/sponsors/robn
Closes #15681
2023-12-20 11:47:55 -08:00
Rob Norris 1d324aceef Linux 6.7 compat: handle superblock shrinker member change
In 6.7 the superblock shrinker member s_shrink has changed from being an
embedded struct to a pointer. Detect this, and don't take a reference if
it already is one.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://github.com/sponsors/robn
Closes #15681
2023-12-20 11:47:50 -08:00
Rob Norris db4fc559cc Linux 6.7 compat: use inode atime/mtime accessors
6.6 made i_ctime inaccessible; 6.7 has done the same for i_atime and
i_mtime. This extends the method used for ctime in b37f29341 to atime
and mtime as well.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://github.com/sponsors/robn
Closes #15681
2023-12-20 11:47:40 -08:00
Rob Norris 00f40961e0 Linux 6.7 compat: simplify current_time() check
6.7 changed the names of the time members in struct inode, so we can't
assign back to it because we don't know its name. In practice this
doesn't matter though - if we're missing current_time(), then we must be
on <4.9, and we know our fallback will need to return timespec.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://github.com/sponsors/robn
Closes #15681
2023-12-20 11:47:18 -08:00
Umer Saleem dbda45160f Test LWB buffer overflow for block cloning
PR#15634 removes 128K into 2x68K LWB split optimization, since it
was found to cause LWB buffer overflow while trying to write 128KB
TX_CLONE_RANGE record with 1022 block pointers into 68KB buffer,
with multiple VDEVs ZIL.

This commit adds a test for this particular scenario by writing
maximum sizes TX_CLONE_RANE record with 1022 block pointers into
68KB buffer, with two SLOG devices.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15672
2023-12-15 14:18:27 -08:00
Alexander Motin 9b1677fb5a dmu: Allow buffer fills to fail
When ZFS overwrites a whole block, it does not bother to read the
old content from disk. It is a good optimization, but if the buffer
fill fails due to page fault or something else, the buffer ends up
corrupted, neither keeping old content, nor getting the new one.

On FreeBSD this is additionally complicated by page faults being
blocked by VFS layer, always returning EFAULT on attempt to write
from mmap()'ed but not yet cached address range.  Normally it is
not a big problem, since after original failure VFS will retry the
write after reading the required data.  The problem becomes worse
in specific case when somebody tries to write into a file its own
mmap()'ed content from the same location.  In that situation the
only copy of the data is getting corrupted on the page fault and
the following retries only fixate the status quo.  Block cloning
makes this issue easier to reproduce, since it does not read the
old data, unlike traditional file copy, that may work by chance.

This patch provides the fill status to dmu_buf_fill_done(), that
in case of error can destroy the corrupted buffer as if no write
happened.  One more complication in case of block cloning is that
if error is possible during fill, dmu_buf_will_fill() must read
the data via fall-back to dmu_buf_will_dirty().  It is required
to allow in case of error restoring the buffer to a state after
the cloning, not not before it, that would happen if we just call
dbuf_undirty().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15665
2023-12-15 09:51:41 -08:00
Alexander Motin 86e115e21e dbuf: Set dr_data when unoverriding after clone
Block cloning normally creates dirty record without dr_data.  But if
the block is read after cloning, it is moved into DB_CACHED state and
receives the data buffer.  If after that we call dbuf_unoverride()
to convert the dirty record into normal write, we should give it the
data buffer from dbuf and release one.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15654
Closes #15656
2023-12-12 12:59:24 -08:00
Alexander Motin 86063d9031 dbuf: Handle arcbuf assignment after block cloning
In some cases dbuf_assign_arcbuf() may be called on a block that
was recently cloned.  If it happened in current TXG we must undo
the block cloning first, since the only one dirty record per TXG
can't and shouldn't mean both cloning and overwrite same time.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15653
2023-12-12 12:53:59 -08:00
Brian Behlendorf f4b97c1e00 ZTS: Update raidz_expand_005_pos.ksh
Align the raidz_expand_005_pos test with the raidz_expand_004_pos test
and only verify no errors were reported.  Allow scrub repair IO.

Reviewed-by: Don Brady <don.brady@klarasystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15663
2023-12-12 09:56:19 -08:00
Chunwei Chen a9b937e066 For db_marker inherit the db pointer for AVL comparision.
While evicting dbufs of a dnode, a marker node is added to the AVL.
The marker node should be inserted in AVL tree ahead of the dbuf its
trying to delete. The blkid and level is used to ensure this. However,
this could go wrong there's another dbufs with the same blkid and level
in DB_EVICTING state but not yet removed from AVL tree. dbuf_compare()
could fail to give the right location or could cause confusion and
trigger ASSERTs.

To ensure that the marker is inserted before the deleting dbuf, use
the pointer value of the original dbuf for comparision.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sanjeev Bagewadi <sanjeev.bagewadi@nutanix.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #12482 
Closes #15643
2023-12-11 14:42:06 -08:00
Tony Hutter b1748eaee0 ZTS: Add dirty dnode stress test
Add a test for the dirty dnode SEEK_HOLE/SEEK_DATA bug described in
https://github.com/openzfs/zfs/issues/15526

The bug was fixed in https://github.com/openzfs/zfs/pull/15571 and
was backported to 2.2.2 and 2.1.14.  This test case is just to
make sure it does not come back.

seekflood.c originally written by Rob Norris.

Reviewed-by: Graham Perrin <grahamperrin@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15608
2023-12-11 09:59:59 -08:00
Brian Behlendorf 9f6ad4dcb6 ZTS: Add zpool_import_status.ksh to Makefile.am
The zpool_import_status.ksh test case was not being run because
it was not included in the Makefile.am.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15655
2023-12-11 09:40:32 -08:00
Brian Behlendorf 8ad73bf449 ZTS: Disable io_uring test on CentOS 9
The io_uring test fails on CentOS 9 with the following fio error.
Disable the test for the benefit of the CI until this can be fully
investigated.  This basic test passes as expected on newer kernels.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15636
2023-12-08 17:31:31 -08:00
Alexander Motin e53e60c0bd DMU: Fix lock leak on dbuf_hold() error
dmu_assign_arcbuf_by_dnode() should drop dn_struct_rwlock lock in
case dbuf_hold() failed.  I don't have reproduction for this, but
it looks inconsistent with dmu_buf_hold_noread_by_dnode() and co.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15644
2023-12-08 16:43:39 -08:00
Ameer Hamza 2ebb9a4811 ZTS: Add test cases for block cloning replay
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15614
2023-12-08 16:39:14 -08:00
Ameer Hamza 5ff43969e7 ZTS: block_cloning: Use numeric sort for get_same_blocks
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15614
2023-12-08 16:38:45 -08:00
Mauricio Faria de Oliveira 3c7650491b zed: fix typo in variable ZED_POWER_OFF_ENCLO*US*RE_SLOT_ON_FAULT
Replace ENCLO_US_RE with ENCLO_SU_RE in the name of the variable.

Note this changes the user-visible string in zed.rc, thus might
break current users with the wrong string, but it's ~2 months
since zfs-2.2.0 tag is out, thus should not be widespread yet.

Mechanical change:

    $ grep -rl ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT
    cmd/zed/zed.d/zed.rc
    cmd/zed/zed.d/statechange-slot_off.sh

    $ sed -i 's/ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT/<linebreak>
                ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT/g' \
      cmd/zed/zed.d/zed.rc \
      cmd/zed/zed.d/statechange-slot_off.sh

    $ grep -rl ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT
    $

Fixes 11fbcacf37
("zed: Add zedlet to power off slot when drive is faulted")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Closes #15651
2023-12-08 16:32:35 -08:00
Rob N 450f2d0b08 import: ignore return on hostid lookups
Just silencing a warning. Its totally fine for a hostid to not be there.

Reported-by: Coverity (CID-1573336)

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15650
2023-12-07 08:41:54 -08:00
Rob N f0cb6482e1 setproctitle: fix ununitialised variable
Reported-by: Coverity (CID-1573333)

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15649
2023-12-07 08:23:16 -08:00
Rob N 4836d293c0 zfs_refcount_remove: explictly ignore returns
Coverity noticed that sometimes we ignore the return, and sometimes we
don't. Its not wrong, and I like consistent style, so here we are.

Reported-by: Coverity (CID-1564584)
Reported-by: Coverity (CID-1564585)
Reported-by: Coverity (CID-1564586)
Reported-by: Coverity (CID-1564587)
Reported-by: Coverity (CID-1564588)

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15647
2023-12-07 08:21:38 -08:00
Mark Johnston 11656234b5 FreeBSD: Ensure that zfs_getattr() initializes the va_rdev field
Otherwise the field is left uninitialized, leading to a possible kernel
memory disclosure to userspace or to the network.  Use the same
initialization value we use in zfsctl_common_getattr().

Reported-by: KMSAN
Sponsored-by: The FreeBSD Foundation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ed Maste <emaste@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15639
2023-12-07 08:20:11 -08:00
Alexander Motin 9743d09635 BRT: Limit brt_vdev_dump() to only one vdev
Without this patch on pool of 60 vdevs with ZFS_DEBUG enabled clone
takes much more time than copy, while heavily trashing dbgmsg for
no good reason, repeatedly dumping all vdevs BRTs again and again,
even unmodified ones.

I am generally not sure this dumping is not excessive, but decided
to keep it for now, just restricting its scope to more reasonable.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15625
2023-12-06 15:37:27 -08:00
Alexander Motin 2aa3a482ab ZIL: Remove 128K into 2x68K LWB split optimization
To improve 128KB block write performance in case of multiple VDEVs
ZIL used to spit those writes into two 64KB ones.  Unfortunately it
was found to cause LWB buffer overflow, trying to write maximum-
sizes 128KB TX_CLONE_RANGE record with 1022 block pointers into
68KB buffer, since unlike TX_WRITE ZIL code can't split it.

This is a minimally-invasive temporary block cloning fix until the
following more invasive prediction code refactoring.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15634
2023-12-06 15:02:05 -08:00
Alexander Motin f9765b182e zdb: Dump encrypted write and clone ZIL records
Block pointers are not encrypted in TX_WRITE and TX_CLONE_RANGE
records, so we can dump them, that may be useful for debugging.

Related to #15543.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15629
2023-12-06 12:39:12 -08:00
Shengqi Chen 86239a5b9c compact: workaround for GPL-only symbols on riscv from Linux 6.2
Since Linux 6.2, the implementation of flush_dcache_page on riscv
references GPL-only symbol `PageHuge`, breaking the build of zfs.

This patch uses existing mechanism to override flush_dcache_page,
removing the call to `PageHuge`. According to comments in kernel,
it is only used to do some check against HugeTLB pages, which only
exist in userspace. ZFS uses flush_dcache_page only on kernel pages,
thus this patch will not introduce any behaviour change.

See also: torvalds/linux@d33deda, openzfs/zfs@589f59b

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #14974 
Closes #15627
2023-12-06 12:37:50 -08:00
Don Brady 687e4d7f9c Extend import_progress kstat with a notes field
Detail the import progress of log spacemaps as they can take a very
long time.  Also grab the spa_note() messages to, as they provide
insight into what is happening

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Closes #15539
2023-12-05 14:27:56 -08:00
Shengqi Chen 727497ccdf module/icp/asm-arm/sha2: enable non-SIMD asm kernels on armv5/6
My merged pull request #15557 fixes compilation of sha2 kernels on arm
v5/6. However, the compiler guards only allows sha256/512_armv7_impl to
be used when __ARM_ARCH > 6. This patch enables these ASM kernels on all
arm architectures. Some compiler guards are adjusted accordingly to
avoid the unnecessary compilation of SIMD (e.g., neon, armv8ce) kernels
on old architectures.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15623
2023-12-05 12:01:09 -08:00
Rob N 5f2700eee5 zpool: flush output before sleeping
Several zpool commands (status, list, iostat) have modes that present
some information, sleep a while, present the current state, sleep, etc.
Some of those had ways to invoke them that when piped would appear to do
nothing for a while, because non-terminals are block-buffered, not
line-buffered, by default.  Fix this by forcing a flush before sleeping.

In particular, all of these buffered:
- zpool status <pool> <interval>
- zpool iostat -y<m> <pool> <interval>
- zpool list <pool> <interval>

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15593
2023-12-05 11:53:14 -08:00
oromenahar c7b6119268 Allow block cloning across encrypted datasets
When two datasets share the same master encryption key, it is safe
to clone encrypted blocks. Currently only snapshots and clones
of a dataset share with it the same encryption key.

Added a test for:
- Clone from encrypted sibling to encrypted sibling with
  non encrypted parent
- Clone from encrypted parent to inherited encrypted child
- Clone from child to sibling with encrypted parent
- Clone from snapshot to the original datasets
- Clone from foreign snapshot to a foreign dataset
- Cloning from non-encrypted to encrypted datasets
- Cloning from encrypted to non-encrypted datasets

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Original-patch-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Signed-off-by: Kay Pedersen <mail@mkwg.de>
Closes #15544
2023-12-05 11:03:48 -08:00
Alexander Motin 55b764e062 ZIL: Do not clone blocks from the future
ZIL claim can not handle block pointers cloned from the future,
since they are not yet allocated at that point.  It may happen
either if the block was just written when it was cloned, or if
the pool was frozen or somehow else rewound on import.

Handle it from two sides: prevent cloning of blocks with physical
birth time from not yet synced or frozen TXG, and abort ZIL claim
if we still detect such blocks due to rewind or something else.

While there, assert that any cloned blocks we claim are really
allocated by calling metaslab_check_free().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15617
2023-12-05 10:58:11 -08:00
Dex Wood 014265f4e6 Add Ntfy notification support to ZED
This commit adds the zed_notify_ntfy() function and hooks it
into zed_notify(). This will allow ZED to send notifications
to ntfy.sh or a self-hosted Ntfy service, which can be received
on a desktop or mobile device. It is configured with ZED_NTFY_TOPIC,
ZED_NTFY_URL, and ZED_NTFY_ACCESS_TOKEN variables in zed.rc.

Reviewed-by: @classabbyamp 
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dex Wood <slash2314@gmail.com>
Closes #15584
2023-12-01 15:25:17 -08:00
Alexander Motin bcd83ccd25 ZIL: Remove TX_CLONE_RANGE replay for ZVOLs.
zil_claim_clone_range() takes references on cloned blocks before ZIL
replay.  Later zil_free_clone_range() drops them after replay or on
dataset destroy.  The total balance is neutral.  It means we do not
need to do anything (drop the references) for not implemented yet
TX_CLONE_RANGE replay for ZVOLs.

This is a logical follow up to #15603.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15612
2023-12-01 15:23:20 -08:00
Alexander Motin adcea23cb0 ZIO: Add overflow checks for linear buffers
Since we use a limited set of kmem caches, quite often we have unused
memory after the end of the buffer.  Put there up to a 512-byte canary
when built with debug to detect buffer overflows at the free time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15553
2023-12-01 11:50:10 -08:00
Brooks Davis 3e4bef52b0 Only provide execvpe(3) when needed
Check for the existence of execvpe(3) and only provide the FreeBSD
compat version if required.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Brooks Davis <brooks.davis@sri.com>
Closes #15609
2023-11-30 16:04:18 -08:00
Yuri Pankov 735ba3a7b7 Use uint64_t instead of u_int64_t
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Yuri Pankov <ypankov@tintri.com>
Closes #15610
2023-11-30 10:36:33 -08:00
Alexander Motin a03ebd9bee ZIL: Call brt_pending_add() replaying TX_CLONE_RANGE
zil_claim_clone_range() takes references on cloned blocks before ZIL
replay.  Later zil_free_clone_range() drops them after replay or on
dataset destroy.  The total balance is neutral.  It means on actual
replay we must take additional references, which would stay in BRT.

Without this blocks could be freed prematurely when either original
file or its clone are destroyed.  I've observed BRT being emptied
and the feature being deactivated after ZIL replay completion, which
should not have happened.  With the patch I see expected stats.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15603
2023-11-29 10:51:34 -08:00
Wraithh 8adf2e3066 Fix zoneid when USER_NS is disabled
getzoneid() should return GLOBAL_ZONEID instead of 0 when USER_NS is disabled.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ilkka Sovanto <github@ilkka.kapsi.fi>
Closes #15560
2023-11-29 09:55:17 -08:00
VaibhavB 7d68900af3 ZTS: get_persistent_disk_name can return truncated names
Instead of using only the 3rd element return the entire string after 
the split to handle device names with dashes.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Vaibhav Bhanawat <vaibhav.bhanawat@delphix.com>
Closes #15567
2023-11-29 09:34:29 -08:00
Martin Matuška 1c38cdfe98 zdb: fix printf() length for uint64_t devid
Bug introduced in 213d682967.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@FreeBSD.org>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15606
2023-11-29 09:18:30 -08:00
Alexander Motin 2a27fd4111 ZIL: Assert record sizes in different places
This should make sure we have log written without overflows.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15517
2023-11-28 13:35:14 -08:00
Shengqi Chen b94ce4e17d module/icp/asm-arm/sha2: fix compiling on armv5/6
The `adr` insn in neon kernel generates an compiling
error on armv5/6 target. Fix that by using `ldr`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15557
2023-11-28 13:26:12 -08:00
Shengqi Chen 4340f69be1 module/icp/asm-arm/sha2: auto detect __ARM_ARCH
This patch uses __ARM_ARCH set by compiler (both
GCC and Clang have this) whenever possible instead
of hardcoding it to 7. This change allows code to
compile on earlier ARM architectures such as armv5te.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15557
2023-11-28 13:25:44 -08:00
Jaron Kent-Dobias b3a985fa42 Linux 6.6 compat: fix configure error with clang (#15558)
With Linux v6.6.x and clang 16, a configure step fails on a warning that
later results in an error while building, due to 'ts' being
uninitialized. Add a trivial initialization to silence the warning.

Signed-off-by: Jaron Kent-Dobias <jaron@kent-dobias.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2023-11-28 11:34:40 -08:00
Rob N 688514e470 dmu_buf_will_clone: fix race in transition back to NOFILL
Previously, dmu_buf_will_clone() would roll back any dirty record, but
would not clean out the modified data nor reset the state before
releasing the lock. That leaves the last-written data in db_data, but
the dbuf in the wrong state.

This is eventually corrected when the dbuf state is made NOFILL, and
dbuf_noread() called (which clears out the old data), but at this point
its too late, because the lock was already dropped with that invalid
state.

Any caller acquiring the lock before the call into
dmu_buf_will_not_fill() can find what appears to be a clean, readable
buffer, and would take the wrong state from it: it should be getting the
data from the cloned block, not from earlier (unwritten) dirty data.

Even after the state was switched to NOFILL, the old data was still not
cleaned out until dbuf_noread(), which is another gap for a caller to
take the lock and read the wrong data.

This commit fixes all this by properly cleaning up the previous state
and then setting the new state before dropping the lock. The
DBUF_VERIFY() calls confirm that the dbuf is in a valid state when the
lock is down.

Sponsored-by: Klara, Inc.
Sponsored-By: OpenDrives Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15566
Closes #15526
2023-11-28 09:53:04 -08:00
Matthew Ahrens 67894a597f unnecessary alloc/free in dsl_scan_visitbp()
Clean up code in dsl_scan_visitbp() by removing an unnecessary
alloc/free and `goto`.  This has the side benefit of reducing CPU usage,
which is only really noticeable if we are not doing i/o for the leaf
blocks, like when `zfs_no_scrub_io` is set.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #15549
2023-11-28 09:20:48 -08:00
Rob N 30d581121b dnode_is_dirty: check dnode and its data for dirtiness
Over its history this the dirty dnode test has been changed between
checking for a dnodes being on `os_dirty_dnodes` (`dn_dirty_link`) and
`dn_dirty_record`.

  de198f2d9 Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
  2531ce372 Revert "Report holes when there are only metadata changes"
  ec4f9b8f3 Report holes when there are only metadata changes
  454365bba Fix dirty check in dmu_offset_next()
  66aca2473 SEEK_HOLE should not block on txg_wait_synced()

Also illumos/illumos-gate@c543ec060d illumos/illumos-gate@2bcf0248e9

It turns out both are actually required.

In the case of appending data to a newly created file, the dnode proper
is dirtied (at least to change the blocksize) and dirty records are
added.  Thus, a single logical operation is represented by separate
dirty indicators, and must not be separated.

The incorrect dirty check becomes a problem when the first block of a
file is being appended to while another process is calling lseek to skip
holes. There is a small window where the dnode part is undirtied while
there are still dirty records. In this case, `lseek(fd, 0, SEEK_DATA)`
would not know that the file is dirty, and would go to
`dnode_next_offset()`. Since the object has no data blocks yet, it
returns `ESRCH`, indicating no data found, which results in `ENXIO`
being returned to `lseek()`'s caller.

Since coreutils 9.2, `cp` performs sparse copies by default, that is, it
uses `SEEK_DATA` and `SEEK_HOLE` against the source file and attempts to
replicate the holes in the target. When it hits the bug, its initial
search for data fails, and it goes on to call `fallocate()` to create a
hole over the entire destination file.

This has come up more recently as users upgrade their systems, getting
OpenZFS 2.2 as well as a newer coreutils. However, this problem has been
reproduced against 2.1, as well as on FreeBSD 13 and 14.

This change simply updates the dirty check to check both types of dirty.
If there's anything dirty at all, we immediately go to the "wait for
sync" stage, It doesn't really matter after that; both changes are on
disk, so the dirty fields should be correct.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15571 
Closes #15526
2023-11-28 09:07:57 -08:00
rmacklem acb33ee1c1 FreeBSD: Fix ZFS so that snapshots under .zfs/snapshot are NFS visible
Call vfs_exjail_clone() for mounts created under .zfs/snapshot
to fill in the mnt_exjail field for the mount.  If this is not
done, the snapshots under .zfs/snapshot with not be accessible
over NFS.

This version has the argument name in vfs.h fixed to match that
of the name in spl_vfs.c, although it really does not matter.

External-issue: https://reviews.freebsd.org/D42672
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes #15563
2023-11-27 16:31:03 -08:00
Akash B c1a47de86f zdb: Fix zdb '-O|-r' options with -e/exported zpool
zdb with '-e' or exported zpool doesn't work along with
'-O' and '-r' options as we process them before '-e' has
been processed.

Below errors are seen:

~> zdb -e pool-mds65/mdt65 -O oi.9/0x200000009:0x0:0x0
failed to hold dataset 'pool-mds65/mdt65': No such file or directory

~> zdb -e pool-oss0/ost0 -r file1 /tmp/filecopy1 -p.
failed to hold dataset 'pool-oss0/ost0': No such file or directory
zdb: internal error: No such file or directory

We need to make sure to process '-O|-r' options after the
'-e' option has been processed, which imports the pool to
the namespace if it's not in the cachefile.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #15532
2023-11-27 13:41:58 -08:00
Rob Norris 213d682967 zdb: show BRT statistics and dump its contents
Same idea as the dedup stats, but for block cloning.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15541
2023-11-27 13:35:07 -08:00
Rob Norris 803a9c12c9 brt: lift internal definitions into _impl header
So that zdb (and others!) can get at the BRT on-disk structures.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15541
2023-11-27 13:34:43 -08:00
Tony Hutter 3551a32e5e ZTS: Fix zfs_load-key failures on F39
The zfs_load-key tests were failing on F39 due to their use of the
deprecated ssl.wrap_socket function.  This commit updates the test to
instead use ssl.SSLContext() as described in
https://stackoverflow.com/a/65194957.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15534
Closes #15550
2023-11-27 13:24:37 -08:00
AllKind 95b68eb693 zfs-dkms: fix shell-init error message
If all zfs dkms modules have been removed, a shell-init error message
may appear, because /var/lib/dkms/zfs does no longer exist.
Resolve this by leaving the directory earlier on.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes #15576
2023-11-27 13:17:48 -08:00
Alexander Motin cf33166336 ZVOL: Minor code cleanup
- Remove zsda_tx field, it is used only once.
 - Remove unneeded string lengths checks, all names are terminated.
 - Replace few explicit MAXNAMELEN usages with sizeof().
 - Change dsname from MAXNAMELEN to ZFS_MAX_DATASET_NAME_LEN, as
expected by dsl_dataset_name().  Both are 256 bytes now, but it is
better to be safe.

This should have no functional difference.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15535
2023-11-27 13:16:59 -08:00
Alan Somers 126efb5889 FreeBSD: Fix the build on FreeBSD 12
It was broken for several reasons:
* VOP_UNLOCK lost an argument in 13.0.  So OpenZFS should be using
  VOP_UNLOCK1, but a few direct calls to VOP_UNLOCK snuck in.
* The location of the zlib header moved in 13.0 and 12.1.  We can drop
  support for building on 12.0, which is EoL.
* knlist_init lost an argument in 13.0.  OpenZFS change 9d0887402b
  assumed 13.0 or later.
* FreeBSD 13.0 added copy_file_range, and OpenZFS change 67a1b03791
  assumed 13.0 or later.

Sponsored-by: Axcient
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #15551
2023-11-27 12:58:03 -08:00
Alexander Motin a490875103 ZIL: Refactor TX_WRITE encryption similar to TX_CLONE_RANGE
It should be purely textual change to make the code more readable.
Should cause no functional difference.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Reviewed-by: Sean Eric Fagan <sef@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15543
Closes #15513
2023-11-27 09:56:30 -08:00
Alexander Motin 27d8c23c58 ZIL: Do not encrypt block pointers in lr_clone_range_t
In case of crash cloned blocks need to be claimed on pool import.
It is only possible if they (lr_bps) and their count (lr_nbps) are
not encrypted but only authenticated, similar to block pointer in
lr_write_t.  Few other fields can be and are still encrypted.

This should fix panic on ZIL claim after crash when block cloning
is actively used.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Reviewed-by: Sean Eric Fagan <sef@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15543
Closes #15513
2023-11-27 09:53:32 -08:00
Don Brady 7bbd42ef49 Don't allow attach to a raidz child vdev
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15536
Closes #15564
2023-11-27 09:46:38 -08:00
Tony Hutter a94860a6de ZTS: Fix 'could not unmount datasets' on Alma 9 (#15542)
Many tests are failing on AlmaLinux 9 because ZTS could not destroy the
pool in cleanup.  This was due to $PWD being set to '.' instead of the
expected full path.  This patch sets $PWD to the full path.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
2023-11-20 16:07:32 -08:00
Brooks Davis cd67bc0ae4 freebsd: remove __FBSDID macro use
With FreeBSD's switch to git the $FreeBSD$ string is no longer expanded
and they have mostly been removed upstream.  Stop using __FBSDID and
remove the no-longer needed sys/cdefs.h includes.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #15527
2023-11-17 14:02:09 -08:00
Alexander Motin 5a3bffab10 ZIO: Optimize zio_flush()
- Generalize vdev_nowritecache handling by traversing through the
VDEV tree and skipping children ZIOs where not supported.
 - Remove intermediate zio_null() in case of several VDEV children.
 - Remove children handling from zio_ioctl().  There are no other
use cases for this code beside DKIOCFLUSHWRITECACHED, and would there
be, I doubt they would so straightforward apply to all VDEV children.

Comparing to removed previous optimization this should improve cases
of redundant ZILs/SLOGs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15515
2023-11-17 14:00:59 -08:00
Alexander Motin 22c8c33a58 Use abd_zero_off() where applicable
In several places abd_zero() cleaned ABD filled at the next line.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15514
2023-11-17 13:28:32 -08:00
Rob N 92dc4ad83d Consider dnode_t allocations in dbuf cache size accounting
Entries in the dbuf cache contribute only the size of the dbuf data to
the cache size. Attached "user" data is not counted. This can lead to
the data currently "owned" by the cache consuming more memory accounting
appears to show. In some cases (eg a metadnode data block with all child
dnode_t slots allocated), the actual size can be as much as 3x as what
the cache believes it to be.

This is arguably correct behaviour, as the cache is only tracking the
size of the dbuf data, not even the overhead of the dbuf_t. On the other
hand, in the above case of dnodes, evicting cached metadnode dbufs is
the only current way to reclaim the dnode objects, and can lead to the
situation where the dbuf cache appears to be comfortably within its
target memory window and yet is holding enormous amounts of slab memory
that cannot be reclaimed.

This commit adds a facility for a dbuf user to artificially inflate the
apparent size of the dbuf for caching purposes. This at least allows for
cache tuning to be adjusted to match something closer to the real memory
overhead.

metadnode dbufs carry a >1KiB allocation per dnode in their user data.
This informs the dbuf cache machinery of that fact, allowing it to make
better decisions when evicting dbufs.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15511
2023-11-17 13:25:53 -08:00
Paul Dagnelie 6c6fae6fae Fix memory leak in zfs_setprocinit code
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15508
2023-11-17 13:21:04 -08:00
Rich Ercolani 03e9caaec0 Add a tunable to disable BRT support.
Copy the disable parameter that FreeBSD implemented, and extend it to
work on Linux as well, until we're sure this is stable.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15529
2023-11-16 11:35:22 -08:00
Umer Saleem 5796e3a742 Packaging: Auto-generate changelog during configure (#15528)
Auto-generate changelog based off on @VERSION@ during configure,
so that it is not needed to be update with new releases / version
updates.

Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2023-11-16 08:58:47 -08:00
Alexander Motin 35da345160 L2ARC: Restrict write size to 1/4 of the device
PR #15457 exposed weird logic in L2ARC write sizing. If it appeared
bigger than device size, instead of liming write it reset all the
system-wide tunables to their default.  Aside of being excessive,
it did not actually help with the problem, still allowing infinite
loop to happen.

This patch removes the tunables reverting logic, but instead limits
L2ARC writes (or at least eviction/trim) to 1/4 of the capacity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15519
2023-11-14 13:47:57 -08:00
Chunwei Chen da51bd17e5 Fix snap_obj_array memory leak in check_filesystem()
Use goto out instead of return for early exit to make sure
snap_obj_array is freed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #15516
2023-11-14 12:59:02 -08:00
Tony Hutter 2319656802 Linux 6.6 compat: META
Update the META file to reflect compatibility with the 6.6 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15520
2023-11-14 09:55:28 -08:00
Tony Hutter 786641dcf9 Workaround UBSAN errors for variable arrays
This gets around UBSAN errors when using arrays at the end of
structs.  It converts some zero-length arrays to variable length
arrays and disables UBSAN checking on certain modules.

It is based off of the patch from #15460.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Co-authored-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Issue #15145
Closes #15510
2023-11-12 16:26:07 -08:00
Alexander Motin 3a8d9b8487 Linux: Reclaim unused spl_kmem_cache_reclaim
It is unused for 3 years since #10576.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15507
2023-11-10 10:34:46 -08:00
Umer Saleem 40fccc423a ZTS: Test for all known zpool feature sets
zpool_create_features_007_pos only tested for compat-2020 feature
set. It would be useful to test for all known features sets. If
any additional feature is found enabled that is not present in
compatibility list or feature set, it should be caught and
reported earlier.

This commit also removes encryption from openzfsonosx-1.8.1
compatibility list. Encryption enables bookmark_v2, since it is
a dependency of encryption, but not listed in openzfsonoxx-1.8.1
compatibility list.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15505
2023-11-09 10:58:23 -08:00
Umer Saleem 15a8fa76b2 Update zpool-features.7 for grub2 compatibility list updates
This commit updates zpool-features.7 man page to add newly added
zpool features to grub2 compatibility list.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15505
2023-11-09 10:58:09 -08:00
shodanshok 887a3c533b Increase L2ARC write rate and headroom
Current L2ARC write rate and headroom parameters are very conservative:
l2arc_write_max=8M and l2arc_headroom=2 (ie: a full L2ARC writes at
8 MB/s, scanning 16/32 MB of ARC tail each time; a warming L2ARC runs
at 2x these rates).

These values were selected 15+ years ago based on then-current SSDs
size, performance and endurance. Today we have multi-TB, fast and
cheap SSDs which can sustain much higher read/write rates.

For this reason, this patch increases l2arc_write_max to 32M and
l2arc_headroom to 8 (4x increase for both).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #15457
2023-11-08 16:30:47 -08:00
Martin Matuška 1c1be60fa2 Unbreak FreeBSD world build after 3bd4df384
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15504
2023-11-08 16:29:34 -08:00
Low-power a160c153e2 Linux: reject read/write mapping to immutable file only on VM_SHARED
Private read/write mapping can't be used to modify the mapped files, so
they will remain be immutable. Private read/write mappings are usually
used to load the data segment of executable files, rejecting them will
rendering immutable executable files to stop working.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #15344
2023-11-08 12:19:38 -08:00
AllKind 3a81bf4ad2 Workaround to allow openzfs-zfs-dkms install on Ubuntu
As shown in #15404#issuecomment-1765002181, Ubuntu kernel has
'Provides: zfs-dkms', which will cause uninstall of the kernel, when
attempting to install openzfs-zfs-dkms.
As a workaround remove the 'Conflicts: zfs-dkms' definition from
the debian control file.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes #15503
2023-11-08 10:30:46 -08:00
Don Brady 5caeef02fa RAID-Z expansion feature
This feature allows disks to be added one at a time to a RAID-Z group,
expanding its capacity incrementally.  This feature is especially useful
for small pools (typically with only one RAID-Z group), where there
isn't sufficient hardware to add capacity by adding a whole new RAID-Z
group (typically doubling the number of disks).

== Initiating expansion ==

A new device (disk) can be attached to an existing RAIDZ vdev, by
running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank
raidz2-0 sda`.  The new device will become part of the RAIDZ group.  A
"raidz expansion" will be initiated, and the new device will contribute
additional space to the RAIDZ group once the expansion completes.

The `feature@raidz_expansion` on-disk feature flag must be `enabled` to
initiate an expansion, and it remains `active` for the life of the pool.
In other words, pools with expanded RAIDZ vdevs can not be imported by
older releases of the ZFS software.

== During expansion ==

The expansion entails reading all allocated space from existing disks in
the RAIDZ group, and rewriting it to the new disks in the RAIDZ group
(including the newly added device).

The expansion progress can be monitored with `zpool status`.

Data redundancy is maintained during (and after) the expansion.  If a
disk fails while the expansion is in progress, the expansion pauses
until the health of the RAIDZ vdev is restored (e.g. by replacing the
failed disk and waiting for reconstruction to complete).

The pool remains accessible during expansion.  Following a reboot or
export/import, the expansion resumes where it left off.

== After expansion ==

When the expansion completes, the additional space is available for use,
and is reflected in the `available` zfs property (as seen in `zfs list`,
`df`, etc).

Expansion does not change the number of failures that can be tolerated
without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after
expansion).

A RAIDZ vdev can be expanded multiple times.

After the expansion completes, old blocks remain with their old
data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but
distributed among the larger set of disks.  New blocks will be written
with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been
expanded once to 6-wide, has 4 data to 2 parity).  However, the RAIDZ
vdev's "assumed parity ratio" does not change, so slightly less space
than is expected may be reported for newly-written blocks, according to
`zfs list`, `df`, `ls -s`, and similar tools.

Sponsored-by: The FreeBSD Foundation
Sponsored-by: iXsystems, Inc.
Sponsored-by: vStack
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Authored-by: Matthew Ahrens <mahrens@delphix.com>
Contributions-by: Fedor Uporov <fuporov.vstack@gmail.com>
Contributions-by: Stuart Maybee <stuart.maybee@comcast.net>
Contributions-by: Thorsten Behrens <tbehrens@outlook.com>
Contributions-by: Fmstrat <nospam@nowsci.com>
Contributions-by: Don Brady <dev.fs.zfs@gmail.com>
Signed-off-by: Don Brady <dev.fs.zfs@gmail.com>
Closes #15022
2023-11-08 10:19:41 -08:00
Umer Saleem 9198de8f10 Linux 6.6 compat: fix implicit conversion error with debug build
With Linux v6.6.0 and GCC 12, when debug build is configured,
implicit conversion error is raised while converting
'enum <anonymous>' to 'boolean_t'. Use 'B_TRUE' instead of
'true' to fix the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15489
2023-11-07 13:24:16 -08:00
Gordon Tetlow dc45a00eac Add kern.features.zfs
Add a ZFS feature flag to indicate OpenZFS availability.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gordon Tetlow <gordon@freebsd.org>
Closes #15484
2023-11-07 13:21:56 -08:00
Jason King 3d86999c75 sa_lookup() ignores buffer size.
When retrieving a system attribute, the size of the supplied
buffer is ignored. If the buffer is too small to hold the attribute,
sa_attr_op() will write past the end of the buffer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason King <jking@racktopsystems.com>
Closes #15476
2023-11-07 12:11:48 -08:00
Umer Saleem 78ac868824 Remove obsolete_counts from grub2 compatibility list
PR#15459 add all read-only compatible zpool features to grub2
compatibility list. 'obsolete_counts' is a read-only features that
depends on 'device_removal' feature which is not read-only and
is marked as ZFEATURE_FLAG_MOS. Creating a pool with grub2
compatibility enables 'device_removal' feature as well, which is
not desired.

This commit removes the 'obsolete_counts' feature from
grub2 compatibility list, as GRUB only supports read-only
compatible features.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15499
2023-11-07 12:04:56 -08:00
Alexander Motin 020f6fd093 FreeBSD: Implement taskq_init_ent()
Previously taskq_init_ent() was an empty macro, while actual init
was done by taskq_dispatch_ent().  It could be slightly faster in
case taskq never enqueued. But without it taskq_empty_ent() relied
on the structure being zeroed by somebody else, that is not good.

As a side effect this allows the same task to be queued several
times, that is normal on FreeBSD, that may or may not get useful
here also one day.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15455
2023-11-07 11:37:18 -08:00
Alexander Motin 58398cbd03 FreeBSD: Optimize large kstat outputs
- Use sbuf_new_for_sysctl() to reduce double-buffering on sysctl
output.
- Use much faster sbuf_cat() instead of sbuf_printf("%s").

Together it reduces `sysctl kstat.zfs.misc.dbufs` time from minutes
to seconds, making dbufstat almost usable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15495
2023-11-07 11:35:40 -08:00
Alan Somers e36ff84c33 Update the kstat dataset_name when renaming a zvol
Add a dataset_kstats_rename function, and call it when renaming
a zvol on FreeBSD and Linux.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #15482
Closes #15486
2023-11-07 11:34:50 -08:00
AllKind 9ce567c6ff Fix dkms installation of deb packages created with Alien.
Alien does not honour the %posttrans hook.
So move the dkms uninstall/install scripts to the
 %pre/%post hooks in case of package install/upgrade.
In case of package removal, handle that in %preun.
Add removal of all old dkms modules.
Add checking for broken 'dkms status'. Handle that as
good as possible and warn the user about it.
Also add more verbose messages about what we are doing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes #15415
2023-11-07 11:27:29 -08:00
Mark Johnston f4cd1bac72 Make abd_raidz_gen_iterate() pass an initialized pointer to the callback
Otherwise callbacks may trigger KMSAN violations in the dlen == 0 case.
For example, raidz_syn_pq_abd() will compare an uninitialized pointer
with itself before returning.  This seems harmless, but let's maintain
good hygiene and avoid passing uninitialized variables, if only to
placate KMSAN.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15491
2023-11-07 10:24:15 -08:00
Tony Hutter 358ce2cf28 zed: misc vdev_enc_sysfs_path fixes
There have been rare cases where the VDEV_ENC_SYSFS_PATH value that zed
gets passed is stale.  To mitigate this, dynamically check the sysfs
path at the time of zed event processing, and use the dynamic value if
possible.  Note that there will be other times when we can not
dynamically detect the sysfs path (like if a disk disappears) and have
to rely on the old value for things like turning on the fault LED.  That
is to say, we can't just blindly use the dynamic path in every case.

Also:
	- Add enclosure sysfs entry when running 'zpool add'
	- Fix 'slot' and 'enc' zpool.d scripts for nvme

Reviewed-by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15462
2023-11-07 09:09:24 -08:00
MigeljanImeri 2a154b8484 Fix accounting error for pending sync IO ops in zpool iostat
Currently vdev_queue_class_length is responsible for checking how long
the queue length is, however, it doesn't check the length when a list
is used, rather it just returns whether it is empty or not. To fix this
I added a counter variable to vdev_queue_class to keep track of the sync
IO ops, and changed vdev_queue_class_length to reference this variable
instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: MigeljanImeri <ImeriMigel@gmail.com>
Closes #15478
2023-11-07 09:06:14 -08:00
ednadolski-ix 3bd4df3841 Improve ZFS objset sync parallelism
As part of transaction group commit, dsl_pool_sync() sequentially calls
dsl_dataset_sync() for each dirty dataset, which subsequently calls
dmu_objset_sync().  dmu_objset_sync() in turn uses up to 75% of CPU
cores to run sync_dnodes_task() in taskq threads to sync the dirty
dnodes (files).

There are two problems:

1. Each ZVOL in a pool is a separate dataset/objset having a single
   dnode.  This means the objsets are synchronized serially, which
   leads to a bottleneck of ~330K blocks written per second per pool.

2. In the case of multiple dirty dnodes/files on a dataset/objset on a
   big system they will be sync'd in parallel taskq threads. However,
   it is inefficient to to use 75% of CPU cores of a big system to do
   that, because of (a) bottlenecks on a single write issue taskq, and
   (b) allocation throttling.  In addition, if not for the allocation
   throttling sorting write requests by bookmarks (logical address),
   writes for different files may reach space allocators interleaved,
   leading to unwanted fragmentation.

The solution to both problems is to always sync no more and (if
possible) no fewer dnodes at the same time than there are allocators
the pool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Closes #15197
2023-11-06 10:38:42 -08:00
Andrew Innes 0527774066 Use env var for sed
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Closes #15470
2023-11-01 15:19:44 -07:00
siv0 41e55b476b Fix nfs_truncate_shares without /etc/exports.d
Calling nfs_reset_shares on Linux prints a warning:
`failed to lock /etc/exports.d/zfs.exports.lock: No such file or
directory`
when /etc/exports.d does not exist. The directory gets created, when a
filesystem is actually exported through nfs_toggle_share and
nfs_init_share. The truncation of /etc/exports.d/zfs.exports happens
unconditionally when calling `zfs mount -a` (via zfs_do_mount and
share_mount in `cmd/zfs/zfs_main.c`).

Fixing the issue only in the Linux part, since the exports file on
freebsd is in `/etc/zfs/`, which seems present on 2 FreeBSD systems I
have access to (through `/etc/zfs/compatibility.d/`), while a Debian
box does not have the directory even if `/usr/sbin/exportfs` is
present through the `nfs-kernel-server` package.

The code for exports_available is copied from nfs_available above.

Fixes: ede037cda7
("Make zfs-share service resilient to stale exports")

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #15369 
Closes #15468
2023-10-31 13:57:54 -07:00
Martin Matuška 763ca47fa8 Fix block cloning between unencrypted and encrypted datasets
Block cloning from an encrypted dataset into an unencrypted dataset
and vice versa is not possible. The current code did allow cloning
unencrypted files into an encrypted dataset causing a panic when
these were accessed. Block cloning between encrypted and encrypted
is currently supported on the same filesystem only.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Rob N <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15464
Closes #15465
2023-10-31 13:49:41 -07:00
Umer Saleem cba99a046e Add all read-only compatible zpool features to grub2 compatibility
GRUB opens the boot pool in read-only mode. All read-only
compatible features for zpool can be enabled and added to
grub2 compatibility, as GRUB does not open the boot-pool
for write.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15459
2023-10-31 09:51:54 -07:00
Ameer Hamza 9ccdb8becd zvol: fix delayed update to block device ro entry
The change in the zvol readonly property does not update the block
device readonly entry until the first IO to the ZVOL. This patch
addresses the issue by updating the block device readonly property
from the set property IOCTL call.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15409
2023-10-31 09:50:38 -07:00
Ameer Hamza 60387facd2 zvol: Implement zvol threading as a Property
Currently, zvol threading can be switched through the zvol_request_sync
module parameter system-wide. By making it a zvol property, zvol
threading can be switched per zvol.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15409
2023-10-31 09:50:32 -07:00
Ameer Hamza dbe839a9ca zvol: Cleanup set property
zvol_set_volmode() and zvol_set_snapdev() share a common code path.
Merge this shared code path into zvol_set_common().

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15409
2023-10-31 09:49:32 -07:00
Alexander Motin 799e09f75a Unify arc_prune_async() code
There is no sense to have separate implementations for FreeBSD and
Linux.  Make Linux code shared as more functional and just register
FreeBSD-specific prune callback with arc_add_prune_callback() API.

Aside of code cleanup this should fix excessive pruning on FreeBSD:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274698

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15456
2023-10-30 16:56:04 -07:00
Alexander Motin 514d661ca1 Tune zio buffer caches and their alignments
We should not always use PAGESIZE alignment for caches bigger than
it and SPA_MINBLOCKSIZE otherwise.  Doing that caches for 5, 6, 7,
10 and 14KB rounded up to 8, 12 and 16KB respectively make no sense.
Instead specify as alignment the biggest power-of-2 divisor.  This
way 2KB and 6KB caches are both aligned to 2KB, while 4KB and 8KB
are aligned to 4KB.

Reduce number of caches to half-power of 2 instead of quarter-power
of 2.  This removes caches difficult for underlying allocators to
fit into page-granular slabs, such as: 2.5, 3.5, 5, 7, 10KB, etc.
Since these caches are mostly used for transient allocations like
ZIOs and small DBUF cache it does not worth being too aggressive.
Due to the above alignment issue some of those caches were not
working properly any way.  6KB cache now finally has a chance to
work right, placing 2 buffers into 3 pages, that makes sense.

Remove explicit alignment in Linux user-space case.  I don't think
it should be needed any more with the above fixes.

As result on FreeBSD instead of such numbers of pages per slab:

vm.uma.zio_buf_comb_16384.keg.ppera: 4
vm.uma.zio_buf_comb_14336.keg.ppera: 4
vm.uma.zio_buf_comb_12288.keg.ppera: 3
vm.uma.zio_buf_comb_10240.keg.ppera: 3
vm.uma.zio_buf_comb_8192.keg.ppera: 2
vm.uma.zio_buf_comb_7168.keg.ppera: 2
vm.uma.zio_buf_comb_6144.keg.ppera: 2   <= Broken
vm.uma.zio_buf_comb_5120.keg.ppera: 2
vm.uma.zio_buf_comb_4096.keg.ppera: 1
vm.uma.zio_buf_comb_3584.keg.ppera: 7   <= Hard to free
vm.uma.zio_buf_comb_3072.keg.ppera: 3
vm.uma.zio_buf_comb_2560.keg.ppera: 2
vm.uma.zio_buf_comb_2048.keg.ppera: 1
vm.uma.zio_buf_comb_1536.keg.ppera: 2
vm.uma.zio_buf_comb_1024.keg.ppera: 1
vm.uma.zio_buf_comb_512.keg.ppera: 1

I am now getting such:

vm.uma.zio_buf_comb_16384.keg.ppera: 4
vm.uma.zio_buf_comb_12288.keg.ppera: 3
vm.uma.zio_buf_comb_8192.keg.ppera: 2
vm.uma.zio_buf_comb_6144.keg.ppera: 3   <= Fixed, 2 in 3 pages
vm.uma.zio_buf_comb_4096.keg.ppera: 1
vm.uma.zio_buf_comb_3072.keg.ppera: 3
vm.uma.zio_buf_comb_2048.keg.ppera: 1
vm.uma.zio_buf_comb_1536.keg.ppera: 2
vm.uma.zio_buf_comb_1024.keg.ppera: 1
vm.uma.zio_buf_comb_512.keg.ppera: 1

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15452
2023-10-30 14:55:32 -07:00
Alexander Motin 05a7348a7e RAIDZ: Use cache blocking during parity math
RAIDZ parity is calculated by adding data one column at a time.  It
works OK for small blocks, but for large blocks results of previous
addition may already be evicted from CPU caches to main memory, and
in addition to extra memory write require extra read to get it back.

This patch splits large parity operations into 64KB chunks, that
should in most cases fit into CPU L2 caches from the last decade.
I haven't touched more complicated cases of data reconstruction to
not over complicate the code.  Those should be relatively rare.

My tests on Xeon Gold 6242R CPU with 1MB of L2 cache per core show
up to 10/20% memory traffic reduction when writing to 4-wide RAIDZ/
RAIDZ2 blocks of ~4MB and up.  Older CPUs with 256KB of L2 cache
should see the effect even on smaller blocks.  Wider vdevs may need
bigger blocks to be affected.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15448
2023-10-30 14:54:27 -07:00
Alexander Motin c3773de168 ZIL: Cleanup sync and commit handling
ZVOL:
 - Mark all ZVOL ZIL transactions as sync.  Since ZVOLs have only
one object, it makes no sense to maintain async queue and on each
commit merge it into sync. Single sync queue is just cheaper, while
it changes nothing until actual commit request arrives.
 - Remove zsd_sync_cnt and the zil_async_to_sync() calls since we
are no longer switching between sync and async queues.

ZFS:
 - Mark write transactions as sync based only on number of sync
opens (z_sync_cnt).  We can not randomly jump between sync and
async unless we want data corruptions due to writes reordering.
 - When file first opened with O_SYNC (z_sync_cnt incremented to 1)
call zil_async_to_sync() for it to preserve correct ordering between
past and future writes.
 - Drop zfs_fsyncer_key logic.  Looks like it was an optimization
for workloads heavily intermixing async writes with tons of fsyncs.
But first it was broken 8 years ago due to Linux tsd implementation
not allowing data storage between syscalls, and second, I doubt it
is safe to switch from async to sync so often and without calling
zil_async_to_sync().

 - Rename sync argument of *_log_write() into commit, now only
signalling caller's intent to call zil_commit() soon after.  It
allows WR_COPIED optimizations without extra other meanings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15366
2023-10-30 14:51:56 -07:00
shodanshok 043c6ee3b6 Read prefetched buffers from L2ARC
Prefetched buffers are currently read from L2ARC if, and only if,
l2arc_noprefetch is set to non-default value of 0. This means that
a streaming read which can be served from L2ARC will instead engage
the main pool.

For example, consider what happens when a file is sequentially read:
- application requests contiguous data, engaging the prefetcher;
- ARC buffers are initially marked as prefetched but, as the calling
application consumes data, the prefetch tag is cleared;
- these "normal" buffers become eligible for L2ARC and are copied to it;
- re-reading the same file will *not* engage L2ARC even if it contains
the required buffers;
- main pool has to suffer another sequential read load, which (due to
most NCQ-enabled HDDs preferring sequential loads) can dramatically
increase latency for uncached random reads.

In other words, current behavior is to write data to L2ARC (wearing it)
without using the very same cache when reading back the same data. This
was probably useful many years ago to preserve L2ARC read bandwidth but,
with current SSD speed/size/price, it is vastly sub-optimal.

Setting l2arc_noprefetch=1, while enabling L2ARC to serve these reads,
means that even prefetched but unused buffers will be copied into L2ARC,
further increasing wear and load for potentially not-useful data.

This patch enable prefetched buffer to be read from L2ARC even when
l2arc_noprefetch=1 (default), increasing sequential read speed and
reducing load on the main pool without polluting L2ARC with not-useful
(ie: unused) prefetched data. Moreover, it clear users confusion about
L2ARC size increasing but not serving any IO when doing sequential
reads.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #15451
2023-10-26 09:40:21 -07:00
Thomas Bertschinger 97a0b5be50 Add mutex_enter_interruptible() for interruptible sleeping IOCTLs
Many long-running ZFS ioctls lock the spa_namespace_lock, forcing
concurrent ioctls to sleep for the mutex. Previously, the only
option is to call mutex_enter() which sleeps uninterruptibly. This
is a usability issue for sysadmins, for example, if the admin runs
`zpool status` while a slow `zpool import` is ongoing, the admin's
shell will be locked in uninterruptible sleep for a long time.

This patch resolves this admin usability issue by introducing
mutex_enter_interruptible() which sleeps interruptibly while waiting
to acquire a lock. It is implemented for both Linux and FreeBSD.

The ZFS_IOC_POOL_CONFIGS ioctl, used by `zpool status`, is changed to
use this new macro so that the command can be interrupted if it is
issued during a concurrent `zpool import` (or other long-running
operation).

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Closes #15360
2023-10-26 09:17:40 -07:00
ednadolski-ix 6a629f3234 arc_default_max on Linux should match FreeBSD
Commits 518b487 and 23bdb07 changed the default ARC size limit on
Linux systems to 1/2 of physical memory, which has become too
strict for modern systems with large amounts of RAM. This patch
changes the default limit to match that of FreeBSD, so ZFS may
have a unified value on both platforms.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Closes #15437
2023-10-26 09:13:01 -07:00
Alexander Motin 3afdc97d91 ZIO: Remove READY pipeline stage from root ZIOs
zio_root() has no arguments for ready callback or parent ZIO. Except
one recent case in ZIL code if root ZIOs ever have a parent it is
also a root ZIO.  It means we do not need READY pipeline stage for
them, which takes some time to process, but even more time to wait
for the children and be woken by them, and both for no good reason.

The most visible effect of this change is that it avoids one taskq
wakeup per ZIL block written, previously used to run zio_ready()
for lwb_root_zio and skipped now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15398
2023-10-25 15:22:25 -07:00
Tony Hutter 05c4710e89 Revert "zvol: Temporally disable blk-mq"
This reverts commit aefb6a2bd6.

aefb6a2bd temporally disabled blk-mq until we could fix a fix for

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15439
2023-10-24 14:41:25 -07:00
Tony Hutter 7c9b6fed16 zvol: Remove broken blk-mq optimization
This fix removes a dubious optimization in zfs_uiomove_bvec_rq()
that saved the iterator contents of a rq_for_each_segment().  This
optimization allowed restoring the "saved state" from a previous
rq_for_each_segment() call on the same uio so that you wouldn't
need to iterate though each bvec on every zfs_uiomove_bvec_rq() call.
However, if the kernel is manipulating the requests/bios/bvecs under
the covers between zfs_uiomove_bvec_rq() calls, then it could result
in corruption from using the "saved state".  This optimization
results in an unbootable system after installing an OS on a zvol
with blk-mq enabled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15351
2023-10-24 14:37:52 -07:00
Alexander Motin 252f46be7d ZIL: Detect single-threaded workloads
... by checking that previous block is fully written and flushed.
It allows to skip commit delays since we can give up on aggregation
in that case.  This removes zil_min_commit_timeout parameter, since
for single-threaded workloads it is not needed at all, while on very
fast devices even some multi-threaded workloads may get detected as
single-threaded and still bypass the wait.  To give multi-threaded
workloads more aggregation chances increase zfs_commit_timeout_pct
from 5 to 10%, as they should suffer less from additional latency.

Also single-threaded workloads detection allows in perspective better
prediction of the next block size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15381
2023-10-24 14:35:25 -07:00
Alexander Motin e007908a16 ABD: Be more assertive in iterators
Once we verified the ABDs and asserted the sizes we should never
see premature ABDs ends.  Assert that and remove extra branches
from production builds.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15428
2023-10-24 14:33:58 -07:00
Brian Behlendorf 07345ac252 Add prefetch property
ZFS prefetch is currently governed by the zfs_prefetch_disable
tunable. However, this is a module-wide settings - if a specific
dataset benefits from prefetch, while others have issue with it,
an optimal solution does not exists.

This commit introduce the "prefetch" tri-state property, which enable
granular control (at dataset/volume level) for prefetching.

This patch does not remove the zfs_prefetch_disable, which remains
a system-wide switch for enable/disable prefetch. However, to avoid
duplication, it would be preferable to deprecate and then remove
the module tunable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Co-authored-by: Gionatan Danti <g.danti@assyoma.it>
Closes #15237 
Closes #15436
2023-10-24 11:00:07 -07:00
ofthesun9 e57909265b "ARC prefetch metadata accesses:" appears twice in the output.
The first occurrence should be "ARC prefetch data accesses:"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: ofthesun9 <olivier@ofthesun.net>
Closes #15427
2023-10-23 13:41:29 -07:00
Brian Behlendorf e9725abd83 Revert "Do not persist user/group/project quota zap objects when unneeded"
This reverts commit 797f55ef12 which
was causing a VERIFY failure when running the project quota tests.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15438
2023-10-23 09:55:36 -07:00
Rob N b5e6091885 spa: document spa_thread() and SDC feature gates
spa_thread() and the "System Duty Cycle" scheduling class are from
Illumos and have not yet been adapted to Linux or FreeBSD.

HAVE_SPA_THREAD has long been explicitly undefined and used to mark
spa_thread(), but there's some related taskq code that can never be
invoked without it, which makes some already-tricky code harder to read.

HAVE_SYSDC is introduced in this commit to mark the SDC parts. SDC
requires spa_thread(), but the inverse is not true, so they are
separate.

I don't want to make the call to just remove it because I still harbour
hopes that OpenZFS could become a first-class citizen on Illumos
someday. But hopefully this will at least make the reason it exists a
bit clearer for people without long memories and/or an interest in
history.

For those that are interested in the history, the original FreeBSD port
of ZFS (before ZFS-on-Linux was adopted there) did have a spa_thread(),
but not SDC. The last version of that before it was removed can be read
here:

  https://github.com/freebsd/freebsd-src/blob/22df1ffd812f0395cdb7c0b1edae1f67b991562a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c

Meanwhile, more information on the SDC scheduling class is here:

  https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/disp/sysdc.c

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15406
2023-10-23 08:50:55 -07:00
Sam Atkinson 797f55ef12 Do not persist user/group/project quota zap objects when unneeded
In the zfs_id_over*quota functions, there is a short-circuit to skip
the zap_lookup when the quota zap does not exist. If quotas are never
used in a zpool, then the quota zap will never exist. But if
user/group/project quotas are ever used, the zap objects will be
created and will persist even if the quotas are deleted.

The quota zap_lookup in the write path can become a bottleneck for
write-heavy small I/O workloads. Before this commit, it was not
possible to remove this lookup without creating a new zpool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sam Atkinson <samatk@amazon.com>
Closes #14721
2023-10-20 14:22:04 -07:00
Alexander Motin 57b4098562 Trust ARC_BUF_SHARED() more
In my understanding ARC_BUF_SHARED() and arc_buf_is_shared() should
return identical results, except the second also asserts it deeper.
The first is much cheaper though, saving few pointer dereferences.
Replace production arc_buf_is_shared() calls with ARC_BUF_SHARED(),
and call arc_buf_is_shared() in random assertions, while making it
even more strict.

On my tests this in half reduces arc_buf_destroy_impl() time, that
noticeably reduces hash_lock congestion under heavy dbuf eviction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15397
2023-10-20 12:38:37 -07:00
Alexander Motin 4fbc524955 Remove lock from dsl_pool_need_dirty_delay()
Torn reads/writes of dp_dirty_total are unlikely: on 64-bit systems
due to register size, while on 32-bit due to memory constraints.
And even if we hit some race, the code implementing the delay takes
the lock any way.

Removal of the poll-wide lock acquisition saves ~1% of CPU time on
8-thread 8KB write workload.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15390
2023-10-20 12:37:16 -07:00
VaibhavB de7b1ae30a run-zts test procfs/pool_state failed with uncorrectable I/O failure
Once we trigger the zpool scrub, all zpool/zfs command gets stuck for 
180 seconds. Post 180 seconds zpool/zfs commands gets start executing 
however few more seconds(10s) it take to update the status. hence 
sleeping for 200 seconds so that we get the correct status.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: vaibhav.bhanawat <vaibhav.bhanawat@delphix.com>
Closes #15364
2023-10-20 11:57:39 -07:00
Alexander Motin b29e98fa8d Properly pad struct tx_cpu to cache line
We already use ____cacheline_aligned in many places, so add one more
instead of seems arbitrary char tc_pad[8].

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15402
2023-10-20 11:54:05 -07:00
dennisfriedrichsen 0d6cec418e Fix typo in tests/zfs-tests/tests/functional/cli_user/misc/misc.cfg
Reviewed-by: Rob N <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com>
Closes #15417
2023-10-20 11:52:13 -07:00
Olivier Certner b9384b9498 FreeBSD: taskq: Remove unused declaration
Variable 'uma_align_cache' has not been used since commit "FreeBSD: Use
a hash table for taskqid lookups" (3933305ea).  Moreover, it is soon
going to become private to FreeBSD's UMA in 15.0-CURRENT (main),
14.0-STABLE (stable/14) and 13.2-STABLE (stable/13).  Should accessing
this information become necessary again, one will have to use the new
accessors for recent versions.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olivier Certner <olce.freebsd@certner.fr>
Closes #15416
2023-10-20 11:49:56 -07:00
Colin Percival ea30b5a9e0 Set spa_ccw_fail_time=0 when expanding a vdev.
When a vdev is to be expanded -- either via `zpool online -e` or via
the autoexpand option -- a SPA_ASYNC_CONFIG_UPDATE request is queued
to be handled via an asynchronous worker thread (spa_async_thread).
This normally happens almost immediately; but will be delayed up to
zfs_ccw_retry_interval seconds (default 5 minutes) if an attempt to
write the zpool configuration cache failed.

When FreeBSD boots ZFS-root VM images generated using `makefs -t zfs`,
the zpoolupgrade rc.d script runs `zpool upgrade`, which modifies the
pool configuration and triggers an attempt to write to the cache file.
This attempted write fails because the filesystem is still mounted
read-only at this point in the boot process, triggering a 5-minute
cooldown before SPA_ASYNC_CONFIG_UPDATE requests will be handled by
the asynchronous worker thread.

When expanding a vdev, reset the "when did a configuration cache
write last fail" value so that the SPA_ASYNC_CONFIG_UPDATE request
will be handled promptly.  A cleaner but more intrusive option would
be to use separate SPA_ASYNC_ flags for "configuration changed" and
"try writing the configuration cache again", but with FreeBSD 14.0
coming very soon I'd prefer to leave such refactoring for a later
date.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colin Percival <cperciva@FreeBSD.org>
Closes #15405
2023-10-20 10:30:32 -07:00
Don Brady f0f330e121 Fix ZED auto-replace for VDEVs using by-id paths
The change is simple -- restore the original code so that the VDEV 
path is updated when using by-id paths.  The more challenging part 
was to devise a second ZTS test, that would test auto-replace for 
'by-id' and help prevent a future regression.

With that new test, we can now do an A|B test with , and without, 
the fix to confirm that auto-replace for by-id paths works. The 
existing auto-replace test, functional/fault/auto_replace_001_pos, 
will confirm that we didn't break auto-replace for 'by-vdev' paths.

In the original functional/fault/auto_replace_001_pos test, the disk 
wipe (using dd) was not effective in removing the partitioning since 
the kernel was never informed of the wipe.

Added a call to wipefs(8) so that the kernel is informed and ZED will 
re-partition the device.
    
Added a validation step that the re-partitioning occurred by
confirming  that the GPT partition UUID changes.

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15363
2023-10-20 09:29:02 -07:00
John Wren Kennedy c0e58995e3 Large sync writes perform worse with slog
For synchronous write workloads with large IO sizes, a pool configured
with a slog performs worse than one with an embedded zil:

sequential_writes 1m sync ios, 16 threads
  Write IOPS:              1292          438   -66.10%
  Write Bandwidth:      1323570       448910   -66.08%
  Write Latency:       12128400     36330970      3.0x

sequential_writes 1m sync ios, 32 threads
  Write IOPS:              1293          430   -66.74%
  Write Bandwidth:      1324184       441188   -66.68%
  Write Latency:       24486278     74028536      3.0x

The reason is the `zil_slog_bulk` variable. In `zil_lwb_write_open`,
if a zil block is greater than 768K, the priority of the write is
downgraded from sync to async. Increasing the value allows greater
throughput. To select a value for this PR, I ran an fio workload with
the following values for `zil_slog_bulk`:

    zil_slog_bulk    KiB/s
    1048576         422132
    2097152         478935
    4194304         533645
    8388608         623031
    12582912        827158
    16777216       1038359
    25165824       1142210
    33554432       1211472
    50331648       1292847
    67108864       1308506
    100663296      1306821
    134217728      1304998

At 64M, the results with a slog are now improved to parity with an
embedded zil:

sequential_writes 1m sync ios, 16 threads
  Write IOPS:               438         1288      2.9x
  Write Bandwidth:       448910      1319062      2.9x
  Write Latency:       36330970     12163408   -66.52%

sequential_writes 1m sync ios, 32 threads
  Write IOPS:               430         1290      3.0x
  Write Bandwidth:       441188      1321693      3.0x
  Write Latency:       74028536     24519698   -66.88%

None of the other tests in the performance suite (run with a zil or
slog) had a significant change, including the random_write_zil tests,
which use multiple datasets.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com>
Closes #14378
2023-10-13 11:15:09 -07:00
Alexander Motin 380c25f640 FreeBSD: Improve taskq wrapper
- Group tqent_task and tqent_timeout_task into a union.  They are
never used same time. This shrinks taskq_ent_t from 192 to 160 bytes.
 - Remove tqent_registered.  Use tqent_id != 0 instead.
 - Remove tqent_cancelled.  Use taskqueue pending counter instead.
 - Change tqent_type into uint_t.  We don't need to pack it any more.
 - Change tqent_rc into uint_t, matching refcount(9).
 - Take shared locks in taskq_lookup().
 - Call proper taskqueue_drain_timeout() for TIMEOUT_TASK in
taskq_cancel_id() and taskq_wait_id().
 - Switch from CK_LIST to regular LIST.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15356
2023-10-13 10:41:11 -07:00
Jason King 8a74070128 Zpool can start allocating from metaslab before TRIMs have completed
When doing a manual TRIM on a zpool, the metaslab being TRIMmed is
potentially re-enabled before all queued TRIM zios for that metaslab
have completed. Since TRIM zios have the lowest priority, it is 
possible to get into a situation where allocations occur from the 
just re-enabled metaslab and cut ahead of queued TRIMs to the same 
metaslab.  If the ranges overlap, this will cause corruption.

We were able to trigger this pretty consistently with a small single 
top-level vdev zpool (i.e. small number of metaslabs) with heavy 
parallel write activity while performing a manual TRIM against a 
somewhat 'slow' device (so TRIMs took a bit of time to complete). 
With the patch, we've not been able to recreate it since. It was on 
illumos, but inspection of the OpenZFS trim code looks like the 
relevant pieces are largely unchanged and so it appears it would be 
vulnerable to the same issue.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason King <jking@racktopsystems.com>
Illumos-issue: https://www.illumos.org/issues/15939
Closes #15395
2023-10-12 11:01:54 -07:00
Brian Behlendorf fd51286227 spec: define _bashcompletiondir if undefined
Always define _bashcompletiondir in the spec file to a reasonable value
when it is undefined.  Required for `rpmbuild --rebuild <srpm>`.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15396
2023-10-11 16:56:32 -07:00
Alexander Motin 1b310dfb1d DMU: Do not pre-read holes during write
dmu_tx_check_ioerr() pre-reads blocks that are going to be dirtied
as part of transaction to both prefetch them and check for errors.
But it makes no sense to do it for holes, since there are no disk
reads to prefetch and there can be no errors.  On the other side
those blocks are anonymous, and they are freed immediately by the
dbuf_rele() without even being put into dbuf cache, so we just
burn CPU time on decompression and overheads and get absolutely
no result at the end.

Use of dbuf_hold_impl() with fail_sparse parameter allows to skip
the extra work, and on my tests with sequential 8KB writes to empty
ZVOL with 32KB blocks shows throughput increase from 1.7 to 2GB/s.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15371
2023-10-11 16:37:21 -07:00
Brian Behlendorf 9facf2d1ad ZTS: Debug zfs_share_concurrent_shares failure
Update zfs_share_concurrent_shares test case to wait a few seconds
and recheck that the filesystem isn't shared.  The intent here is
determine the nature of the error and if it may be a race.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by:  Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15379
2023-10-10 13:32:33 -07:00
Brian Behlendorf 822b32ee51 CI: Move perl script to dist_noinst_DATA
Everything listed in dist_noinst_SCRIPTS is assumed to be a shell
script, this generates a shellcheck SC1071 error since perl is not
supported.  Move update_authors.pl to dist_noinst_DATA with the
other perl scripts.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob N <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15392
2023-10-10 13:31:15 -07:00
Daniel Berlin bc29124b1b Ensure we call fput when cloning fails due to different devices.
Right now, zpl_ioctl_ficlone and zpl_ioctl_ficlonerange do not call
put on the src fd if the source and destination are on two different
devices.  This leaves the source file held open in this case.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Daniel Berlin <dberlin@dberlin.org>
Closes #15386
2023-10-10 11:04:32 -07:00
Brian Behlendorf 483ccf0c63 ZTS: Remove zfs_allow_010_pos expection for FreeBSD
This issue should now be address by PR #15376 and the exception
for this test case be removed.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by:  Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15382
2023-10-10 08:59:10 -07:00
Tony Hutter aefb6a2bd6 zvol: Temporally disable blk-mq
There was a report of zvol data loss (#15351) after enabling blk-mq on a
zvol backed with 16k physical block sized disks.  Out of an abundance of
caution, do not allow the user to enable blk-mq until we can look into
the issue.

Note that blk-mq was not enabled by default on zvols.  It was always
opt-in via the zvol_use_blk_mq module parameter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Addresses: #15351
Closes #15378
2023-10-10 08:57:48 -07:00
Rob Norris 28096c7c13 AUTHORS: update with missing names
This is generated by scripts/update_authors.pl. I've looked over the
results fairly closely and while I don't think they're bad, they could
be improved somewhat, but also, I don't know if its good form to just
update this without explicit consent from those named.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15374
2023-10-10 08:55:42 -07:00
Rob Norris 000cfca068 update_authors: add missing names from commits to AUTHORS
Full description of what's happening in comments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15374
2023-10-10 08:54:57 -07:00
Rob Norris dc1d3303d2 mailmap: initial, trying to tidy up a lot of the commit history
This comes from the observation that a huge number of commit author
fields look quite strange (to my eyes), but quite often the
Signed-off-by: trailer has the correct name. For these I have updated
the name where it was obvious how to do so, however, I have not created
a mapping for the commit email to the Signed-off-by email, as whatever I
choose for email will become the prime candidate for inclusion in the
AUTHORS file, and care needs to be taken when acting without explicit
consent.

There's a small handful of commits that look like they were done on
local machines, or CI hosts, or similar, where the git authorship config
wasn't set up properly. Its obvious what this should look like, so I've
just done them.

The remainder is mapping Github noreply emails to either an
obviously-correct Signed-off-by trailer, or to a an author from another
commit. This was mostly done by hand, so there may be errors, but I
think its close. I do not understand where these come from - I know that
they're what commits made via Github web look like when there's no real
address set on the account, but I find it hard to believe that so many
of these came through the web, especially given the complexity of most
of the changes. I suspect there's some kind of merge helper tool in play
here. Regardless, the history is set now, and this tries to get it back
on track.

Obviously, all of this helps the history look tidy, but this also feeds
into the AUTHORS update script. See next commit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15374
2023-10-10 08:54:30 -07:00
Umer Saleem 250349ffff ZTS: Fix verify_fs_mount in delegate_common.kshlib
verify_fs_mount expects the dataset to remain unmounted after
updating the mountpoint property in delegate_common.kshlib.

This commit updates verify_fs_mount and uses nomount parameter
for zfs set to update the mountpoint property without mounting
the dataset.

This fixes the zfs_allow_010_pos test case, which was failing on
FreeBSD after the behavior update in setting the mountpoint
property.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15376
2023-10-09 17:24:24 -07:00
Brian Behlendorf 954a380e19 ZTS: Move zpool_import_hostid_changed* tests to Linux runfile
Relocate the zpool_import_hostid_changed* test cases to the Linux
runfile until these tests are modified to run cleanly on FreeBSD.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15377
2023-10-09 17:22:44 -07:00
Alexander Motin 008baa091f FreeBSD: Reduce divergence from in-tree sources
This includes random small tweaks, primarily a build fixes, required
when ZFS is built as part of FreeBSD base.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15368
2023-10-09 13:27:18 -07:00
Sam James c57ff818f6 config/zfs-build.m4: add Gentoo's bash-completion path
Followup e69ade32e1 by adding Gentoo's
bash completion path.

We should probably consider using/honouring the standard --with-bashcompletiondir
autoconf option as well, but that's something to do later.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Sam James <sam@gentoo.org>
Closes #15372
2023-10-09 12:50:06 -07:00
Alexander Motin 66b81b3497 ZIL: Reduce maximum size of WR_COPIED to 7.5K
Benchmarks show that at certain write sizes range lock/unlock take
not so much time as extra memory copy.  The exact threshold is not
obvious due to other overheads, but it is definitely lower than
~63KB used before.  Make it configurable, defaulting at 7.5KB,
that is 8KB of nearest malloc() size minus itx and lr structs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15353
2023-10-06 10:09:27 -07:00
siv0 74ed1ae08a rpm: Fix make rpm on Debian/Ubuntu
The recent patch to change the bash completion install location based
on the Distribution, ignored that it should still be possible to
create RPMs on Debian derived systems. Additionally `make deb` itself
creates RPMs and converts them via `alien`.

This patch adds the bashcompletiondir variable to the rpm defines and
uses this for the location, where to get the bash completion file.

It still changes the location on Debian/Ubuntu systems in the final
packages from /etc/bash_completion.d to
/usr/share/bash-completion/completions

Fixes: e69ade32e1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #15355
Closes #15365
2023-10-06 09:53:23 -07:00
Rob Norris 54b1b1d893 import: require force when cachefile hostid doesn't match on-disk
Previously, if a cachefile is passed to zpool import, the cached config
is mostly offered as-is to ZFS_IOC_POOL_TRYIMPORT->spa_tryimport(), and
the results are taken as the canonical pool config and handed back to
ZFS_IOC_POOL_IMPORT.

In the course of its operation, spa_load() will inspect the pool and
build a new config from what it finds on disk. However, it then
regenerates a new config ready to import, and so rightly sets the hostid
and hostname for the local host in the config it returns.

Because of this, the "require force" checks always decide the pool is
exported and last touched by the local host, even if this is not true,
which is possible in a HA environment when MMP is not enabled. The pool
may be imported on another head, but the import checks still pass here,
so the pool ends up imported on both.

(This doesn't happen when a cachefile isn't used, because the pool
config is discovered in userspace in zpool_find_import(), and that does
find the on-disk hostid and hostname correctly).

Since the systemd zfs-import-cache.service unit uses cachefile imports,
this can lead to a system returning after a crash with a "valid"
cachefile on disk and automatically, quietly, importing a pool that has
already been taken up by a secondary head.

This commit causes the on-disk hostid and hostname to be included in the
ZPOOL_CONFIG_LOAD_INFO item in the returned config, and then changes the
"force" checks for zpool import to use them if present.

This method should give no change in behaviour for old userspace on new
kernels (they won't know to look for the new config items) and for new
userspace on old kernels (the won't find the new config items).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15290
2023-10-06 09:24:44 -07:00
Rob Norris 8f5aa8cb00 tests: add tests for zpool import behaviour when hostid changes
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15290
2023-10-06 09:23:39 -07:00
Rob N 5b8688e620 zfsconcepts: add description of block cloning
Here I'm trying to succinctly introduce the concept, the basics of its
construction, how its different to dedup, how to use it, and where its
limitations lie, in four paragraphs and with enough searchable terms to
help the reader find more information both within OpenZFS and elsewhere.

Phew.

Sponsored-By: Klara, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15362
2023-10-06 09:06:29 -07:00
Alexander Motin 342357cd9e Reduce number of metaslab preload taskq threads.
Before this change ZFS created threads for 50% of CPUs for each top-
level vdev.  Plus it created the same number of threads for embedded
log groups (that have only one metaslab and don't need any preload).
As result, on system with 80 CPUs and pool of 60 vdevs this resulted
in 4800 metaslab preload threads, that is absolutely insane.

This patch changes the preload threads to 50% of CPUs in one taskq
per pool, so on the mentioned system it will be only 40 threads.

Among other things this fixes zdb on the mentioned system and pool
on FreeBSD, that failed to create so many threads in one process.

Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15319
2023-10-06 09:04:00 -07:00
Alexander Motin 75a2eb7fac ARC: Drop different size headers for crypto
To reduce memory usage ZFS crypto allocated bigger by 56 bytes ARC
headers only when specific block was encrypted on disk.  It was a
nice optimization, except in some cases the code reallocated them
on fly, that invalidated header pointers from the buffers.  Since
the buffers use different locking, it created number of races, that
were originally covered (at least partially) by b_evict_lock, used
also to protection evictions.  But it has gone as part of #14340.
As result, as was found in #15293, arc_hdr_realloc_crypt() ended
up unprotected and causing use-after-free.

Instead of introducing some even more elaborate locking, this patch
just drops the difference between normal and protected headers. It
cost us additional 56 bytes per header, but with couple patches
saving 24 bytes, the net growth is only 32 bytes with total header
size of 232 bytes on FreeBSD, that IMHO is acceptable price for
simplicity.  Additional locking would also end up consuming space,
time or both.

Reviewe-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15293
Closes #15347
2023-10-06 09:01:00 -07:00
Alexander Motin 96b9cf42e0 ARC: Remove b_bufcnt/b_ebufcnt from ARC headers
In most cases we do not care about exact number of buffers linked
to the header, we just need to know if it is zero, non-zero or one.
That can easily be checked just looking on b_buf pointer or in some
cases derefencing it.

b_ebufcnt is read only once, and in that case we already traverse
the list as part of arc_buf_remove(), so second traverse should not
be expensive.

This reduces L1 ARC header size by 8 bytes and full crypto header by
16 bytes, down to 176 and 232 bytes on FreeBSD respectively.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15350
2023-10-06 08:56:17 -07:00
Martin Matuška 82d2291398 CI: add FreeBSD build with Cirrus CI
As a first step for automatic FreeBSD testing add a build and install
for FreeBSD versions 12.4, 13.2 and 14-snapshot using Cirrus CI.

Reviewed-by: Jose Luis Duran 
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15332
2023-10-06 08:50:26 -07:00
Rob N 4848a0898e tests/block_cloning: sync before write in fallback test
We're still seeing this test fail intermittently (that is, the clone
happens), which must mean the write and the clone can still be happening
on different txgs.

It might be that there's still activity after the pool is created. So
here we force a sync before starting the write.

Sponsored-By: Klara Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15359
2023-10-06 08:39:20 -07:00
Alexander Motin 2a6c62109c ARC: Remove b_cv from struct l1arc_buf_hdr
Earlier as part of #14123 I've removed one use of b_cv.  This patch
reuses the same approach to remove the other one from much more
rare code path.

This saves 16 bytes of L1 ARC header on FreeBSD (reducing it from
200 to 184 bytes) and seems even more on Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15340
2023-10-04 14:45:00 -07:00
Andrew Turner f795e90a11 Add BTI landing pads to the AArch64 SHA2 assembly
The Arm Branch Target Identification (BTI) extension guards against
branching to an unintended instruction.

To support BTI add the landing pad instructions to the SHA2 functions.
These are from the hint space so are a nop on hardware that lacks BTI
support or if BTI isn't enabled.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Andrew Turner <andrew.turner4@arm.com>
Closes #14862
Closes #15339
2023-10-03 15:12:36 -07:00
Stoiko Ivanov eb955f6e93 contrib: debian: drop bashcompletion mangling after install
tested by running:
```
./configure --with-config=user; cp -a contrib/debian .
dpkg-buildpackage -b -uc -us
```
on a Debian 12 based system.

and checking where the completion file got installed.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #15304
2023-10-02 17:06:41 -07:00
Stoiko Ivanov e14293a4e5 contrib: debian: switch to dh-sequence-dkms
Follows b191f9a13d3005621ead9a727b811892264505ef from Debian's
packaging team at:
https://salsa.debian.org/zfsonlinux-team/zfs/

The previous build-dependency is kept as option, to still be able to
build on older Debian based distros (e.g. Ubuntu 20.04).

Without this building on Debian 12/bookworm does not work, as `dkms`
is a virtual package.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #15304
2023-10-02 17:06:33 -07:00
Stoiko Ivanov e69ade32e1 contrib: bash_completion.d: make install destination vendor dependent
Certain Linux distributions (Debian/Ubuntu at least) expect
bash-completion snippets to be installed in
/usr/share/bash-completion/completions instead of
/etc/bash_completion.d.

This patch sets the bashcompletiondir variable based on the vendor,
inspired by similar settings for initdir and initconfdir.

It seems that commit 612b8dff5b
caused the file to be installed in the first-place (thus the error
when building debian packages only became apparent when testing a
2.2.0-rc4 build)

The change only sets the variable in Makefile context - the
rpm/zfs.spec.in file has the path hardcoded as
%{_sysconfdir}/bash_completion.d/zfs, but since running
```
./configure --sysconfdir=/myetc  ; make rpm
```
also results in all relevant files to be installed in /etc instead of
/myetc I assume this can remain as is.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #15304
2023-10-02 17:05:59 -07:00
Umer Saleem 4e16964e1c Add '-u' - nomount flag for zfs set
This commit adds '-u' flag for zfs set operation. With this flag,
mountpoint, sharenfs and sharesmb properties can be updated
without actually mounting or sharing the dataset.

Previously, if dataset was unmounted, and mountpoint property was
updated, dataset was not mounted after the update. This behavior
is changed in #15240. We mount the dataset whenever mountpoint
property is updated, regardless if it's mounted or not.

To provide the user with option to keep the dataset unmounted and
still update the mountpoint without mounting the dataset, '-u'
flag can be used.

If any of mountpoint, sharenfs or sharesmb properties are updated
with '-u' flag, the property is set to desired value but the
operation to (re/un)mount and/or (re/un)share the dataset is not
performed and dataset remains as it was before.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15322
2023-10-02 16:58:54 -07:00
Chunwei Chen 249d759caf Fix invalid pointer access in trace_dbuf.h
In dnode_destroy, dn_objset is invalidated. However, it will later call
into dbuf_destroy, in which DTRACE_SET_STATE will try to access spa_name
via dn_objset causing illegal pointer access.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #15333
2023-10-02 16:58:01 -07:00
George Amanakis fe4d055b36 Report ashift of L2ARC devices in zdb
Commit 8af1104f does not actually store the ashift of cache devices in
their label. However, in order to facilitate reporting the ashift
through zdb, we enable this in the present commit. We also document
how the retrieval of the ashift is done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #15331
2023-10-02 16:57:09 -07:00
Alexander Motin e135388564 Restrict short block cloning requests
If we are copying only one block and it is smaller than recordsize
property, do not allow destination to grow beyond one block if it
is not there yet.  Otherwise the destination will get stuck with
that block size forever, that can be as small as 512 bytes, no
matter how big the destination grow later.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15321
2023-09-29 08:22:46 -07:00
Brian Behlendorf f9c39dc862 Tweak rebuild in-flight hard limit
Vendor testing shows we should be able to get a little more
performance if we further relax the hard limit which we're hitting.

Authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15324
2023-09-29 08:21:25 -07:00
Akash B ba769ea351 Fix ENOSPC for extended quota
When unlinking multiple files from a pool at 100% capacity, it
was possible for ENOSPC to be returned after the first few unlinks.
This issue was fixed previously by PR #13172 but then this was
again introduced by PR #13839.

This is resolved using the existing mechanism of returning ERESTART
when over quota as long as we know enough space will shortly be
available after processing the pending deferred frees.

Also, updated the existing testcase which reliably reproduced the
issue without this patch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #15312
2023-09-28 14:10:07 -07:00
Paul Dagnelie 5551dcd762 Don't allocate from new metaslabs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15307
Closes #15308
2023-09-28 14:08:52 -07:00
Paul Dagnelie ec994486b1 Reduce trim min size even lower for tests to reduce flakiness
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15315
2023-09-27 12:06:24 -07:00
Paul Dagnelie a59e294e21 ZTS: Fix introduced test bug in block_cloning_copyfilerange
Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15316
2023-09-26 14:37:28 -07:00
Brian Behlendorf 87725534db ZTS: Add additional exceptions
"zfs_share_concurrent_shares" may fail on FreeBSD and some Linux
distributions (fedora).  Move it to the common list.

"zfs_allow_010_pos" has been observed to fail on FreeBSD 13.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15306
2023-09-25 11:15:32 -07:00
Paul Dagnelie 8a9ba769ee Set timeout before creating pool in test
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15309
2023-09-25 11:14:00 -07:00
Paul Dagnelie 2e2a46e0a5 Invoke zdb by guid to avoid import errors
The problem that was occurring is basically that a device was removed 
by ztest and replaced with another device. It was then reguided. The 
import then failed because there were two possible imports with the 
same name; one with the new guid, and one with the old. This can 
happen because the label writes from the device removal/replacement 
can be subject to ztest's error injection. 

The other ways to fix this would be to change the error injection to 
not trigger on removals (which may not be technically feasible), or 
to change the import code to not report configurations that are so 
short on devices (which would potentially have unpleasant end-user 
effects when trying to recover from data losses/device configuration 
issues).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15298
2023-09-22 16:08:51 -07:00
Alexander Motin e5d70f4677 ZIL: Avoid dbuf_read() in ztest_get_data()
While working on similar patches for zfs and zvol in #15153 I've
forgot about ztest.  Update it also so that we test the same code
paths as use in production.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15301
2023-09-21 18:40:13 -07:00
Coleman Kane 7ac56b86cd Linux 6.6 compat: fsync_bdev() has been removed in favor of sync_blockdev()
In Linux commit 560e20e4bf6484a0c12f9f3c7a1aa55056948e1e, the
fsync_bdev() function was removed in favor of sync_blockdev() to do
(roughly) the same thing, given the same input. This change
conditionally attempts to call sync_blockdev() if fsync_bdev() isn't
discovered during configure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15263
2023-09-21 18:38:40 -07:00
Coleman Kane 01d00dfa9e Linux 6.6 compat: generic_fillattr has a new u32 request_mask added at arg2
In commit 0d72b92883c651a11059d93335f33d65c6eb653b, a new u32 argument
for the request_mask was added to generic_fillattr. This is the same
request_mask for statx that's present in the most recent API implemented
by zpl_getattr_impl. This commit conditionally adds it to the
zpl_generic_fillattr(...) macro, as well as the zfs_getattr_fast(...)
implementation, when configure determines it's present in the kernel's
generic_fillattr(...).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15263
2023-09-21 18:38:40 -07:00
Coleman Kane b37f29341b Linux 6.6 compat: use inode_get/set_ctime*(...)
In Linux commit 13bc24457850583a2e7203ded05b7209ab4bc5ef, direct access
to the i_ctime member of struct inode was removed. The new approach is
to use accessor methods that exclusively handle passing the timestamp
around by value. This change adds new tests for each of these functions
and introduces zpl_* equivalents in include/os/linux/zfs/sys/zpl.h. In
where the inode_get/set_ctime*() functions exist, these zpl_* calls will
be mapped to the new functions. On older kernels, these macros just wrap
direct-access calls. The code that operated on an address of ip->i_ctime
to call ZFS_TIME_DECODE() now will take a local copy using
zpl_inode_get_ctime(), and then pass the address of the local copy when
performing the ZFS_TIME_DECODE() call, in all cases, rather than
directly accessing the member.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15263
Closes #15257
2023-09-21 18:38:31 -07:00
Rob N 2dc89b922b tests/block_cloning: try harder to stay on same txg in fallback test
We've observed this test failing intermittently. When it does, the
"same block" check shows that both files have the same content, that is,
the file was cloned.

The only way this could have happened is if the open txg moved between
the dd and clonefile calls. That's possible because although we set
zfs_txg_timeout to be large, that only affects the wait time in the sync
thread at the start of a new txg; it doesn't change anything if its
currently waiting or working.

So here we just force the txgs to move immediately before, which should
get both operations onto the same txg as intented.

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris Rob Norris <rob.norris@klarasystems.com>
Closes #15303
2023-09-21 17:54:15 -07:00
Tony Hutter b53077a9e7 Add zfs_prepare_disk script for disk firmware install
Have libzfs call a special `zfs_prepare_disk` script before a disk is
included into the pool.  The user can edit this script to add things
like a disk firmware update or a disk health check.  Use of the script
is totally optional. See the zfs_prepare_disk manpage for full details.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15243
2023-09-21 08:36:26 -07:00
Rob N 4647353c8b status: report pool suspension state under failmode=continue
When failmode=continue is set and the pool suspends, both 'zpool status'
and the 'zfs/pool/state' kstat ignore it and report the normal vdev tree
state. There's no clear indicator that the pool is suspended. This is
unlike suspend in failmode=wait, or suspend due to MMP check failure,
which both report "SUSPENDED" explicitly.

This commit changes it so SUSPENDED is reported for failmode=continue
the same as for other modes.

Rationale:

The historical behaviour of failmode=continue is roughly, "press on as
though all is well". To this end, the fact that the pool had suspended
was not shown, to maintain the façade that all is well.

Its unclear why hiding this information was considered appropriate. One
possibility is that it was expected that a true pool fault would always
be reported as DEGRADED or FAULTED, and that the pool could not suspend
without these happening.

That is not necessarily true, as vdev health and suspend state are only
loosely connected, such that a pool in (apparent) good health can be
suspended for good reasons, and of course a degraded pool does not lead
to suspension. Even if that expectation were true, there's still a
difference in urgency - a degraded pool may not need to be attended to
for hours, while a suspended pool is most often unusable until an
operator intervenes.

An operator that has set failmode=continue has presumably done so
because their workload is one that can continue to operate in a useful
way when the pool suspends. In this case the operator still needs a
clear indicator that there is a problem that needs attending to.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15297
2023-09-20 16:56:45 -07:00
Paul Dagnelie 6d9bc3ec9f Fix occasional rsend test crashes
We have occasional crashes in the rsend tests. Debugging revealed 
that this is because the send_worker thread is getting EINTR from 
splice(). This happens when a non-fatal signal is received during 
the syscall. We should retry the syscall, rather than exiting failure.
Tweak the loop to only break if the splice is finished or we receive 
a non-EINTR error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15273
2023-09-20 16:39:38 -07:00
Alexander Motin 90149552b1 ZIL: Fix potential race on flush deferring.
zil_lwb_set_zio_dependency() can not set write ZIO dependency on
previous LWB's write ZIO if one is already in done handler and set
state to LWB_STATE_WRITE_DONE.  So theoretically done handler of
next LWB's write ZIO may run before done handler of previous LWB
write ZIO completes.  In such case we can not defer flushes, since
the flush issue process is not locked.

This may fix some reported assertions of lwb_vdev_tree not being
empty inside zil_free_lwb().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15278
2023-09-20 11:17:11 -07:00
Dag-Erling Smørgrav 5f1479d92f Use ASSERT0P() to check that a pointer is NULL.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dag-Erling Smørgrav <des@FreeBSD.org>
Closes #15225
2023-09-19 17:22:01 -07:00
Dag-Erling Smørgrav 506fe78c48 Add VERIFY0P() and ASSERT0P() macros.
These macros are similar to VERIFY0() and ASSERT0() but are intended
for pointers, and therefore use uintptr_t instead of int64_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dag-Erling Smørgrav <des@FreeBSD.org>
Closes #15225
2023-09-19 17:21:55 -07:00
Dag-Erling Smørgrav 01d9283af3 Clean up existing VERIFY*() macros.
Chiefly:

- Remove unnecessary parentheses around variable names.
- Remove spaces between the type and variable in casts.
- Make the panic message for VERIFY0() reflect how the macro is used.
- Use %p to format pointers, except in Linux kernel code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dag-Erling Smørgrav <des@FreeBSD.org>
Closes #15225
2023-09-19 17:21:01 -07:00
Umer Saleem c63aabaf1c Improve the handling of sharesmb,sharenfs properties
For sharesmb and sharenfs properties, the status of setting the
property is tied with whether we succeed to share the dataset or
not. In case sharing the dataset is not successful, this is
treated as overall failure of setting the property. In this case,
if we check the property after the failure, it is set to on.

This commit updates this behavior and the status of setting the
share properties is not returned as failure, when we fail to
share the dataset.

For sharenfs property, if access list is provided, the syntax
errors in access list/host adresses are not validated until after
setting the property during postfix phase while trying to
share the dataset. This is not correct, since the property has
already been set when we reach there.

Syntax errors in access list/host addresses are validated while
validating the property list, before setting the property and
failure is returned to user in this case when there are errors
in access list.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15240
2023-09-19 17:16:14 -07:00
Umer Saleem bbac1d2977 Update the behavior of mountpoint property
There are some inconsistencies in the handling of mountpoint
property. This commit updates the behavior and makes it
consistent.

If mountpoint property is set when dataset is unmounted, this
would update the mountpoint property. The mountpoint could be
valid or invalid in this case. Setting the mountpoint property
would result in success in this case. Dataset would still be
unmounted here.

On the other hand, if dataset is mounted and mountpoint
property is updated to something invalid where mount cannot be
successful, for example, setting the mountpoint inside a readonly
directory. This would unmount the dataset, set the mountpoint
property to requested value and tries to mount the dataset. The
mount operation returns error and this error is treated as
overall failure of setting the property while the property is
actually set.

To make the behavior consistent in case dataset is mounted or
unmounted, we should try to mount the dataset whenever mountpoint
property is updated. This would result in mounting the datasets
if canmount property is set to on, regardless if the dataset was
previously unmounted.

The failure in mount operation while setting the mountpoint
property should not be treated as failure, since the property is
actually set now to user requested value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15240
2023-09-19 17:15:24 -07:00
Rob N 7228ba1114 cmd: add 'help' subcommand to zpool and zfs
'program help subcommand' is a reasonably common pattern for
multifunction command-line programs. This commit adds support for that
style to the zpool and zfs commands.

When run as 'zpool help [<topic>]' or 'zfs help [<topic>]', executes the
'man' program on the PATH with the most likely manpage name for the
requested topic: "zpool-<topic>" or "zfs-<topic>" for subcommands, or
"zpool<topic>" or "zfs<topic>" for the "concepts" and "props" topics.
If no topic is supplied, uses the top "zpool" or "zfs" pages.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15288
2023-09-19 09:06:47 -07:00
Paul Dagnelie 2076011e0c Fix incorrect expected error in ztest
There is an occasional ztest failure that looks like ztest: attach 
(/var/tmp/zloop-run/ztest.13a 570425344, draid1-1-0 532152320, 1) 
returned 22, expected 95. This is because the value that we return 
is EINVAL, but expected_error is set incorrectly.

Change the expected_error value to match both the comment and the 
actual error value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15295
2023-09-19 09:02:23 -07:00
Paul Dagnelie 741c215bab Fix l2arc_apply_transforms ztest crash
In #13375 we modified the allocation size of the buffer that we use 
to apply l2arc transforms to be the size of the arc hdr we're using, 
rather than the allocation size that will be in place on the disk, 
because sometimes the hdr size is larger. Unfortunately, sometimes 
the allocation size is larger, which means that we overflow the buffer 
in that case. This change modifies the allocation to be the max of 
the two values

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15177
Closes #15248
2023-09-19 08:58:14 -07:00
Rob N 6cb933c56e tests: install missing PAM tests
'pam_change_unmounted' and 'pam_recursive' both exist and are referenced
by the test run config, but weren't being installed and so are excluded.
This gets them installed so they will run as expected.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15291
2023-09-19 08:48:02 -07:00
George Amanakis e923bcd16c Update the MOS directory on spa_upgrade_errlog()
spa_upgrade_errlog() does not update the MOS directory when the
head_errlog feature is enabled. In this case if spa_errlog_sync() is not
called, the MOS dir references the old errlog_last and errlog_sync
objects. Thus when doing a scrub a panic will occur:

Call Trace:
 dump_stack+0x6d/0x8b
 panic+0x101/0x2e3
 spl_panic+0xcf/0x102 [spl]
 delete_errlog+0x124/0x130 [zfs]
 spa_errlog_sync+0x256/0x260 [zfs]
 spa_sync_iterate_to_convergence+0xe5/0x250 [zfs]
 spa_sync+0x2f7/0x670 [zfs]
 txg_sync_thread+0x22d/0x2d0 [zfs]
 thread_generic_wrapper+0x83/0xa0 [spl]
 kthread+0x104/0x140
 ret_from_fork+0x1f/0x40

Fix this by updating the related MOS directory objects in
spa_upgrade_errlog().

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #15279 
Closes #15277
2023-09-18 17:06:35 -07:00
Mateusz Guzik ee720ad7bc Retire z_nr_znodes
Added in ab26409db7 ("Linux 3.1 compat, super_block->s_shrink"), with
the only consumer which needed the count getting retired in 066e825221
("Linux compat: Minimum kernel version 3.10").

The counter gets in the way of not maintaining the list to begin with.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #15274
2023-09-18 16:53:33 -07:00
Tony Hutter 529bec7d7b zed: Allow autoreplace and fault LEDs for removed vdevs
Allow zed to autoreplace vdevs marked as REMOVED.  Also update
statechange-led zedlet to toggle fault LEDs for REMOVED vdevs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15281
2023-09-18 16:25:58 -07:00
наб 9192ab7777 check-zstd-symbols: also ignore __pfx_ symbols
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b341b20d648bb7e9a3307c33163e7399f0913e66

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #15282 
Closes #15284
2023-09-18 09:08:41 -07:00
Tony Hutter 8af8d2abb1 Linux 6.5 compat: META (#15265)
Update the META file to reflect compatibility with the 6.5
kernel.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-09-12 12:51:11 -07:00
Laura Hild 4d1b70175c Remove implication that child disks aren't vdevs in zpoolconcepts(7)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Laura Hild <lsh@jlab.org>
Closes #15247
2023-09-11 14:58:19 -07:00
ednadolski-ix 0ee9b02390 update max_variance limit in zdb_block_size_histogram test for CI
Commit 2d7843401a had previously
updated this hardcoded limit to allow for CI testing. As there
is no deterministic pass/fail value, the need has arisen for
one more small increase.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Closes #15252
2023-09-09 10:23:29 -07:00
Alexander Motin 5cc1876f14 Add more constraints for block cloning.
- We cannot clone into files with smaller block size if there is
more than one block, since we can not grow the block size.
 - Block size must be power-of-2 if destination offset != 0, since
there can be no multiple blocks of non-power-of-2 size.

The first should handle the case when destination file has several
blocks but still is not bigger than one block of the source file.
The second fixes panic in dmu_buf_hold_array_by_dnode() on attempt
to concatenate files with equal but non-power-of-2 block sizes.

While there, assert that error is reported if we made no progress.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15251
2023-09-09 10:22:36 -07:00
Volker Mauel 12ce45f260 Intel QAT 1.7 compatibility
Based on the intel QAT samples which are bundled in the 1.x drivers, 
this is the preferred approach since api version 1.6.  See:

https://www.intel.de/content/www/de/de/download/19734/intel-quickassist-technology-driver-for-linux-hw-version-1-x.html?

Reviewed-by: Weigang Li <weigang.li@intel.com>
Signed-off-by: Volker Mauel <volkermauel@gmail.com>
Closes #15190
2023-09-07 14:38:17 -07:00
Andrea Righi 3602775330 Linux 6.5 compat: spl: properly unregister sysctl entries
When register_sysctl_table() is unavailable we fail to properly
unregister sysctl entries under "kernel/spl".

This leads to errors like the following when spl is unloaded/reloaded,
making impossible to properly reload the spl module:

[  746.995704] sysctl duplicate entry: /kernel/spl/kmem/slab_kvmem_total

Fix by cleaning up all the sub-entries inside "kernel/spl" when the
spl module is unloaded.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Closes #15239
2023-09-07 14:36:32 -07:00
ednadolski-ix 95f71c019d Selectable block allocators
ZFS historically has had several space allocators that were
dynamically selectable.  While these have been retained in 
OpenZFS, only a single allocator has been statically compiled 
in. This patch compiles all allocators for OpenZFS and provides 
a module parameter to allow for manual selection between them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Closes #15218
2023-09-01 18:00:30 -07:00
Umer Saleem 71472bf375 Relax error reporting in zpool import and zpool split
For zpool import and zpool split, zpool_enable_datasets is called
to mount and share all datasets in a pool. If there is an error
while mounting or sharing any dataset in the pool, the status of
import or split is reported as failure. However, the changes do
show up in zpool list.

This commit updates the error reporting in zpool import and zpool
split path. More descriptive messages are shown to user in case
there is an error during mount or share. Errors in mount or share
do not effect the overall status of zpool import and zpool split.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15216
2023-09-01 17:25:11 -07:00
Andrea Righi bcb1159c09 Linux 6.5 compat: safe cleanup in spl_proc_fini()
If we fail to create a proc entry in spl_proc_init() we may end up
calling unregister_sysctl_table() twice: one in the failure path of
spl_proc_init() and another time during spl_proc_fini().

Avoid the double call to unregister_sysctl_table() and while at it
refactor the code a bit to reduce code duplication.

This was accidentally introduced when the spl code was
updated for Linux 6.5 compatibility.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Closes #15234 
Closes #15235
2023-09-01 17:21:40 -07:00
Alexander Motin 9da6b60417 ZIL: Change ZIOs issue order.
In zil_lwb_write_issue(), after issuing lwb_root_zio/lwb_write_zio,
we have no right to access lwb->lwb_child_zio. If it was not there,
the first two ZIOs may have already completed and freed the lwb.
ZIOs issue in opposite order from children to parent should keep
the lwb valid till the end, since the lwb can be freed only after
lwb_root_zio completion callback.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15233
2023-09-01 17:14:50 -07:00
Alexander Motin b1b99e10a6 ZIL: Revert zl_lock scope reduction.
While I have no reports of it, I suspect possible use-after-free
scenario when zil_commit_waiter() tries to dereference zcw_lwb
for lwb already freed by zil_sync(), while zcw_done is not set.
Extension of zl_lock scope as it was originally should block
zil_sync() from freeing the lwb, closing this race.

This reverts #14959 and couple chunks of #14841.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15228
2023-09-01 17:13:52 -07:00
Alexander Motin bbcf18c293 ZIL: Tune some assertions.
In zil_free_lwb() we should first assert lwb_state or the rest of
assertions can be misleading if it is false.

Add lwb_state assertions in zil_lwb_add_block() to make sure we are
not trying to add elements to lwb_vdev_tree after it was processed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15227
2023-09-01 17:13:22 -07:00
Dimitry Andric 010c003e5f dmu_buf_will_clone: change assertion to fix 32-bit compiler warning
Building module/zfs/dbuf.c for 32-bit targets can result in a warning:

In file included from
/usr/src/sys/contrib/openzfs/include/sys/zfs_context.h:97,
                 from /usr/src/sys/contrib/openzfs/module/zfs/dbuf.c:32:
/usr/src/sys/contrib/openzfs/module/zfs/dbuf.c: In function
'dmu_buf_will_clone':
/usr/src/sys/contrib/openzfs/lib/libspl/include/assert.h:116:33: error:
cast from pointer to integer of different size
[-Werror=pointer-to-int-cast]
  116 |         const uint64_t __left = (uint64_t)(LEFT);
  \
      |                                 ^
/usr/src/sys/contrib/openzfs/lib/libspl/include/assert.h:148:25: note:
in expansion of macro 'VERIFY0'
  148 | #define ASSERT0         VERIFY0
      |                         ^~~~~~~
/usr/src/sys/contrib/openzfs/module/zfs/dbuf.c:2704:9: note: in
expansion of macro 'ASSERT0'
 2704 |         ASSERT0(dbuf_find_dirty_eq(db, tx->tx_txg));
      |         ^~~~~~~

This is because dbuf_find_dirty_eq() returns a pointer, which if
pointers are 32-bit results in a warning about the cast to uint64_t.

Instead, use the ASSERT3P() macro, with == and NULL as second and third
arguments, which should work regardless of the target's bitness.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #15224
2023-08-31 18:17:12 -07:00
Serapheim Dimitropoulos cad00d5180 checkstyle: fix action failures
Reviewed-by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #15220
2023-08-29 09:12:40 -07:00
Paul Dagnelie bee9cfb813 Increase limit of redaction list by using spill block
Currently redaction bookmarks and their associated redaction lists
have a relatively low limit of 36 redaction snapshots. This is imposed
by the number of snapshot GUIDs that fit in the bonus buffer of the
redaction list object. While this is more than enough for most use
cases, there are some limited cases where larger numbers would be
useful to support.

We tweak the redaction list creation code to use a spill block if
the number of redaction snapshots is above the amount that would fit
in the bonus buffer. We also make a small change to allow spill blocks
to be use for types of data besides SA. In order to fully leverage
this logic, we also change the redaction code to use vmem_alloc, to
handle extremely large allocations if needed. Finally, small tweaks
were made to the zfs commands and the test suite.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15018
2023-08-26 11:34:43 -07:00
Paul Dagnelie 11326f8eb1 Try to clarify wording to reduce zpool add incidents
Try to clarify wording to reduce zpool add incidents.
Add an attach example.

Reviewed-by: Rich Ercolani <Rincebrain@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15179
2023-08-26 11:30:19 -07:00
Rich Ercolani 277f2e587b Avoid save/restoring AMX registers to avoid a SPR erratum
Intel SPR erratum SPR4 says that if you trip into a vmexit while
doing FPU save/restore, your AMX register state might misbehave...
and by misbehave, I mean save all zeroes incorrectly, leading to
explosions if you restore it.

Since we're not using AMX for anything, the simple way to avoid
this is to just not save/restore those when we do anything, since
we're killing preemption of any sort across our save/restores.

If we ever decide to use AMX, it's not clear that we have any
way to mitigate this, on Linux...but I am not an expert.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14989
Closes #15168
2023-08-26 11:25:46 -07:00
Brian Behlendorf f0e34c8879 zed: update zed.d/statechange-slot_off.sh
The statechange-slot_off.sh zedlet which was added in #15200
needed to be installed so it's included by the packages.

Additional testing has also shown that multiple retries are
often needed for the script to operate reliably.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15210
2023-08-26 11:22:28 -07:00
наб d54358ff59 Make zoned/jailed zfsprops(7) make more sense.
- Distribute zfs-[un]jail.8 on FreeBSD and zfs-[un]zone.8 on Linux
- zfsprops.7: mirror zoned/jailed, only available on respective platforms

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #15161
2023-08-25 16:13:43 -07:00
Rob N 804414aad2 tests/block_cloning: rename and document get_same_blocks helper
`get_same_blocks` is a helper to compare two files and return a list of
the blocks that are clones of each other. Its very necessary for block
cloning tests.

Previously it was incorrectly called `unique_blocks`, which is the
_inverse_ of what it does (an early version did list unique blocks; it
was changed but the name was not). So if nothing else, it should be
called `duplicate_blocks`.

But, keeping the details of a clone operation in your head is actually
quite difficult, without the additional overhead of wondering how the
tools work. So I've renamed it to better describe what it does, added a
usage note, and changed it to return block indexes from 0 instead of 1,
to match how L0 blocks are normally counted.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by:  Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15181
2023-08-25 10:31:29 -07:00
Serapheim Dimitropoulos ed39d668ea Update outdated assertion from zio_write_compress
As part of some internal gang block testing within Delphix
we hit the assertion removed by this patch. The assertion
was triggered by a ZIO that had two copies and was a gang
block making the following expression equal to 3:
```
MIN(zp->zp_copies + BP_IS_GANG(bp), spa_max_replication(spa))
```
and failing when we expected the above to be equal to
`BP_GET_NDVAS(bp)`.

The assertion is no longer valid since the following commit:
```
commit 14872aaa4f
Author: Matthew Ahrens <matthew.ahrens@delphix.com>
Date:   Mon Feb 6 09:37:06 2023 -0800

  EIO caused by encryption + recursive gang
```

The above commit changed gang block headers so they can't
have more than 2 copies but the assertion in question from
this PR was never updated.

Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #15180
2023-08-25 10:28:36 -07:00
Alexander Motin eda3fcd56f ZIL: Second attempt to reduce scope of zl_issuer_lock.
The previous patch #14841 appeared to have significant flaw, causing
deadlocks if zl_get_data callback got blocked waiting for TXG sync.  I
already handled some of such cases in the original patch, but issue
 #14982 shown cases that were impossible to solve in that design.

This patch fixes the problem by postponing log blocks allocation till
the very end, just before the zios issue, leaving nothing blocking after
that point to cause deadlocks.  Before that point though any sleeps are
now allowed, not causing sync thread blockage.  This require slightly
more complicated lwb state machine to allocate blocks and issue zios
in proper order.  But with removal of special early issue workarounds
the new code is much cleaner now, and should even be more efficient.

Since this patch uses null zios between write, I've found that null
zios do not wait for logical children ready status in zio_ready(),
that makes parent write to proceed prematurely, producing incorrect
log blocks.  Added ZIO_CHILD_LOGICAL_BIT to zio_wait_for_children()
fixes it.

Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15122
2023-08-24 17:08:49 -07:00
Tony Hutter 11fbcacf37 zed: Add zedlet to power off slot when drive is faulted
If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then
power off the drive's slot in the enclosure if it becomes FAULTED.
This can help silence misbehaving drives.  This assumes your drive
enclosure fully supports slot power control via sysfs.

Reviewed-by: @AllKind
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15200
2023-08-24 11:59:03 -07:00
Rob N cae502c175 copy_file_range: fix fallback when source create on same txg
In 019dea0a5 we removed the conversion from EAGAIN->EXDEV inside
zfs_clone_range(), but forgot to add a test for EAGAIN to the
copy_file_range() entry points to trigger fallback to a content copy.

This commit fixes that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15170
Closes #15172
2023-08-14 17:34:14 -07:00
Alexander Motin 8e20e0ff39 ZIL: Replay blocks without next block pointer.
If we get next block allocation error during log write, we trigger
transaction commit.  But the block we have just completed is still
written and transactions it covers will be acknowledged normally.
If after that we ignore the block during replay just because it is
the last in the chain, we may not replay some transactions that we
have acknowledged as synced, that is not right.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15132
2023-08-11 09:04:44 -07:00
Alexander Motin bdb7df4245 ZIL: Avoid dbuf_read() before dmu_sync().
In most cases dmu_sync() works with dirty records directly and does
not need actual data. The only exception is dmu_sync_late_arrival().
To save some CPU time use dmu_buf_hold_noread*() in z*_get_data()
and explicitly call dbuf_read() in dmu_sync_late_arrival(). There
is also a chance that by that time TXG will already be synced and
we won't have to do it at all.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15153
2023-08-11 09:04:08 -07:00
Coleman Kane 8ce2eba9e6 Linux 6.5 compat: Use copy_splice_read instead of filemap_splice_read
Using the filemap_splice_read function for the splice_read handler was
leading to occasional data corruption under certain circumstances. Favor
using copy_splice_read instead, which does not demonstrate the same
erroneous behavior under the tested failure cases.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15164
2023-08-08 15:42:32 -07:00
Umer Saleem 8150090257 Move zinject from openzfs-zfs-test to openzfs-zfsutils
For Native Debian packaging, zinject binary and man page is
packaged in ZFS test package. zinject is not not directly related
to ZTS and should be packaged with other utilities, like it is
present in zfs_<ver>.rpm/deb packages.

This commit moves zinject binary and man page from openzfs-zfs-test
to openzfs-zfsutils package.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15160
2023-08-08 09:40:36 -07:00
Rafael Kitover b8c9070d09 dracut: support mountpoint=legacy for root dataset
Support mountpoint=legacy for the root dataset in the dracut zfs support
scripts.

mountpoint=/ or mountpoint=/sysroot also works.

Change zfs-env-bootfs.service to add zfsutil to BOOTFSFLAGS only for
root datasets with mountpoint != legacy.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Rafael Kitover <rkitover@gmail.com>
Closes #15149
2023-08-08 09:38:34 -07:00
oromenahar 019dea0a55 zfs_clone_range should return a descriptive error codes
Return the more descriptive error codes instead of `EXDEV` when
the parameters don't match the requirements of the clone function.
Updated the comments in `brt.c` accordingly.
The first three errors are just invalid parameters, which zfs can
not handle.
The fourth error indicates that the block which should be cloned
is created and cloned or modified in the same transaction
group (`txg`).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Kay Pedersen <mail@mkwg.de>
Closes #15148
2023-08-08 09:37:06 -07:00
наб 683edb32b7 libzfs: sendrecv: send_progress_thread: handle SIGINFO/SIGUSR1
POSIX timers target the process, not the thread (as does SIGINFO),
so we need to block it in the main thread which will die if interrupted.

Ref: https://101010.pl/@ed1conf@bsd.network/110731819189629373
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #15113
2023-08-08 09:35:35 -07:00
Coleman Kane 36261c8238 Linux 6.5 compat: replace generic_file_splice_read with filemap_splice_read
The generic_file_splice_read function was removed in Linux 6.5 in favor
of filemap_splice_read. Add an autoconf test for filemap_splice_read and
use it if it is found as the handler for .splice_read in the
file_operations struct. Additionally, ITER_PIPE was removed in 6.5. This
change removes the ITER_* macros that OpenZFS doesn't use from being
tested in config/kernel-vfs-iov_iter.m4. The removal of ITER_PIPE was
causing the test to fail, which also affected the code responsible for
setting the .splice_read handler, above. That behavior caused run-time
panics on Linux 6.5.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15155
2023-08-07 15:47:46 -07:00
Ryan Lahfa fdb8fff916 linux/spl/kmem_cache: undefine kmem_cache_alloc before defining it
When compiling a kernel with bcachefs and zfs,
the two macros will collide, making it impossible
to have both filesystems.

It is sufficient to just undefine the macro before calling it.

On why this should be in ZFS rather than bcachefs, currently,
bcachefs is not a in-tree filesystem, but,
it has a reasonably high chance of getting included soon.

This avoids the breakage in ZFS early,
this patch may be distributed downstream in NixOS
and is already used there.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Lahfa <ryan@lahfa.xyz>
Closes #15144
2023-08-07 13:55:59 -07:00
Alexander Motin 6c94e64963 Refactor dmu_prefetch().
- Split dmu_prefetch_dnode() from dmu_prefetch() into a separate
function.  It is quite inconvenient to read the code where len = 0
means dnode prefetch instead indirect/data prefetch.  One function
doing both has no benefits, since the code paths are independent.
 - Improve dmu_prefetch() handling of long block ranges.  Instead
of limiting L0 data length to prefetch for to dmu_prefetch_max,
make dmu_prefetch_max limit the actual amount of prefetch at the
specified level, and, if there is more, prefetch all the rest at
higher indirection level.  It should improve random access times
within the prefetched range of any length, reducing importance of
specific dmu_prefetch_max value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15076
2023-08-07 13:54:41 -07:00
Mateusz Piotrowski a97b8fc2dd Fix some typos
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Closes #15141
2023-08-07 13:53:59 -07:00
Coleman Kane e47e9bbe86 Linux 6.5 compat: register_sysctl_table removed
Additionally, the .child element of ctl_table has been removed in 6.5.
This change adds a new test for the pre-6.5 register_sysctl_table()
function, and uses the old code in that case. If it isn't found, then
the parentage entries in the tables are removed, and the register_sysctl
call is provided the paths of "kernel/spl", "kernel/spl/kmem", and
"kernel/spl/kstat" directly, to populate each subdirectory over three
calls, as is the new API.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15138
2023-08-02 14:05:46 -07:00
Brian Atkinson a5fdba1185 Revert "Linux 6.5 compat: register_sysctl_table removed"
This reverts commit b35374fd64 as there
are error messages when loading the SPL module. Errors seemed to be tied
to duplicate a duplicate entry.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #15134
2023-08-01 14:48:19 -07:00
Serapheim Dimitropoulos 12373b0cc7 zpool_vdev_remove() should handle EALREADY error return
When the vdev properties features was merged an extra check
was added in `spa_vdev_remove_top_check()` which checked
whether the vdev that we want to remove is already being
removed and if so return an EALREADY error.

```
static int
spa_vdev_remove_top_check(vdev_t *vd)
{
	... <snip> ...
	/*
	 * This device is already being removed
	 */
	if (vd->vdev_removing)
		return (SET_ERROR(EALREADY));
```

Before that change we'd still fail with an error but it
was a more generic one - here is the check that failed
later in the same function:
```
	/*
	 * There can not be a removal in progress.
	 */
	if (spa->spa_removing_phys.sr_state == DSS_SCANNING)
		return (SET_ERROR(EBUSY));
```

Changing the error code returned from that function changed
the behavior of the removal's library interface exposed to
the userland - `spa_vdev_remove()` now returns `EZFS_UNKNOWN`
instead of `EZFS_EBUSY` that was returning before.

This patch adds logic to make `spa_vdev_remove()` mindful
of the new EALREADY code and propagating `EZFS_EBUSY`
reverting to the previously established semantics of that
function.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #15013
Closes #15129
2023-08-01 14:47:00 -07:00
Rob N ead3eea3e0 linux/copy_file_range: properly request a fallback copy on Linux <5.3
Before Linux 5.3, the filesystem's copy_file_range handler had to signal
back to the kernel that we can't fulfill the request and it should
fallback to a content copy. This is done by returning -EOPNOTSUPP.

This commit converts the EXDEV return from zfs_clone_range to
EOPNOTSUPP, to force the kernel to fallback for all the valid reasons it
might be unable to clone. Without it the copy_file_range() syscall will
return EXDEV to userspace, breaking its semantics.

Add test for copy_file_range fallbacks.  copy_file_range should always
fallback to a content copy whenever ZFS can't service the request with
cloning.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15131
2023-08-01 11:31:11 -07:00
Zach Dykstra fcd61d937f readmmap.c: fix building with MUSL libc
glibc includes sys/types.h from stdlib.h. This is not the case for MUSL,
so explicitly include it. Fixes usage of uint_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Zach Dykstra <dykstra.zachary@gmail.com>
Closes #15130
2023-08-01 09:01:32 -07:00
Rob N 114a39964f zdb: include cloned blocks in block statistics
This gives `zdb -b` support for clone blocks.

Previously, it didn't know what clones were, so would count their space
allocation multiple times and then report leaked space (or, in debug,
would assert trying to claim blocks a second time).

This commit fixes those bugs, and reports the number of clones and the
space "used" (saved) by them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15123
2023-08-01 08:56:30 -07:00
наб a21ca18d4d linux: zfs: ctldir: set [amc]time to snapshot's creation property
If looking up a snapdir inode failed, hold pool config – hold the 
snapshot – get its creation property – release it – release it, 
then use that as the [amc]time in the allocated inode. If that 
fails then fall back to current time. No performance impact since 
this is only done when allocating a new snapdir inode.
                                                       
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #15110
Closes #15117
2023-08-01 08:50:17 -07:00
Coleman Kane 6751634d77 Linux 4.20 compat: wrapper function for iov_iter type access
An iov_iter_type() function to access the "type" member of the struct
iov_iter was added at one point. Move the conditional logic to decide
which method to use for accessing it into a macro and simplify the
zpl_uio_init code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15100
2023-08-01 08:42:33 -07:00
Coleman Kane 325505e5c4 Linux 6.4 compat: iter_iov() function now used to get old iov member
The iov_iter->iov member is now iov_iter->__iov and must be accessed via
the accessor function iter_iov(). Create a wrapper that is conditionally
compiled to use the access method appropriate for the target kernel
version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15100
2023-08-01 08:42:26 -07:00
Coleman Kane 43e8f6e37f Linux 6.5 compat: blkdev changes
Multiple changes to the blkdev API were introduced in Linux 6.5. This
includes passing (void* holder) to blkdev_put, adding a new
blk_holder_ops* arg to blkdev_get_by_path, adding a new blk_mode_t type
that replaces uses of fmode_t, and removing an argument from the release
handler on block_device_operations that we weren't using. The open
function definition has also changed to take gendisk* and blk_mode_t, so
update it accordingly, too.

Implement local wrappers for blkdev_get_by_path() and
vdev_blkdev_put() so that the in-line calls are cleaner, and place the
conditionally-compiled implementation details inside of both of these
local wrappers. Both calls are exclusively used within vdev_disk.c, at
this time.

Add blk_mode_is_open_write() to test FMODE_WRITE / BLK_OPEN_WRITE
The wrapper function is now used for testing using the appropriate
method for the kernel, whether the open mode is writable or not.

Emphasize fmode_t arg in zvol_release is not used

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15099
2023-08-01 08:37:20 -07:00
Coleman Kane 3b8e318b77 Linux 6.5 compat: use disk_check_media_change when it exists
When disk_check_media_change() exists, then define
zfs_check_media_change() to simply call disk_check_media_change() on
the bd_disk member of its argument. Since disk_check_media_change()
is newer than when revalidate_disk was present in bops, we should
be able to safely do this via a macro, instead of recreating a new
implementation of the inline function that forces revalidation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15101
2023-08-01 08:32:38 -07:00
Coleman Kane b35374fd64 Linux 6.5 compat: register_sysctl_table removed
Additionally, the .child element of ctl_table has been removed in 6.5.
This change adds a new test for the pre-6.5 register_sysctl_table()
function, and uses the old code in that case. If it isn't found, then
the parentage entries in the tables are removed, and the register_sysctl
call is provided the paths of "kernel/spl", "kernel/spl/kmem", and
"kernel/spl/kstat" directly, to populate each subdirectory over three
calls, as is the new API.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15098
2023-08-01 08:27:58 -07:00
oromenahar e9c59310f7 Check the return value in clonefile test
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Kay Pedersen <mail@mkwg.de>
Closes #15128
2023-08-01 08:26:12 -07:00
Alexander Motin b22bab2547 Remove fastwrite mechanism.
Fastwrite was introduced many years ago to improve ZIL writes spread
between multiple top-level vdevs by tracking number of allocated but
not written blocks and choosing vdev with smaller count.  It suposed
to reduce ZIL knowledge about allocation, but actually made ZIL to
even more actively report allocation code about the allocations,
complicating both ZIL and metaslabs code.

On top of that, it seems ZIO_FLAG_FASTWRITE setting in dmu_sync()
was lost many years ago, that was one of the declared benefits. Plus
introduction of embedded log metaslab class solved another problem
with allocation rotor accounting both normal and log allocations,
since in most cases those are now in different metaslab classes.

After all that, I'd prefer to simplify already too complicated ZIL,
ZIO and metaslab code if the benefit of complexity is not obvious.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15107
2023-07-28 13:30:33 -07:00
oromenahar 5bdfff5cfc BRT should return EOPNOTSUPP
Return the more descriptive EOPNOTSUPP instead of EXDEV when the
storage pool doesn't support block cloning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Kay Pedersen <mail@mkwg.de>
Closes #15097
2023-07-27 11:32:34 -07:00
Alexander Motin 704c80f048 Avoid waiting in dmu_sync_late_arrival().
The transaction there does not produce any dirty data or log blocks,
so it should not be throttled. All other cases wait for TXG sync, by
which time the log block we are writing will be obsolete, so we can
skip waiting and just return error here instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15096
2023-07-27 09:07:09 -07:00
Brian Behlendorf 782312c612 zed: Reduce log noise for large JBODs
For large JBODs the log message "zfs_iter_vdev: no match" can
account for the bulk of the log messages (over 70%).  Since this
message is purely informational and not that useful we remove it.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15086
Closes #15094
2023-07-25 13:55:29 -07:00
Alexander Motin 2848de11e5 Remove zl_issuer_lock from zil_suspend().
This locking was recently added as part of #14979. But appears it
is illegal to take zl_issuer_lock while holding dp_config_rwlock,
taken by dsl_pool_hold().  It causes deadlock with sync thread in
spa_sync_upgrades().  On a second thought, we should not
need this locking, since zil_commit_impl() we call below takes
zl_issuer_lock, that should sufficiently protect zl_suspend reads,
combined with other logic from #14979.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15103
2023-07-25 09:08:36 -07:00
Rob Norris 48d0e9465d zts: block cloning tests
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
Closes #405
Closes #13349
2023-07-24 16:37:58 -07:00
Rob Norris 6b0a4be5fe linux: implement filesystem-side copy/clone functions for EL7
Redhat have backported copy_file_range and clone_file_range to the EL7
kernel using an "extended file operations" wrapper structure. This
connects all that up to let cloning work there too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-24 16:37:04 -07:00
Rob Norris 9927f219f1 linux: implement filesystem-side clone ioctls
Prior to Linux 4.5, the FICLONE etc ioctls were specific to BTRFS, and
were implemented as regular filesystem-specific ioctls. This implements
those ioctls directly in OpenZFS, allowing cloning to work on older
kernels.

There's no need to gate these behind version checks; on later kernels
Linux will simply never deliver these ioctls, instead calling the
approprate VFS op.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-24 16:36:54 -07:00
Rob Norris 5a35c68b67 linux: implement filesystem-side copy/clone functions
This implements the Linux VFS ops required to service the file
copy/clone APIs:

  .copy_file_range    (4.5+)
  .clone_file_range   (4.5-4.19)
  .dedupe_file_range  (4.5-4.19)
  .remap_file_range   (4.20+)

Note that dedupe_file_range() and remap_file_range(REMAP_FILE_DEDUP) are
hooked up here, but are not implemented yet.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-24 16:36:38 -07:00
Rob Norris f6facd2429 dbuf_sync_leaf: check DB_READ in state assertions
Block cloning introduced a new state transition from DB_NOFILL to
DB_READ. This occurs when a block is cloned and then read on the
current txg.

In this case, the clone will move the dbuf to DB_NOFILL, and then the
read will be issued for the overidden block pointer. If that read is
still outstanding when it comes time to write, the dbuf will be in
DB_READ, which is not handled by the checks in dbuf_sync_leaf, thus
tripping the assertions.

This updates those checks to allow DB_READ as a valid state iff the
dirty record is for a BRT write and there is a override block pointer.
This is a safe situation because the block already exists, so there's
nothing that could change from underneath the read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Original-patch-by: Kay Pedersen <mail@mkwg.de>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-24 16:36:17 -07:00
Rob Norris d4edecd1a2 dmu_buf_will_clone: only check that current txg is clean
dbuf_undirty() will (correctly) only removed dirty records for the given
(open) txg. If there is a dirty record for an earlier closed txg that
has not been synced out yet, then db_dirty_records will still have
entries on it, tripping the assertion.

Instead, change the assertion to only consider the current txg. To some
extent this is redundant, as its really just saying "did dbuf_undirty()
work?", but it it doesn't hurt and accurately expresses our
expectations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Original-patch-by: Kay Pedersen <mail@mkwg.de>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-24 16:35:56 -07:00
Rob Norris 87a6e135c5 brt_vdev_realloc: use vmem_alloc for large allocation
bv_entcount can be a relatively large allocation (see comment for
BRT_RANGESIZE), so get it from the big allocator.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-24 16:35:49 -07:00
Rob Norris 8d21c002c6 zfs_clone_range: use vmem_malloc for large allocation
Just silencing the warning about large allocations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-24 16:34:11 -07:00
Alexander Motin 2cb992a99c ZIL: Fix config lock deadlock.
When we have some LWBs closed and their ZIOs ready to be issued, we
can not afford sleeping on config lock if somebody else try to lock
it as writer, or it will cause a deadlock.

To solve it, move spa_config_enter() from zil_lwb_write_issue() to
zil_lwb_write_close() under zl_issuer_lock to enforce lock ordering
with other threads.  Now if we can't immediately lock config, issue
all previously closed LWBs so that they could drop their config
locks after completion, and only then allow sleeping on our lock.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15078
Closes #15080
2023-07-24 13:41:11 -07:00
Brian Behlendorf fb344f5aeb Linux 6.4 compat: META
Update the META file to reflect compatibility with the 6.4 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15095
2023-07-24 11:20:42 -07:00
Rob N 13ec73a028 shellcheck: disable "unreachable command" check [SC2317]
This new check in 0.9.0 appears to have some issues with various forms
of "early return", like trap, exit and return. This is tripping up (at
least):

  cmd/zed/zed.d/history_event-zfs-list-cacher.sh
  /etc/zfs/zfs-functions

Its not obvious what its complaining about or what the remedy is, so it
seems sensible to disable this check for now.

See also:

  https://www.shellcheck.net/wiki/SC2317
  https://github.com/koalaman/shellcheck/issues/2542
  https://github.com/koalaman/shellcheck/issues/2613

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15089
2023-07-21 11:53:06 -07:00
Rob N 46adb2820a metaslab: tuneable to better control force ganging
metaslab_force_ganging isn't enough to actually force ganging, because
it still only forces 3% of the time. This adds
metaslab_force_ganging_pct so we can configure how often to force
ganging.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15088
2023-07-21 11:52:32 -07:00
Alexander Motin 34b3d498a9 Adjust prefetch parameters.
- Reduce maximum prefetch distance for 32bit platforms to 8MB as it
was previously.  Those systems didn't grow much probably, so better
stay conservative there.
 - Retire array_rd_sz tunable, blocking prefetch for large requests.
We should not penalize applications trying to be more efficient. The
speculative prefetcher by itself has reasonable distance limits, and
1MB is not much at all these days.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15072
2023-07-21 11:51:47 -07:00
Alexander Motin 28430b51e3 Add explicit prefetches to bpobj_iterate().
To simplify error handling bpobj_iterate_blkptrs() iterates through
the list of block pointers backwards.  Unfortunately speculative
prefetcher is currently unable to detect such patterns, that makes
each block read there synchronous and very slow on HDD pools.

According to my tests, added explicit prefetch reduces time needed
to asynchronously delete 8 snapshots of 4 million blocks each from
20 seconds to less than one, that should free sync thread for other
useful work, such as async writes, scrub, etc.

While there, plug one memory leak in case of bpobj_open() error and
harmonize some variable names.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15071
2023-07-21 11:50:48 -07:00
Alan Somers 6fd87e1d8d Don't emit cksum_{actual_expected} in ereport.fs.zfs.checksum events
With anything but fletcher-4, even a tiny change in the input will cause
the checksum value to change completely.  So knowing the actual and
expected checksums doesn't provide much more information than "they
don't match".  The harm in sending them is simply that they bloat the
event.  In particular, on FreeBSD the event must fit into a 1016 byte
buffer.

Fixes #14717 for mirrored pools.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #14717
Closes #15052
2023-07-21 11:49:26 -07:00
Alan Somers cf2a225b24 Don't emit checksum histograms in ereport.fs.zfs.checksum events
The checksum histograms were intended to be used with ATA and parallel
SCSI, which are obsolete.  With modern storage hardware, they will
almost always look like white noise; all bits will be wrong.  They only
serve to bloat the event.  That's a particular problem on FreeBSD, where
events must fit into a 1016 byte buffer.

This fixes issue #14717 for RAIDZ pools, but not for mirror pools.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #15052
2023-07-21 11:48:17 -07:00
Tony Hutter ab0b0393cb zed: Fix zed ASSERT on slot power cycle
We would see zed assert on one of our systems if we powered off a
slot.  Further examination showed zfs_retire_recv() was reporting
a GUID of 0, which in turn would return a NULL nvlist.  Add
in a check for a zero GUID.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15084
2023-07-21 11:46:58 -07:00
Chunwei Chen 2d8a2b51dc Fix zpl_test_super race with zfs_umount
We cannot call zpl_enter in zpl_test_super, because zpl_test_super is
under spinlock so we can't sleep, and also because zpl_test_super is
called without sb->s_umount taken, so it's possible we would race with
zfs_umount and call zpl_enter on freed zfsvfs.

Here's an stack trace when this happens:
[ 2379.114837] VERIFY(cvp->cv_magic == CV_MAGIC) failed
[ 2379.114845] PANIC at spl-condvar.c:497:__cv_broadcast()
[ 2379.114854] Kernel panic - not syncing: VERIFY(cvp->cv_magic == CV_MAGIC) failed
[ 2379.115012] Call Trace:
[ 2379.115019]  dump_stack+0x74/0x96
[ 2379.115024]  panic+0x114/0x2f6
[ 2379.115035]  spl_panic+0xcf/0xfc [spl]
[ 2379.115477]  __cv_broadcast+0x68/0xa0 [spl]
[ 2379.115585]  rrw_exit+0xb8/0x310 [zfs]
[ 2379.115696]  rrm_exit+0x4a/0x80 [zfs]
[ 2379.115808]  zpl_test_super+0xa9/0xd0 [zfs]
[ 2379.115920]  sget+0xd1/0x230
[ 2379.116033]  zpl_mount+0xdc/0x230 [zfs]
[ 2379.116037]  legacy_get_tree+0x28/0x50
[ 2379.116039]  vfs_get_tree+0x27/0xc0
[ 2379.116045]  path_mount+0x2fe/0xa70
[ 2379.116048]  do_mount+0x80/0xa0
[ 2379.116050]  __x64_sys_mount+0x8b/0xe0
[ 2379.116052]  do_syscall_64+0x35/0x50
[ 2379.116054]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 2379.116057] RIP: 0033:0x7f9912e8b26a

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #15077
2023-07-20 10:30:21 -07:00
Ameer Hamza d9bb583c25 spa_min_alloc should be GCD, not min
Since spa_min_alloc may not be a power of 2, unlike ashifts, in the
case of DRAID, we should not select the minimal value among several
vdevs. Rounding to a multiple of it is unlikely to work for other
vdevs. Instead, using the greatest common divisor produces smaller
yet more reasonable results.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15067
2023-07-20 10:23:52 -07:00
Yuri Pankov 929173ab42 Don't panic if setting vdev properties is unsupported for this vdev type
Check that vdev has valid zap and bail out early.

While here, move objid selection out of the loop, it's not going to
change.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Yuri Pankov <yuripv@FreeBSD.org>
Closes #15063
2023-07-20 10:21:47 -07:00
Ameer Hamza 4d2dad04aa Ignore pool ashift property during vdev attachment
Ashift can be set for a vdev only during its creation, and the
top-level vdev does not change when a vdev is attached or replaced.
The ashift property should not be used during attachment, as it
does not allow attaching/replacing a vdev if the pool's ashift
property is increased after the existing vdev was created. Instead,
we should be able to attach the vdev if the attached vdev can
satisfy the ashift requirement with its parent.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15061
2023-07-20 09:57:16 -07:00
Wojciech Małota-Wójcik e6ea31de9f Rollback before zfs root is mounted
On my machines I observe random failures caused by rollback happening 
after zfs root is mounted. I've observed two types of failures:

- zfs-rollback-bootfs.service fails saying that rollback must be
  done just before mounting the dataset
- boot process fails and rescue console is entered.

After making this modification and testing it for couple of days 
none of those problems have been observed anymore.

I don't know if `dracut-mount.service` is still needed in the 
`After` directive. Maybe someone else is able to address this?

Reviewed-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Signed-off-by: Wojciech Małota-Wójcik <59281144+outofforest@users.noreply.github.com>
Closes #15025
2023-07-20 09:55:22 -07:00
Alexander Motin 7d0df5422c Do not request data L1 buffers on scan prefetch.
Set ARC_FLAG_NO_BUF when prefetching data L1 buffers for scan.  We
do not prefetch data L0 buffers, so we do not need the L1 buffers,
only want them to be ready in ARC. This saves some CPU time on the
buffers decompression.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15029
2023-07-20 09:10:04 -07:00
Coleman Kane 74f8ce4ca5 Linux 6.5 compat: disk_check_media_change() was added
The disk_check_media_change() function was added which replaces
bdev_check_media_change.  This change was introduced in 6.5rc1
444aa2c58cb3b6cfe3b7cc7db6c294d73393a894 and the new function takes a
gendisk* as its argument, no longer a block_device*. Thus, bdev->bd_disk
is now used to pass the expected data.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15060
2023-07-20 09:09:25 -07:00
Yuri Pankov 8beabfd3bf set autotrim default to 'off' everywhere
As it turns out having autotrim default to 'on' on FreeBSD never really
worked due to mess with defines where userland and kernel module were
getting different default values (userland was defaulting to 'off',
module was thinking it's 'on').

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Yuri Pankov <yuripv@FreeBSD.org>
Closes #15079
2023-07-20 09:06:55 -07:00
Coleman Kane d3d63cac4d Linux 6.5 compat: BLK_STS_NEXUS renamed to BLK_STS_RESV_CONFLICT
This change was introduced in Linux commit
7ba150834b840f6f5cdd07ca69a4ccf39df59a66

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15059
2023-07-14 16:33:51 -07:00
Coleman Kane 3a3e0d6fbc intptr_t definition is canonically signed
Make the version here match that elsewhere in the kernel and system
headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15058
2023-07-14 16:32:49 -07:00
Alexander Motin c4e8742149 Fix raw receive with different indirect block size.
Unlike regular receive, raw receive require destination to have the
same block structure as the source.  In case of dnode reclaim this
triggers two special cases, requiring special handling:
 - If dn_nlevels == 1, we can change the ibs, but dnode_set_blksz()
should not dirty the data buffer if block size does not change, or
durign receive dbuf_dirty_lightweight() will trigger assertion.
 - If dn_nlevels > 1, we just can't change the ibs, dnode_set_blksz()
would fail and receive_object would trigger assertion, so we should
destroy and recreate the dnode from scratch.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15039
2023-07-14 16:16:40 -07:00
Alan Somers 67c5e1ba4f Fix the ZFS checksum error histograms with larger record sizes
My analysis in PR #14716 was incorrect.  Each histogram bucket contains
the number of incorrect bits, by position in a 64-bit word, over the
entire record.  8-bit buckets can overflow for record sizes above 2k.
To forestall that, saturate each bucket at 255.  That should still get
the point across: either all bits are equally wrong, or just a couple
are.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #15049
2023-07-14 16:13:15 -07:00
Alexander Motin fdba8cbb79 Avoid extra snprintf() in dsl_deadlist_merge().
Since we are already iterating the ZAP, we have exact string key to
remove, we do not need to call zap_remove_int() with the int key we
just converted, we can call zap_remove() for the original string.

This should make no functional change, only a micro-optimization.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15056
2023-07-14 16:11:46 -07:00
Alexander Motin 6db781d52c Add missed DMU_PROJECTUSED_OBJECT prefetch.
It seems 9c5167d19f "Project Quota on ZFS" missed to add prefetch
for DMU_PROJECTUSED_OBJECT during scan (scrub/resilver).  It should
not cause visible problems, but may affect scub/resilver performance.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15024
2023-07-13 09:12:55 -07:00
Mateusz Guzik 6c9aa1d2a6 FreeBSD: catch up to __FreeBSD_version 1400093
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #15036
2023-07-13 09:06:57 -07:00
Umer Saleem 58f4a094b4 Update changelog for 2.2
Add a new changelog entry for native packages to reflect version
2.2.99.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15054
2023-07-13 08:55:12 -07:00
Alexander Motin 736d5962b4 FreeBSD: Fix build on stable/13 after 1302506.
Starting approximately from version 1302506 vn_lock_pair() grown two
additional arguments following head.  There is a one week hole, but
that is closet reference point we have.

Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #15047
2023-07-13 08:50:34 -07:00
Brian Behlendorf ca960ce56c Update META
Increase the version to 2.2.99 to indicate the master branch is
newer than the 2.2.x release.  This ensures packages built from
master branch are considered to be newer than the last release.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-06-30 13:32:18 -07:00
Brian Behlendorf 009d3288de Tag 2.2.0-rc1
New features:
- Fully adaptive ARC eviction (#14359)
- Block cloning (#13392)
- Scrub error log (#12812, #12355)
- Linux container support (#14070, #14097, #12263)
- BLAKE3 Checksums (#12918)
- Corrective "zfs receive" (#9372)

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-06-30 12:07:09 -07:00
Prakash Surya 945e39fc3a Enable tuning of ZVOL open timeout value
The default timeout for ZVOL opens may not be sufficient for all cases,
so we should enable the value to be more easily tuned to account for
systems where the default value is insufficient.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #15023
2023-06-30 11:34:05 -07:00
Brian Behlendorf ac8ae18d22 Revert "spa.h: use IN_BASE instead of IN_FREEBSD_BASE"
This reverts commit 77a3bb1f47.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-06-30 10:03:46 -07:00
Rich Ercolani 2b10e32561 Pack our DDT ZAPs a bit denser.
The DDT is really inefficient on 4k and up vdevs, because it always
allocates 4k blocks, and while compression could save us somewhat
at ashift 9, that stops being true.

So let's change the default to 32 KiB, which seems like a reasonable
compromise between improved space savings and inflated write sizes
for DDT updates.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14654
2023-06-30 09:42:02 -07:00
Rob N 61ab05cac7 ddt_addref: remove unnecessary phys fill when refcount is 0
The previous comment wondered if this case could happen; it turns out
that it really can't.

This block can only be entered if dde_type and dde_class are "real";
that only happens when a ddt entry has been previously synced to a ddt
store, that is, it was created on a previous txg. Since its gone through
that sync, its dde_refcount must be >0.

ddt_addref() is called from brt_pending_apply(), which is called at the
beginning of spa_sync(), before pending DMU writes/frees are issued.
Freeing a dedup block is the only thing that can decrement dde_refcount,
so there's no way for it to drop to zero before applying the clone bumps
it.

Further, even if it _could_ go to zero, it wouldn't be necessary to fill
the entry from the block. The phys content is not cleared until the free
is issued, which happens when the refcount goes to zero, when the last
real free comes through. The cloned block should be identical to what's
in the phys already, so the fill should be a no-op anyway.

I've replaced this with an assertion because this is all very dependent
on the ordering in which BRT and DDT changes are applied, and that might
change in the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: Klara, Inc.
Closes #15004
2023-06-30 09:01:58 -07:00
Alexander Motin 233425a153 Again fix race between zil_commit() and zil_suspend().
With zl_suspend read in zil_commit() not protected by any locks it
is possible for new ZIL writes to be in progress while zil_destroy()
called by zil_suspend() freeing them.  This patch closes the race
by taking zl_issuer_lock in zil_suspend() and adding the second
zl_suspend check to zil_get_commit_list(), protected by the lock.
It allows all already queued transactions to be logged normally,
while blocks any new ones, calling txg_wait_synced() for the TXGs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14979
2023-06-30 08:59:39 -07:00
Alexander Motin b4a0873092 Some ZIO micro-optimizations.
- Pack struct zio_prop by 4 bytes from 84 to 80.
 - Skip new child ZIO locking while linking to parent.  The newly
allocated ZIO is not externally visible yet, so nobody should care.
 - Skip io_bp_copy writes when not used (write && non-debug).

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14985
2023-06-30 08:54:00 -07:00
Alexander Motin fa7b2390d4 Do not report bytes skipped by scan as issued.
Scan process may skip blocks based on their birth time, DVA, etc.
Traditionally those blocks were accounted as issued, that caused
reporting of hugely over-inflated numbers, having nothing to do
with actual disk I/O.  This change utilizes never used field in
struct dsl_scan_phys to account such skipped bytes, allowing to
report how much data were actually scrubbed/resilvered and what
is the actual I/O speed.  While formally it is an on-disk format
change, it should be compatible both ways, so should not need a
feature flag.

This should partially address the same issue as c85ac731a0, but
from a different perspective, complementing it.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #15007
2023-06-30 08:47:13 -07:00
Arshad Hussain 6052060c13 Don't use hard-coded 'size' value in snprintf()
This patch changes the passing of "size" to snprintf
from hard-coded (openended) to sizeof(errbuf). This
is bringing to standard with rest of the code where-
ever 'errbuf' is used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #15003
2023-06-30 08:37:26 -07:00
Alexander Motin eda32dca92 Fix remount when setting multiple properties.
The previous code was checking zfs_is_namespace_prop() only for the
last property on the list.  If one was not "namespace", then remount
wasn't called.  To fix that move zfs_is_namespace_prop() inside the
loop and remount if at least one of properties was "namespace".

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15000
2023-06-30 08:36:43 -07:00
vimproved 24554082bd contrib: dracut: Conditionalize copying of libgcc_s.so.1 to glibc only
The issue that this is designed to work around is only applicable to
glibc, since it's caused by glibc's pthread_cancel() implementation
using dlopen on libgcc_s.so.1 (and therefor not triggering dracut to
include it in the initramfs). This commit adds an extra condition to the
workaround that tests for glibc via "ldconfig -p | grep -qF 'libc.so.6'"
(which should only be present on glibc systems).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Violet Purcell <vimproved@inventati.org>
Closes #14992
2023-06-29 12:54:37 -07:00
Yuri Pankov 77a3bb1f47 spa.h: use IN_BASE instead of IN_FREEBSD_BASE
Consistently get the proper default value for autotrim.

Currently, only the kernel module is built with IN_FREEBSD_BASE,
and libzfs get the wrong default value, leading to confusion and
incorrect output when autotrim value was not set explicitly.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Signed-off-by: Yuri Pankov <yuripv@FreeBSD.org>
Closes #15016
2023-06-29 11:50:52 -07:00
Mateusz Piotrowski 62ace21a14 zdb: Add missing poolname to -C synopsis
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Sponsored-by: Klara Inc.
Closes #15014
2023-06-29 10:54:43 -07:00
Alexander Motin a9d6b0690b ZIL: Fix another use-after-free.
lwb->lwb_issued_txg can not be accessed after lwb_state is set to
LWB_STATE_FLUSH_DONE and zl_lock is dropped, since the lwb may be
freed by zil_sync().  We must save the txg number before that.

This is similar to the 55b1842f92, but as I see the bug is not new.
It existed for quite a while, just was not triggered due to smaller
race window.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14988
Closes #14999
2023-06-27 17:03:37 -07:00
Alexander Motin b0cbc1aa9a Use big transactions for small recordsize writes.
When ZFS appends files in chunks bigger than recordsize, it borrows
buffer from ARC and fills it before opening transaction.  This
supposed to help in case of page faults to not hold transaction open
indefinitely.  The problem appears when recordsize is set lower than
default 128KB. Since each block is committed in separate transaction,
per-transaction overhead becomes significant, and what is even worse,
active use of of per-dataset and per-pool locks to protect space use
accounting for each transaction badly hurts the code SMP scalability.
The same transaction size limitation applies in case of file rewrite,
but without even excuse of buffer borrowing.

To address the issue, disable the borrowing mechanism if recordsize
is smaller than default and the write request is 4x bigger than it.
In such case writes up to 32MB are executed in single transaction,
that dramatically reduces overhead and lock contention.  Since the
borrowing mechanism is not used for file rewrites, and it was never
used by zvols, which seem to work fine, I don't think this change
should create significant problems, partially because in addition to
the borrowing mechanism there are also used pre-faults.

My tests with 4/8 threads writing several files same time on datasets
with 32KB recordsize in 1MB requests show reduction of CPU usage by
the user threads by 25-35%.  I would measure it in GB/s, but at that
block size we are now limited by the lock contention of single write
issue taskqueue, which is a separate problem we are going to work on.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14964
2023-06-27 17:00:30 -07:00
Laevos bc9d0084ea Remove unnecessary commas in zpool-create.8
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Laevos <5572812+Laevos@users.noreply.github.com>
Closes #15011
2023-06-27 16:58:32 -07:00
Alexander Motin 8469b5aac0 Another set of vdev queue optimizations.
Switch FIFO queues (SYNC/TRIM) and active queue of vdev queue from
time-sorted AVL-trees to simple lists.  AVL-trees are too expensive
for such a simple task.  To change I/O priority without searching
through the trees, add io_queue_state field to struct zio.

To not check number of queued I/Os for each priority add vq_cqueued
bitmap to struct vdev_queue.  Update it when adding/removing I/Os.
Make vq_cactive a separate array instead of struct vdev_queue_class
member.  Together those allow to avoid lots of cache misses when
looking for work in vdev_queue_class_to_issue().

Introduce deadline of ~0.5s for LBA-sorted queues.  Before this I
saw some I/Os waiting in a queue for up to 8 seconds and possibly
more due to starvation.  With this change I no longer see it.  I
had to slightly more complicate the comparison function, but since
it uses all the same cache lines the difference is minimal.  For a
sequential I/Os the new code in vdev_queue_io_to_issue() actually
often uses more simple avl_first(), falling back to avl_find() and
avl_nearest() only when needed.

Arrange members in struct zio to access only one cache line when
searching through vdev queues.  While there, remove io_alloc_node,
reusing the io_queue_node instead.  Those two are never used same
time.

Remove zfs_vdev_aggregate_trim parameter.  It was disabled for 4
years since implemented, while still wasted time maintaining the
offset-sorted tree of TRIM requests.  Just remove the tree.

Remove locking from txg_all_lists_empty().  It is racy by design,
while 2 pair of locks/unlocks take noticeable time under the vdev
queue lock.

With these changes in my tests with volblocksize=4KB I measure vdev
queue lock spin time reduction by 50% on read and 75% on write.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14925
2023-06-27 09:09:48 -07:00
Rich Ercolani 35a6247c5f Add a delay to tearing down threads.
It's been observed that in certain workloads (zvol-related being a
big one), ZFS will end up spending a large amount of time spinning
up taskqs only to tear them down again almost immediately, then
spin them up again...

I noticed this when I looked at what my mostly-idle system was doing
and wondered how on earth taskq creation/destroy was a bunch of time...

So I added a configurable delay to avoid it tearing down tasks the
first time it notices them idle, and the total number of threads at
steady state went up, but the amount of time being burned just
tearing down/turning up new ones almost vanished.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14938
2023-06-26 13:57:12 -07:00
Alexander Motin 8e8acabdca Fix memory leak in zil_parse().
482da24e2 missed arc_buf_destroy() calls on log parse errors, possibly
leaking up to 128KB of memory per dataset during ZIL replay.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14987
2023-06-17 19:51:37 -07:00
George Amanakis 10e36e1761 Shorten arcstat_quiescence sleep time
With the latest L2ARC fixes, 2 seconds is too long to wait for
quiescence of arcstats like l2_size. Shorten this interval to avoid
having the persistent L2ARC tests in ZTS prematurely terminated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14981
2023-06-15 12:45:36 -07:00
Alexander Motin ccec7fbe1c Remove ARC/ZIO physdone callbacks.
Those callbacks were introduced many years ago as part of a bigger
patch to smoothen the write throttling within a txg. They allow to
account completion of individual physical writes within a logical
one, improving cases when some of physical writes complete much
sooner than others, gradually opening the write throttle.

Few years after that ZFS got allocation throttling, working on a
level of logical writes and limiting number of writes queued to
vdevs at any point, and so limiting latency distribution between
the physical writes and especially writes of multiple copies.
The addition of scheduling deadline I proposed in #14925 should
further reduce the latency distribution.  Grown memory sizes over
the past 10 years should also reduce importance of the smoothing.

While the use of physdone callback may still in theory provide
some smoother throttling, there are cases where we simply can not
afford it.  Since dirty data accounting is protected by pool-wide
lock, in case of 6-wide RAIDZ, for example, it requires us to take
it 8 times per logical block write, creating huge lock contention.

My tests of this patch show radical reduction of the lock spinning
time on workloads when smaller blocks are written to RAIDZ pools,
when each of the disks receives 8-16KB chunks, but the total rate
reaching 100K+ blocks per second.  Same time attempts to measure
any write time fluctuations didn't show anything noticeable.

While there, remove also io_child_count/io_parent_count counters.
They are used only for couple assertions that can be avoided.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14948
2023-06-15 10:49:03 -07:00
Brian Behlendorf e32e326c5b ZTS: Skip send_raw_ashift on FreeBSD
On FreeBSD 14 this test runs slowly in the CI environment
and is killed by the 10 minute timeout.  Skip the test on
FreeBSD until the slow down is resolved.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14961
2023-06-14 08:04:05 -07:00
Alexander Motin d057807ede Switch refcount tracking from lists to AVL-trees.
With large number of tracked references list searches under the lock
become too expensive, creating enormous lock contention.

On my tests with ZFS_DEBUG enabled this increases write throughput
with 32KB blocks from ~1.2GB/s to ~7.5GB/s.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14970
2023-06-14 08:02:27 -07:00
George Amanakis 8af1104f83 Store the L2ARC device ashift in the vdev label
If this is not done, and the pool has an ashift other than the default
(at the moment 9) then the following happens:

1) vdev_alloc() assigns the ashift of the pool to L2ARC device, but
   upon export it is not stored anywhere
2) at the first import, vdev_open() sees an vdev_ashift() of 0 and
   assigns the logical_ashift, which is 9
3) reading the contents of L2ARC, including the header fails
4) L2ARC buffers are not restored in ARC.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14313 
Closes #14963
2023-06-14 08:01:17 -07:00
George Amanakis feff9dfed3 Fix the L2ARC write size calculating logic (2)
While commit bcd5321 adjusts the write size based on the size of the log
block, this happens after comparing the unadjusted write size to the
evicted (target) size.

In this case l2ad_hand will exceed l2ad_evict and violate an assertion
at the end of l2arc_write_buffers().

Fix this by adding the max log block size to the allocated size of the
buffer to be committed before comparing the result to the target
size.

Also reset the l2arc_trim_ahead ZFS module variable when the adjusted
write size exceeds the size of the L2ARC device.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14936
Closes #14954
2023-06-09 17:05:47 -07:00
Alexander Motin 70ea484e3e Finally drop long disabled vdev cache.
It was a vdev level read cache, designed to aggregate many small
reads by speculatively issuing bigger reads instead and caching
the result.  But since it has almost no idea about what is going
on with exception of ZIO_FLAG_DONT_CACHE flag set by higher layers,
it was found to make more harm than good, for which reason it was
disabled for the past 12 years.  These days we have much better
instruments to enlarge the I/Os, such as speculative and prescient
prefetches, I/O scheduler, I/O aggregation etc.

Besides just the dead code removal this removes one extra mutex
lock/unlock per write inside vdev_cache_write(), not otherwise
disabled and trying to do some work.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14953
2023-06-09 12:40:55 -07:00
Brian Behlendorf 6db4ed51d6 ZTS: Skip checkpoint_discard_busy
Until the ASSERT which is occasionally hit while running
checkpoint_discard_busy is resolved skip this test case.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12053
Closes #14952
2023-06-09 11:10:01 -07:00
Alexander Motin 90ccfd426d Improve l2arc reporting in arc_summary.
- Do not report L2ARC as FAULTED in presence of in-flight writes.
- Report read and write I/Os, bytes and errors.
- Remove few numbers not important to average user.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #12304 
Closes #14946
2023-06-09 10:14:05 -07:00
Alexander Motin b3ad3f48d9 Use list_remove_head() where possible.
... instead of list_head() + list_remove().  On FreeBSD the list
functions are not inlined, so in addition to more compact code
this also saves another function call.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14955
2023-06-09 10:12:52 -07:00
Alexander Motin 55b1842f92 ZIL: Fix race introduced by f63811f072.
We are not allowed to access lwb after setting LWB_STATE_FLUSH_DONE
state and dropping zl_lock, since it may be freed by zil_sync().
To free itxs and waiters after dropping the lock we need to move
lwb_itxs and lwb_waiters lists elements to local storage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14957
Closes #14959
2023-06-09 10:08:05 -07:00
Rich Ercolani 6c96269024 Revert "systemd: Use non-absolute paths in Exec* lines"
This reverts commit 79b20949b2 since it
doesn't work with the systemd version shipped with RHEL7-based systems.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14943
Closes #14945
2023-06-07 11:14:05 -07:00
Brian Behlendorf 93f8abeff0 Linux: Never sleep in kmem_cache_alloc(..., KM_NOSLEEP) (#14926)
When a kmem cache is exhausted and needs to be expanded a new
slab is allocated.  KM_SLEEP callers can block and wait for the
allocation, but KM_NOSLEEP callers were incorrectly allowed to
block as well.

Resolve this by attempting an emergency allocation as a best
effort.  This may fail but that's fine since any KM_NOSLEEP
consumer is required to handle an allocation failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2023-06-07 10:43:43 -07:00
George Amanakis bcd5321039 Fix the L2ARC write size calculating logic
l2arc_write_size() should return the write size after adjusting for trim
and overhead of the L2ARC log blocks. Also take into account the
allocated size of log blocks when deciding when to stop writing buffers
to L2ARC.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14939
2023-06-06 12:32:37 -07:00
Rob Norris 8653f1de48 zdb: add -B option to generate backup stream
This is more-or-less like `zfs send`, but specifying the snapshot by its
objset id for situations where it can't be referenced any other way.

Sponsored-By: Klara, Inc.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: WHR <msl0000023508@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #14642
2023-06-05 11:54:42 -07:00
Rob Norris 2b9f8ba673 znode: expose zfs_get_zplprop to libzpool
There's no particular reason this function should be kernel-only, and I
want to use it (indirectly) from zdb. I've moved it to zfs_znode.c
because libzpool does not compile in zfs_vfsops.c, and this at least
matches the header its imported from.

Sponsored-By: Klara, Inc.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: WHR <msl0000023508@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #14642
2023-06-05 11:53:44 -07:00
Alexander Motin 5ba4025a8d Introduce zfs_refcount_(add|remove)_few().
There are two places where we need to add/remove several references
with semantics of zfs_refcount_(add|remove). But when debug/tracing
is disabled, it is a crime to run multiple atomic_inc() in a loop,
especially under congested pool-wide allocator lock.

Introduced new functions implement the same semantics as the loop,
but without overhead in production builds.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14934
2023-06-05 11:51:44 -07:00
Brian Behlendorf dae3c549f5 Linux 6.3 compat: META (#14930)
Update the META file to reflect compatibility with the 6.3 kernel.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2023-06-05 11:08:24 -07:00
Graham Perrin 6c29422e90 zfs-create(8): ZFS for swap: caution, clarity
Make the section heading more generic (the section relates to ZFS files
as well as ZFS volumes).

Swapping to a ZFS volume is prone to deadlock. Remove the related
instruction, direct readers to OpenZFS FAQ. Related, but not linked
from within the manual page:

<https://openzfs.github.io/openzfs-docs/Project%20and%20Community/FAQ.html#using-a-zvol-for-a-swap-device-on-linux>
(Using a zvol for a swap device on Linux).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Graham Perrin <grahamperrin@freebsd.org>
Issue #7734 
Closes #14756
2023-06-02 11:25:13 -07:00
Alexander Motin 482da24e20 ZIL: Allow to replay blocks of any size.
There seems to be no reason for ZIL blocks to be limited by 128KB
other than replay code is written in such a way.  This change does
not increase the limit yet, just removes the artificial limitation.

Avoided extra memcpy() may save us a second during replay.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14910
2023-06-02 11:01:58 -07:00
Val Packett 4f583a827e PAM: enable testing on FreeBSD
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834
2023-05-31 17:01:16 -07:00
Val Packett db994458bb PAM: support password changes even when not mounted
There's usually no requirement that a user be logged in for changing
their password, so let's not be surprising here.

We need to use the fetch_lazy mechanism for the old password to avoid
a double prompt for it, so that mechanism is now generalized a bit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834
2023-05-31 17:01:11 -07:00
Val Packett e3ba6b93de PAM: add 'uid_min' and 'uid_max' options for changing the uid range
Instead of a fixed >=1000 check, allow the configuration to override
the minimum UID and add a maximum one as well. While here, add the
uid range check to the authenticate method as well, and fix the return
in the chauthtok method (seems very wrong to report success when we've
done absolutely nothing).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834
2023-05-31 17:01:07 -07:00
Val Packett f2f3ec17ed PAM: add 'forceunmount' flag
Probably not always a good idea, but it's nice to have the option.
It is a workaround for FreeBSD calling the PAM session end earier than
the last process is actually done touching the mount, for example.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834
2023-05-31 17:01:02 -07:00
Val Packett 850bccd3bc PAM: add 'recursive_homes' flag to use with 'prop_mountpoint'
It's not always desirable to have a fixed flat homes directory.
With the 'recursive_homes' flag, 'prop_mountpoint' search would
traverse the whole tree starting at 'homes' (which can now be '*'
to mean all pools) to find a dataset with a mountpoint matching
the home directory.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834
2023-05-31 17:00:58 -07:00
Val Packett bd4962b5ac PAM: use boolean_t for config flags
Since we already use boolean_t in the file, we can use it here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834
2023-05-31 17:00:53 -07:00
Val Packett c47b708647 PAM: do not fail to mount if the key's already loaded
If we're expecting a working home directory on login, it would be
rather frustrating to not have it mounted just because it e.g. failed to
unmount once on logout.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834
2023-05-31 17:00:15 -07:00
Rich Ercolani 2810dda80b Revert "initramfs: use mount.zfs instead of mount"
This broke mounting of snapshots on / for users.

See https://github.com/openzfs/zfs/issues/9461#issuecomment-1376162949 for more context.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14908
2023-05-31 16:58:41 -07:00
Luís Henriques 928c81f4df Fix NULL pointer dereference when doing concurrent 'send' operations
A NULL pointer will occur when doing a 'zfs send -S' on a dataset that
is still being received.  The problem is that the new 'send' will
rightfully fail to own the datasets (i.e. dsl_dataset_own_force() will
fail), but then dmu_send() will still do the dsl_dataset_disown().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Luís Henriques <henrix@camandro.org>
Closes #14903 
Closes #14890
2023-05-30 15:15:24 -07:00
Brian Behlendorf e085e98d54 ZTS: zvol_misc_trim disable blk mq
Disable the zvol_misc_fua.ksh and zvol_misc_trim.ksh test cases on impacted
kernels.  This issue is being actively worked in #14872 and as part of that
fix this commit will be reverted.

    VERIFY(zh->zh_claim_txg == 0) failed
    PANIC at zil.c:904:zil_create()

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14872
Closes #14870
2023-05-29 12:55:35 -07:00
Richard Yao 0f03a41161 Use __attribute__((malloc)) on memory allocation functions
This informs the C compiler that pointers returned from these functions
do not alias other functions, which allows it to do better code
optimization and should make the compiled code smaller.

References:
https://stackoverflow.com/a/53654773
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-malloc-function-attribute
https://clang.llvm.org/docs/AttributeReference.html#malloc

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14827
2023-05-26 15:47:52 -07:00
Brian Behlendorf 20494d47d2 ZTS: Add zpool_resilver_concurrent exception
The zpool_resilver_concurrent test case requires the ZED which is not used
on FreeBSD.  Add this test to the known list of skipped tested for FreeBSD.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14904
2023-05-26 15:39:23 -07:00
Mike Swanson 365bae0eab Add compatibility symlinks for FreeBSD 12.{3,4} and 13.{0,1,2}
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com>
Closes #14902
2023-05-26 15:37:15 -07:00
Colm d3e0138a3d Adding new read-only compatible zpool features to compatibility.d/grub2
GRUB2 is compatible with all "read-only compatible" features,
so it is safe to add new features of this type to the grub2
compatibility list. We generally want to include all compatible
features, to minimize the differences between grub2-compatible
pools and no-compatibility pools.

Adding new properties `livelist` and `zpool_checkpoint` accordingly.

Also adding them to the man page which references this file as an
example, for consistency.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #14893
2023-05-26 10:04:19 -07:00
Richard Yao 677c6f8457 btree: Implement faster binary search algorithm
This implements a binary search algorithm for B-Trees that reduces
branching to the absolute minimum necessary for a binary search
algorithm. It also enables the compiler to inline the comparator to
ensure that the only slowdown when doing binary search is from waiting
for memory accesses. Additionally, it instructs the compiler to unroll
the loop, which gives an additional 40% improve with Clang and 8%
improvement with GCC.

Consumers must opt into using the faster algorithm. At present, only
B-Trees used inside kernel code have been modified to use the faster
algorithm.

Micro-benchmarks suggest that this can improve binary search performance
by up to 3.5 times when compiling with Clang 16 and up to 1.9 times when
compiling with GCC 12.2.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14866
2023-05-26 10:03:12 -07:00
George Amanakis bb736d98d1 Fix inconsistent definition of zfs_scrub_error_blocks_per_txg
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14894
2023-05-26 09:53:00 -07:00
Damiano Albani ff03dfd4d8 Add missing files to Debian DKMS package
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Damiano Albani <damiano.albani@gmail.com>
Closes #14887
Closes #14889
2023-05-25 16:10:54 -07:00
Brian Behlendorf 91a2325c4a Update compatibility.d files
Add an openzfs-2.2 compatibility file for the next release.

Edon-R support has been enabled for FreeBSD removing the need
for different FreeBSD and Linux files.  Symlinks for the -linux
and -freebsd names are created for any scripts expecting that
convention.

Additionally, a symlink for ubunutu-22.04 was added.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14833
2023-05-25 13:53:08 -07:00
Alexander Motin b6fbe61fa6 zil: Add some more statistics.
In addition to a number of actual log bytes written, account also a
total written bytes including padding and total allocated bytes (bytes
<= write <= alloc).  It should allow to monitor zil traffic and space
efficiency.

Add dtrace probe for zil block size selection.

Make zilstat report more information and fit it into less width.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #14863
2023-05-25 13:51:53 -07:00
Alexander Motin f63811f072 ZIL: Reduce scope of per-dataset zl_issuer_lock.
Before this change ZIL copied all log data while holding the lock.
It caused huge lock contention on workloads with many big parallel
writes.  This change splits the process into two parts: first,
zil_lwb_assign() estimates the log space needed for all transactions,
and zil_lwb_write_close() allocates blocks and zios while holding the
lock, then, after the lock in dropped, zil_lwb_commit() copies the
data, and zil_lwb_write_issue() issues the I/Os.

Also while there slightly reduce scope of zl_lock.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #14841
2023-05-25 09:48:43 -07:00
Dimitri John Ledkov 79b20949b2 systemd: Use non-absolute paths in Exec* lines
Since systemd v239, Exec* binaries are resolved from PATH when they
are not-absolute. Switch to this by default for ease of downstream
maintenance. Many downstream distributions move individual binaries
to locations that existing compile-time configurations cannot
accommodate.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Closes #14880
2023-05-24 12:31:28 -07:00
Akash B 9d618615d1 Fix concurrent resilvers initiated at same time
For draid vdevs it was possible to initiate both the
sequential and healing resilver at same time.

This fixes the following two scenarios.
     1) There's a window where a sequential rebuild can
be started via ZED even if a healing resilver has been
scheduled.
	- This is fixed by adding additional check in
spa_vdev_attach() for any scheduled resilver and return
appropriate error code when a resilver is already in
progress.

     2) It was possible for zpool clear to start a healing
resilver when it wasn't needed at all. This occurs because
during a vdev_open() the device is presumed to be healthy not
until the device is validated by vdev_validate() and it's set
unavailable. However, by this point an async resilver will
have already been requested if the DTL isn't empty.
	- This is fixed by cancelling the SPA_ASYNC_RESILVER
request immediately at the end of vdev_reopen() when a resilver
is unneeded.

Finally, added a testcase in ZTS for verification.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #14881
Closes #14892
2023-05-24 12:28:09 -07:00
youzhongyang f8447cf22e Linux 6.4 compat: reclaimed_slab renamed to reclaimed
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14891
2023-05-24 12:23:42 -07:00
Brian Atkinson ad0a554614 Hold db_mtx when updating db_state
Commit 555ef90 did some general code refactoring for
dmu_buf_will_not_fill() and dmu_buf_will_fill(). However, the db_mtx was
not held when update db->db_state in those code block. The rest of the
dbuf code always holds the db_mtx when updating db_state. This is
important because cv_wait() db_changed is used to check for db_state
changes.

Updating dmu_buf_will_not_fill() and dmu_buf_will_fill() to hold the
db_mtx when updating db_state.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #14875
2023-05-19 13:05:53 -07:00
Brian Behlendorf 577e835f30 Probe vdevs before marking removed
Before allowing the ZED to mark a vdev as REMOVED due to a
hotplug event confirm that it is non-responsive with probe.
Any device which can be successfully probed should be left
ONLINE to prevent a healthy pool from being incorrectly
SUSPENDED.  This may occur for at least the following two
scenarios.

1) Drive expansion (zpool online -e) in VMware environments.
   If, during the partition resize operation, a partition is
   removed and re-created then udev will send a removed event.

2) Re-scanning the namespaces of an NVMe device (nvme ns-rescan)
   may result in a udev remove and add event being delivered.

Finally, update the ZED to only kick in a spare when the
removal was successful.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14859
Closes #14861
2023-05-19 13:05:09 -07:00
George Amanakis 482eeef804 Teach zpool scrub to scrub only blocks in error log
Added a flag '-e' in zpool scrub to scrub only blocks in error log. A
user can pause, resume and cancel the error scrub by passing additional
command line arguments -p -s just like a regular scrub. This involves
adding a new flag, creating new libzfs interfaces, a new ioctl, and the
actual iteration and read-issuing logic. Error scrubbing is executed in
multiple txg to make sure pool performance is not affected.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Co-authored-by: TulsiJain tulsi.jain@delphix.com
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #8995
Closes #12355
2023-05-18 11:59:42 -07:00
Brian Behlendorf e34e15ed6d Add the ability to uninitialize
zpool initialize functions well for touching every free byte...once.
But if we want to do it again, we're currently out of luck.

So let's add zpool initialize -u to clear it.

Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12451 
Closes #14873
2023-05-18 10:02:20 -07:00
Antonio Russo e0d5007bcf test-runner: pass kmemleak and kmsg to Cmd.run
test-runner.py orchestrates all of the ZTS executions. The `Cmd` object
manages these process, and its `run` method specifically invokes these
possibly long-running processes, possibly retrying in the event of a
timeout. Since its inception, memory leak detection using the kmemleak
infrastructure [1], and kernel logging [2] have been added to this run
mechanism.

However, the callback to cull a process beyond its timeout threshold,
`kill_cmd`, has evaded modernization by both of these changes. As a
result, this function fails to properly invoke `run`, leading to an
untrapped exception and unreported test failure.

This patch extends `kill_cmd` to receive these kernel devices through
the `options` parameter, and regularizes all the `.run` calls from
`Cmd`, and its subclasses, to accept that parameter.

[1] Commit a69765ea5b
[2] Commit fc2c0256c5

Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #14849
2023-05-15 16:11:33 -07:00
Richard Yao ee7b71dbc9 Fix undefined behavior in spa_sync_props()
8eae2d214c caused Coverity to begin
complaining about "Improper use of negative value" in two places in
spa_sync_props() because Coverity correctly inferred from `prop ==
ZPOOL_PROP_INVAL` that prop could be -1 while both zpool_prop_to_name()
and zpool_prop_get_type() use it an array index, which is undefined
behavior.

Assuming that the system does not panic from an attempt to read invalid
memory, the case statement for ZPOOL_PROP_INVAL will ensure that only
user properties will reach this code when prop is ZPOOL_PROP_INVAL, such
that execution will continue safely. However, if we are unlucky enough
to read invalid memory, then the system will panic.

This issue predates the patch that caused coverity to begin complaining.
Thankfully, our userland tools do not pass nonsense to us, so this bug
should not be triggered unless a future userland tool attempts to set a
property that we do not understand.

Reported-by: Coverity (CID-1561129)
Reported-by: Coverity (CID-1561130)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14860
2023-05-15 10:29:05 -07:00
Richard Yao c87798d8ff Fix use after free regression in spa_remove_healed_errors()
6839ec6f10 placed code in
spa_remove_healed_errors() that uses a pointer after the kmem_free()
call that frees it.

Reported-by: Coverity (CID-1562375)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14860
2023-05-15 10:29:01 -07:00
Alexander Motin 7381ddf1ab zil: Free lwb_buf after write completion.
There is no sense to keep that memory allocated during the flush.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #14855
2023-05-12 09:49:26 -07:00
Alexander Motin 895e03135e zil: Some micro-optimizations.
Should not cause functional changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #14854
2023-05-12 09:14:29 -07:00
Don Brady da211a4a33 Refine special_small_blocks property validation
When the special_small_blocks property is being set during a pool 
create it enforces a limit of 128KiB even if the pool's record size 
is larger.

If the recordsize property is being set during a pool create, then 
use that value instead of the default SPA_OLD_MAXBLOCKSIZE value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <dev.fs.zfs@gmail.com>
Closes #13815
Closes #14811
2023-05-12 09:12:28 -07:00
Brian Behlendorf 5b3b6e95c0 ZTS: Add auto_replace_001_pos to exceptions
The auto_replace_001_pos test case does not reliably pass on
Fedora 37 and newer.  Until the test case can be updated to make
it reliable add it to the list of "maybe" exceptions on Linux.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14851
Closes #14852
2023-05-12 09:07:58 -07:00
Pawel Jakub Dawidek e610766838 Make sure we are not trying to clone a spill block.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825
2023-05-11 16:07:15 -07:00
Pawel Jakub Dawidek fbbe5e96ef Correct comment.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825
2023-05-11 16:07:11 -07:00
Pawel Jakub Dawidek 9879930f7a Remove badly placed comment.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825
2023-05-11 16:07:07 -07:00
Pawel Jakub Dawidek b6d7370b9d Don't call zfs_exit_two() before zfs_enter_two().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825
2023-05-11 16:07:02 -07:00
Pawel Jakub Dawidek d0d91f185e Don't use dmu_buf_is_dirty() for unassigned transaction.
The dmu_buf_is_dirty() call doesn't make sense here for two reasons:
1. txg is 0 for unassigned tx, so it was a no-op.
2. It is equivalent of checking if we have dirty records and we are doing
   this few lines earlier.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825
2023-05-11 16:06:57 -07:00
Pawel Jakub Dawidek bd8c6bd66f Deny block cloning is dbuf size doesn't match BP size.
I don't know an easy way to shrink down dbuf size, so just deny block cloning
into dbufs that don't match our BP's size.

This fixes the following situation:
1. Create a small file, eg. 1kB of random bytes. Its dbuf will be 1kB.
2. Create a larger file, eg. 2kB of random bytes. Its dbuf will be 2kB.
3. Truncate the large file to 0. Its dbuf will remain 2kB.
4. Clone the small file into the large file. Small file's BP lsize is
   1kB, but the large file's dbuf is 2kB.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825
2023-05-11 16:06:52 -07:00
Pawel Jakub Dawidek 555ef90c5c Additional block cloning fixes.
Reimplement some of the block cloning vs dbuf logic, mostly to fix
situation where we clone a block and in the same transaction group
we want to partially overwrite the clone.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825
2023-05-11 16:06:36 -07:00
Alexander Motin 469019fb0b zil: Don't expect zio_shrink() to succeed.
At least for RAIDZ zio_shrink() does not reduce zio size, but reduced
wsz in that case likely results in writing uninitialized memory.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #14853
2023-05-11 14:27:12 -07:00
Ameer Hamza 14ba8ab97d Prevent panic during concurrent snapshot rollback and zvol read
Protect zvol_cdev_read with zv_suspend_lock to prevent concurrent
release of the dnode, avoiding panic when a snapshot is rolled back
in parallel during ongoing zvol read operation.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14839
2023-05-09 17:56:35 -07:00
Tony Hutter d3db900a4e pam: Fix "buffer overflow" in pam ZTS tests on F38
The pam ZTS tests were reporting a buffer overflow on F38, possibly
due to F38 now setting _FORTIFY_SOURCE=3 by default.  gdb and
valgrind narrowed this down to a snprintf() buffer overflow in
zfs_key_config_modify_session_counter().  I'm not clear why this
particular snprintf() was being flagged as an overflow, but when
I replaced it with an asprintf(), the test passed reliably.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #14802 
Closes #14842
2023-05-09 17:55:19 -07:00
Brian Behlendorf 903c3613d4 Add dmu_tx_hold_append() interface
Provides an interface which callers can use to declare a write when
the exact starting offset in not yet known.  Since the full range
being updated is not available only the first L0 block at the
provided offset will be prefetched.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14819
2023-05-09 09:03:10 -07:00
Brian Behlendorf c8b3dda186 Debug auto_replace_001_pos failures
Reduced the timeout to 60 seconds which should be more than
sufficient and allow the test to be marked as FAILED rather
than KILLED.  Also dump the pool status on cleanup.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14829
2023-05-09 08:57:02 -07:00
George Amanakis d38c815fe2 Remove duplicate code in l2arc_evict()
l2arc_evict() performs the adjustment of the size of buffers to be
written on L2ARC unnecessarily. l2arc_write_size() is called right
before l2arc_evict() and performs those adjustments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14828
2023-05-09 08:54:41 -07:00
Alexander Motin b035f2b2cb Remove single parent assertion from zio_nowait().
We only need to know if ZIO has any parent there.  We do not care if
it has more than one, but use of zio_unique_parent() == NULL asserts
that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14823
2023-05-09 08:54:01 -07:00
George Amanakis 6839ec6f10 Enable the head_errlog feature to remove errors
In case check_filesystem() does not error out and does not report
an error, remove that error block from error lists and logs
without requiring a scrub. This can happen when the original file and
all snapshots/clones referencing it have been removed.

Otherwise zpool status will still report that "Permanent errors have
been detected..." without actually reporting any of them.

To implement this change the functions introduced in corrective
receive were modified to take into account the head_errlog feature.

Before this change:
=============================
pool: test
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
config:

        NAME                   STATE     READ WRITE CKSUM
        test                   ONLINE       0     0     0
          /home/user/vdev_a    ONLINE       0     0     2

errors: Permanent errors have been detected in the following files:

=============================

After this change:
=============================
  pool: test
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are
unaffected.
action: Determine if the device needs to be replaced, and clear the
errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
config:

        NAME                   STATE     READ WRITE CKSUM
        test                   ONLINE       0     0     0
          /home/user/vdev_a    ONLINE       0     0     2

errors: No known data errors
=============================

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14813
2023-05-09 08:53:27 -07:00
George Amanakis 4eca03faaf Fixes in head_errlog feature with encryption
For the head_errlog feature use dsl_dataset_hold_obj_flags() instead of
dsl_dataset_hold_obj() in order to enable access to the encryption keys
(if loaded). This enables reporting of errors in encrypted filesystems
which are not mounted but have their keys loaded.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14837
2023-05-08 13:35:03 -07:00
Matthew Ahrens 3095ca91c2 Verify block pointers before writing them out
If a block pointer is corrupted (but the block containing it checksums
correctly, e.g. due to a bug that overwrites random memory), we can
often detect it before the block is read, with the `zfs_blkptr_verify()`
function, which is used in `arc_read()`, `zio_free()`, etc.

However, such corruption is not typically recoverable.  To recover from
it we would need to detect the memory error before the block pointer is
written to disk.

This PR verifies BP's that are contained in indirect blocks and dnodes
before they are written to disk, in `dbuf_write_ready()`. This way,
we'll get a panic before the on-disk data is corrupted. This will help
us to diagnose what's causing the corruption, as well as being much
easier to recover from.

To minimize performance impact, only checks that can be done without
holding the spa_config_lock are performed.

Additionally, when corruption is detected, the raw words of the block
pointer are logged.  (Note that `dprintf_bp()` is a no-op by default,
but if enabled it is not safe to use with invalid block pointers.)

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #14817
2023-05-08 11:20:23 -07:00
Brian Behlendorf dd19821149 zdb: consistent xattr output
When using zdb to output the value of an xattr only interpret it
as printable characters if the entire byte array is printable.
Additionally, if the --parseable option is set always output the
buffer contents as octal for easy parsing.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14830
2023-05-08 11:17:41 -07:00
Brian Behlendorf 245f4a3467 ZTS: add snapshot/snapshot_002_pos exception
Add snapshot_002_pos to the known list of occasional failures
for FreeBSD until it can be made entirely reliable.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14831
Closes #14832
2023-05-08 10:09:30 -07:00
Alexander Motin 190290a9ac Fix two abd_gang_add_gang() issues.
- There is no reason to assert that added gang is not empty.  It
may be weird to add an empty gang, but it is legal.
 - When moving chain list from the added gang clear its size, or it
will trigger assertion in abd_verify() when that gang is freed.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14816
2023-05-05 09:17:55 -07:00
Pawel Jakub Dawidek 6fa6bb051c Simplify and optimize random_int_between().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14805
2023-05-05 09:09:12 -07:00
Pawel Jakub Dawidek 599df82049 Plug memory leak in zfsdev_state.
On kernel module unload, free all zfsdev state structures, except for
zfsdev_state_listhead, which is statically allocated.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14824
2023-05-05 08:51:41 -07:00
Ameer Hamza 82ac409acc zpool import -m also removing spare and cache when log device is missing
spa_import() relies on a pool config fetched by spa_try_import() for
spare/cache devices. Import flags are not passed to spa_tryimport(),
which makes it return early due to a missing log device and missing
retrieving the cache device and spare eventually. Passing
ZFS_IMPORT_MISSING_LOG to spa_tryimport() makes it fetch the correct
configuration regardless of the missing log device.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14794
2023-05-03 15:10:32 -07:00
buzzingwires a46001adb9 Allow zhack label repair to restore detached devices.
This commit expands on the zhack label repair command in d04b5c9 by
adding the -u option to undetach a device by regenerating uberblocks,
in addition to the existing functionality of fixing checksums, now
represented by -c. Previous behavior is retained in the case of no
options.

The changes are heavily inspired by Jeff Bonwick's labelfix
utility, as archived at:

https://gist.github.com/jjwhitney/baaa63144da89726e482

Additionally, it is now capable of properly determining the size of
block devices and other media, as well as handling sizes which are
not divisible by 2^18. This should make it viable for use on physical
devices and partitions, in addition to files.

These changes should make it possible to import zpools that have had
their uberblocks erased, such as in the case of pools rendered
inaccessible by erroneous detach commands.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: buzzingwires <buzzingwires@outlook.com>
Closes #14773
2023-05-03 09:03:57 -07:00
George Amanakis 9de5300c7f Optimize check_filesystem() and process_error_log()
Integrate check_clones() into check_filesystem() and implement a list
instead of iterating recursively over the clones, thus eliminating the
risk of a stack overflow.

Also use kmem_zalloc() to allocate large structures in
process_error_log() reducing its stack size from ~700 to ~128 bytes.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14744
2023-05-03 09:00:14 -07:00
Pawel Jakub Dawidek d96e29576c Use correct block pointer in block cloning case.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14806
2023-05-02 09:24:26 -07:00
Brian Behlendorf 012829df0c Wrap clang specific pragma
Clang specific pragmas need to be wrapped to prevent a build
warning when compiling with gcc.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14814
2023-05-02 09:21:47 -07:00
Mateusz Guzik e2a92d726e blake3: fix up bogus checksums in face of cpu migration
This is a temporary measure until a better fix is sorted out.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Closes #14785
Closes #14808
2023-05-01 17:21:27 -07:00
Serapheim Dimitropoulos 0c93d86f01 Correct ABD size for split block ZIOs
Currently when layering the ABD buffer of each split block on top of
an indirect vdev's ZIO ABD we don't specify the split block's ABD.
This results in those ABDs being incorrectly sized by inheriting
the size of their parent ABD which is larger than what each split
block needs.

The above behavior isn't causing any bugs currently but can lead
to unexpected ABD sizes for people analyzing and/or working on
the ZIO codepath. This patch fixes this behavior by properly setting
the ABD size for split block ZIOs.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14804
2023-05-01 17:18:42 -07:00
Justin Hibbits 5a83f761c7 powerpc64: Support ELFv2 asm on Big Endian
FreeBSD/powerpc64 is all ELFv2 since FreeBSD 13, even big endian.  The
existing sha256 and sha512 asm code assumes that BE is all ELFv1, and LE
is ELFv2.  Minor changes to add ELFv2 in the BE side gets this working
correctly on FreeBSD with latest OpenZFS import.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Justin Hibbits <chmeeedalf@gmail.com>
Closes #14779
2023-04-27 12:49:21 -07:00
Alexander Motin 2fd1c30423 Mark TX_COMMIT transaction with TXG_NOTHROTTLE.
TX_COMMIT has no on-disk representation and does not produce any more
dirty data.  It should not wait for anything, and even just skipping
the checks if not waiting gives improvement noticeable in profiler.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14798
2023-04-27 12:32:58 -07:00
Val Packett ae0d0f0e04 PAM: support the authentication facility
Implement the pam_sm_authenticate method, using the noop argument of
lzc_load_key to do a passphrase check without actually loading the key.

This allows using ZFS as the source of truth for user passwords,
without storing any password hashes in /etc or using other PAM modules.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14789
2023-04-27 09:49:03 -07:00
Tino Reichardt ee728008a4 Fix BLAKE3 aarch64 assembly for FreeBSD and macOS
The x18 register isn't useable within FreeBSD kernel space, so we
have to fix the BLAKE3 aarch64 assembly for not using it.

The source files are here: https://github.com/mcmilk/BLAKE3-tests

Reviewed-by: Kyle Evans <kevans@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14728
2023-04-26 12:40:26 -07:00
Brian Behlendorf b5411618f7 Fix checkstyle warning
Resolve a missed checkstyle warning.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14799
2023-04-26 11:49:16 -07:00
Alexander Motin bba7cbf0a4 Fix positive ABD size assertion in abd_verify().
Gang ABDs without childred are legal, and they do have zero size.
For other ABD types zero size doesn't have much sense and likely
not working correctly now.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14795
2023-04-26 09:20:43 -07:00
Mateusz Guzik e37a89d5d0 FreeBSD: fix up EINVAL from getdirentries on .zfs
Without the change:
/.zfs
/.zfs/snapshot
find: /.zfs: Invalid argument

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #14774
2023-04-26 09:16:37 -07:00
Mateusz Guzik 88b8777159 FreeBSD: add missing vn state transition for .zfs
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #14774
2023-04-26 09:16:09 -07:00
Rob N b69cb06664 tests/zdb_encrypted: parse numbers a little more robustly
On FreeBSD, `wc` prints some leading spaces, while on Linux it does not.
So we tell ksh to expect an integer, and it does the rest.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14791
Closes #14797
2023-04-26 08:50:44 -07:00
Brian Behlendorf d960beca61 zdb: Fix minor memory leak
Commit 6b6aaf6dc2 introduced a small
memory leak in zdb.  This was detected by the LeakSanitizer and was
causing all ztest runs to fail.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14796
2023-04-26 08:43:39 -07:00
Brian Behlendorf 0e8a42bbee Revert "Fix data race between zil_commit() and zil_suspend()"
This reverts commit 4c856fb333 to
resolve a newly introduced deadlock which in practice in more
disruptive that the issue this commit intended to address.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14775
Closes #14790
2023-04-25 16:40:55 -07:00
Han Gao 6d59d5df98 Add loongarch64 support
Add loongarch64 definitions & lua module setjmp asm

LoongArch is a new RISC ISA, which is a bit like MIPS or RISC-V.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Han Gao <gaohan@uniontech.com>
Signed-off-by: WANG Xuerui <xen0n@gentoo.org>
Closes #13422
2023-04-25 16:05:45 -07:00
Rich Ercolani 6b6aaf6dc2 Taught zdb -bb to print metadata totals
People often want estimates of how much of their pool is occupied
by metadata, but they end up using lots of text processing on zdb's
output to get it.

So let's just...provide it for them.

Now, zdb -bbbs will output something like:

Blocks  LSIZE   PSIZE   ASIZE     avg    comp   %Total  Type
[...]
    68  1.06M    272K    544K      8K    4.00     0.00      L6 Total
 1.71K   212M   6.85M   13.7M      8K   30.91     0.00      L5 Total
 1.71K   212M   6.85M   13.7M      8K   30.91     0.00      L4 Total
 1.73K   214M   6.92M   13.8M      8K   30.89     0.00      L3 Total
 18.7K  2.29G    111M    221M   11.8K   21.19     0.00      L2 Total
 3.56M   454G   28.4G   56.9G   16.0K   15.97     0.19      L1 Total
  308M  36.8T   28.2T   28.6T   95.1K    1.30    99.80      L0 Total
  311M  37.3T   28.3T   28.6T   94.2K    1.32   100.00  Total
 50.4M   774G    113G    291G   5.77K    6.85     0.99  Metadata Total

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14746
2023-04-24 16:55:07 -07:00
Mateusz Guzik 81a2b2e6a6 FreeBSD: add missing vop_fplookup assignments
It became illegal to not have them as of
5f6df177758b9dff88e4b6069aeb2359e8b0c493 ("vfs: validate that vop
vectors provide all or none fplookup vops") upstream.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #14788
2023-04-24 16:15:42 -07:00
Mateusz Guzik ff0e135e25 FreeBSD: try to fallback early if can't do optimized copy
Not complete, but already shaves on some locking.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Closes #14723
2023-04-24 16:13:52 -07:00
Mateusz Guzik a7982d5d30 FreeBSD: fix up EXDEV handling for clone_range
API contract requires VOPs to handle EXDEV internally, worst case by
falling back to the generic copy routine. This broke with the recent
changes.

While here whack custom loop to lock 2 vnodes with vn_lock_pair, which
provides the same functionality internally. write start/finish around
it plays no role so got eliminated.

One difference is that vn_lock_pair always takes an exclusive lock on
both vnodes. I did not patch around it because current code takes an
exclusive lock on the target vnode. zfs supports shared-locking for
writes, so this serializes different calls to the routine as is, despite
range locking inside. At the same time you may notice the source vnode
can get some traffic if only shared-locked, thus once more this goes
the safer route of exclusive-locking. Note this should be patched to
use shared-locking for both once the feature is considered stable.

Technically the switch to vn_lock_pair should be a separate change, but
it would only introduce churn immediately whacked by the rest of the
patch.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Closes #14723
2023-04-24 16:13:09 -07:00
Dimitry Andric 62cc9d4f6b FreeBSD: make zfs_vfs_held() definition consistent with declaration
Noticed while attempting to change FreeBSD's boolean_t into an actual
bool: in include/sys/zfs_ioctl_impl.h, zfs_vfs_held() is declared to
return a boolean_t, but in module/os/freebsd/zfs/zfs_ioctl_os.c it is
defined to return an int. Make the definition match the declaration.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #14776
2023-04-21 10:22:52 -07:00
Allan Jude 8eae2d214c Add support for zpool user properties
Usage:

    zpool set org.freebsd:comment="this is my pool" poolname

Tests are based on zfs_set's user property tests.

Also stop truncating property values at MAXNAMELEN, use ZFS_MAXPROPLEN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Sponsored-by: Beckhoff Automation GmbH & Co. KG.
Sponsored-by: Klara Inc.
Closes #11680
2023-04-21 10:20:36 -07:00
Richard Yao 135d9a9048 Linux: Suppress -Wordered-compare-function-pointers in tracepoint code
Clang points out that there is a comparison against -1, but we cannot
fix it because that is from the kernel headers, which we must support.
We can workaround this by using a pragma.

Sponsored-By: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Closes #14738
2023-04-20 10:31:01 -07:00
Richard Yao ab71b24d20 Linux: zfs_zaccess_trivial() should always call generic_permission()
Building with Clang on Linux generates a warning that err could be
uninitialized if mnt_ns is a NULL pointer. However, mnt_ns should never
be NULL, so there is no need to put this behind an if statement.  Taking
it outside of the if statement means that the possibility of err being
uninitialized goes from being always zero in a way that the compiler
could not realize to a way that is always zero in a way that the
compiler can realize.

Sponsored-By: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Closes #14738
2023-04-20 10:29:44 -07:00
Brian Behlendorf a10c645616 ZTS: zvol_misc_trim retry busy export
Retry the export if the pool is busy due to an open zvol.
Observed in the CI on Fedora 37.

  cannot export 'testpool': pool is busy
  ERROR: zpool export testpool exited 1

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14769
2023-04-20 10:25:16 -07:00
rob-wing 3e4ed4213d Create zap for root vdev
And add it to the AVZ, this is not backwards compatible with older pools
due to an assertion in spa_sync() that verifies the number of ZAPs of
all vdevs matches the number of ZAPs in the AVZ.

Granted, the assertion only applies to #DEBUG builds - still, a feature
flag is introduced to avoid the assertion, com.klarasystems:vdev_zaps_v2

Notably, this allows to get/set properties on the root vdev:

    % zpool set user:prop=value <pool> root-0

Before this commit, it was already possible to get/set properties on
top-level vdevs with the syntax <type>-<vdev_id> (e.g. mirror-0):

    % zpool set user:prop=value <pool> mirror-0

This syntax also applies to the root vdev as it is is of type 'root'
with a vdev_id of 0, root-0. The keyword 'root' as an alias for
'root-0'.

The following tests have been added:

    - zpool get all properties from root vdev
    - zpool set a property on root vdev
    - verify root vdev ZAP is created

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Sponsored-by: Seagate Technology
Submitted-by: Klara, Inc.
Closes #14405
2023-04-20 10:07:56 -07:00
Herb Wartens 71d191ef25 Allow MMP to bypass waiting for other threads
At our site we have seen cases when multi-modifier protection is enabled
(multihost=on) on our pool and the pool gets suspended due to a single
disk that is failing and responding very slowly. Our pools have 90 disks
in them and we expect disks to fail. The current version of MMP requires
that we wait for other writers before moving on. When a disk is
responding very slowly, we observed that waiting here was bad enough to
cause the pool to suspend. This change allows the MMP thread to bypass
waiting for other threads and reduces the chances the pool gets
suspended.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Herb Wartens <hawartens@gmail.com>
Closes #14659
2023-04-19 13:22:59 -07:00
Paul Dagnelie d4657835c8 ZTS: send-c_volume is flaky
We use block_device_wait to wait for the zvol block device to 
actually appear, and we log the result of the dd calls by using 
an intermediate file.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #14767
2023-04-19 13:20:02 -07:00
Ameer Hamza 719534ca8e Fix "Detach spare vdev in case if resilvering does not happen"
Spare vdev should detach from the pool when a disk is reinserted.
However, spare detachment depends on the completion of resilvering,
and if resilver does not schedule, the spare vdev keeps attached to
the pool until the next resilvering. When a zfs pool contains
several disks (25+ mirror), resilvering does not always happen when
a disk is reinserted. In this patch, spare vdev is manually detached
from the pool when resilvering does not occur and it has been tested
on both Linux and FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14722
2023-04-19 09:04:32 -07:00
наб 3d37e7e5f5 zfsprops.7: update mandlock
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=f7e33bdbd6d1bdf9c3df8bba5abcf3399f957ac3
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=7e59106e9c34458540f7d382d5b49071d1b7104f

Fixes: commit fb9baa9b20 ("zfsprops.8:
 remove nbmand-not-used-on-Linux and pointer to mount(8)")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #14765
2023-04-19 09:03:42 -07:00
youzhongyang 23f84d161e Silence clang warning of flexible array not at end
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14764
2023-04-18 18:10:40 -07:00
Low-power f9e1c63f8c Values printed by zpool-iostat(8) should be right-aligned
This inappropriate left-alignment was introduced in 7bb7b1f.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #14751
2023-04-18 11:34:41 -07:00
Tony Hutter accfdeb948 Revert "ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()"
This reverts commit 4b3133e671.

Users identified this commit as a possible source of data
corruption:
https://github.com/openzfs/zfs/issues/14753

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Issue #14753 
Closes #14761
2023-04-18 08:41:52 -07:00
Rich Ercolani 8ed62440ef Work around Raspberry Pi kernel packaging oddities
On Debian and Ubuntu and friends, you get something like
"linux-image-$(uname -r)" and "linux-headers-$(uname -r)" you
can put a Depends on.

On Raspberry Pi OS, you get "raspberrypi-kernel" and
"raspberrypi-kernel-headers", with version numbers like 20230411.

There is not, as far as I can tell, a reasonable way to map that
to a kernel version short of reaching out and digging around in
the changelogs or Makefile, so just special-case it so the packages
don't fail to install at install time. They still might not build
if the versions don't match, but I don't see a way to do anything
about that...

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14745
Closes #14747
2023-04-17 17:38:09 -07:00
Pawel Jakub Dawidek 3b5af20139 Fix VERIFY(!zil_replaying(zilog, tx)) panic
The zfs_log_clone_range() function is never called from the
zfs_clone_range_replay() function, so I assumed it is safe to assert
that zil_replaying() is never TRUE here. It turns out zil_replaying()
also returns TRUE when the sync property is set to disabled.

Fix the problem by just returning if zil_replaying() returns TRUE.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported by: Florian Smeets
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14758
2023-04-17 16:42:09 -07:00
dodexahedron ac18dc77f3 Minor improvements to zpoolconcepts.7
* Fixed one typo (effects -> affects)
 * Re-worded raidz description to make it clearer that it is not
   quite the same as RAID5, though similar
 * Clarified that data is not necessarily written in a static
   stripe width
 * Minor grammar consistency improvement
 * Noted that "volumes" means zvols
 * Fixed a couple of split infinitives
 * Clarified that hot spares come from the same pool they were
   assigned to
* "we" -> ZFS
* Fixed warnings thrown by mandoc, and removed unnecessary
  wordiness in one fixed line.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brandon Thetford <brandon@dodecatec.com>
Closes #14726
2023-04-13 09:15:34 -07:00
youzhongyang 27a82cbb3e Linux 6.3 compat: Fix memcpy "detected field-spanning write" error
Add a new union member of flexible array to dnode_phys_t and use
it in the macro so we can silence the memcpy() fortify error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14737
2023-04-13 09:12:03 -07:00
Pawel Jakub Dawidek c71fe71640 Fix data corruption when cloning embedded blocks
Don't overwrite blk_phys_birth, as for embedded blocks it is part of
the payload.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Issue #13392 
Closes #14739
2023-04-12 16:15:05 -07:00
наб 6e015933f8 initramfs: source user scripts from /e/z/initramfs-tools-load-key{,.d/*}
By dropping in a file in a directory (for packages) or by making a file
(for local administrators), custom key loading methods may be provided
for the rootfs and necessities.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nicholas Morris <security@niwamo.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Co-authored-by: Nicholas Morris <security@niwamo.com>
Supersedes: #14704
Closes: #13757
Closes #14733
2023-04-12 10:08:49 -07:00
George Amanakis 574e09d8c6 Fix in check_filesystem()
Fix the code in case of missing snapshots. Previously the check was in
a conditional that would be executed if the filesystem had snapshots.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14735
2023-04-12 08:53:53 -07:00
Alan Somers 678a3b8f99 Trim needless zeroes from checksum events
The ereport.fs.zfs.checksum event contains histograms of the bits that
were wrongly set or cleared according to their bit position in a 64-bit
word.  So the maximum value that any histogram bucket could have would
be 64.  But ZFS currently uses a uint32_t to hold each bucket.  As a
result, the event report is full of needless zeroes.

Change the bucket size to uint8_t, stripping 768 needless zeros from
each event.

Original event format:
```
 class=ereport.fs.zfs.checksum ena=639460469834258433 pool=testpool.1933 pool_guid=4979719877084416563 pool_state=0 pool_context=0 pool_failmode=wait vdev_guid=4136721804819128578 vdev_type=file vdev_path=/tmp/kyua.1TxP3A/2/work/file1.1933 vdev_ashift=9 vdev_complete_ts=609837019678 vdev_delta_ts=33450 vdev_read_errors=0 vdev_write_errors=0 vdev_cksum_errors=20 vdev_delays=0 parent_guid=2751977006639883417 parent_type=raidz vdev_spare_guids= zio_err=0 zio_flags=1048752 zio_stage=4194304 zio_pipeline=65011712 zio_delay=0 zio_timestamp=0 zio_delta=0 zio_priority=4 zio_offset=702976 zio_size=1024 zio_objset=24 zio_object=0 zio_level=3 zio_blkid=0 bad_ranges=0000000000000400 bad_ranges_min_gap=8 bad_range_sets=0000079e bad_range_clears=00000854 bad_set_histogram=000000210000001a000000150000001d000000240000001b000000220000001b000000210000002100000018000000260000002300000025000000210000001e000000250000001b0000001d0000001e0000001600000025000000180000001b000000240000001b000000240000001b0000001c000000210000001b0000001e000000210000001a0000001e000000220000001d0000001b000000200000001f0000001a000000250000001f0000001d0000001b0000001d000000240000001d0000001b0000001b0000001f00000024000000190000001a0000001f0000001e000000240000001e0000002400000021000000200000001d0000001d00000021 bad_cleared_histogram=000000220000002700000021000000210000001b0000001a000000250000001f0000001c0000001e0000002400000022000000220000002400000022000000240000002200000021000000220000001b0000002100000021000000190000001b000000240000002400000020000000290000002a00000028000000250000002400000020000000270000002500000016000000270000001c000000210000001f000000240000001c0000002100000022000000240000002100000023000000210000002700000022000000240000001b00000022000000210000001c00000023000000150000002600000020000000270000001e0000001d0000002400000026 time=00000016806457270000000323406839 eid=458
```

New format:
```
 class=ereport.fs.zfs.checksum ena=96599319807790081 pool=testpool.1933 pool_guid=1236902063710799041 pool_state=0 pool_context=0 pool_failmode=wait vdev_guid=2774253874431514999 vdev_type=file vdev_path=/tmp/kyua.6Temlq/2/work/file1.1933 vdev_ashift=9 vdev_complete_ts=92124283803 vdev_delta_ts=46670 vdev_read_errors=0 vdev_write_errors=0 vdev_cksum_errors=20 vdev_delays=0 parent_guid=8090931855087882905 parent_type=raidz vdev_spare_guids= zio_err=0 zio_flags=1048752 zio_stage=4194304 zio_pipeline=65011712 zio_delay=0 zio_timestamp=0 zio_delta=0 zio_priority=4 zio_offset=1028608 zio_size=512 zio_objset=0 zio_object=0 zio_level=0 zio_blkid=4 bad_ranges=0000000000000200 bad_ranges_min_gap=8 bad_range_sets=0000061f bad_range_clears=000001f4 bad_set_histogram=1719161c1c1c101618171a151a1a19161e1c171d1816161c191f1a18192117191c131d171b1613151a171419161a1b1319101b14171b18151e191a1b141a1c17 bad_cleared_histogram=06090a0808070a0b020609060506090a01090a050a0a0509070609080d050d0607080d060507080c04070807070a0608020c080c080908040808090a05090a07 time=00000016806477050000000604157480 eid=62
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Alan Somers <asomers@FreeBSD.org>
Sponsored-by: Axcient
Closes #14716
2023-04-10 14:24:27 -07:00
youzhongyang d4dc53dad2 Linux 6.3 compat: idmapped mount API changes
Linux kernel 6.3 changed a bunch of APIs to use the dedicated idmap 
type for mounts (struct mnt_idmap), we need to detect these changes 
and make zfs work with the new APIs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14682
2023-04-10 14:15:36 -07:00
Kyle Evans d0cbd9feaf module: freebsd: fix aarch64 fpu handling
Just like x86, aarch64 needs to use the fpu_kern(9) API around FPU
usage, otherwise we panic promptly at boot as soon as ZFS attempts to
do checksum benchmarking.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #14715
2023-04-10 12:40:18 -07:00
Kyle Evans dee77f45d0 module: resync part of Makefile.bsd
sha256-armv8.S and sha512-armv8.S need the same treatment as the sse
bits; removal of -mgeneral-regs-only from flags.

This fixes errors about requiring NEON, which is a difference in clang
vs. gcc treatment of -mgeneral-regs-only being specified on asm files.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #14715
2023-04-10 12:39:43 -07:00
Rob N baca06c258 libzfs: add v2 iterator interfaces
f6a0dac84 modified the zfs_iter_* functions to take a new "flags"
parameter, and introduced a variety of flags to ask the kernel to limit
the results in various ways, reducing the amount of work the caller
needed to do to filter out things they didn't need.

Unfortunately this change broke the ABI for existing clients (read:
older versions of the `zfs` program), and was reverted 399b98198.

dc95911d2 reintroduced the original patch, with the understanding that a
backwards-compatible fix would be made before the 2.2 release branch was
tagged. This commit is that fix.

This introduces zfs_iter_*_v2 functions that have the new flags
argument, and reverts the existing functions to not have the flags
parameter, as they were before. The old functions are now reimplemented
in terms of the new, with flags set to 0.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Original-patch-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Closes #14597
2023-04-10 11:53:02 -07:00
Rob N ff73574cd8 vdev: expose zfs_vdev_max_ms_shift as a module parameter
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Seagate Technology LLC
Closes #14719
2023-04-06 10:52:50 -07:00
George Amanakis a8a127e2c9 Fix typo in check_clones()
Run kmem_free() after zap_cursor_fini().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14702
2023-04-06 10:46:18 -07:00
Damian Szuberski 8ab674ab9c ZTS: add existing tests to runfiles
Some test cases were committed to the repository but never added to
runfiles.
Move `zfs_unshare_008_pos` to the Linux-only runfile.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14701
2023-04-06 10:43:24 -07:00
Andrew Innes 0b8fdb8ade ZTS: Use inbuilt monotonic time
Make the test runner try to use the included python monotonic time
function instead of calling librt.

This makes the test runner work on macos where librt wasn't available.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Closes #14700
2023-04-06 10:40:23 -07:00
Martin Matuška a3f82aec93 Miscellaneous FreBSD compilation bugfixes
Add missing machine/md_var.h to spl/sys/simd_aarch64.h and
spl/sys/simd_arm.h

In spl/sys/simd_x86.h, PCB_FPUNOSAVE exists only on amd64, use PCB_NPXNOSAVE
on i386

In FreeBSD sys/elf_common.h redefines AT_UID and AT_GID on FreeBSD, we need
a hack in vnode.h similar to Linux. sys/simd.h needs to be included early.

In zfs_freebsd_copy_file_range() we pass a (size_t *)lenp to
zfs_clone_range() that expects a (uint64_t *)

Allow compiling armv6 world by limiting ARM macros in sha256_impl.c and
sha512_impl.c to __ARM_ARCH > 6

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Signed-off-by: WHR <msl0000023508@gmail.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #14674
2023-04-06 10:35:02 -07:00
Rob N ece7ab7e7d vdev: expose zfs_vdev_def_queue_depth as a module parameter
It was previously available only to FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Seagate Technology LLC
Closes #14718
2023-04-06 10:31:19 -07:00
Paul Dagnelie b66c2a0899 Storage device expansion "silently" fails on degraded vdev
When a vdev is degraded or faulted, we refuse to expand it when doing
online -e. However, we also don't actually cause the online command
to fail, even though the disk didn't expand. This is confusing and
misleading, and can result in violated expectations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes 14145
2023-04-06 10:29:27 -07:00
Alexander Motin 1038f87c4e Fix some signedness issues in arc_evict()
It may happen that "wanted total ARC size" (wt) is negative, that was
expected.  But multiplication product of it and unsigned fractions
result in unsigned value, incorrectly shifted right with a sing loss.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #14692
2023-04-05 10:42:22 -07:00
youzhongyang 8eb2f26057 Linux 6.3 compat: writepage_t first arg struct folio*
The type def of writepage_t in kernel 6.3 is changed to take
struct folio* as the first argument. We need to detect this
change and pass correct function to write_cache_pages().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14699
2023-04-05 10:01:38 -07:00
Tino Reichardt 6ecdd35bdb Fix "Add colored output to zfs list"
Running `zfs list -o avail rpool` resulted in a core dump.
This commit will fix this.

Run the needed overhead only, when `use_color()` is true.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14712
2023-04-05 09:57:01 -07:00
наб 3399a30ee0 contrib: dracut: fix race with root=zfs:dset when necessities required
This had always worked in my testing, but a user on hardware reported
this to happen 100%, and I reproduced it once with cold VM host caches.

dracut-zfs-generator runs as a systemd generator, i.e. at Some
Relatively Early Time; if root= is a fixed dataset, it tries to
"solve [necessities] statically at generation time".

If by that point zfs-import.target hasn't popped (because the import is
taking a non-negligible amount of time for whatever reason), it'll see
no children for the root datase, and as such generate no mounts.

This has never had any right to work. No-one caught this earlier because
it's just that much more convenient to have root=zfs:AUTO, which orders
itself properly.

To fix this, always run zfs-nonroot-necessities.service;
this additionally simplifies the implementation by:
  * making BOOTFS from zfs-env-bootfs.service be the real, canonical,
    root dataset name, not just "whatever the first bootfs is",
    and only set it if we're ZFS-booting
  * zfs-{rollback,snapshot}-bootfs.service can use this instead of
    re-implementing it
  * having zfs-env-bootfs.service also set BOOTFSFLAGS
  * this means the sysroot.mount drop-in can be fixed text
  * zfs-nonroot-necessities.service can also be constant and always
    enabled, because it's conditioned on BOOTFS being set

There is no longer any code generated at run-time
(the sysroot.mount drop-in is an unavoidable gratuitous cp).

The flow of BOOTFS{,FLAGS} from zfs-env-bootfs.service to sysroot.mount
is not noted explicitly in dracut.zfs(7), because (a) at some point it's
just visual noise and (b) it's already ordered via d-p-m.s from z-i.t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #14690
2023-03-31 09:47:48 -07:00
youzhongyang c5431f1465 linux 6.3 compat: needs REQ_PREFLUSH | REQ_OP_WRITE
Modify bio_set_flush() so if kernel version is >= 4.10, flags 
REQ_PREFLUSH and REQ_OP_WRITE are set together.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14695
2023-03-31 09:46:22 -07:00
Brian Behlendorf 1142362ff6 Use vmem_zalloc to silence allocation warning
The kmem allocation in zfs_prune_aliases() will trigger a large
allocation warning on systems with 64K pages.  Resolve this by
switching to vmem_alloc() which internally uses kvmalloc() so the
right allocator will be used based on the allocation size.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8491
Closes #14694
2023-03-31 09:43:54 -07:00
Tony Hutter 21c4b2a944 Linux 6.2 compat: META
Update the META file to reflect compatibility with the 6.2 kernel.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #14689
2023-03-29 15:39:36 -07:00
George Amanakis 431083f75b Fixes in persistent error log
Address the following bugs in persistent error log:

1) Check nested clones, eg "fs->snap->clone->snap2->clone2".

2) When deleting files containing error blocks in those clones (from
   "clone" the example above), do not break the check chain.

3) When deleting files in the originating fs before syncing the errlog
   to disk, do not break the check chain. This happens because at the
   time of introducing the error block in the error list, we do not have
   its birth txg and the head filesystem. If the original file is
   deleted before the error list is synced to the error log (which is
   when we actually lookup the birth txg and the head filesystem), then
   we do not have access to this info anymore and break the check chain.

The most prominent change is related to achieving (3). We expand the
spa_error_entry_t structure to accommodate the newly introduced
zbookmark_err_phys_t structure (containing the birth txg of the error
block).Due to compatibility reasons we cannot remove the
zbookmark_phys_t structure and we also need to place the new structure
after se_avl, so it is not accounted for in avl_find(). Then we modify
spa_log_error() to also provide the birth txg of the error block. With
these changes in place we simplify the previously introduced function
get_head_and_birth_txg() (now named get_head_ds()).

We chose not to follow the same approach for the head filesystem (thus
completely removing get_head_ds()) to avoid introducing new lock
contentions.

The stack sizes of nested functions (as measured by checkstack.pl in the
linux kernel) are:
check_filesystem [zfs]: 272 (was 912)
check_clones [zfs]: 64

We also introduced two new tests covering the above changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14633
2023-03-28 16:51:58 -07:00
Kevin Jin 65d10bd87c Fix short-lived txg caused by autotrim
Current autotrim causes short-lived txg through:

1. calling txg_wait_synced() in metaslab_enable()
2. calling txg_wait_open() with should_quiesce = true

This patch addresses all the issues mentioned above.

A new cv, vdev_autotrim_kick_cv is added to kick autotrim activity.
It will be signaled once a txg is synced so that it does not change 
the original autotrim pace. Also because it is a cv, the wait is 
interruptible which speeds up the vdev_autotrim_stop_wait() call.

Finally, combining big zfs_txg_timeout, txg_wait_open() also causes
delay when exporting a pool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Issue #8993
Closes #12194
2023-03-28 08:43:41 -07:00
Brian Behlendorf 64bfa6bae3 Additional limits on hole reporting
Holding the zp->z_rangelock as a RL_READER over the range
0-UINT64_MAX is sufficient to prevent the dnode from being
re-dirtied by concurrent writers.  To avoid potentially
looping multiple times for external caller which do not
take the rangelock holes are not reported after the first
sync.  While not optimal this is always functionally correct.

This change adds the missing rangelock calls on FreeBSD to
zvol_cdev_ioctl().

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14512
Closes #14641
2023-03-28 08:19:03 -07:00
George Wilson a604d3243b Revert "Do not hold spa_config in ZIL while blocked on IO"
This reverts commit 7d638df09b.

Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #14678
2023-03-28 08:13:32 -07:00
Rob N aebd94cc85 config: don't link libudev on FreeBSD
FreeBSD has a libudev shim in libudev-devd. If present, configure would
detect it and produce binaries linked against it, even though nothing
used it. That is surprising and unnecessary, so lets remove it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14669
2023-03-27 11:55:54 -07:00
Rich Ercolani ae0b1f66c7 linux 6.3 compat: add another bdev_io_acct case
Linux 6.3+, and backports from it (6.2.8+), changed the
signatures on bdev_io_{start,end}_acct.  Add a case for it.  

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14658
Closes #14668
2023-03-27 11:29:19 -07:00
Ameer Hamza a05263b7aa Update vdev state for spare vdev
zfsd fetches new pool configuration through ZFS_IOC_POOL_STATS but
it does not get updated nvlist configuration for spare vdev since
the configuration is read by spa_spares->sav_config. In this commit,
updating the vdev state for spare vdev that is consumed by zfsd on
spare disk hotplug.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14653
2023-03-24 10:30:38 -07:00
Rich Ercolani 0ad5f43442 Drop lying to the compiler in the fletcher4 code
This is probably the uncontroversial part of #13631, which fixes
a real problem people are having.

There's still things to improve in our code after this is merged,
but it should stop the breakage that people have reported, where
we lie about a type always being aligned and then pass in stack
objects with no alignment requirement and hope for the best.

Of course, our SIMD code was written with unaligned accesses, so it
doesn't care if we drop this...but some auto-vectorized code that
gcc emits sure does, since we told it it can assume they're aligned.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14649
2023-03-24 10:29:19 -07:00
George Wilson 460d887c43 panic loop when removing slog device
There is a window in the slog removal code where a panic loop could
ensue if the system crashes during that operation. The original design
of slog removal did not persisted any state because the removal happened
synchronously. This was changed by a later commit which persisted the
vdev_removing flag and exposed this bug. If a slog removal is in
progress and happens to crash after persisting the vdev_removing flag to
the label but before the vdev is removed from the spa config, then the
pool will continue to panic on import. Here's a sample of the panic:

[  134.387411] VERIFY0(0 == dmu_buf_hold_array(os, object, offset, size,
FALSE, FTAG, &numbufs, &dbp)) failed (0 == 22)
[  134.393865] PANIC at dmu.c:1135:dmu_write()
[  134.396035] Kernel panic - not syncing: VERIFY0(0 ==
dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs,
&dbp)) failed (0 == 22)
[  134.397857] CPU: 2 PID: 5914 Comm: txg_sync Kdump: loaded Tainted:
P           OE     5.4.0-1100-dx2023020205-b3751f8c2-azure #106
[  134.407938] Hardware name: Microsoft Corporation Virtual
Machine/Virtual Machine, BIOS 090008  12/07/2018
[  134.407938] Call Trace:
[  134.407938]  dump_stack+0x57/0x6d
[  134.407938]  panic+0xfb/0x2d7
[  134.407938]  spl_panic+0xcf/0x102 [spl]
[  134.407938]  ? traverse_impl+0x1ca/0x420 [zfs]
[  134.407938]  ? dmu_object_alloc_impl+0x3b4/0x3c0 [zfs]
[  134.407938]  ? dnode_hold+0x1b/0x20 [zfs]
[  134.407938]  dmu_write+0xc3/0xd0 [zfs]
[  134.407938]  ? space_map_alloc+0x55/0x80 [zfs]
[  134.407938]  metaslab_sync+0x61a/0x830 [zfs]
[  134.407938]  ? queued_spin_unlock+0x9/0x10 [zfs]
[  134.407938]  vdev_sync+0x72/0x190 [zfs]
[  134.407938]  spa_sync_iterate_to_convergence+0x160/0x250 [zfs]
[  134.407938]  spa_sync+0x2f7/0x670 [zfs]
[  134.407938]  txg_sync_thread+0x22d/0x2d0 [zfs]
[  134.407938]  ? txg_dispatch_callbacks+0xf0/0xf0 [zfs]
[  134.407938]  thread_generic_wrapper+0x83/0xa0 [spl]
[  134.407938]  kthread+0x104/0x140
[  134.407938]  ? kasan_check_write.constprop.0+0x10/0x10 [spl]
[  134.407938]  ? kthread_park+0x90/0x90
[  134.457802]  ret_from_fork+0x1f/0x40

This change no longer persists the vdev_removing flag when removing slog
devices and also cleans up some code that was added which is not used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #14652
2023-03-24 10:27:07 -07:00
Tino Reichardt 2bd0490faf Add colored output to zfs list
Use a bold header row and colorize the AVAIL column based on
the used space percentage of volume.

We define these colors:
- when > 80%, use yellow
- when > 90%, use red

Reviewed-by: WHR <msl0000023508@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ethan Coe-Renner <coerenner1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14621
Closes #14350
2023-03-24 10:24:11 -07:00
Tino Reichardt 7bde396aa2 Colorize zpool iostat output
Use a bold header and colorize the space suffixes in iostat
by order of magnitude like this:
- K is green
- M is yellow
- G is red
- T is lightblue
- P is magenta
- E is cyan
- 0 space is colored gray

Reviewed-by: WHR <msl0000023508@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ethan Coe-Renner <coerenner1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14621
Closes #14459
2023-03-24 10:23:52 -07:00
Tino Reichardt 80f2cdcd67 Add more ANSI colors to libzfs
Reviewed-by: WHR <msl0000023508@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ethan Coe-Renner <coerenner1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14621
2023-03-24 10:21:19 -07:00
Matthew Ahrens d2d4f8554f Fix prefetching of indirect blocks while destroying
When traversing a tree of block pointers (e.g. for `zfs destroy <fs>` or
`zfs send`), we prefetch the indirect blocks that will be needed, in
`traverse_prefetch_metadata()`.  In the case of `zfs destroy <fs>`, we
do a little traversing each txg, and resume the traversal the next txg.
So the indirect blocks that will be needed, and thus are candidates for
prefetching, does not include blocks that are before the resume point.

The problem is that the logic for determining if the indirect blocks are
before the resume point is incorrect, causing the (up to 1024) L1
indirect blocks that are inside the first L2 to not be prefetched.  In
practice, if we are able to read many more than 1024 blocks per txg,
then this will be inconsequential.  But if i/o latency is more than a
few milliseconds, almost no L1's will be prefetched, so they will be
read serially, and thus the destroying will be very slow.  This can be
observed as `zpool get freeing` decreasing very slowly.

Specifically: When we first examine the L2 that contains the block we'll
be resuming from, we have not yet resumed, so `td_resume` is nonzero.
At this point, all calls to `traverse_prefetch_metadata()` will fail,
even if the L1 in question is after the resume point.  It isn't until
the callback is issued for the resume point that we zero out
`td_resume`, but by this point we've already attempted and failed to
prefetch everything under this L2 indirect block.

This commit addresses the issue by reusing the existing
`resume_skip_check()` to determine if the L1's bookmark is before or
after the resume point.  To do so, this function is made non-mutating
(the caller now zeros `td_resume`).

Note, this bug likely predates (was not introduced by) #11803.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #14603
2023-03-24 10:20:07 -07:00
Pawel Jakub Dawidek ce0e1cc402 Fix cloning into already dirty dbufs.
Undirty the dbuf and destroy its buffer when cloning into it.

Coverity ID: CID-1535375
Reported-by: Richard Yao
Reported-by: Benjamin Coddington
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14655
2023-03-24 10:18:35 -07:00
Rob N 5b5f518687 man: add ZIO_STAGE_BRT_FREE to zpool-events
And bump all the values after it, matching the header update in
67a1b037.

Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14665
2023-03-24 10:14:39 -07:00
Pawel Jakub Dawidek 9fa007d35d Fix build on FreeBSD
Constify some variables after d1807f168e.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14656
2023-03-22 09:24:41 -07:00
Timothy Day 1eca40f3ad Fix kmodtool for packaging mainline Linux
kmodtool currently incorrectly identifies official
RHEL kernels, as opposed to custom kernels. This
can cause the openZFS kmod RPM build to break.

The issue can be reproduced by building a set of
mainline Linux RPMs, installing them, and then
attempting to build the openZFS kmod package
against them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Timothy Day <timday@amazon.com>
Closes #14617
2023-03-22 09:22:52 -07:00
Tino Reichardt 0f9e735414 Remove unused constant EdonR256_BLOCK_BITSIZE
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14650
2023-03-22 08:39:48 -07:00
Alexander Motin d520f64342 FreeBSD: Remove extra arc_reduce_target_size() call
Remove arc_reduce_target_size() call from arc_prune_task().  The idea
of arc_prune_task() is to remove external references on ARC metadata,
such as vnodes. Since arc_prune_async() is called only from ARC itself,
it makes no sense to create a parasitic loop between ARC eviction and
the pruning, treatening to drop ARC to its minimum.  I can't guess why
it was added as part of FreeBSD to OpenZFS integration.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14639
2023-03-17 17:31:08 -07:00
Richard Yao fa46802585 Fix possible bad bit shift in dnode_next_offset_level()
031d7c2fe6 did not handle reverse
iteration, such that the original issue theoretically could still occur.

Note that contrary to the claim in the ZFS disk format specification
that a maximum of 6 levels are possible, 9 levels are possible with
recordsize=512 and and indirect block size of 16KB. In this unusual
configuration, span will be 65. The maximum size of span at 70 can be
reached at recordsize=16K and an indirect blocksize of 16KB.

When we are at this indirection level and are traversing backward, the
minimum value is start, but we cannot calculate that with 64-bit
arithmetic, so we avoid the calculation and instead rely on the earlier
statement that did `*offset = start;`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-1466214)
Closes #14618
2023-03-16 14:27:49 -07:00
naivekun 60cfd3bbc2 QAT: Fix uninitialized seed in QAT compression
CpaDcRqResults have to be initialized with checksum=1 for adler32.
Otherwise when error CPA_DC_OVERFLOW occurred, the next compress
operation will continue on previously part-compressed data, and write
invalid checksum data. When zfs decompress the compressed data, a
invalid checksum will occurred and lead to #14463

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Weigang Li <weigang.li@intel.com>
Reviewed-by: Chengfei Zhu <chengfeix.zhu@intel.com>
Signed-off-by: naivekun <naivekun0817@gmail.com>
Closes #14632
Closes #14463
2023-03-16 11:54:10 -07:00
Tino Reichardt 480d809703 Refine some details for the github actions update
Set the retention-days variable to 14 days for these artifacts:
- the zloop error logs
- the zloop vdev files
- the compiled modules

Add the abality to re-run some part of the functional testings.
Fix some comments and remove the deleting of the modules artifact.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14637
2023-03-16 10:00:14 -07:00
Tino Reichardt c9c463a508 Add git repo checkout to testing workflow
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14634
2023-03-15 15:37:56 -07:00
Attila Fülöp 5f3611121d spl: cmn_err_once() should be usable in brace-less if else statements
Commit 11913870 (#14567) added cmn_err_once() by #define'ing a
compound statement but failed to consider usage in a single
statement brace-less if else.

Fix the problem by using the common "do {} while (0)" construct.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #14629
2023-03-15 11:13:25 -07:00
Low-power c31bb934cd Fix building for powerpc* targets with some compilers
Under some configurations, GCC didn't predefined macro 'powerpc' for
such a target. Use the guaranteed macro '__powerpc__' instead.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #14631
2023-03-15 10:44:28 -07:00
Tino Reichardt b7bc334d13 Split functional testings via github action matrix
This commit changes the workflow of the github actions.

We split the workflow into different parts:

1) build zfs modules for Ubuntu 20.04 and 22.04 (~25m)
2) 2x zloop test (~10m) + 2x sanity test (~25m)
3) functional testings in parts 1..5 (each ~1h)
   - these could be triggered, when sanity tests are ok
   - currently I just start them all in the same time
4) cleanup and create summary

When everything is fine, the full run with all testings
should be done in around 2 hours.

The codeql.yml and checkstyle.yml are not part in this circle.

The testings are also modified a bit:
- report info about CPU and checksum benchmarks
- reset the debugging logs for each test
  - when some error occurred, we call dmesg with -c to get
    only the log output for the last failed test
  - we empty also the dbgsys

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14078
2023-03-15 10:41:05 -07:00
Alek P f55d6ee818 Improve tests and update man page for healing recv
Fix the manpage. The "SYNOPSIS" section is incorrectly formatted for 
receive -c.  I also took this opportunity to reword some parts and 
fix a run-on sentence in the manpage.

Add large block testing for corrective recv. This adds a new test 
that makes sure blocks generated using zfs send -L/--large-block 
large-block send flag are able to be used for healing.

Since with unloaded key and errlog feature enabled corruption is not 
shown in zpool status #13675 is fixed the zfs_receive_corrective.ksh 
test no longer sets -o feature@head_errlog=disabled on pool creation 
so that it can also test for regressions related to head_errlog feature.

Note that the zfs_receive_compressed_corrective.ksh and
zfs_receive_large_block_corrective.ksh tests are still creating pools 
with -o feature@head_errlog=disabled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #14615
2023-03-15 10:34:10 -07:00
WHR d74f123045 Refactor CONFIG_SPE check on Linux/powerpc
Commit 5401472 adds a check to call enable_kernel_spe and
disable_kernel_spe only if CONFIG_SPE is defined. Refactor this check
in a way similar to what CONFIG_ALTIVEC and CONFIG_VSX are checked, in
order to remove redundant kfpu_begin() and kfpu_end() implementations.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #14623
2023-03-15 10:30:42 -07:00
WHR f31b0d4e88 Fix missing semicolons in commit 1f196e3
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #14623
2023-03-15 10:30:02 -07:00
ofthesun9 8b7342d290 Fix for mountpoint=legacy
We need to clear mountpoint only after checking it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: ofthesun9 <olivier@ofthesun.net>
Closes #14599
Closes #14604
2023-03-14 16:40:55 -07:00
Tino Reichardt fe6a7b787f Remove unused Edon-R variants
This commit removes the edonr_byteorder.h file and all unused
variants of Edon-R.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13618
2023-03-14 15:59:58 -07:00
Richard Yao dbfc622345 nvpair: Use flexible array member for nvpair name strings
Coverity reported possible out-of-bounds reads from doing `((char
*)(nvp) + sizeof (nvpair_t))` to get the nvpair name string. These were
initially marked as false positives, but since we are now using C99
flexible array members elsewhere, we could use them here too as cleanup
to make the code easier to understand.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-977165)
Reported-by: Coverity (CID-1524109)
Reported-by: Coverity (CID-1524642)
Closes #14612
2023-03-14 15:25:55 -07:00
Richard Yao d1807f168e nvpair: Constify string functions
After addressing coverity complaints involving `nvpair_name()`, the
compiler started complaining about dropping const. This lead to a rabbit
hole where not only `nvpair_name()` needed to be constified, but also
`nvpair_value_string()`, `fnvpair_value_string()` and a few other static
functions, plus variable pointers throughout the code. The result became
a fairly big change, so it has been split out into its own patch.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14612
2023-03-14 15:25:50 -07:00
Richard Yao 50f6934b9c discover_cached_paths() should not corrupt nvlist string value
discover_cached_paths() will write a NULL into a string from a nvlist to
use it as a substring, but does not restore it before return. This
corrupts the nvlist. It should be harmless unless the string is needed
again later, but we should not do this, so let us fix it.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14612
2023-03-14 15:25:45 -07:00
Richard Yao 47a7062772 zpool_valid_proplist() should not corrupt nvpair name string on error
The strings returned from parsing nvlists should be immutable, but to
simplify the code when we want a substring from it, we sometimes will
write a NULL into it and then restore the value afterward. Provided
there is no concurrent access, this is okay, unless we forget to restore
the value afterward. This was caught when constifying string functions
related to nvlists.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14612
2023-03-14 15:25:40 -07:00
Richard Yao 27ff18cd43 Fix possible NULL pointer dereference in nvlist_lookup_nvpair_ei_sep()
Clang's static analyzer complains about a possible NULL pointer
dereference in nvlist_lookup_nvpair_ei_sep() because it unconditionally
dereferences a pointer initialized by `nvpair_value_nvlist_array()`
under the assumption that `nvpair_value_nvlist_array()` will always
initialize the pointer without checking to see if an error was returned
to indicate otherwise. This itself is improper error handling, so we fix
it. However, fixing it to properly respond to errors is not enough to
avoid a NULL pointer dereference, since we can receive NULL when the
array is empty, so we also add a NULL check.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14612
2023-03-14 15:25:35 -07:00
Richard Yao 47b994049f Silence clang static analyzer warnings about stored stack addresses
Clang's static analyzer complains that nvs_xdr() and nvs_native()
functions return pointers to stack memory. That is technically true, but
the pointers are stored in stack memory from the caller's stack frame,
are not read by the caller and are deallocated when the caller returns,
so this is harmless. We set the pointers to NULL to silence the
warnings.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14612
2023-03-14 15:25:01 -07:00
Richard Yao 3cb293a6f8 Fix possible NULL pointer dereference in dbuf_verify()
Coverity reported a dereference after a NULL check in dbuf_verify(). If
`dn` is `NULL`, we can just assume that !dn->dn_free_txg, so we change
`!dn->dn_free_txg` to `(dn == NULL || !dn->dn_free_txg)`.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-992298)
Closes #14619
2023-03-14 15:00:54 -07:00
Tino Reichardt 3a03c96381 Replace dead opensolaris.org license links
The commit replaces all findings of the link:
http://www.opensolaris.org/os/licensing with this one:
https://opensource.org/licenses/CDDL-1.0

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: WHR <msl0000023508@gmail.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14625
2023-03-14 14:44:01 -07:00
Richard Yao 24e61911f0 Zero zio_prop_t in flush_write_batch_impl()
After 67a1b03791 was merged, coverity
started complaining about an uninitialized scalar variable in
flush_write_batch_impl() due to the new field zp.zp_brtwrite. Upon
inspection, it appears that uninitialized memory was being copied for
non-raw streams, so this is a pre-existing issue. The addition of
zp_brtwrite by the block cloning commit caused Coverity to begin to
notice it.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-1535378)
Closes #14607
2023-03-14 14:41:45 -07:00
Richard Yao 1c212d1b7c Fix uninitialized scalar value read regression in dmu_recv_begin()
da19d919a8 changed this in a way that
permits execution to reach `if (err == 0)` without initializing err.
This could randomly cause the sync task to not execute. We fix that by
initializing err to zero.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-1535377)
Closes #14607
2023-03-14 14:40:49 -07:00
Matthew Ahrens 519851122b ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()
`lseek(SEEK_DATA | SEEK_HOLE)` are only accurate when the on-disk blocks
reflect all writes, i.e. when there are no dirty data blocks.  To ensure
this, if the target dnode is dirty, they wait for the open txg to be
synced, so we can call them "stabilizing operations".  If they cause
txg_wait_synced often, it can be detrimental to performance.

Typically, a group of files are all modified, and then SEEK_DATA/HOLE
are performed on them.  In this case, the first SEEK does a
txg_wait_synced(), and subsequent SEEKs don't need to wait, so
performance is good.

However, if a workload involves an interleaved metadata modification,
the subsequent SEEK may do a txg_wait_synced() unnecessarily.  For
example, if we do a `read()` syscall to each file before we do its SEEK.
This applies even with `relatime=on`, when the `read()` is the first
read after the last write.  The txg_wait_synced() is unnecessary because
the SEEK operations only care that the structure of the tree of indirect
and data blocks is up to date on disk.  They don't care about metadata
like the contents of the bonus or spill blocks.  (They also don't care
if an existing data block is modified, but this would be more involved
to filter out.)

This commit changes the behavior of SEEK_DATA/HOLE operations such that
they do not call txg_wait_synced() if there is only a pending change to
the bonus or spill block.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #13368 
Issue #14594 
Issue #14512 
Issue #14009
2023-03-14 14:30:29 -07:00
Attila Fülöp 78289b8458 zcommon: Refactor FPU state handling in fletcher4
Currently calls to kfpu_begin() and kfpu_end() are split between
the init() and fini() functions of the particular SIMD
implementation. This was done in #14247 as an optimization measure
for the ABD adapter. Unfortunately the split complicates FPU
handling on platforms that use a local FPU state buffer, like
Windows and macOS.

To ease porting, we introduce a boolean struct member in
fletcher_4_ops_t, indicating use of the FPU, and move the FPU state
handling from the SIMD implementations to the call sites.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #14600
2023-03-14 09:45:28 -07:00
George Melikov b15ab50c4d ABI files: sync with new Ubuntu release in CI
We may try to build ZFS inside container too,
but let's just sync them for now.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #14605
2023-03-10 16:23:01 -08:00
Pawel Jakub Dawidek 67a1b03791 Implementation of block cloning for ZFS
Block Cloning allows to manually clone a file (or a subset of its
blocks) into another (or the same) file by just creating additional
references to the data blocks without copying the data itself.
Those references are kept in the Block Reference Tables (BRTs).

The whole design of block cloning is documented in module/zfs/brt.c.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #13392
2023-03-10 11:59:53 -08:00
Paul Dagnelie da19d919a8 Fix incremental receive silently failing for recursive sends
The problem occurs because dmu_recv_begin pulls in the payload and 
next header from the input stream in order to use the contents of 
the begin record's nvlist. However, the change to do that before the 
other checks in dmu_recv_begin occur caused a regression where an 
empty send stream in a recursive send could have its END record 
consumed by this, which broke the logic of recv_skip. A test is 
also included to protect against this case in the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12661
Closes #14568
2023-03-10 09:52:44 -08:00
Low-power 589f59b52a Workaround for Linux PowerPC GPL-only cpu_has_feature()
Linux since 4.7 makes interface 'cpu_has_feature' to use jump labels on
powerpc if CONFIG_JUMP_LABEL_FEATURE_CHECKS is enabled, in this case
however the inline function references GPL-only symbol
'cpu_feature_keys'.

ZFS currently uses 'cpu_has_feature' either directly or indirectly from
several places; while it is unknown how this issue didn't break ZFS on
64-bit little-endian powerpc, it is known to break ZFS with many Linux
versions on both 32-bit and 64-bit big-endian powerpc.

Until this issue is fixed in Linux, we have to workaround it by
overriding affected inline functions without depending on
'cpu_feature_keys'.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #14590
2023-03-10 09:35:00 -08:00
Richard Yao 7316fdd1c0 txg_sync should handle write errors in ZIL
The txg_sync thread will see certain buffers in a DR_IN_DMU_SYNC state
when ZIL is writing them out. Then it waits until the state changes, but
has an assertion to check that they were not DR_NOT_OVERRIDDEN. If the
data write failed with an error, ZIL will put it into the
DR_NOT_OVERRIDDEN state. It looks like the code will handle that state
without an issue, so we can just delete the assertion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Closes #14283
2023-03-10 09:34:00 -08:00
Richard Yao 950980b4c4 Suppress clang static analyzer warning in vdev_stat_update()
63652e1546 added unnecessary branches in
`vdev_stat_update()` to suppress an ASAN false positive the breaks
ztest. This had the downside of causing false positive reports in both
Coverity and Clang's static analyzer. vd is never NULL, so we add a
preprocessor check to only apply the workaround when compiling with ASAN
support.

Reported-by: Coverity (CID-1524583)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:52:24 -08:00
Richard Yao 37edc7ea98 Refactor loop in dump_histogram()
The current loop triggers a complaint that we are using an array offset
prior to a range check from cpp/offset-use-before-range-check when we
are actually calculating maximum and minimum values. I was about to file
a false positive report with CodeQL, but after looking at how the code
is structured, I really cannot blame CodeQL for mistaking this for a
range check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:52:20 -08:00
Richard Yao 08641d9007 Suppress static analyzer warning in dmu_objset_create_impl_dnstats()
ae7e700650 added an assertion to suppress
a complaint from Clang's static analyzer. Unfortunately, it missed
another way for Clang to complain about this function. This adds another
assertion to handle that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:52:15 -08:00
Richard Yao 703283fabd Linux: Fix octal detection in define_ddi_strtox()
Clang Tidy reported this as a misc-redundant-expression because writing
`8` instead of `'8'` meant that the condition could never be true.

The only place where we have a chance of this being a bug would be in
nvlist_lookup_nvpair_ei_sep(). I am not sure if we ever pass an octal to
that, but if we ever do, it should work properly now instead of failing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:52:09 -08:00
Richard Yao 66a38fd10a Linux: Suppress clang static analyzer warning in zfs_remove()
Clang's static analyzer points out that if we fail to find an extended
attribute directory, but somehow find it when calculating delete_now and
delete_now is true, we will have a NULL pointer dereference when we try
to unlink the extended attribute directory.

I am not sure if this is possible, but if it is, I do not see a sane way
of handling this other than rolling back the transaction and retrying.
For now, let us do an VERIFY_IMPLY(). If this trips, it will stop the
transaction from committing, which will prevent an attribute directory
leak.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:52:04 -08:00
Richard Yao c2550a136e Linux: Silence static analyzer warning in crypto_create_ctx_template()
A CodeChecker report from Clang's CTU analysis indicated that we were
assigning uninitialized values in crypto_create_ctx_template() when we
call it from zio_crypt_key_init(). This occurs because the ->cm_param
and ->cm_param_len fields are uninitialized. Thankfully, the
uninitialized values are only used in the skein via
KCF_PROV_CREATE_CTX_TEMPLATE() -> skein_create_ctx_template() ->
skein_mac_ctx_build() -> skein_get_digest_bitlen(), but that should not
be called from here. We fix this to avoid a possible trap should this
code change in the future.

The FreeBSD version of zio_crypt_key_init() is unaffected.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:59 -08:00
Richard Yao 51f55742f6 Suppress Clang Static Analyzer warning in bpobj_enqueue()
scan-build does not do cross translation unit analysis to realize that
`dmu_buf_hold()` will always set `bpo->bpo_cached_dbuf` to a non-NULL
pointer, so we add an assertion to make it realize this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:55 -08:00
Richard Yao 45c446308a Suppress Clang Static Analyzer warning in dsl_dir_rename_sync()
Clang's static analyzer reports that if we try to rename a root dataset
in `dsl_dir_rename_sync()`, we will have a NULL pointer passed to
strlcpy(). This is impossible because `dsl_dir_rename_check()` will
prevent us from doing this. We add an assertion to silence this warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:50 -08:00
Richard Yao 17443e0b20 Cleanup: Remove constant comparisons reported by CodeQL
CodeQL's cpp/constant-comparison query from its security-and-extended
query set reported 4 instances where we have comparions that always
evaluate the same way.

In `draid_config_by_type()`, we have an early `if (nparity == 0)` check
that returns `EINVAL`, making a later `if (nparity == 0 || nparity >
VDEV_DRAID_MAXPARITY)` partially redundant. The later check prints an
error message when parity is 0, but the early check does not. This is
not useful feedback, so we move the later check to the place where the
early check runs to replace the early check.

In `perform_thread_merge()`, we return when `num_threads == 0`. After
that block, we do `if (num_threads > 0) {`, which will always be true.
We remove the `if` statement.

In `sa_modify_attrs()`, we have a loop condition that is `k != 2`, but
at the end of the loop, we have `if (k == 0 && hdl->sa_spill)` followed
by an else that does a break. The result is that k != 2 will never be
evaluated when it is false. We drop the comparison.

In `zap_leaf_array_read()`, we have a for loop condition that is `i <
ZAP_LEAF_ARRAY_BYTES && len > 0`. However, that loop itself is in a loop
that is `while (len > 0)` and while the value of len is decremented
inside the loop, when `len == 0`, it will return, such that `len > 0`
inside the loop condition will always be true. We drop that part of the
condition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:46 -08:00
Richard Yao 5dd0f019cd Linux cleanup: zvol_discard() should only call blk_queue_io_stat() once
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:40 -08:00
Richard Yao f9e109223b Suppress Clang Static Analyzer warning in dbuf_dnode_findbp()
Clang's static analyzer reports that if a `blkid == DMU_SPILL_BLKID` is
passed, then we can have a NULL pointer dereference when either
->dn_have_spill or `DNODE_FLAG_SPILL_BLKPTR` is not set. This should not
happen. We add an `ASSERT()` to suppress reports about NULL pointer
dereferences.

Originally, I wanted to use one or two IMPLY statements on
pre-conditions before the call to `dbuf_findbp()`, but Clang's static
analyzer did not understand it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:36 -08:00
Richard Yao 399bb81607 Suppress Clang Static Analyzer warning in vdev_split()
Clang's static analyzer pointed out that we can have a NULL pointer
dereference if we ever attempt to split a vdev that has only 1 child. If
that happens, we are left with zero children, but then try to access a
non-existent child. Calling vdev_split() on a vdev with only 1 child
should be impossible due to how the code is structured. If this ever
happens, it would be best to stop execution immediately even in a
production environment to allow for the best possible chance of recovery
by an expert, so we use `VERIFY3U()` instead of `ASSERT3U()`.

Unfortunately, while that defensive assertion will prevent execution
from ever reaching the NULL pointer dereference, Clang's static analyzer
does not realize that, so we add an `ASSERT()` to inform it of this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:31 -08:00
Richard Yao 0b831cabc6 Suppress Clang Static Analyzer warning about SNPRINTF_BLKPTR()
Clang's static analyzer pointed out that if we can pass a -1 array index
to copyname[copies] if there are no valid DVAs. This is an absurd
situation, but it suggests that we are missing an assertion, so we add
it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:26 -08:00
Richard Yao a4240a8ac7 Suppress Clang Static Analyzer false positive in the AVL tree code.
This has been filed as llvm/llvm-project#60694. Switching from a copy
through a C pointer dereference to an explicit memcpy() is a workaround
that prevents a false positive.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:21 -08:00
Richard Yao 8b72dfed11 Suppress Clang Static Analyzer defect report in abd_get_size()
Clang's static analyzer reports a possible NULL pointer dereference in
abd_get_size() when called from vdev_draid_map_alloc_write() called from
vdev_draid_map_alloc_row() and vdc->vdc_nparity == 0. This should be
impossible, so we add an assertion to silence the defect report.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:09 -08:00
Richard Yao 9368b3877c Fix TOCTOU race in zpool_do_labelclear()
Coverity reported a TOCTOU race in `zpool_do_labelclear()`. This is not
believed to be a real security issue, but fixing it reduces the number
of syscalls we do and will prevent other static analyzers from
complaining about this.

The code is expected to be equivalent. However, under rare
circumstances, such as ELOOP, ENAMETOOLONG, ENOMEM, ENOTDIR and
EOVERFLOW, we will display the error message that we currently display
for the `open()` syscall rather than the one that we currently display
for the `stat()` syscall. This is considered to be an improvement.

Reported-by: Coverity (CID-1524188)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:50:51 -08:00
Alexander Motin a8d83e2a24 More adaptive ARC eviction
Traditionally ARC adaptation was limited to MRU/MFU distribution.  But
for years people with metadata-centric workload demanded mechanisms to
also manage data/metadata distribution, that in original ZFS was just
a FIFO.  As result ZFS effectively got separate states for data and
metadata, minimum and maximum metadata limits etc, but it all required
manual tuning, was not adaptive and in its heart remained a bad FIFO.

This change removes most of existing eviction logic, rewriting it from
scratch.  This makes MRU/MFU adaptation individual for data and meta-
data, same as the distribution between data and metadata themselves.
Since most of required states separation was already done, it only
required to make arcs_size state field specific per data/metadata.

The adaptation logic is still based on previous concept of ghost hits,
just now it balances ARC capacity between 4 states: MRU data, MRU
metadata, MFU data and MFU metadata.  To simplify arc_c changes instead
of arc_p measured in bytes, this code uses 3 variable arc_meta, arc_pd
and arc_pm, representing ARC balance between metadata and data, MRU and
MFU for data, and MRU and MFU for metadata respectively as 32-bit fixed
point fractions.  Since we care about the math result only when need to
evict, this moves all the logic from arc_adapt() to arc_evict(), that
reduces per-block overhead, since per-block operations are limited to
stats collection, now moved from arc_adapt() to arc_access() and using
cheaper wmsums.  This also allows to remove ugly ARC_HDR_DO_ADAPT flag
from many places.

This change also removes number of metadata specific tunables, part of
which were actually not functioning correctly, since not all metadata
are equal and some (like L2ARC headers) are not really evictable.
Instead it introduced single opaque knob zfs_arc_meta_balance, tuning
ARC's reaction on ghost hits, allowing administrator give more or less
preference to metadata without setting strict limits.

Some of old code parts like arc_evict_meta() are just removed, because
since introduction of ABD ARC they really make no sense: only headers
referenced by small number of buffers are not evictable, and they are
really not evictable no matter what this code do.  Instead just call
arc_prune_async() if too much metadata appear not evictable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14359
2023-03-08 11:17:23 -08:00
Attila Fülöp 8d9752569b ICP: AES-GCM: Unify gcm_init_ctx() and gmac_init_ctx()
gmac_init_ctx() duplicates most of the code in gcm_int_ctx() while
it just needs to set its own IV length and AAD tag length.

Introduce gcm_init_ctx_impl() which handles the GCM and GMAC
differences while reusing the duplicated code.

While here, fix a flaw where the AVX implementation would accept a
context using a byte swapped key schedule which it could not
handle. Also constify the IV and AAD pointers passed to
gcm_init{,_avx}().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #14529
2023-03-08 11:12:15 -08:00
Richard Yao 7d638df09b Do not hold spa_config in ZIL while blocked on IO
Otherwise, we can get a deadlock that looks like this:

1. fsync() grabs spa_config_enter(zilog->zl_spa, SCL_STATE, lwb,
RW_READER) as part of zil_lwb_write_issue() . It then blocks on the
txg_sync when a flush fails from a drive power cycling.

2. The txg_sync then blocks on the pool suspending due to the loss of
too many disks.

3. zpool clear then blocks on spa_config_enter(spa, SCL_STATE |
SCL_L2ARC | SCL_ZIO, spa, RW_WRITER)  because it is a writer.

The disks cannot be brought online due to fsync() holding that lock and
the user gets upset since fsync() is uninterruptibly blocked inside the
kernel.

We need to grab the lock for vdev_lookup_top(), but we do not need to
hold it while there is outstanding IO.

This fixes a regression introduced by
1ce23dcaff.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Closes #14519
2023-03-07 16:12:28 -08:00
Low-power 1f196e3107 Fix build for Linux/powerpc without CONFIG_ALTIVEC or CONFIG_VSX
This fixes building ZFS for Linux 4.7+ powerpc* architecture, where 
Linux was configured without CONFIG_ALTIVEC or CONFIG_VSX.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #14591
2023-03-07 14:06:52 -08:00
Rob N b988f32c70 Better handling for future crypto parameters
The intent is that this is like ENOTSUP, but specifically for when
something can't be done because we have no support for the requested
crypto parameters; eg unlocking a dataset or receiving a stream
encrypted with a suite we don't support.

Its not intended to be recoverable without upgrading ZFS itself.
If the request could be made to work by enabling a feature or modifying
some other configuration item, then some other code should be used.

load-key: In the future we might have more crypto suites (ie new values
for the `encryption` property. Right now trying to load a key on such
a future crypto suite will look up suite parameters off the end of the
crypto table, resulting in misbehaviour and/or crashes (or, with debug
enabled, trip the assertion in `zio_crypt_key_unwrap`).

Instead, lets check the value we got from the dataset, and if we can't
handle it, abort early.

recv: When receiving a raw stream encrypted with an unknown crypto
suite, `zfs recv` would report a generic `invalid backup stream`
(EINVAL). While technically correct, its not super helpful, so lets
ship a more specific error code and message.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14577
2023-03-07 14:05:14 -08:00
George Amanakis 12a240ac0b Fix a typo in ac2038a
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: George Amanakis <gamanakis@gmail.com> 
Closes #14585
Closes #14592
2023-03-07 13:50:44 -08:00
Andriy Gapon a55254be7a [FreeBSD] fix false assert in cache_vop_rmdir when replaying ZIL
The assert is enabled when DEBUG_VFS_LOCKS kernel option is set.
The exact panic is:
    panic: condition seqc_in_modify(_vp->v_seqc) not met
It happens because seqc protocol is not followed for ZIL replay.

But we actually do not need to make any namecache calls at that stage,
because the namecache use is not enabled until after the replay is
completed.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #14566
2023-03-07 13:48:43 -08:00
Attila Fülöp 1191387012 spl: Add cmn_err_once() to log a message only on the first call
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #14567
2023-03-07 13:44:11 -08:00
q66 4628bb9c60 initramfs: fix zpool get argument order
When using the zfs initramfs scripts on my system, I get various
errors at initramfs stage, such as:

cannot open '-o': name must begin with a letter

My zfs binaries are compiled with musl libc, which may be why
this happens. In any case, fix the argument order to make the
zpool binary happy, and to match its --help output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Closes #14572
2023-03-06 17:07:01 -08:00
Tino Reichardt 84a1c48c86 Fix detection of IBM Power8 machines (ISA 2.07)
An IBM POWER7 system with Power ISA 2.06 tried to execute
zfs_sha256_power8() - which should only be run on ISA 2.07
machines.

The detection is implemented via the zfs_isa207_available() call,
but this check was not used.

This pull request will fix this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Low-power <msl0000023508@gmail.com>
Closes #14576
2023-03-06 17:01:01 -08:00
Andriy Gapon 28bf26acb6 [FreeBSD] zfs_znode_alloc: lock the vnode earlier
This is needed because of a possible error path where zfs_vnode_forget()
is called.  That function calls vgone() and vput(), the former requires
the vnode to be exclusively locked and the latter expects it to be
locked.

It should be safe to lock the vnode as early as possible because it is
not yet visible, so there is no interaction with other locks.

While here, remove a tautological assignment to 'vp'.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #14565
2023-03-06 16:30:54 -08:00
George Amanakis ca9e32d3a7 Optimize the is_l2cacheable functions
by placing the most common use case (no special vdevs) first and avoid
allocating new variables.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14494 
Closes #14563
2023-03-06 16:13:05 -08:00
Richard Yao bc4d210783 Fix memory leak in ztest
This is tripping LeakSanitizer, which causes zloop test failures on pull
requests.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14583
2023-03-06 15:30:29 -08:00
Richard Yao b79e7114bb Add missing increment to dsl_deadlist_move_bpobj()
dc5c8006f6 was recently merged to prefetch
up to 128 deadlists. Unfortunately, a loop was missing an increment,
such that it will prefetch all deadlists. The performance properties of
that patch probably should be re-evaluated.

This was caught by CodeQL's cpp/constant-comparison check in an
experimental branch where I am testing the security-and-extended
queries. It complained about the `i < 128` part of the loop condition
always evaluating to the same thing. The standard CodeQL configuration
we use missed this because it does not include that check.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14573
2023-03-06 15:28:26 -08:00
Richard Yao 8846139b45 SHA2Init() should use signed assertions when checking an enum
The recent 4c5fec01a4 commit caused
Coverity to report that ASSERT3U(algotype, >=, SHA256_MECH_INFO_TYPE);
is always true. That is because the signed algotype and signed
SHA256_MECH_INFO_TYPE values were cast to unsigned types. To fix this,
we switch the assertions to use ASSERT3S(), which retains the signedness
of the original values for the comparison.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-1535300)
Closes #14573
2023-03-06 15:26:43 -08:00
Jorgen Lundman 47119d60ef Restore ASMABI and other Unify work
Make sure all SHA2 transform function has wrappers

For ASMABI to work, it is required the calling convention
is consistent.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Joergen Lundman <lundman@lundman.net>
Closes #14569
2023-03-06 15:24:05 -08:00
Tino Reichardt 620a977f22 Use SECTION_STATIC macro for sha2 x86_64 assembly
- instead of ".section .rodata" we should use SECTION_STATIC

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13741
2023-03-02 13:52:33 -08:00
Tino Reichardt f9f9bef22f Update BLAKE3 for using the new impl handling
This commit changes the BLAKE3 implementation handling and
also the calls to it from the ztest command.

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13741
2023-03-02 13:52:27 -08:00
Tino Reichardt 4c5fec01a4 Add generic implementation handling and SHA2 impl
The skeleton file module/icp/include/generic_impl.c can be used for
iterating over different implementations of algorithms.

It is used by SHA256, SHA512 and BLAKE3 currently.

The Solaris SHA2 implementation got replaced with a version which is
based on public domain code of cppcrypto v0.10.

These assembly files are taken from current openssl master:
- sha256-x86_64.S: x64, SSSE3, AVX, AVX2, SHA-NI (x86_64)
- sha512-x86_64.S: x64, AVX, AVX2 (x86_64)
- sha256-armv7.S: ARMv7, NEON, ARMv8-CE (arm)
- sha512-armv7.S: ARMv7, NEON (arm)
- sha256-armv8.S: ARMv7, NEON, ARMv8-CE (aarch64)
- sha512-armv8.S: ARMv7, ARMv8-CE (aarch64)
- sha256-ppc.S: Generic PPC64 LE/BE (ppc64)
- sha512-ppc.S: Generic PPC64 LE/BE (ppc64)
- sha256-p8.S: Power8 ISA Version 2.07 LE/BE (ppc64)
- sha512-p8.S: Power8 ISA Version 2.07 LE/BE (ppc64)

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13741
2023-03-02 13:52:21 -08:00
Tino Reichardt ac678c8eee Add SHA2 SIMD feature tests for libspl
These are added via HWCAP interface:
- zfs_neon_available() for arm and aarch64
- zfs_sha256_available() for arm and aarch64
- zfs_sha512_available() for aarch64

This one via cpuid() call:
- zfs_shani_available() for x86_64

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13741
2023-03-02 13:52:15 -08:00
Tino Reichardt cef1253135 Add SHA2 SIMD feature tests for Linux
These are added:
- zfs_neon_available() for arm and aarch64
- zfs_sha256_available() for arm and aarch64
- zfs_sha512_available() for aarch64
- zfs_shani_available() for x86_64

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Co-Authored-By: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Closes #13741
2023-03-02 13:52:04 -08:00
Tino Reichardt 589143c225 Add SHA2 SIMD feature tests for FreeBSD
These are added:
- zfs_neon_available() for arm and aarch64
- zfs_sha256_available() for arm and aarch64
- zfs_sha512_available() for aarch64
- zfs_shani_available() for x86_64

Changes:
- simd_powerpc.h: change license from CDDL to BSD

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13741
2023-03-02 13:51:56 -08:00
Tino Reichardt 6723d1110f Add ARM architecture to OpenZFS buildsystem
Tested-by: Rich Ercolani <rincebrain@gmail.com>
Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13741
2023-03-02 13:51:50 -08:00
Tino Reichardt 3e254aaad0 Remove old or redundant SHA2 files
We had three sha2.h headers in different places.
The FreeBSD version, the Linux version and the generic solaris version.

The only assembly used for acceleration was some old x86-64 openssl
implementation for sha256 within the icp module.

For FreeBSD the whole SHA2 files of FreeBSD were copied into OpenZFS,
these files got removed also.

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13741
2023-03-02 13:50:21 -08:00
Rob N 163f3d3a1f zdb: add decryption support
The approach is straightforward: for dataset ops, if a key was offered,
find the encryption root and the various encryption parameters, derive a
wrapping key if necessary, and then unlock the encryption root. After
that all the regular dataset ops will return unencrypted data, and
that's kinda the whole thing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #11551
Closes #12707
Closes #14503
2023-03-02 13:39:09 -08:00
Alexander Motin 5f42d1dbf2 System-wide speculative prefetch limit.
With some pathological access patterns it is possible to make ZFS
accumulate almost unlimited amount of speculative prefetch ZIOs.
Combined with linear ABD allocations in RAIDZ code, it appears to
be possible to exhaust system KVA, triggering kernel panic.

Address this by introducing a system-wide counter of active prefetch
requests and blocking prefetch distance doubling per stream hits if
the number of active requests is higher that ~6% of ARC size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14516
2023-03-01 15:27:40 -08:00
Brian Behlendorf cd560c4474 Accommodate debug-kernel stack frame size
The blk_queue_discard() and blk_queue_sector_erase() functions
slightly exceed the allowed 4096 maximum stack frame size when
building with the RedHat debug kernel which causes their
configure checks to fail.

Add an exception for these two tests so the interfaces are
correctly detected.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14540
2023-03-01 14:40:45 -08:00
Richard Yao 4c856fb333 Fix data race between zil_commit() and zil_suspend()
openzfsonwindows/openzfs#206 found that it is possible to trip
`VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed
by the scheduler long enough for a parallel `zil_suspend()` operation to
exit `zil_commit_impl()`. This is a data race. To prevent this, we
introduce a `zilog->zl_suspend_lock` rwlock to ensure that all
outstanding `zil_commit()` operations finish before `zil_suspend()`
begins and that subsequent operations fallback to `txg_wait_synced()`
after `zil_suspend()` has begun.

On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers
from writer starvation. This means that a ZIL intensive system can delay
`zil_suspend()` indefinitely. This is a pre-existing problem that
affects everything that uses rw locks, so it needs to be addressed in
the SPL.  However, builds against `PREEMPT_RT` Linux kernels are
currently broken due to a GPL symbol issue (#11097), so we can safely
disregard that issue for now.

Reported-by: Arun KV <arun.kv@datacore.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14514
2023-03-01 13:23:09 -08:00
Richard Yao 9fd5fedc67 Remove bad kmem_free() oversight from previous zfsdev_state_list patch
I forgot to remove the corresponding kmem_free() from zfs_kmod_fini() in
9a14ce43c3. Clang's static analyzer did
not complain, but the Coverity scan that was run after the patch was
merged did.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-1535275)
Closes #14556
2023-03-01 13:20:53 -08:00
Richard Yao dd108f5d73 Linux: zfs_fillpage() should handle partial pages from end of file
After 89cd2197b9 was merged, Clang's
static analyzer began complaining about a dead assignment in
`zfs_fillpage()`. Upon inspection, I noticed that the dead assignment
was because we are not using the calculated io_len that we should use to
avoid asking the DMU to read past the end of a file. This should result
in `dmu_buf_hold_array_by_dnode()` calling `zfs_panic_recover()`.

This issue predates 89cd2197b9, but its
simplification of zfs_fillpage() eliminated the only use of the
assignment to io_len, which made Clang's static analyzer complain about
the issue.

Also, as a precaution, we add an assertion that io_offset < i_size. If
this ever fails, bad things will happen. Otherwise, we are blindly
trusting the kernel not to give us invalid offsets. We continue to
blindly trust it on non-debug kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14534
2023-03-01 13:19:47 -08:00
Tino Reichardt 0f9ed58f43 Update reclaim disk space script
The script uses systemd-run, which does the job in background.
We should take the the time and wait for the job to finish.

Maybe some functional tests suffer from not really freed disk
space and fail because of this.

We also add some trimming in the end of the script.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14554
2023-03-01 12:25:09 -08:00
Richard Yao 3a7c35119e Handle unexpected errors in zil_lwb_commit() without ASSERT()
We tripped `ASSERT(error == ENOENT || error == EEXIST || error ==
EALREADY)` in `zil_lwb_commit()` at Klara when doing robustness testing
of ZIL against drive power cycles.

That assertion presumably exists because when this code was written, the
only errors expected from here were EIO, ENOENT, EEXIST and EALREADY,
with EIO having its own handling before the assertion. However, upon
doing a manual depth first search traversal of the source tree, it turns
out that a large number of unexpected errors are possible here. In
theory, EINVAL and ENOSPC can come from dnode_hold_impl(). However, most
unexpected errors originate in the block layer and come to us from
zio_wait() in various ways. One way is ->zl_get_data() -> dmu_buf_hold()
-> dbuf_read() -> zio_wait().

From vdev_disk.c on Linux alone, zio_wait() can return the unexpected
errors ENXIO, ENOTSUP, EOPNOTSUPP, ETIMEDOUT, ENOSPC, ENOLINK,
EREMOTEIO, EBADE, ENODATA, EILSEQ and ENOMEM

This was only observed after what have been likely over 1000 test
iterations, so we do not expect to reproduce this again to find out what
the error code was. However, circumstantial evidence suggests that the
error was ENXIO.

When ENXIO or any other unexpected error occurs, the `fsync()` or
equivalent operation that called zil_commit() will return success, when
in fact, dirty data has not been committed to stable storage. This is a
violation of the Single UNIX Specification.

The code should be able to handle this and any other unknown error by
calling `txg_wait_synced()`. In addition to changing the code to call
txg_wait_synced() on unexpected errors instead of returning, we modify
it to print information about unexpected errors to dmesg.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Closes #14532
2023-03-01 09:39:41 -08:00
Rob N 25c4d1f032 mancheck: exclude dotfiles in man dir
Its not uncommon for an editor to drop a hidden swap file in the dir
while editing a file there. mancheck would find it and run mandoc on it,
which would complain about its distinctly not-manpage format.

A more correct solution might be to reconfigure the editor to not put
swap files in the same dir, but its the default a lot of the time, and
this is a very small change that gives a very nice quality-of-life
improvement.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14549
2023-03-01 09:38:31 -08:00
Rob N 4fe9cc5437 man: note that zdb operates directly on pool storage
A frequent misunderstanding is that zdb accesses the pool through the
kernel or filesystem, leading to confusion particularly when it can't
access something that it seems like it should be able to.

I've seen this confusion recently when zdb couldn't access a pool because
the user didn't have permission to read directly from the block devices,
and when it couldn't show attributes of encrypted files even though the
dataset was unlocked.

The manpage already speaks to another symptom of this, namely that zdb
may "behave erratically" on an active pool; here I'm trying to make that
a little more explicit.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14539
2023-02-28 17:35:33 -08:00
Allan Jude cbd76b4032 Makefile.bsd: cleanup and sync with FreeBSD
xxhash.c was not being compiled, so when FreeBSD's kernel
switched to a newer version of ZSTD a few weeks ago, out-of-tree ZFS
failed to build

Sync module/Makefile.bsd with FreeBSD's sys/modules/zfs/Makefile

And restore the alphabetical sort in a number of places

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Klara, Inc.
Closes #14508
2023-02-28 17:33:59 -08:00
Richard Yao ae7e700650 Suppress static analyzer warning in dmu_objset_create_impl_dnstats()
Clang's static analyzer claims that dereferencing ds in
dmu_objset_create_impl_dnstats() could cause a NULL pointer dereference
when a previous NULL check confirms that it is NULL. It is only NULL on
the MOS, for which dmu_objset_userused_enabled(os) should always return
false, so ds will never be dereferenced when it is NULL. We add an
assertion to suppress this warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14470
2023-02-28 17:31:58 -08:00
Richard Yao 5cc4950901 Suppress static analyzer warning in dbuf_hold_copy()
Clang's static analyzer claims that dbuf_hold_copy() will have a NULL
pointer dereference in data->b_data when called by dbuf_hold_impl().
This is impossible because data is dr->dt.dl.dr_data, which is non-NULL
whenever db->db_level == 0, which is always the case whenever
dbuf_hold_impl() calls dbuf_hold_copy(). We add an assertion to suppress
the complaint.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14470
2023-02-28 17:31:50 -08:00
Richard Yao 9a14ce43c3 Statically allocate first node of zfsdev_state_list
This avoids a call to kmem_alloc() during module load. It also
suppresses a defect report from Clang's static analyzer that claims that
we will have a NULL pointer dereference in zfsdev_state_init() because
it does not understand that this has already been allocated in
zfs_kmod_init().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14470
2023-02-28 17:31:41 -08:00
Richard Yao 7fc48f8378 Suppress static analyzer warning in sa_attr_iter()
Clang's static analyzer points out that when IS_SA_BONUSTYPE(type) is
true and .sa_length is 0 for an attribute, we have a NULL pointer
dereference. We suppress this with an IMPLY() statement.

This was also identified by Coverity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-1017954)
Closes #14470
2023-02-28 17:31:30 -08:00
Richard Yao 4d9bb5514c Suppress static analyzer warnings in zio_checksum_error_impl()
Clang's static analyzer informs us of multiple NULL pointer dereferences
involving zio_checksum_error_impl().

The first is a NULL pointer dereference if bp is NULL and ci->ci_flags &
ZCHECKSUM_FLAG_EMBEDDED is false, but bp is NULL implies that
ci->ci_flags & ZCHECKSUM_FLAG_EMBEDDED is true, so we add an IMPLY()
statement to suppress the report.

The second and third are identical, and are duplicated because while the
NULL pointer dereference occurs in zio_checksum_gang_verifier(), it is
called by zio_checksum_error_impl() and there is a report for each of
the two functions. The reports state that when bp is NULL, ci->ci_flags
& ZCHECKSUM_FLAG_EMBEDDED is true and checksum is not
ZIO_CHECKSUM_LABEL, we also have a NULL pointer dereference. bp is NULL
should imply that checksum == ZIO_CHECKSUM_LABEL, so we add an IMPLY()
statement to suppress the second report. The two reports are
functionally identical.

A fourth variation of this was also reported by Coverity. It occurs when
checksum == ZIO_CHECKSUM_ZILOG2.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-1524672)
Closes #14470
2023-02-28 17:31:08 -08:00
Richard Yao d634d20d1b icp: Prevent compilers from optimizing away memset() in gcm_clear_ctx()
The recently merged f58e513f74 was
intended to zero sensitive data before exit from encryption
functions to harden the code against theoretical information
leaks. Unfortunately, the method by which it did that is
optimized away by the compiler, so some information still leaks. This
was confirmed by counting function calls in disassembly.

After studying how the OpenBSD, FreeBSD and Linux kernels handle this,
and looking at our disassembly, I decided on a two-factor approach to
protect us from compiler dead store elimination passes.

The first factor is to stop trying to inline gcm_clear_ctx(). GCC does
not actually inline it in the first place, and testing suggests that
dead store elimination passes appear to become more powerful in a bad
way when inlining is forced, so we recognize that and move
gcm_clear_ctx() to a C file.

The second factor is to implement an explicit_memset() function based on
the technique used by `secure_zero_memory()` in FreeBSD's blake2
implementation, which coincidentally is functionally identical to the
one used by Linux. The source for this appears to be a LLVM bug:

https://llvm.org/bugs/show_bug.cgi?id=15495

Unlike both FreeBSD and Linux, we explicitly avoid the inline keyword,
based on my observations that GCC's dead store elimination pass becomes
more powerful when inlining is forced, under the assumption that it will
be equally powerful when the compiler does decide to inline function
calls.

Disassembly of GCC's output confirms that all 6 memset() calls are
executed with this patch applied.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14544
2023-02-28 17:28:50 -08:00
Richard Yao 2f76797ad9 Linux: Assert mutex is held in mutex_exit()
A spurious mutex_exit() in a development branch caused weird issues
until I identified it. An assertion prior to mutex_exit() would have
caught it. Rather than adding assertions before invocations of
mutex_exit() in the code, let us simply add an assertion to
mutex_exit(). It is cheap and will likely improve developer
productivity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Closes #14541
2023-02-28 17:27:20 -08:00
George Amanakis 13ff72ba0a Revert zfeature_active() to static
Commit 34ce4c4 made zfeature_active() non-static. This is not required.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14546
2023-02-28 14:03:52 -08:00
Tino Reichardt 727339b118 Fix github build failures because of vdevprops.7
This small fix adds the manpage vdevprops.7 to the file
contrib/debian/openzfs-zfsutils.install and the github
actions will work again.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14553
2023-02-28 14:02:48 -08:00
John Poduska 73c383f541 Prevent incorrect datasets being mounted
During a mount, zpl_mount_impl(), uses sget() with the callback
zpl_test_super() to find a super_block with a matching objset,
stored in z_os. It does so without taking the teardown lock on
the zfsvfs.

The problem is that operations like rollback will replace the
z_os. And, there is a window where the objset in the rollback
is freed, but z_os still points to it. Then, a mount like
operation, for instance a clone, can reallocate that exact same
pointer and zpl_test_super() will then match the super_block
associated with the rollback as opposed to the clone.

This fix tests for a match and if so, takes the teardown lock
before doing the final match test.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes #14518
2023-02-27 16:49:34 -08:00
D. Ebdrup 4b9bc6345e Add vdevprops.7 to the Makefile
Adding vdevprops.7 to the Makefile ensures that it gets installed
properly on FreeBSD.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Daniel Ebdrup Jensen <debdrup@FreeBSD.org>
Closes #14527
2023-02-27 16:38:09 -08:00
Richard Yao bff26b0220 Skip memory allocation when compressing holes
Hole detection in the zio compression code allows us to
opportunistically skip compression on holes. We can go a step further
by not doing memory allocations on holes either.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Sponsored-by: Wasabi Technology, Inc.
Closes #14500
2023-02-27 14:41:02 -08:00
Attila Fülöp f58e513f74 ICP: AES-GCM: Refactor gcm_clear_ctx()
Currently the temporary buffer in which decryption takes place
isn't cleared on context destruction. Further in some routines we
fail to call gcm_clear_ctx() on error exit. Both flaws may result
in leaking sensitive data.

We follow best practices and zero out the plaintext buffer before
freeing the memory holding it. Also move all cleanup into
gcm_clear_ctx() and call it on any context destruction.

The performance impact should be negligible.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #14528
2023-02-27 14:38:12 -08:00
Mariusz Zaborski 3b9309aabe Move zap_attribute_t to the heap in dsl_deadlist_merge
In the case of a regular compilation, the compiler
raises a warning for a dsl_deadlist_merge function, that
the stack size is to large. In debug build this can
generate an error.

Move large structures to heap.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #14524
2023-02-27 14:27:58 -08:00
Brian Behlendorf 000985fc15 Workaround GitHub Action failure
Ubuntu 20.04 and 22.04 workflows are failing due to an error
which is hit when running `apt-get update`.  Until the
problematic package is fixed apply the suggested workaround
described here:

  https://github.com/orgs/community/discussions/47863

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14530
2023-02-27 09:19:25 -08:00
Dimitry Andric bf1bec394e Use .section .rodata instead of .rodata on FreeBSD
In commit 0a5b942d4 the FreeBSD SECTION_STATIC macro was set to
".rodata". This assembler directive is supported by LLVM (as a
convenience alias for ".section .rodata") by not by GNU as.

This caused the FreeBSD builds that are done with gcc to fail.
Therefore, use ".section .rodata" instead, similar to the other
asm_linkage.h headers.

Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #14526
2023-02-24 16:45:48 -08:00
George Amanakis d816bc5ec7 Move dmu_buf_rele() after dsl_dataset_sync_done()
Otherwise the dataset may be freed after the last dmu_buf_rele() leading
to a panic.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14522
Closes #14523
2023-02-23 18:14:52 -07:00
Brian Behlendorf 6109d83df8 ZTS: Minor fixes
- The migration_012_pos.ksh test case was failing because of a
  missing space after `log_must`.

- None of the tests listed in the runfiles should include the .ksh
  suffix.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14515
2023-02-23 17:10:46 -08:00
Brian Behlendorf 89cd2197b9 Fix buffered/direct/mmap I/O race
When a page is faulted in for memory mapped I/O the page lock
may be dropped before it has been read and marked up to date.
If a buffered read encounters such a page in mappedread() it
must wait until the page has been updated. Failure to do so
will result in a panic on debug builds and incorrect data on
production builds.

The critical part of this change is in mappedread() where pages
which are not up to date are now handled. Additionally, it
includes the following simplifications.

- zfs_getpage() and zfs_fillpage() could be passed an array of
  pages. This could be more efficient if it was used but in
  practice only a single page was ever provided. These
  interfaces were simplified to acknowledge that.

- update_pages() was modified to correctly set the PG_error bit
  on a page when it cannot be read by dmu_read().

- Setting PG_error and PG_uptodate was moved to zfs_fillpage()
  from zpl_readpage_common(). This is consistent with the
  handling in update_pages() and mappedread().

- Minor additional refactoring to comments and variable
  declarations to improve readability.

- Add a test case to exercise concurrent buffered, direct,
  and mmap IO to the same file.

- Reduce the mmap_sync test case default run time.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13608 
Closes #14498
2023-02-23 10:57:24 -08:00
Richard Yao 7cb67d627c Fix NULL pointer dereference in zio_ready()
Clang's static analyzer correctly identified a NULL pointer dereference
in zio_ready() when ZIO_FLAG_NODATA has been set on a zio that is
missing a block pointer. The NULL pointer dereference occurs because we
have logic intended to disable ZIO_FLAG_NODATA when it has been set on a
gang block.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14469
2023-02-23 10:19:08 -08:00
Richard Yao c9e39da9a4 Use rw_tryupgrade() in dmu_bonus_hold_by_dnode()
When dn->dn_bonus == NULL, dmu_bonus_hold_by_dnode() will unlock its
read lock on dn->dn_struct_rwlock and grab a write lock. This can be
micro-optimized by calling rw_tryupgrade().

Linux will not benefit from this since it does not support rwlock
upgrades, but FreeBSD will.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14517
2023-02-22 16:33:23 -08:00
Paul Dagnelie d9e64a4030 Improve error message of zfs redact
We improve the error message of zfs redact by checking if the target 
snapshot exists, and if all the redaction snapshots exist. As a
future improvement we could iterate over every snapshot provided and 
use that to determine which one specifically doesn't exist.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11426 
Closes #14496
2023-02-21 17:30:05 -08:00
rob-wing 28251d81d7 FreeBSD: don't verify recycled vnode for zfs control directory
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes #14501
2023-02-21 17:26:33 -08:00
Allan Jude 1d56c6d017 Fix per-jail zfs.mount_snapshot setting
When jail.conf set the nopersist flag during startup, it was
incorrectly destroying the per-jail ZFS settings.

Reported-by: Martin Matuska <mm@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Modirum MDPay
Sponsored-by: Klara, Inc.
Closes #14509
2023-02-21 17:23:01 -08:00
George Amanakis 0f32b1f728 Partially revert eee9362a7
With commit 34ce4c42f applied, there is no need for eee9362a7.
Revert that aside from the test. All tests introduced in those commits
pass.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14502
2023-02-21 09:36:22 -08:00
Richard Yao 9f08b6e31f Sync thread should avoid holding the spa config write lock when possible
spa_sync() currently grabs the write lock due to an old hack that is
documented by a comment:

    We need the write lock here because, for aux vdevs,
    calling vdev_config_dirty() modifies sav_config.
    This is ugly and will become unnecessary when we
    eliminate the aux vdev wart by integrating all vdevs
    into the root vdev tree.
 
This has lead to deadlocks in rare edge cases from holding the write 
lock. We can reduce incidence of these deadlocks by not grabbing the 
write lock on pools without auxillary vdevs.

Sponsored-By: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Closes #14282
2023-02-16 14:10:52 -08:00
Paul Dagnelie dc72c60ec1 zfs redact fails when dnodesize=auto
Add handling to dmu_object_next for the case where *objectp == 0.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #14479
2023-02-16 09:23:39 -08:00
Brian Behlendorf 57cfae4a2f zdb: zero-pad checksum output follow up
Apply zero padding for checksums consistently.  The SNPRINTF_BLKPTR
macro was not updated in commit ac7648179c which results in the
`cli_root/zdb/zdb_checksum.ksh` test case reliably failing.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14497
2023-02-15 09:06:29 -08:00
Richard Yao f04cb31e7c Suppress Clang static analyzer complaint in zfs_replay_create()
Clang's static analyzer incorrectly complains about an undefined value
here when lr->lr_common.lrc_txtype == TX_SYMLINK and txtype ==
TX_CREATE. This is impossible, because of this line:

txtype = (lr->lr_common.lrc_txtype & ~TX_CI((uint64_t)0x1 << 63));

Changing the code to compare against txtype suppresses the report.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14472
2023-02-14 11:05:41 -08:00
Brian Behlendorf 3fc92adc40 Linux: use filemap_range_has_page()
As of the 4.13 kernel filemap_range_has_page() can be used to
check if there is a page mapped in a given file range.  When
available this interface should be used which eliminates the
need for the zp->z_is_mapped boolean.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14493
2023-02-14 11:04:34 -08:00
Richard Yao ab672133a9 Give strlcat() full buffer lengths rather than smaller buffer lengths
strlcat() is supposed to be given the length of the destination buffer,
including the existing contents. Unfortunately, I had been overzealous
when I wrote a51288aabb, since I gave it
the length of the destination buffer, minus the existing contents. This
likely caused a regression on large strings.

On the topic of being overzealous, the use of strlcat() in
dmu_send_estimate_fast() was unnecessary because recv_clone_name is a
fixed length string. We continue using strlcat() mostly as defensive
programming, in case the string length is ever changed, even though it
is unnecessary.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14476
2023-02-14 11:03:42 -08:00
Rich Ercolani cfd57573ff quick fix for lingering snapdir unmount problems
Unfortunately, even after e79b6807, I still, much more rarely,
tripped asserts when playing with many ctldir mounts at once.

Since this appears to happen if we dispatched twice too fast, just
ignore it. We don't actually need to do anything if someone already
started doing it for us.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14462
2023-02-13 16:40:13 -08:00
George Amanakis 34ce4c42ff Fix a race condition in dsl_dataset_sync() when activating features
The zio returned from arc_write() in dmu_objset_sync() uses
zio_nowait(). However we may reach the end of dsl_dataset_sync()
which checks if we need to activate features in the filesystem
without knowing if that zio has even run through the ZIO pipeline yet.
In that case we will flag features to be activated in
dsl_dataset_block_born() but dsl_dataset_sync() has already
completed its run and those features will not actually be activated.
Mitigate this by moving the feature activation code in
dsl_dataset_sync_done(). Also add new ASSERTs in
dsl_scan_visitbp() checking if a block contradicts any filesystem
flags.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13816
2023-02-13 16:37:46 -08:00
Prakash Surya 13312e2fa1 Reduce need for contiguous memory for ioctls
We've had cases where we trigger an OOM despite having memory freely
available on the system. For example, here, we had about 21GB free:

    kernel: Node 0 Normal: 2418758*4kB (UME) 1549533*8kB (UE) 0*16kB
    0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
    22071296kB

The problem being, all the memory is in 4K and 8K contiguous regions,
but the allocation request was for a 16K contiguous region:

    kernel: SafeExecutors-4 invoked oom-killer:
    gfp_mask=0x42dc0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_ZERO),
    order=2, oom_score_adj=0

The offending allocation came from this call trace:

    kernel: Call Trace:
    kernel:  dump_stack+0x57/0x7a
    kernel:  dump_header+0x4f/0x1e1
    kernel:  oom_kill_process.cold.33+0xb/0x10
    kernel:  out_of_memory+0x1ad/0x490
    kernel:  __alloc_pages_slowpath+0xd55/0xe40
    kernel:  __alloc_pages_nodemask+0x2df/0x330
    kernel:  kmalloc_large_node+0x42/0x90
    kernel:  __kmalloc_node+0x25a/0x320
    kernel:  ? spl_kmem_free_impl+0x21/0x30 [spl]
    kernel:  spl_kmem_alloc_impl+0xa5/0x100 [spl]
    kernel:  spl_kmem_zalloc+0x19/0x20 [spl]
    kernel:  zfsdev_ioctl+0x2b/0xe0 [zfs]
    kernel:  do_vfs_ioctl+0xa9/0x640
    kernel:  ? __audit_syscall_entry+0xdd/0x130
    kernel:  ksys_ioctl+0x67/0x90
    kernel:  __x64_sys_ioctl+0x1a/0x20
    kernel:  do_syscall_64+0x5e/0x200
    kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
    kernel: RIP: 0033:0x7fdca3674317

The problem is, for each ioctl that ZFS makes, it has to allocate a
zfs_cmd_t structure, which is 13744 bytes in size (on my system):

    sdb> sizeof zfs_cmd
    (size_t)13744

This size, coupled with the fact that we currently allocate it with
kmem_zalloc, means we need a 16K contiguous region of memory to satisfy
the request.

The solution taken by this change, is to use "vmem" instead of "kmem" to
do the allocation, such that we don't necessarily need a contiguous 16K
memory region to satisfy the allocation.

Arguably, a better solution would be not to require such a large
allocation to begin with (e.g. reduce the size of the zfs_cmd_t
structure), but that'd be a much larger change than this "one liner".
Thus, I've opted for this approach for now; we can always circle back
and attempt to reduce the size of the structure in the future.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #14474
2023-02-13 16:35:59 -08:00
Alexander Motin 87a4dfa561 Improve arc_read() error reporting
Debugging reported NULL de-reference panic in dnode_hold_impl() I found
that for certain types of errors arc_read() may only return error code,
but not properly report it via done and pio arguments.  Lack of done
calls may result in reference and/or memory leaks in higher level code.
Lack of error reporting via pio may result in unnoticed errors there.
For example, dbuf_read(), where dbuf_read_impl() ignores arc_read()
return, relies completely on the pio mechanism and missed the errors.

This patch makes arc_read() to always call done callback and always
propagate errors to parent zio, if either is provided.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14454
2023-02-13 13:21:53 -08:00
Andreas Vögele 7883ea2234 rpm: Use libtirpc-devel and /usr/lib on SUSE
SUSE Linux distributions require libtirpc-devel.  The dracut and udev
directories are /usr/lib/dracut and /usr/lib/udev.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Andreas Vögele <andreas@andreasvoegele.com>
Closes #14467
Closes #14468
2023-02-09 11:57:50 -08:00
Rob N ★ ac7648179c zdb: zero-pad checksum output
The leading zeroes are part of the checksum so we should show them.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14464
2023-02-07 13:48:22 -08:00
Ryan Moeller eb823cbc76 initramfs: Make mountpoint=none work
In initramfs, mount.zfs fails to mount a dataset with mountpoint=none,
but mount.zfs -o zfsutil works.  Use -o zfsutil when mountpoint=none.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #14455
2023-02-06 11:16:01 -08:00
Richard Yao 66953686c0 Add assertion and make variables unsigned in abd_alloc_chunks()
Clang's static analyzer pointed out that if alloc_pages >= nr_pages
before the loop, the value of page will be undefined and will be used
anyway. This should not be possible, but as cleanup, we add an
assertion. We also recognize that the local variables should be unsigned
in the first place, so we make them unsigned. This is not enough to
avoid the need for the assertion, since there is still the case that
alloc_pages == nr_pages and nr_pages == 0, which the assertion
implicitly checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14456
2023-02-06 11:10:50 -08:00
Richard Yao cfb49616cd Cleanup: spa vdev processing should check NULL pointers
The PVS Studio 2016 FreeBSD kernel report stated:

\contrib\opensolaris\uts\common\fs\zfs\spa.c (1341): error V595: The 'spa->spa_spares.sav_vdevs' pointer was utilized before it was verified against nullptr. Check lines: 1341, 1342.
\sys\cddl\contrib\opensolaris\uts\common\fs\zfs\spa.c (1355): error V595: The 'spa->spa_l2cache.sav_vdevs' pointer was utilized before it was verified against nullptr. Check lines: 1355, 1357.
\sys\cddl\contrib\opensolaris\uts\common\fs\zfs\spa.c (1398): error V595: The 'spa->spa_spares.sav_vdevs' pointer was utilized before it was verified against nullptr. Check lines: 1398, 1408.
\sys\cddl\contrib\opensolaris\uts\common\fs\zfs\spa.c (1583): error V595: The 'oldvdevs' pointer was utilized before it was verified against nullptr. Check lines: 1583, 1595.

In practice, all of these uses were safe because a NULL pointer
implied a 0 vdev count, which kept us from iterating over vdevs.
However, rearranging the code to check the pointer first is not a
terrible micro-optimization and makes it more readable, so let us
do that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14456
2023-02-06 11:09:28 -08:00
Richard Yao 3a7d2a0ce0 zfs_get_temporary_prop() should not pass NULL to strcpy()
`dsl_dir_activity_in_progress()` can call `zfs_get_temporary_prop()` with
the forth value set to NULL, which will pass NULL to `strcpy()` when
there is a match

Clang's static analyzer caught this with the help of CodeChecker for
Cross Translation Unit analysis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14456
2023-02-06 11:08:57 -08:00
Matthew Ahrens 14872aaa4f EIO caused by encryption + recursive gang
Encrypted blocks can not have 3 DVAs, because they use the space of the
3rd DVA for the IV+salt.  zio_write_gang_block() takes this into
account, setting `gbh_copies` to no more than 2 in this case.  Gang
members BP's do not have the X (encrypted) bit set (nor do they have the
DMU level and type fields set), because encryption is not handled at
this level.  The gang block is reassembled, and then encryption (and
compression) are handled.

To check if this gang block is encrypted, the code in
zio_write_gang_block() checks `pio->io_bp`.  This is normally fine,
because the block that's being ganged is typically the encrypted BP.

The problem is that if there is "recursive ganging", where a gang member
is itself a gang block, then when zio_write_gang_block() is called to
create a gang block for a gang member, `pio->io_bp` is the gang member's
BP, which doesn't have the X bit set, so the number of DVA's is not
restricted to 2.  It should instead be looking at the the "gang leader",
i.e. the top-level gang block, to determine how many DVA's can be used,
to avoid a "NDVA's inversion" (where a child has more DVA's than its
parent).

gang leader BP: X (encrypted) bit set, 2 DVA's, IV+salt in 3rd DVA's
space:
```
DVA[0]=<1:...:100400> DVA[1]=<0:...:100400> salt=... iv=...
[L0 ZFS plain file] fletcher4 uncompressed encrypted LE
gang unique double size=100000L/100000P birth=... fill=1 cksum=...
```

leader's GBH contains a BP with gang bit set and 3 DVA's:
```
DVA[0]=<1:...:55600> DVA[1]=<0:...:55600>
[L0 unallocated] fletcher4 uncompressed unencrypted LE
contiguous unique double size=55600L/55600P birth=... fill=0 cksum=...

DVA[0]=<1:...:55600> DVA[1]=<0:...:55600>
[L0 unallocated] fletcher4 uncompressed unencrypted LE
contiguous unique double size=55600L/55600P birth=... fill=0 cksum=...

DVA[0]=<1:...:55600> DVA[1]=<0:...:55600> DVA[2]=<1:...:200>
[L0 unallocated] fletcher4 uncompressed unencrypted LE
gang unique double size=55400L/55400P birth=... fill=0 cksum=...
```

On nondebug bits, having the 3rd DVA in the gang block works for the
most part, because it's true that all 3 DVA's are available in the gang
member BP (in the GBH).  However, for accounting purposes, gang block
DVA's ASIZE include all the space allocated below them, i.e. the
512-byte gang block header (GBH) as well as the gang members below that.
We see that above where the gang leader BP is 1MB logical (and after
compression: 0x`100000P`), but the ASIZE of each DVA is 2 sectors (1KB)
more than 1MB (0x`100400`).

Since thre are 3 copies of a block below it, we increment the ATIME of
the 3rd DVA of the gang leader by the space used by the 3rd DVA of the
child (1 sector, in this case).  But there isn't really a 3rd DVA of the
parent; the salt is stored in place of the 3rd DVA's ASIZE.

So when zio_write_gang_member_ready() increments the parent's BP's
`DVA[2]`'s ASIZE, it's actually incrementing the parent's salt.  When we
later try to read the encrypted recursively-ganged block, the salt
doesn't match what we used to write it, so MAC verification fails and we
get an EIO.

```
zio_encrypt():  encrypted 515/2/0/403 salt: 25 25 bb 9d ad d6 cd 89
zio_decrypt(): decrypting 515/2/0/403 salt: 26 25 bb 9d ad d6 cd 89
```

This commit addresses the problem by not increasing the number of copies
of the GBH beyond 2 (even for non-encrypted blocks).  This simplifies
the logic while maintaining the ability to traverse all metadata
(including gang blocks) even if one copy is lost.  (Note that 3 copies
of the GBH will still be created if requested, e.g. for `copies=3` or
MOS blocks.)  Additionally, the code that increments the parent's DVA's
ASIZE is made to check the parent DVA's NDVAS even on nondebug bits.  So
if there's a similar bug in the future, it will cause a panic when
trying to write, rather than corrupting the parent BP and causing an
error when reading.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Caused-by: #14356
Closes #14440
Closes #14413
2023-02-06 09:37:06 -08:00
Jorgen Lundman 0a5b942d4a Restore FreeBSD to use .rodata
In https://github.com/openzfs/zfs/pull/14228 the FreeBSD
SECTION_STATIC was set to ".data" instead of ".rodata". This
commit just restores it back to .rodata.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #14460
2023-02-06 09:34:59 -08:00
Jorgen Lundman dffd40b3b6 Unify assembly files with macOS
The remaining changes needed to make the assembly files work
with macOS.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #14451
2023-02-06 09:27:55 -08:00
Reno Reckling 6017fd9377 Fix variable shadowing in libzfs_mount
We accidentally reused variable name "i" for inner and outer loops.

Reviewed-by: Rich Ercolani <Rincebrain@gmail.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Reno Reckling <e-github@wthack.de>
Closes #14452 
Closes #14445
2023-02-02 15:22:12 -08:00
Paul Dagnelie 90f4e01f8a Prevent error messages when running tests with no timeout
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #14450
2023-02-02 15:19:26 -08:00
George Amanakis ac2038a19c Teach zdb about DMU_OT_ERROR_LOG objects
With the persistent error log feature we need to account for
spa_errlog_{scrub, last} containing mappings to other error log objects,
which need to be marked as in-use as well.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14442 
Closes #14434
2023-02-02 15:17:37 -08:00
rob-wing 326f1e3d88 zfs_main.c: fix unused variable error with GCC
zfs_setproctitle_init() is stubbed out on FreeBSD.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Rob Wing <rob.fx907@gmail.com>
Closes #14441
2023-02-02 15:16:40 -08:00
Allan Jude c799866b97 Resolve WS-2021-0184 vulnerability in zstd
Pull in d40f55cd950919d7eac951b122668e55e33e5202 from upstream

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #14439
2023-02-02 15:12:51 -08:00
George Wilson f18e083bf8 rootdelay on zfs should be adaptive
The 'rootdelay' boot option currently pauses the boot for a specified
amount of time. The original intent was to ensure that slower
configurations would have ample time to enumerate the devices to make
importing the root pool successful. This, however, causes unnecessary
boot delay for environments like Azure which set this parameter by
default.

This commit changes the initramfs logic to pause until it can
successfully load the 'zfs' module. The timeout specified by
'rootdelay' now becomes the maximum amount of time that initramfs will
wait before failing the boot.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #14430
2023-02-02 15:11:35 -08:00
Ameer Hamza 05b72415d1 Fix console progress reporting for recursive send
After commit 19d3961, progress reporting (-v) with replication flag
enabled does not report the progress on the console. This commit
fixes the issue by updating the logic to check for pa->progress
instead of pa_verbosity in send_progress_thread().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14448
2023-02-02 15:09:57 -08:00
Brian Behlendorf 973934b965 Increase default zfs_rebuild_vdev_limit to 64MB
When testing distributed rebuild performance with more capable
hardware it was observed than increasing the zfs_rebuild_vdev_limit
to 64M reduced the rebuild time by 17%.  Beyond 64MB there was
some improvement (~2%) but it was not significant when weighed
against the increased memory usage. Memory usage is capped at 1/4
of arc_c_max.

Additionally, vr_bytes_inflight_max has been moved so it's updated
per-metaslab to allow the size to be adjust while a rebuild is
running.

Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14428
2023-01-27 10:02:24 -08:00
Brian Behlendorf c0aea7cf4e Increase default zfs_scan_vdev_limit to 16MB
For HDD based pools the default zfs_scan_vdev_limit of 4M
per-vdev can significantly limit the maximum scrub performance.
Increasing the default to 16M can double the scrub speed from
80 MB/s per disk to 160 MB/s per disk.

This does increase the memory footprint during scrub/resilver
but given the performance win this is a reasonable trade off.
Memory usage is capped at 1/4 of arc_c_max.  Note that number
of outstanding I/Os has not changed and is still limited by
zfs_vdev_scrub_max_active.

Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14428
2023-01-27 10:01:13 -08:00
Alexander Motin dc5c8006f6 Prefetch on deadlists merge
During snapshot deletion ZFS may issue several reads for each deadlist
to merge them into next snapshot's or pool's bpobj.  Number of the dead
lists increases with number of snapshots.  On HDD pools it may take
significant time during which sync thread is blocked.

This patch introduces prescient prefetch of required blocks for up to
128 deadlists ahead.  Tests show reduction of time required to delete
dataset with 720 snapshots with randomly overwritten file on wide HDD
pool from 75-85 to 22-28 seconds.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Issue #14276 
Closes #14402
2023-01-25 11:30:24 -08:00
Brian Behlendorf c85ac731a0 Improve resilver ETAs
When resilvering the estimated time remaining is calculated using
the average issue rate over the current pass.  Where the current
pass starts when a scan was started, or restarted, if the pool
was exported/imported.

For dRAID pools in particular this can result in wildly optimistic
estimates since the issue rate will be very high while scanning
when non-degraded regions of the pool are scanned.  Once repair
I/O starts being issued performance drops to a realistic number
but the estimated performance is still significantly skewed.

To address this we redefine a pass such that it starts after a
scanning phase completes so the issue rate is more reflective of
recent performance.  Additionally, the zfs_scan_report_txgs
module option can be set to reset the pass statistics more often.

Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14410
2023-01-25 11:28:54 -08:00
Coleman Kane 9cd71c8604 linux 6.2 compat: zpl_set_acl arg2 is now struct dentry
Linux 6.2 changes the second argument of the set_acl operation to be a
"struct dentry *" rather than a "struct inode *". The inode* parameter
is still available as dentry->d_inode, so adjust the call to the _impl
function call to dereference and pass that pointer to it.

Also document that the get_acl -> get_inode_acl member name change from
commit 884a693 was an API change also introduced in Linux 6.2.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #14415
2023-01-24 11:20:50 -08:00
Alexander Motin 0f740a4f1d Introduce minimal ZIL block commit delay
Despite all optimizations, tests on actual hardware show that FreeBSD
kernel can't sleep for less then ~2us.  Similar tests on Linux show
~50us delay at least from nanosleep() (haven't tested inside kernel).
It means that on very fast log device ZIL may not be able to satisfy
zfs_commit_timeout_pct block commit timeout, increasing log latency
more than desired.

Handle that by introduction of zil_min_commit_timeout parameter,
specifying minimal timeout value where additional delays to aggregate
writes may be skipped.  Also skip delays if the LWB is more than 7/8
full, that often happens if I/O sizes are constant and match one of
LWB sizes.  Both things are applied only if there were no already
outstanding log blocks, that may indicate single-threaded workload,
that by definition can not benefit from the commit delays.

While there, add short time moving average to zl_last_lwb_latency to
make it more stable.

Tests of single-threaded 4KB writes to NVDIMM SLOG on FreeBSD show IOPS
increase by 9% instead of expected 5%.  For zfs_commit_timeout_pct of
1 there IOPS increase by 5.5% instead of expected 1%.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14418
2023-01-24 09:20:32 -08:00
Attila Fülöp 037e4f2536 x86 asm: Replace .align with .balign
The .align directive used to align storage locations is
ambiguous. On some platforms and assemblers it takes a byte count,
on others the argument is interpreted as a shift value. The current
usage expects the first interpretation.

Replace it with the unambiguous .balign directive which always
expects a byte count, regardless of platform and assembler.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #14422
2023-01-24 09:04:39 -08:00
Attila Fülöp 58ca7b1011 IPC: blake3 x86 asm: fix placement of .size directives
The .size directive used by the SET_SIZE C macro uses the special
dot symbol to calculate the size of a function. The dot symbol
refers to the current address, so for the calculation to be
meaningful the SET_SIZE macro must be placed immediately after the
end of the function the size is calculated for.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #14422
2023-01-24 09:03:31 -08:00
David Hedberg 37a27b4306 Wait for txg sync if the last DRR_FREEOBJECTS might result in a hole
If we receive a DRR_FREEOBJECTS as the first entry in an object range,
this might end up producing a hole if the freed objects were the
only existing objects in the block.

If the txg starts syncing before we've processed any following
DRR_OBJECT records, this leads to a possible race where the backing
arc_buf_t gets its psize set to 0 in the arc_write_ready() callback
while still being referenced from a dirty record in the open txg.

To prevent this, we insert a txg_wait_synced call if the first
record in the range was a DRR_FREEOBJECTS that actually
resulted in one or more freed objects.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: David Hedberg <david.hedberg@findity.com>
Sponsored by: Findity AB
Closes #11893
Closes #14358
2023-01-23 13:19:43 -08:00
Richard Yao 73968defdd Reject streams that set ->drr_payloadlen to unreasonably large values
In the zstream code, Coverity reported:

"The argument could be controlled by an attacker, who could invoke the
function with arbitrary values (for example, a very high or negative
buffer size)."

It did not report this in the kernel. This is likely because the
userspace code stored this in an int before passing it into the
allocator, while the kernel code stored it in a uint32_t.

However, this did reveal a potentially real problem. On 32-bit systems
and systems with only 4GB of physical memory or less in general, it is
possible to pass a large enough value that the system will hang. Even
worse, on Linux systems, the kernel memory allocator is not able to
support allocations up to the maximum 4GB allocation size that this
allows.

This had already been limited in userspace to 64MB by
`ZFS_SENDRECV_MAX_NVLIST`, but we need a hard limit in the kernel to
protect systems. After some discussion, we settle on 256MB as a hard
upper limit. Attempting to receive a stream that requires more memory
than that will result in E2BIG being returned to user space.

Reported-by: Coverity (CID-1529836)
Reported-by: Coverity (CID-1529837)
Reported-by: Coverity (CID-1529838)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14285
2023-01-23 13:16:22 -08:00
rob-wing 69f024a56e Configure zed's diagnosis engine with vdev properties
Introduce four new vdev properties:
    checksum_n
    checksum_t
    io_n
    io_t

These properties can be used for configuring the thresholds of zed's
diagnosis engine and are interpeted as <N> events in T <seconds>.

When this property is set to a non-default value on a top-level vdev,
those thresholds will also apply to its leaf vdevs. This behavior can be
overridden by explicitly setting the property on the leaf vdev.

Note that, these properties do not persist across vdev replacement. For
this reason, it is advisable to set the property on the top-level vdev
instead of the leaf vdev.

The default values for zed's diagnosis engine (10 events, 600 seconds)
remains unchanged.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Sponsored-by: Seagate Technology LLC
Closes #13805
2023-01-23 13:14:25 -08:00
Richard Yao f091db9248 free_blocks(): Fix reports from 2016 PVS Studio FreeBSD report
In 2016, the authors of PVS Studio ran it on the FreeBSD kernel, which
identified a number of bugs / cleanup opportunities in the FreeBSD ZFS kernel
code. A few of them persist to the present day:

https://reviews.freebsd.org/D5245

Note that the scan was done against
freebsd/freebsd-src@46763fd4ca.

In particular, we have the following in free_blocks():

\sys\cddl\contrib\opensolaris\uts\common\fs\zfs\dnode_sync.c (174): error V547: Expression '__left >= __right' is always true. Unsigned type value is always >= 0.
\sys\cddl\contrib\opensolaris\uts\common\fs\zfs\dnode_sync.c (171): error V634: The priority of the '*' operation is higher than that of the '<<' operation. It's possible that parentheses should be used in the expression.
\sys\cddl\contrib\opensolaris\uts\common\fs\zfs\dnode_sync.c (175): error V547: Expression '__left >= __right' is always true. Unsigned type value is always >= 0.

A couple of assertions accidentally typecast the arguments they check to
unsigned in such a way that the result is always true. Also, parentheses
are missing around `1<<epbs` in `(db->db_blkid * 1<<epbs)`. This works
out to be okay due to multiplication not caring what order of operations
we use, but it is better to fix it to be `(db->db_blkid << epbs)`.

A few of the function local variables probably never should have been
32-bit in the first place, so we make them 64-bit. We also replace the
existing assertions with additional assertions to ensure that 64-bit
unsigned arithmetic is safe.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14407
2023-01-23 13:12:37 -08:00
Chunwei Chen 71974946be Fix reading uninitialized variable in receive_read
When zfs_file_read returns error, resid may be uninitialized.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #14404
2023-01-20 11:49:56 -08:00
Allan Jude 63267f7f77 zfs_receive_one: Check for the more likely error first
If zfs_receive_one() gets back EINVAL, check for the more likely case,
embedded block pointers + encryption and return that error, before
falling back to the less likely case, a resumable stream when the
kernel has not been upgraded to support resume.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: rsync.net
Sponsored-by: Klara Inc.
Closes #14379
2023-01-20 11:11:54 -08:00
Richard Yao 856cefcd1c Cleanup ->dd_space_towrite should be unsigned
This is only ever used with unsigned data, so the type itself should be
unsigned. Also, PVS Studio's 2016 FreeBSD kernel report correctly
identified the following assertion as always being true, so we can drop
it:

ASSERT3U(dd->dd_space_towrite[i & TXG_MASK], >=, 0);

The reason it was always true is because it would do casts to give us
unsigned comparisons. This could have been fixed by switching to
`ASSERT3S()`, but upon inspection, it turned out that this variable
never should have been allowed to be signed in the first place.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14408
2023-01-20 11:10:15 -08:00
Mark Johnston ebabb93e6c Micro-optimize dsl_prop_get_dd()
Use the saved property index instead of looking it up once per DSL
directory when traversing up towards the root.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #14397
2023-01-20 11:01:41 -08:00
Mark Johnston 7c30100c00 Avoid passing an uninitialized index to dsl_prop_known_index
Reported-by: KMSAN
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #14397
2023-01-20 11:00:38 -08:00
Chunwei Chen c6dab6dd39 Fix unprotected zfs_znode_dmu_fini
In original code, zfs_znode_dmu_fini is called in zfs_rmnode without
zfs_znode_hold_enter. It seems to assume it's ok to do so when the znode
is unlinked. However this assumption is not correct, as zfs_zget can be
called by NFS through zpl_fh_to_dentry as pointed out by Christian in
https://github.com/openzfs/zfs/pull/12767, which could result in a
use-after-free bug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12767 
Closes #14364
2023-01-19 16:59:05 -08:00
George Melikov a379083d9f Man: fix defaults for zfs_dirty_data_max_max
It was changed in e99932f7de,
but without docs update.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #14400
2023-01-17 13:12:29 -08:00
Jorgen Lundman 68c0771cc9 Unify Assembler files between Linux and Windows
Add new macro ASMABI used by Windows to change
calling API to "sysv_abi".

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #14228
2023-01-17 11:09:19 -08:00
Ameer Hamza 19d3961589 Use setproctitle to report progress of zfs send
This allows parsing of zfs send progress by checking the process
title.
Doing so requires some changes to the send code in libzfs_sendrecv.c;
primarily these changes move some of the accounting around, to allow
for the code to be verbose as normal, or set the process title. Unlike
BSD, setproctitle() isn't standard in Linux; thus, borrowed it from
libbsd with slight modifications.

Authored-by: Sean Eric Fagan <sef@FreeBSD.org>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14376
2023-01-17 10:17:35 -08:00
Richard Yao 2e7f664f04 Cleanup of dead code suggested by Clang Static Analyzer (#14380)
I recently gained the ability to run Clang's static analyzer on the
linux kernel modules via a few hacks. This extended coverage to code
that was previously missed since Clang's static analyzer only looked at
code that we built in userspace. Running it against the Linux kernel
modules built from my local branch produced a total of 72 reports
against my local branch. Of those, 50 were reports of logic errors and
22 were reports of dead code. Since we already had cleaned up all of
the previous dead code reports, I felt it would be a good next step to
clean up these dead code reports. Clang did a further breakdown of the
dead code reports into:

Dead assignment	15

Dead increment	2

Dead nested assignment	5

The benefit of cleaning these up, especially in the case of dead nested
assignment, is that they can expose places where our error handling is
incorrect. A number of them were fairly straight forward. However
several were not:

In vdev_disk_physio_completion(), not only were we not using the return
value from the static function vdev_disk_dio_put(), but nothing used it,
so I changed it to return void and removed the existing (void) cast in
the other area where we call it in addition to no longer storing it to a
stack value.

In FSE_createDTable(), the function is dead code. Its helper function
FSE_freeDTable() is also dead code, as are the CPP definitions in
`module/zstd/include/zstd_compat_wrapper.h`. We just delete it all.

In zfs_zevent_wait(), we have an optimization opportunity. cv_wait_sig()
returns 0 if there are waiting signals and 1 if there are none. The
Linux SPL version literally returns `signal_pending(current) ? 0 : 1)`
and FreeBSD implements the same semantics, we can just do
`!cv_wait_sig()` in place of `signal_pending(current)` to avoid
unnecessarily calling it again.

zfs_setattr() on FreeBSD version did not have error handling issue
because the code was removed entirely from FreeBSD version. The error is
from updating the attribute directory's files. After some thought, I
decided to propapage errors on it to userspace.

In zfs_secpolicy_tmp_snapshot(), we ignore a lack of permission from the
first check in favor of checking three other permissions. I assume this
is intentional.

In zfs_create_fs(), the return value of zap_update() was not checked
despite setting an important version number. I see no backward
compatibility reason to permit failures, so we add an assertion to catch
failures. Interestingly, Linux is still using ASSERT(error == 0) from
OpenSolaris while FreeBSD has switched to the improved ASSERT0(error)
from illumos, although illumos has yet to adopt it here. ASSERT(error ==
0) was used on Linux while ASSERT0(error) was used on FreeBSD since the
entire file needs conversion and that should be the subject of
another patch.

dnode_move()'s issue was caused by us not having implemented
POINTER_IS_VALID() on Linux. We have a stub in
`include/os/linux/spl/sys/kmem_cache.h` for it, when it really should be
in `include/os/linux/spl/sys/kmem.h` to be consistent with
Illumos/OpenSolaris. FreeBSD put both `POINTER_IS_VALID()` and
`POINTER_INVALIDATE()` in `include/os/freebsd/spl/sys/kmem.h`, so we
copy what it did.

Whenever a report was in platform-specific code, I checked the FreeBSD
version to see if it also applied to FreeBSD, but it was only relevant a
few times.

Lastly, the patch that enabled Clang's static analyzer to be run on the
Linux kernel modules needs more work before it can be put into a PR. I
plan to do that in the future as part of the on-going static analysis
work that I am doing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14380
2023-01-17 09:57:12 -08:00
Brian Behlendorf 60f86a2bfa ZTS: Annotate additonal flaky test cases
Update several flaky test cases in zts-report.py.in until they
can be made entirely reliable.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14392
2023-01-17 09:53:53 -08:00
Brian Behlendorf 07281f7d13 CI: Reclaim space after package operations
Rather than reclaiming space before updating the packages do
it afterwards.  This avoids issues with apt returning an
error due to missing files on the system.

This commit includes a revert for 6320b9e6.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14387
2023-01-17 09:52:27 -08:00
Rob Wing 7a85f58db6 zpool-set: print error message when pool or vdev is not valid
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Sponsored-by: Seagate Technology
Submitted-by: Klara, Inc.
Closes #14310
2023-01-17 09:47:24 -08:00
Rob Wing a0276f7048 zpool-set: update usage text
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Sponsored-by: Seagate Technology
Submitted-by: Klara, Inc.
Closes #14310
2023-01-17 09:46:05 -08:00
Richard Yao d27c7ba62f Linux ppc64le ieee128 compat: Do not redefine __asm on external headers
There is an external assembly declaration extension in GNU C that glibc
uses when building with ieee128 floating point support on ppc64le.
Marking that as volatile makes no sense, so the build breaks.

It does not make sense to only mark this as volatile on Linux, since if
do not want the compiler reordering things on Linux, we do not want the
compiler reordering things on any other platform, so we stop treating
Linux specially and just manually inline the CPP macro so that we can
eliminate it. This should fix the build on ppc64le.

Tested-by: @gyakovlev 
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14308
Closes #14384
2023-01-13 10:58:58 -08:00
Richard Yao 4ef69de384 Cleanup: Use NULL when doing NULL pointer comparisons
The Linux 5.16.14 kernel's coccicheck caught this. The semantic
patch that caught it was:

./scripts/coccinelle/null/badzero.cocci

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 16:00:37 -08:00
Richard Yao 64195fc89f Cleanup: Remove unneeded semicolons
The Linux 5.16.14 kernel's coccicheck caught this. The semantic
patch that caught it was:

./scripts/coccinelle/misc/semicolon.cocci

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 16:00:30 -08:00
Richard Yao 3b2f9c1ec8 Cleanup: Use MIN() macro
The Linux 5.16.14 kernel's coccicheck caught this. The semantic
patch that caught it was:

./scripts/coccinelle/misc/minmax.cocci

There was a third opportunity to use `MIN()`, but that was in
`FSE_minTableLog()` in `module/zstd/lib/compress/fse_compress.c`.
Upstream zstd has yet to make this change and I did not want to change
header includes just for MIN, or do a one off, so I left it alone.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 16:00:23 -08:00
Richard Yao e6328fda2e Cleanup: !A || A && B is equivalent to !A || B
In zfs_zaccess_dataset_check(), we have the following subexpression:

(!IS_DEVVP(ZTOV(zp)) ||
    (IS_DEVVP(ZTOV(zp)) && (v4_mode & WRITE_MASK_ATTRS)))

When !IS_DEVVP(ZTOV(zp)) is false, IS_DEVVP(ZTOV(zp)) is true under the
law of the excluded middle since we are not doing pseudoboolean alegbra.
Therefore doing:

(IS_DEVVP(ZTOV(zp)) && (v4_mode & WRITE_MASK_ATTRS))

Is unnecessary and we can just do:

(v4_mode & WRITE_MASK_ATTRS)

The Linux 5.16.14 kernel's coccicheck caught this. The semantic
patch that caught it was:

./scripts/coccinelle/misc/excluded_middle.cocci

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 16:00:15 -08:00
Richard Yao 9c8fabffa2 Cleanup: Replace oldstyle struct hack with C99 flexible array members
The Linux 5.16.14 kernel's coccicheck caught this. The semantic
patch that caught it was:

./scripts/coccinelle/misc/flexible_array.cocci

However, unlike the cases where the GNU zero length array extension had
been used, coccicheck would not suggest patches for the older style
single member arrays. That was good because blindly changing them would
break size calculations in most cases.

Therefore, this required care to make sure that we did not break size
calculations. In the case of `indirect_split_t`, we use
`offsetof(indirect_split_t, is_child[is->is_children])` to calculate
size. This might be subtly wrong according to an old mailing list
thread:

https://inbox.sourceware.org/gcc-prs/20021226123454.27019.qmail@sources.redhat.com/T/

That is because the C99 specification should consider the flexible array
members to start at the end of a structure, but compilers prefer to put
padding at the end. A suggestion was made to allow compilers to allocate
padding after the VLA like compilers already did:

http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n983.htm

However, upon thinking about it, whether or not we allocate end of
structure padding does not matter, so using offsetof() to calculate the
size of the structure is fine, so long as we do not mix it with sizeof()
on structures with no array members.

In the case that we mix them and padding causes offsetof(struct_t,
vla_member[0]) to differ from sizeof(struct_t), we would be doing unsafe
operations if we underallocate via `offsetof()` and then overcopy via
sizeof().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 16:00:03 -08:00
Richard Yao d35ccc1f59 Cleanup: Fix indentation in zfs_dbgmsg_t
fdc2d30371 accidentally broke the
indentation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 15:59:56 -08:00
Richard Yao 8e7ebf4e2d Cleanup: Use C99 flexible array members instead of zero length arrays
The Linux 5.16.14 kernel's coccicheck caught this. The semantic
patch that caught it was:

./scripts/coccinelle/misc/flexible_array.cocci

The Linux kernel's documentation makes a good case for why we should not
use these:

https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 15:59:41 -08:00
Richard Yao c9c3ce7976 Cleanup: Use kmem_zalloc() instead of memset() to zero memory
The Linux 5.16.14 kernel's coccicheck caught this. The semantic patch
that caught it was:

./scripts/coccinelle/api/alloc/zalloc-simple.cocci

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 15:59:28 -08:00
Richard Yao 7384ec65cd Cleanup: Remove unnecessary explicit casts of pointers from allocators
The Linux 5.16.14 kernel's coccicheck caught these. The semantic patch
that caught them was:

./scripts/coccinelle/api/alloc/alloc_cast.cocci

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 15:59:12 -08:00
Gian-Carlo DeFazio 80d64bb85f change how d_alias is replaced by du.d_alias
d_alias may need to be converted to du.d_alias
depending on the kernel version.
d_alias is currently in only one place in the code which
changes
"hlist_for_each_entry(dentry, &inode->i_dentry, d_alias)"
to
"hlist_for_each_entry(dentry, &inode->i_dentry, d_u.d_alias)"
as neccesary.

This effectively results in a double macro expansion
for code that uses the zfs headers but already has its
own macro for just d_alias (lustre in this case).

Remove the conditional code for hlist_for_each_entry
and have a macro for "d_alias -> du.d_alias" instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Closes #14377
2023-01-12 10:14:04 -08:00
George Amanakis eee9362a72 Activate filesystem features only in syncing context
When activating filesystem features after receiving a snapshot, do 
so only in syncing context.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14304 
Closes #14252
2023-01-11 18:00:39 -08:00
Brian Behlendorf 6320b9e68e CI: remove unused packages/snaps
Removing portions of packages/snaps directly with rm can result in
unexpected errors when running `apt update`.  Free up the additional
space by removing (some) packages with the proper tools.

This change frees up slightly less space than before, but it is
expected to still be sufficient.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14374
2023-01-11 15:18:51 -08:00
rob-wing 6f2ffd272c zpool: do guid-based comparison in is_vdev_cb()
is_vdev_cb() uses string comparison to find a matching vdev and 
will fallback to comparing the guid via a string.  These changes 
drop the string comparison and compare the guids instead.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Sponsored-by: Seagate Technology
Submitted-by: Klara, Inc.
Closes #14311
2023-01-11 15:14:35 -08:00
Mateusz Piotrowski 926715b9fc Turn default_bs and default_ibs into ZFS_MODULE_PARAMs
The default_bs and default_ibs tunables control the default block size
and indirect block size.

So far, default_bs and default_ibs were tunable only on FreeBSD, e.g.,

    sysctl vfs.zfs.default_ibs

Remove the FreeBSD-specific sysctl code and expose default_bs and
default_ibs as tunables on both Linux and FreeBSD using
ZFS_MODULE_PARAM.

One of the use cases for changing the values of those tunables is to
lower the indirect block size, which may improve performance of large
directories (as discussed during the OpenZFS Leadership Meeting
on 2022-08-16).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Sponsored-by: Wasabi Technology, Inc.
Closes #14293
2023-01-11 09:38:20 -08:00
Tony Hutter 4ba3eff2a6 Update META to 6.1 kernel
ZFS successfully builds against the 6.1.4 kernel.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #14371
2023-01-10 15:53:33 -08:00
Mateusz Piotrowski a4b21eadec Add tunable to allow changing micro ZAP's max size
This change turns `MZAP_MAX_BLKSZ` into a `ZFS_MODULE_PARAM()` called
`zap_micro_max_size`. As a result, we can experiment with different
micro ZAP sizes to improve directory size scaling.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Mateusz Piotrowski <mateuszpiotrowski@klarasystems.com>
Co-authored-by: Toomas Soome <toomas.soome@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateuszpiotrowski@klarasystems.com>
Sponsored-by: Wasabi Technology, Inc.
Closes #14292
2023-01-10 13:41:54 -08:00
Alyssa Ross 1f19826c9a etc/systemd/zfs-mount-generator: avoid strndupa
The non-standard strndupa function is not implemented by musl libc,
and can be dangerous due to its potential to blow the stack.  (musl
_does_ implement strdupa, used elsewhere in this function.)

With a similar amount of code, we can use a heap allocation to
construct the pool name, which is musl-friendly and doesn't have
potential stack problems.

(Why care about musl when systemd only supports glibc?  Some distros
patch systemd with portability fixes, and it would be nice to be able
to use ZFS on those distros.)

 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alyssa Ross <alyssa.ross@unikie.com>
Closes #14327
2023-01-10 13:40:31 -08:00
Matthew Ahrens fc45975ec8 Batch enqueue/dequeue for bqueue
The Blocking Queue (bqueue) code is used by zfs send/receive to send
messages between the various threads.  It uses a shared linked list,
which is locked whenever we enqueue or dequeue.  For workloads which
process many blocks per second, the locking on the shared list can be
quite expensive.

This commit changes the bqueue logic to have 3 linked lists:
1. An enquing list, which is used only by the (single) enquing thread,
   and thus needs no locks.
2. A shared list, with an associated lock.
3. A dequing list, which is used only by the (single) dequing thread,
   and thus needs no locks.

The entire enquing list can be moved to the shared list in constant
time, and the entire shared list can be moved to the dequing list in
constant time.  These operations only happen when the `fill_fraction` is
reached, or on an explicit flush request.  Therefore, the lock only
needs to be acquired infrequently.

The API already allows for dequing to block until an explicit flush, so
callers don't need to be changed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #14121
2023-01-10 13:39:22 -08:00
Brian Behlendorf 0c8fbe5b6a ztest: update ztest_dmu_snapshot_create_destroy()
ECHRNG is returned when the channel program encounters a runtime
error.  For example, this can happen when a snapshot doesn't exist.
We handle this error the same way as the existing EEXIST and ENOENT
error checks.

Additionally, improve the internal debug message to include the
error describing why a pool couldn't be opened.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14351
2023-01-10 13:27:48 -08:00
Brian Behlendorf 549aafb7c8 ztest: ztest_dsl_prop_set_uint64() ENOSPC consistency
It is possible for ztest_dsl_prop_set_uint64() to fail with ENOSPC
and this needs to be handled consistently.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14351
2023-01-10 13:27:48 -08:00
Brian Behlendorf f7788883ab ztest: reduce zpool split frequency
There's no need to so aggressively test splitting a pool.  Reduce
the occurence of this test to once every 10 seconds.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14351
2023-01-10 13:27:48 -08:00
Brian Behlendorf 4208a052c2 ztest: update expectation for sparing a special device
Commit c23738c70e modified the expected
behavior of attach to prevent hot spares from being used as special
vdev replacements.  We update ztest's expectations accordingly to
prevent it from failing when testing the updated behavior.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14351
2023-01-10 13:26:44 -08:00
Matthew Ahrens 40d7e971ff ztest fails assertion in zio_write_gang_member_ready()
Encrypted blocks can have up to 2 DVA's, as the third DVA is reserved
for the salt+IV.  However, dmu_write_policy() allows non-encrypted
blocks (e.g. DMU_OT_OBJSET) inside encrypted datasets to request and
allocate 3 DVA's, since they don't need a salt+IV (they are merely
authenicated).

However, if such a block becomes a gang block, the gang code incorrectly
limits the gang block header to 2 DVA's.  This leads to a "NDVAs
inversion", where a parent block (the gang block header) has less DVA's
than its children (the gang members), causing an assertion failure in
zio_write_gang_member_ready().

This commit addresses the problem by only restricting the gang block
header to 2 DVA's if the block is actually encrypted (and thus its gang
block members can have at most 2 DVA's).

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #14250
Closes #14356
2023-01-09 16:43:45 -08:00
Charles Suh 44a78c05b3 libzpool: fix ddi_strtoull to update nptr
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Charles Suh <charles.suh@gmail.com>
Closes #14360
2023-01-09 12:49:35 -08:00
Ameer Hamza 5091867ee6 zed: add hotplug support for spare vdevs
This commit supports for spare vdev hotplug. The
spare vdev associated with all the pools will be
marked as "Removed" when the drive is physically
detached and will become "Available" when the
drive is reattached. Currently, the spare vdev
status does not change on the drive removal and
the same is the case with reattachment.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14295
2023-01-09 12:43:03 -08:00
Alexander Motin 289f7e6adb Remove some dead ARC code. (#14340)
Every ARC buffer holds a reference on the header. It means headers with
buffers are never evictable.  When we are evicting a header, there can
be no more buffers to free.  Just assert that.

b_evict_lock seems not protecting anything now.  Remove it.

Buffers checksum should also be freed with the last uncompressed buffer,
so it should not be there also when we are evicting the header.

Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
2023-01-09 10:45:17 -08:00
Coleman Kane a0105f6cd4 linux 6.2 compat: bio->bi_rw was renamed bio->bi_opf
The bi_rw member of struct bio was renamed to bi_opf in Linux 6.2.
As well, Linux's implementation of bio_set_op_attrs(...) has been
removed.

The HAVE_BIO_BI_OPF macro already appears to be defined, but the
removal of the bio_set_op_attrs(...) implementation makes the build
fall back on the locally-defined implementation, which isn't updated
for the bio->bi_opf change. This commit adds that update.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #14324
Closes #14331
2023-01-06 14:43:22 -08:00
Coleman Kane 884a69357f linux 6.2 compat: get_acl() got moved to get_inode_acl() in 6.2
Linux 6.2 renamed the get_acl() operation to get_inode_acl() in
the inode_operations struct. This should fix Issue #14323.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #14323
Closes #14331
2023-01-06 14:40:54 -08:00
Antonio Russo 556ed09537 Introduce ZFS_LINUX_REQUIRE_API autoconf macro
Currently, if API tests fail, we either ignore the failures, or
unconditionally halt the kernel build.  This leads to situations where
incompatibilities with existing APIs may develop, but not trip the
configure compatibility checks.

This introduces a new mechanism to require APIs for kernels above a
particular version.  While not perfect, this at least guarantees
mainline kernels do not break existing APIs without at least providing
some warning.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #14343
2023-01-06 14:33:53 -08:00
Antonio Russo d27c81847b Linux 6.1 compat: open inside tmpfile()
Linux 863f144 modified the .tmpfile interface to pass a struct file,
rather than a struct dentry, and expect the tmpfile implementation to
open inside of tmpfile().

This patch implements a configuration test that checks for this new API
and appropriately sets a HAVE_TMPFILE_DENTRY flag that tracks this old
API.  Contingent on this flag, the appropriate API is implemented.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #14301
Closes #14343
2023-01-06 14:33:00 -08:00
Antonio Russo a7304ab9c1 ZTS: close in mmapwrite.c
mmapwrite is used during the ZTS to identify issues with mmap-ed files.
This helper program exercises this pathway by continuously writing to a
file.  ee6bf97c7 modified the writing threads to terminate after a set
amount of total data is written.  This change allows standard program
execution to reach the end of a writer thread without closing the file
descriptor, introducing a resource "leak."

This patch appeases resource leak analyses by close()-ing the file at
the end of the thread.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #14353
2023-01-06 10:52:08 -08:00
Antonio Russo ee6bf97c77 ZTS: limit mmapwrite file size
mmapwrite spawns several threads, all of which perform writes on a file
for the purpose of testing the behavior of mmap(2)-ed files.  One
thread performs an mmap and a write to the beginning of that region,
while the others perform regular writes after lseek(2)-ing the end of
the file.

Because these regular writes are set in a while (1) loop, they will
write an unbounded amount of data to disk.  The mmap_write_001_pos test
script SIGKILLs them after 30 seconds, but on fast testbeds, this may
be enough time to exhaust the available space in the filesystem,
leading to spurious test failures.

Instead, limit the total file size by checking that the lseek return
value is no greater than 250 * 1024*1024 bytes, which is less than the
default minimum vdev size defined in includes/default.cfg .

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #14277 
Closes #14345
2023-01-05 12:50:30 -08:00
Clemens Lang 8352e9dfae contrib: dracut: Do not timeout waiting for pw
systemd-ask-password has a default timeout of 90 seconds, which means
that dracut will fall back to the rescue shell 4.5 minutes after boot if
no password is entered.

This is undesirable when combined with, for example, unlocking remotely
using dracut-sshd and systemd-tty-ask-password-agent. See also
https://github.com/gsauthof/dracut-sshd#timeout and
https://bugzilla.redhat.com/show_bug.cgi?id=868421.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Clemens Lang <neverpanic@gmail.com>
Closes #14341
2023-01-05 12:07:43 -08:00
Richard Yao 1f3bc5ea80 Illumos #15286: do_composition() needs sign awareness
Authored by: Dan McDonald <danmcd@mnx.io>
Reviewed by: Patrick Mooney <pmooney@pfmooney.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Ported-by: Richard Yao <richard.yao@alumni.stonybrook.edu>

Illumos-issue: https://www.illumos.org/issues/15286
Illumos-commit: https://github.com/illumos/illumos-gate/commit/f137b22e734e85642da3e56e8b94da3f5f027c73

Porting Notes:

The patch in illumos did not have much of a commit message, and did not
provide attribution to the reporter, while original patch proposed to
OpenZFS did, so I am listing the reporter (myself) and original patch
author (also myself) below while including the original commit message
with some minor corrections as part of the porting notes:

In do_composition(), we have:

size = u8_number_of_bytes[*p];
if (size <= 1 || (p + size) > oslast)
	break;

There, we have type promotion from int8_t to size_t, which is unsigned.
C will sign extend the value as part of the widening before treating the
value as unsigned and the negative values we can counter are error
values from U8_ILLEGAL_CHAR and U8_OUT_OF_RANGE_CHAR, which are -1 and
-2 respectively. The unsigned versions of these under two's complement
are SIZE_MAX and SIZE_MAX-1 respectively.

The bounds check is written under the assumption that `size <= 1` does a
signed comparison. This is followed by a pointer comparison to see if
the string has the correct length, which is fine.

A little further down we have:

for (i = 0; i < size; i++)
	tc[i] = *p++;

When an error condition is encountered, this will attempt to iterate at
least SIZE_MAX-1 times, which will massively overflow the buffer, which
is not fine.

The kernel will kill the loop as soon as it hits the kernel stack guard
on Linux systems built with CONFIG_VMAP_STACK=y, which should be just
about all of them. That prevents arbitrary code execution and just about
any other bad thing that a black hat attacker might attempt with
knowledge of this buffer overflow. Other systems' kernels have
mitigations for unbounded in-kernel buffer overflows that will catch
this too.

Also, the patch in illumos-gate made an effort to fix C style issues
that had been fixed in the OpenZFS/ZFSOnLinux repository. Those issues
had been mentioned in the email that I originally sent them about this
issue. One of the fixes had not been already done, so it is included.
Another to collect_a_seq()'s arguments was handled differently in
OpenZFS. For the sake of avoiding unnecessary differences, it has been
adopted. This has the interesting effect that if you correct the paths
in the illumos-gate patch to match the current OpenZFS repository, you
can reverse apply it cleanly.

Original-patch-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Co-authored-by: Dan McDonald <danmcd@mnx.io>
Closes #14318
Closes #14342
2023-01-05 11:16:21 -08:00
Matthew Ahrens b72efb7511 removal of LegacyVersion broke ax_python_dev.m4
The 22.0 release of the python `packaging` package removed the
`LegacyVersion` trait, causing ZFS to no longer compile.

This commit replaces the sections of `ax_python_dev.m4` that rely on
`LegacyVersion` with updated implementations from the upstream
`autoconf-archive`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #14297
2023-01-05 11:04:24 -08:00
Mateusz Guzik f25f1f9091 FreeBSD: catch up to 1400077
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #14328
2023-01-05 10:56:40 -08:00
Martin Rüegg f6f215f07f Fix shebang for helper script of deb-utils
Shebang was missing the `!` between `#` and the actual path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Rüegg <martin.rueegg@metaworx.ch>
Closes #14339
2023-01-05 10:50:00 -08:00
Martin Rüegg fb000f7867 Add quotation marks around $PATH for deb-utils
Fix #14338, failing to build deb-utils if existing `$PATH` variable
would include a whitespace.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Rüegg <martin.rueegg@metaworx.ch>
Closes #14339
2023-01-05 10:49:33 -08:00
Alexander Motin db832c47fe Pack zrlock_t by 8 bytes
On FreeBSD this reduces this structure size from 64 to 56 bytes.
dnode_handle_t respectively reduces from 72 to 64 bytes. It sounds
like a waste to need 72 bytes to be able to relocate 808 bytes of
dnode_t, which relocation on FreeBSD is not even supported.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #14317
2023-01-05 09:31:55 -08:00
Alexander Motin 792a6ee462 Update arc_summary and arcstat outputs
Recent ARC commits added new statistic counters, such as iohits,
uncached state, etc.  Represent those.  Also some of previously
reported numbers were confusing or even made no sense.  Cleanup
and restructure existing reports.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Issue #14115 
Issue #14123
Issue #14243 
Closes #14320
2023-01-05 09:29:13 -08:00
Alexander Motin bacf366fe2 Hide b_freeze_* under ZFS_DEBUG
This saves 40 bytes per full ARC header, reducing it on FreeBSD from
240 to 200 bytes on production bits.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #14315
2023-01-05 10:15:31 -07:00
Alexander Motin ed2f7ba08d Implement uncached prefetch
Previously the primarycache property was handled only in the dbuf
layer. Since the speculative prefetcher is implemented in the ARC,
it had to be disabled for uncacheable buffers.

This change gives the ARC knowledge about uncacheable buffers
via  arc_read() and arc_write(). So when remove_reference() drops
the last reference on the ARC header, it can either immediately destroy
it, or if it is marked as prefetch, put it into a new arc_uncached state. 
That state is scanned every second, evicting stale buffers that were
not demand read.

This change also tracks dbufs that were read from the beginning,
but not to the end.  It is assumed that such buffers may receive further
reads, and so they are stored in dbuf cache. If a following
reads reaches the end of the buffer, it is immediately evicted.
Otherwise it will follow regular dbuf cache eviction.  Since the dbuf
layer does not know actual file sizes, this logic is not applied to
the final buffer of a dnode.

Since uncacheable buffers should no longer stay in the ARC for long,
this patch also tries to optimize I/O by allocating ARC physical
buffers as linear to allow buffer sharing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14243
2023-01-04 17:29:54 -07:00
Alexander Motin c935fe2e92 arc_read()/arc_access() refactoring and cleanup
ARC code was many times significantly modified over the years, that
created significant amount of tangled and potentially broken code.
This should make arc_access()/arc_read() code some more readable.

 - Decouple prefetch status tracking from b_refcnt.  It made sense
originally, but became highly cryptic over the years.  Move all the
logic into arc_access().  While there, clean up and comment state
transitions in arc_access().  Some transitions were weird IMO.
 - Unify arc_access() calls to arc_read() instead of sometimes calling
it from arc_read_done().  To avoid extra state changes and checks add
one more b_refcnt for ARC_FLAG_IO_IN_PROGRESS.
 - Reimplement ARC_FLAG_WAIT in case of ARC_FLAG_IO_IN_PROGRESS with
the same callback mechanism to not falsely account them as hits. Count
those as "iohits", an intermediate between "hits" and "misses". While
there, call read callbacks in original request order, that should be
good for fairness and random speculations/allocations/aggregations.
 - Introduce additional statistic counters for prefetch, accounting
predictive vs prescient and hits vs iohits vs misses.
 - Remove hash_lock argument from functions not needing it.
 - Remove ARC_FLAG_PREDICTIVE_PREFETCH, since it should be opposite
to ARC_FLAG_PRESCIENT_PREFETCH if ARC_FLAG_PREFETCH is set.  We may
wish to add ARC_FLAG_PRESCIENT_PREFETCH to few more places.
 - Fix few false positive tests found in the process.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14123
2022-12-22 12:10:24 -08:00
Ryan Moeller dc8c2f6158 FreeBSD: Fix potential boot panic with bad label
vdev_geom_read_pool_label() can leave NULL in configs.  Check for it
and skip consistently when generating rootconf.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #14291
2022-12-22 11:50:09 -08:00
Matthew Ahrens 018f26041d deadlock between spa_errlog_lock and dp_config_rwlock
There is a lock order inversion deadlock between `spa_errlog_lock` and
`dp_config_rwlock`:

A thread in `spa_delete_dataset_errlog()` is running from a sync task.
It is holding the `dp_config_rwlock` for writer (see
`dsl_sync_task_sync()`), and waiting for the `spa_errlog_lock`.

A thread in `dsl_pool_config_enter()` is holding the `spa_errlog_lock`
(see `spa_get_errlog_size()`) and waiting for the `dp_config_rwlock` (as
reader).

Note that this was introduced by #12812.

This commit address this by defining the lock ordering to be
dp_config_rwlock first, then spa_errlog_lock / spa_errlist_lock.
spa_get_errlog() and spa_get_errlog_size() can acquire the locks in this
order, and then process_error_block() and get_head_and_birth_txg() can
verify that the dp_config_rwlock is already held.

Additionally, a buffer overrun in `spa_get_errlog()` is corrected.  Many
code paths didn't check if `*count` got to zero, instead continuing to
overwrite past the beginning of the userspace buffer at `uaddr`.

Tested by having some errors in the pool (via `zinject -t data
/path/to/file`), one thread running `zpool iostat 0.001`, and another
thread runs `zfs destroy` (in a loop, although it hits the first time).
This reproduces the problem easily without the fix, and works with the
fix.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #14239
Closes #14289
2022-12-22 11:48:49 -08:00
Brian Behlendorf 29e1b089c1 Documentation corrections
- Update the link to the OpenZFS Code of Conduct.
- Remove extra "the" from contrib/initramfs/scripts/zfs

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14298
Closes #14307
2022-12-22 11:34:28 -08:00
Brian Behlendorf b4cd4fe1aa Revert "zdb: zdb_ddt_leak_init() reads uninitialized memory..."
This reverts commit d30db519af.  With
this change applied zloop.sh fails reliably with the following ASSERT.

  zio_wait(zio_claim(NULL, zcb->zcb_spa, refcnt ? 0 : spa_min_claim_txg(
    zcb->zcb_spa), bp, NULL, NULL, ZIO_FLAG_CANFAIL)) == 0 (0x2 == 0x0)
  ASSERT at cmd/zdb/zdb.c:5452:zdb_count_block()

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14306
2022-12-21 09:17:00 -08:00
George Melikov bd9dc5a1dc systemd: set restart=always for zfs-zed.service
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Co-authored-by: Attila Fülöp <attila@fueloep.org>
Closes #14294
2022-12-19 16:08:15 -08:00
Ethan Coe-Renner fb11b1570a Add color output to zfs diff.
This adds support to color zfs diff (in the style of git diff)
conditional on the ZFS_COLOR environment variable.

Signed-off-by: Ethan Coe-Renner <coerenner1@llnl.gov>
2022-12-15 10:14:32 -08:00
Doug Rabson 24502bd3a7 FreeBSD: Remove stray debug printf
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Doug Rabson <dfr@rabson.org>
Closes #14286 
Closes #14287
2022-12-13 17:35:07 -08:00
Umer Saleem e6e31dd540 Add native-deb* targets to build native Debian packages
In continuation of previous #13451, this commits adds native-deb*
targets for make to build native debian packages. Github workflows
are updated to build and test native Debian packages.

Native packages only build with pre-configured paths (see the
dh_auto_configure section in contrib/debian/rules.in). While
building native packages, paths should not be configured. Initial
config flags e.g. '--enable-debug' are replaced in
contrib/debian/rules.in.

Additional packages on top of existing zfs packages required to
build native packages include debhelper-compat, dh-python, dkms,
po-debconf, python3-all-dev, python3-sphinx.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #14265
2022-12-13 17:33:05 -08:00
Richard Yao f3f5263f8a Zero end of embedded block buffer in dump_write_embedded()
This fixes a kernel stack leak.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-by: Nicholas Sherlock <n.sherlock@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13778
Closes #14255
2022-12-13 17:31:47 -08:00
Ameer Hamza 9be34ec99e Allow receiver to override encryption properties in case of replication
Currently, the receiver fails to override the encryption
property for the plain replicated dataset with the error:
"cannot receive incremental stream: encryption property
'encryption' cannot be set for incremental streams.". The
problem is resolved by allowing the receiver to override
the encryption property for plain replicated send.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14253
Closes #13533
2022-12-13 17:30:46 -08:00
Richard Yao 3236c0b891 Cache dbuf_hash() calculation
We currently compute a 64-bit hash three times, which consumes 0.8% CPU
time on ARC eviction heavy workloads. Caching the 64-bit value in the
dbuf allows us to avoid that overhead.

Sponsored-By: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Closes #14251
2022-12-13 17:29:21 -08:00
Allan Jude dc95911d21 zfs list: Allow more fields in ZFS_ITER_SIMPLE mode
If the fields to be listed and sorted by are constrained to those
populated by dsl_dataset_fast_stat(), then zfs list is much faster,
as it does not need to open each objset and reads its properties.

A previous optimization by Pawel Dawidek
(0cee24064a) took advantage
of this to make listing snapshot names sorted only by name much faster.

However, it was limited to `-o name -s name`, this work extends this
optimization to work with:
  - name
  - guid
  - createtxg
  - numclones
  - inconsistent
  - redacted
  - origin
and could be further extended to any other properties supported by
dsl_dataset_fast_stat() or similar, that do not require extra locking
or reading from disk.

This was committed before (9a9e2e343dfa2af28bf7910de77ae73aa006de62),
but was reverted due to a regression when used with an older kernel.

If the kernel does not populate zc->zc_objset_stats, we now fallback
to getting the properties via the slower interface, to avoid problems
with newer userland and older kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #14110
2022-12-13 17:27:54 -08:00
Marcel Menzel 70ac2654f5 Change ZEVENT_POOL_GUID to ZEVENT_POOL to display pool names
Outgoing mails for ZFS pool events include the pool GUID,
but not the actual pool name. Let's change this for better
readability, as it is already done in the mails for finished
pool resilvers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Marcel Menzel <mail@mcl.gg>
Closes #14272
2022-12-13 17:26:10 -08:00
Richard Yao d31a7cb4fa Address theoretical uninitialized variable usage in zstream
Coverity has long complained about the checksum being uninitialized if
an END record is processed before its BEGIN record. This should not
happen, but there was no code to check for it. I had left this unfixed
since it was a low priority issue, but then
9f4ede63d2 added another instance of this.

I am making an effort to "hold the line" to keep new coverity defect
reports from going unaddressed, so I find myself forced to fix this much
earlier than I had originally planned to address it.

The solution is to maintain a counter and a flag. Then use VERIFY
statements to verify the following runtime constraints:

 * Every record either has a corresponding BEGIN record, is a BEGIN
   record or is the end of stream END record for replication streams.
 * BEGIN records cannot be nested. i.e. There must be an END record
   before another BEGIN record may be seen.

Failure to meet these constraints will cause the program to exit.

This is sufficient to ensure that the checksum is never accessed when
uninitialized.

Reported-by: Coverity (CID 1524578)
Reported-by: Coverity (CID 1524633)
Reported-by: Coverity (CID 1527295)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14176
2022-12-12 10:40:05 -08:00
Ryan Moeller 786ff6a6cb initramfs: Fix legacy mountpoint rootfs
Legacy mountpoint datasets should not pass `-o zfsutil` to `mount.zfs`.
Fix the logic in `mount_fs()` to not forget we have a legacy mountpoint
when checking for an `org.zol:mountpoint` userprop.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #14274
2022-12-12 10:23:06 -08:00
Ameer Hamza e3785718ba Skip permission checks for extended attributes
zfs_zaccess_trivial() calls the generic_permission() to read
xattr attributes. This causes deadlock if called from
zpl_xattr_set_dir() context as xattr and the dent locks are
already held in this scenario. This commit skips the permissions
checks for extended attributes since the Linux VFS stack already
checks it before passing us the control.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14220
2022-12-12 10:21:37 -08:00
Allan Jude f900279e6d Restrict visibility of per-dataset kstats inside FreeBSD jails
When inside a jail, visibility on datasets not "jailed" to the
jail is restricted. However, it was possible to enumerate all
datasets in the pool by looking at the kstats sysctl MIB.

Only the kstats corresponding to datasets that the user has
visibility on are accessible now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #14254
2022-12-09 11:04:29 -08:00
Richard Yao 5401472cd0 Linux PPC: Fix build failures on kernels built without CONFIG_SPE
We do a simple ifdef to avoid calling enable_kernel_spe()/
disable_kernel_spe() on PowerPC.

Reported-by: Rich Ercolani <Rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Tested-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14233
Closes #14244
2022-12-09 10:51:23 -08:00
Serapheim Dimitropoulos 7bf4c97a36 Bypass metaslab throttle for removal allocations
Context:
We recently had a scenario where a customer with 2x10TB disks at 95+%
fragmentation and capacity, wanted to migrate their disks to a 2x20TB
setup. So they added the 2 new disks and submitted the removal of the
first 10TB disk.  The removal took a lot more than expected (order of
more than a week to 2 weeks vs a couple of days) and once it was done it
generated a huge indirect mappign table in RAM (~16GB vs expected ~1GB).

Root-Cause:
The removal code calls `metaslab_alloc_dva()` to allocate a new block
for each evacuating block in the removing device and it tries to batch
them into 16MB segments. If it can't find such a segment it tries for
8MBs, 4MBs, all the way down to 512 bytes.

In our scenario what would happen is that `metaslab_alloc_dva()` from
the removal thread pick the new devices initially but wouldn't allocate
from them because of throttling in their metaslab allocation queue's
depth (see `metaslab_group_allocatable()`) as these devices are new and
favored for most types of allocations because of their free space. So
then the removal thread would look at the old fragmented disk for
allocations and wouldn't find any contiguous space and finally retry
with a smaller allocation size until it would to the low KB range. This
caused a lot of small mappings to be generated blowing up the size of
the indirect table. It also wasted a lot of CPU while the removal was
active making everything slow.

This patch:
Make all allocations coming from the device removal thread bypass the
throttle checks. These allocations are not even counted in the metaslab
allocation queues anyway so why check them?

Side-Fix:
Allocations with METASLAB_DONT_THROTTLE in their flags would not be
accounted at the throttle queues but they'd still abide by the
throttling rules which seems wrong. This patch fixes this by checking
for that flag in `metaslab_group_allocatable()`. I did a quick check to
see where else this flag is used and it doesn't seem like this change
would cause issues.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14159
2022-12-09 10:48:33 -08:00
Richard Yao 5f73bbba43 Do not pass -1 to strerror() from zfs_send_cb_impl()
`zfs_send_cb_impl()` calls `dump_filesystems()`, which calls
`dump_filesystem()`, which will return `-1` as an error when
`zfs_open()` returns `NULL`.

This will be passed to `zfs_standard_error()`, which passes it to
`zfs_standard_error_fmt()`, which passes it to `strerror()`.

To fix this, we modify zfs_open() to set `errno` whenever it returns
NULL. Most of the cases already have `errno` set (since they pass it to
`zfs_standard_error_fmt()`, which makes this easy. Then we modify
`dump_filesystem()` to pass `errno` instead of `-1`.

Reported-by: Coverity (CID-1524598)
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14264
2022-12-08 14:15:27 -08:00
Richard Yao 242a5b748c Fix dereference after null check in enqueue_range
If the bp is NULL, we have a hole. However, when we build with
assertions, we will dereference bp when `blkid == DMU_SPILL_BLKID`. When
this happens on a hole, we will have a NULL pointer dereference.

Reported-by: Coverity (CID-1524670)
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14264
2022-12-08 14:15:21 -08:00
Richard Yao f954ea26a6 zdb: Handle theoretical buffer overflow when printing float
CodeQL pointed out that for extreme floating point values, `sprintf()`
will overwrite a 32 character buffer. It cited 1e304 as an example,
which causes `sprintf()` to print 308 characters.

In practice, the numbers should never exceed 100, so this should not
happen. To silence the warning and also handle unexpected situations, we
change the code to use `snprintf()`.

This was missed during my audit of our use of `sprintf()`, since I did
not think to consider extreme floating point representations. It also
really should not happen, so this change is purely defensive
programming.

This was found by CodeQL's cpp/overrunning-write-with-float check.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14264
2022-12-08 14:15:15 -08:00
Richard Yao d30db519af zdb: zdb_ddt_leak_init() reads uninitialized memory when birth == 0
This was written by Jeff Bonick and was committed to OpenSolaris on
November 1, 2009. It appears that Jeff meant to continue the outer loop
iteration when `ddp->ddp_phys_birth == 0`, but put his check inside the
inner loop. This causes a pointer to uninitialized memory to be passed
to ddt_lookup() inside a VERIFY() statement whenever that condition is
true.

Reported-by: Coverity (CID 1524462)
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14264
2022-12-08 14:15:10 -08:00
Richard Yao 2709ace096 ztest: comparisons against errno should not assign to it
888914486e introduced this regression.

I used cscope to verify that there are no other instances of this in the
codebase. This is the one of the few bugs that are extremely easy to
identify using cscope.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14264
2022-12-08 14:15:04 -08:00
Richard Yao ba87ed1410 Fix potential buffer overflow in zpool command
The ZPOOL_SCRIPTS_PATH environment variable can be passed here. This
allows for arbitrarily long strings to be passed to sprintf(), which can
overflow the buffer.

I missed this in my earlier audit of the codebase. CodeQL's
cpp/unbounded-write check caught this.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14264
2022-12-08 14:14:30 -08:00
Richard Yao ecccaede68 zdb: Fix big parameter passed by value
This is not in performance critical code, but static analyzers will
complain about it, so lets switch to pass by pointer here.

Reported-by: Coverity (CID-1524384)
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14263
2022-12-08 13:52:53 -08:00
Richard Yao f1100863f7 Linux: Cleanup unnecessary NULL check in __vdev_disk_physio()
zio is never NULL when given to the vdev. Coverity complained saying:

"Either the check against null is unnecessary, or there may be a null
pointer dereference."

Reported-by: Coverity (CID-1466174)
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14263
2022-12-08 13:52:47 -08:00
Richard Yao 56c6f293c0 Remove duplicate statically allocated variable
dsl_dataset_snapshot_sync_impl() declares `static zil_header_t zero_zil
__maybe_unused;`, but this is also declared globally. This wastes
memory.

CodeQL's cpp/local-variable-hides-global-variable check caught this.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14263
2022-12-08 13:52:42 -08:00
Richard Yao aaa9a6700f Cleanup: zhack should not declare function prototypes in main()
Instead, it should include the proper header.

CodeQL caught this.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14263
2022-12-08 13:51:24 -08:00
Brian Behlendorf 0620051224 ZTS: Add missing tests to Makefile.am
The send-c_zstream_recompress.ksh test case was being skipped
because it was not added to the Makefile.am, and was thus left
out of the package.

As for the renameat2 tests these were being skipped because
when the patch was rebased it was not updated to use the new
Makefile layout for the tests directory.  Correct this.

Add missing pre/post sections to sanity.run so the pyzfs tests
will run.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14266
2022-12-07 17:26:33 -08:00
Richard Yao 59493b63c1 Micro-optimize fletcher4 calculations
When processing abds, we execute 1 `kfpu_begin()`/`kfpu_end()` pair on
every page in the abd. This is wasteful and slows down checksum
performance versus what the benchmark claimed. We correct this by moving
those calls to the init and fini functions.

Also, we always check the buffer length against 0 before calling the
non-scalar checksum functions. This means that we do not need to execute
the loop condition for the first loop iteration. That allows us to
micro-optimize the checksum calculations by switching to do-while loops.

Note that we do not apply that micro-optimization to the scalar
implementation because there is no check in
`fletcher_4_incremental_native()`/`fletcher_4_incremental_byteswap()`
against 0 sized buffers being passed.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14247
2022-12-05 11:00:34 -08:00
Richard Yao 7b9a423076 FreeBSD: zfs_register_callbacks() must implement error check correctly
I read the following article and noticed a couple of ZFS bugs mentioned:

https://pvs-studio.com/en/blog/posts/cpp/0377/

I decided to search for them in the modern OpenZFS codebase and then
found one that matched the description of the first one:

V593 Consider reviewing the expression of the 'A = B != C' kind. The
expression is calculated as following: 'A = (B != C)'. zfs_vfsops.c 498

The consequence of this is that the error value is replaced with `1`
when there is an error. When there is no error, 0 is correctly passed.
This is a very minor issue that is unlikely to cause any real problems.

The incorrect error code would either be returned to the mount command
on a failure or any of `zfs receive`, `zfs recv`, `zfs rollback` or `zfs
upgrade`.

The second one has already been fixed.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14261
2022-12-05 10:16:50 -08:00
George Wilson ffd2e15d65 zio can deadlock during device removal
When doing a device removal on a pool with gang blocks, the zio pipeline
can deadlock when trying to free blocks from a device which is being
removed with a stack similar to this:

 0xffff8ab9a13a1740 UNINTERRUPTIBLE       4
                   __schedule+0x2e5
                   __schedule+0x2e5
                   schedule+0x33
                   schedule_preempt_disabled+0xe
                   __mutex_lock.isra.12+0x2a7
                   __mutex_lock.isra.12+0x2a7
                   __mutex_lock_slowpath+0x13
                   mutex_lock+0x2c
                   free_from_removing_vdev+0x61
                   metaslab_free_impl+0xd6
                   metaslab_free_dva+0x5e
                   metaslab_free+0x196
                   zio_free_sync+0xe4
                   zio_free_gang+0x38
                   zio_gang_tree_issue+0x42
                   zio_gang_tree_issue+0xa2
                   zio_gang_issue+0x6d
                   zio_execute+0x94
                   zio_execute+0x94
                   taskq_thread+0x23b
                   kthread+0x120
                   ret_from_fork+0x1f

Since there are gang blocks we have to read the gang members as part of
the free. This can be seen with a zio dependency tree that looks like
this:

sdb> echo 0xffff900c24f8a700 | zio -rc | zio
ADDRESS                       TYPE  STAGE            WAITER
0xffff900c24f8a700            NULL  CHECKSUM_VERIFY  0xffff900ddfd31740
0xffff900c24f8c920            FREE  GANG_ASSEMBLE    -
0xffff900d93d435a0            READ  DONE

In the illustration above we are processing frees but because of gang
block we have to read the constituents blocks. Once we finish the READ
in the zio pipeline we will execute the parent. In this case the parent
is a FREE but the zio taskq is a READ and we continue to process the
pipeline leading to the stack above. In the stack above, we are blocked
waiting for the svr_lock so as a result a READ interrupt taskq thread
is now consumed. Eventually, all of the READ taskq threads end up
blocked and we're unable to complete any read requests.

In zio_notify_parent there is an optimization to continue to use
the taskq thread to exectue the parent's pipeline. To resolve the
deadlock above, we only allow this optimization if the parent's
zio type matches the child which just completed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-issue: DLPX-80130
Closes #14236
2022-12-02 17:46:29 -08:00
George Wilson d7cf06a25d nopwrites on dmu_sync-ed blocks can result in a panic
After a device has been removed, any nopwrites for blocks on that
indirect vdev should be ignored and a new block should be allocated. The
original code attempted to handle this but used the wrong block pointer
when checking for indirect vdevs and failed to check all DVAs.

This change corrects both of these issues and modifies the test case
to ensure that it properly tests nopwrites with device removal.

Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #14235
2022-12-02 17:45:33 -08:00
Rob Wing 2c590bdede ZTS: test reported checksum errors for ZED
Test checksum error reporting to ZED via the call paths
vdev_raidz_io_done_unrecoverable() and zio_checksum_verify().

Sponsored-by: Seagate Technology LLC
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Closes #14190
2022-12-02 17:43:18 -08:00
Rob Wing 7a75f74cec Bump checksum error counter before reporting to ZED
The checksum error counter is incremented after reporting to ZED. This
leads ZED to receiving a checksum error report with 0 checksum errors.

To avoid this, bump the checksum error counter before reporting to ZED.

Sponsored-by: Seagate Technology LLC
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Closes #14190
2022-12-02 17:42:22 -08:00
Xinliang Liu db6ba8d744 autoconf: add support for openEuler
Add config support for openEuler, so that it set the right sysconfig
dir for openEuler.

And DEFAULT_INIT_SCRIPT is no longer needed since commit "2a34db1bd
Base init scripts for SYSV systems".

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Closes #14241
2022-12-02 17:39:48 -08:00
szubersk fe975048da Fix Clang 15 compilation errors
- Clang 15 doesn't support `-fno-ipa-sra` anymore. Do a separate
  check for `-fno-ipa-sra` support by $KERNEL_CC.

- Don't enable `-mgeneral-regs-only` for certain module files.
  Fix #13260

- Scope `GCC diagnostic ignored` statements to GCC only. Clang
  doesn't need them to compile the code.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13260
Closes #14150
2022-11-30 13:46:26 -08:00
szubersk 3c1e1933b6 Fix GCC 12 compilation errors
Squelch false positives reported by GCC 12 with UBSan.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14150
2022-11-30 13:45:53 -08:00
Brian Behlendorf 3a6d89ae52 Retire Ubuntu 18.04 CI builder
The GitHub-hosted Ubuntu 18.04 has been deprecated and will be
entirely unsupported as of April 2023.  Leading up to this there
will be scheduled "brownouts" to encourage users to update their
workflows.

This commit retires our use of the GitHub-hosted Ubuntu 18.04
runners in advance of their removal.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14238
2022-11-30 10:25:57 -08:00
Yann Collet 776441152e zstd: Refactor prefetching for the decoding loop
Following facebook/zstd#2545, I noticed that one field in `seq_t` is
optional, and only used in combination with prefetching. (This may have
contributed to static analyzer failure to detect correct
initialization).

I then wondered if it would be possible to rewrite the code so that this
optional part is handled directly by the prefetching code rather than
delegated as an option into `ZSTD_decodeSequence()`.

This resulted into this refactoring exercise where the prefetching
responsibility is better isolated into its own function and
`ZSTD_decodeSequence()` is streamlined to contain strictly Sequence
decoding operations.  Incidently, due to better code locality, it
reduces the need to send information around, leading to simplified
interface, and smaller state structures.

Port of facebook/zstd@f5434663ea

Reported-by: Coverity (CID 1462271)
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Ported-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14212
2022-11-29 10:05:30 -08:00
Nick Terrell 466cf54ecf zstd: [superblock] Add defensive assert and bounds check
The bound check condition should always be met because we selected
`set_basic` as our encoding type. But that code is very far away, so
assert it is true so if it is ever false we can catch it, and add a
bounds check.

Port of facebook/zstd@1047097dad

Reported-by: Coverity (CID 1524446)
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Ported-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14212
2022-11-29 10:04:43 -08:00
Richard Yao eb1ed2a66b Coverity Model Update
When reviewing Clang Static Analyzer reports against a branch that had
experimental header changes based on the Coverity model file to inform
it that KM_SLEEP allocations cannot return NULL, I found a report saying
that a KM_PUSHPAGE allocation returned NULL. The actual implementation
does not return NULL unless KM_NOSLEEP has been passed, so we backport
the correction from the experimental header changes to the Coverity
model.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14210
2022-11-29 10:00:56 -08:00
Richard Yao 97fac0fb70 Fix NULL pointer dereference in dbuf_prefetch_indirect_done()
When ZFS is built with assertions, a prefetch is done on a redacted
blkptr and `dpa->dpa_dnode` is NULL, we will have a NULL pointer
dereference in `dbuf_prefetch_indirect_done()`.

Both Coverity and Clang's Static Analyzer caught this.

Reported-by: Coverity (CID 1524671)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14210
2022-11-29 10:00:50 -08:00
Richard Yao 887fb37843 zdb: Silence Coverity complaint about verify_livelist_allocs()
svb is declared on the stack. We then set parts of svb.svb_dva with
DVA_SET_VDEV(), DVA_SET_OFFSET() and DVA_SET_ASIZE(). However, the DVA
contains other fields for pad, GRID and G. When setting the fields we
use, we technically read uninitialized bits  from the fields we do not
use. This makes Coverity and Clang's Static Analyzer complain.
Presumably, other static analyzers might complain too.

There is no real bug here, but we are still technically reading
undefined data and unless we stop doing that, static analyzers will
complain about it in perpetuum and this could obscure real issues. We
silence the static analyzer complaints by using a 0 struct initializer.

Reported by: Coverity (CID 1524627)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14210
2022-11-29 10:00:45 -08:00
Richard Yao 8532da5e20 Cleanup: Delete dead code from send_merge_thread()
range is always deferenced before it reaches this check, such that the
kmem_zalloc() call is never executed.

A previously version of this had erronously also pruned the
`range->eos_marker = B_TRUE` line, but it must be set whenever we
encounter an error or are cancelled early.

Coverity incorrectly complained about a potential NULL pointer
dereference because of this.

Reported-by: Coverity (CID 1524550)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14210
2022-11-29 09:59:53 -08:00
Alexander b5459dd354 Fix the last two CFI callback prototype mismatches
There was the series from me a year ago which fixed most of the
callback vs implementation prototype mismatches. It was based on
running the CFI-enabled kernel (in permissive mode -- warning
instead of panic) and performing a full ZTS cycle, and then fixing
all of the problems caught by CFI.
Now, Clang 16-dev has new warning flag, -Wcast-function-type-strict,
which detect such mismatches at compile-time. It allows to find the
remaining issues missed by the first series.
There are only two of them left: one for the
secpolicy_vnode_setattr() callback and one for taskq_dispatch().
The fix is easy, since they are not used anywhere else.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14207
2022-11-29 09:56:16 -08:00
Richard Yao 587a39b729 Lua: Fix bad bitshift in lua_strx2number()
The port of lua to OpenZFS modified lua to use int64_t for numbers
instead of double. As part of this, a function for calculating
exponentiation was replaced with a bit shift. Unfortunately, it did not
handle negative values. Also, it only supported exponents numbers with
7 digits before before overflow. This supports exponents up to 15 digits
before overflow.

Clang's static analyzer reported this as "Result of operation is garbage
or undefined" because the exponent was negative.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14204
2022-11-29 09:53:33 -08:00
Brooks Davis d6df4441c0 Don't leak packed recieved proprties
When local properties (e.g., from -o and -x) are provided, don't leak
the packed representation of the received properties due to variable
reuse.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14197
2022-11-29 09:51:35 -08:00
Alexander Motin fd61b2eaba Remove few pointer dereferences in dbuf_read()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #14199
2022-11-29 09:49:02 -08:00
Mateusz Guzik 9aea88ba44 FreeBSD: stop using buffer cache-only routines on sync
Both vop_fsync and vfs_stdsync are effectively just costly no-ops
as they only act on ->v_bufobj.bo_dirty et al, which are unused
by zfs.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by:	Mateusz Guzik <mjguzik@gmail.com>
Closes #14157
2022-11-29 09:35:25 -08:00
Alexander Motin 4df415aa86 Switch dnode stats to wmsums
I've noticed that some of those counters are used in hot paths like
dnode_hold_impl(), and results of this change is visible in profiler.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #14198
2022-11-29 09:33:45 -08:00
Xinliang Liu 6712c77140 rpm: add support for openEuler
OpenEuler uses the same package manager DNF as RHEL/Fedora. And
it is similar to RHEL/Fedora.

OpenEuler Linux is becoming the mainstream Linux distro in China.
So adding support for it makes sense for the users. For more
details about it see: https://www.openeuler.org/en/.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Closes #14222
2022-11-29 09:27:22 -08:00
Alexander Motin f0a76fbec1 Micro-optimize zrl_remove()
atomic_dec_32() should be a bit lighter than atomic_dec_32_nv().

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #14200
2022-11-29 09:26:03 -08:00
Ameer Hamza e996c502e4 zed: unclean disk attachment faults the vdev
If the attached disk already contains a vdev GUID, it
means the disk is not clean. In such a scenario, the
physical path would be a match that makes the disk
faulted when trying to online it. So, we would only
want to proceed if either GUID matches with the last
attached disk or the disk is in a clean state.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14181
2022-11-29 09:24:10 -08:00
Richard Yao 303678350a Convert some sprintf() calls to kmem_scnprintf()
These `sprintf()` calls are used repeatedly to write to a buffer. There
is no protection against overflow other than reviewers explicitly
checking to see if the buffers are big enough. However, such issues are
easily missed during review and when they are missed, we would rather
stop printing rather than have a buffer overflow, so we convert these
functions to use `kmem_scnprintf()`. The Linux kernel provides an entire
page for module parameters, so we are safe to write up to PAGE_SIZE.

Removing `sprintf()` from these functions removes the last instances of
`sprintf()` usage in our platform-independent kernel code. This improves
XNU kernel compatibility because the XNU kernel does not support
(removed support for?) `sprintf()`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14209
2022-11-28 13:49:58 -08:00
Allan Jude d27a00283f Avoid a null pointer dereference in zfs_mount() on FreeBSD
When mounting the root filesystem, vfs_t->mnt_vnodecovered is null

This will cause zfsctl_is_node() to dereference a null pointer when
mounting, or updating the mount flags, on the root filesystem, both
of which happen during the boot process.

Reported-by: Martin Matuska <mm@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #14218
2022-11-28 13:40:49 -08:00
наб 3069872ef5 cmd: zfs: fix missing mention of zfs diff -h
Fixes: 344bbc82e7

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #14224
2022-11-28 13:37:07 -08:00
Richard Yao 022bdb5b0b Set multiple make jobs on CodeQL github workflows
github supports 2 processors right now, although we use nproc instead of
hard coding -j2, so if that ever increases, we will take advantage of it
right away.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14217
2022-11-28 13:29:55 -08:00
Minsoo Choo adf3b84fc7 Add <limits.h> header
According to the UNIX standard, <pthread.h> does not include some
PTHREAD_* values which are included in <limits.h>. OpenZFS uses
some of these values in its code, and this might cause build failure on
systems that do not have these PTHREAD_* values in <pthread.h>

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>
Closes #14225
2022-11-28 13:24:17 -08:00
Allan Jude 077fd55e85 zfs.4: Correct the default for zfs_vdev_async_write_max_active
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #14179
2022-11-28 11:41:14 -08:00
Damian Szuberski 387109364e Python3: replace distutils with sysconfig
- `distutils` module is long time deprecated and already deleted
  from the CPython mainline.

- To remain compatible with Debian/Ubuntu Python3 packaging style,
  try
  `distutils.sysconfig.get_python_path(0,0)`
  first with fallback on
  `sysconfig.get_path('purelib')`

- pyzfs_unittest suite is run unconditionally as a part of ZTS.

- Add pyzfs_unittest suite to sanity tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12833 
Closes #13280 
Closes #14177
2022-11-28 11:39:41 -08:00
Alexander Motin 5f45e3f699 Remove atomics from zh_refcount
It is protected by z_hold_locks, so we do not need more serialization,
simple integer math should be fine.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Closes #14196
2022-11-28 11:36:53 -08:00
John Wren Kennedy b0657a59ab ZTS: zts-report silently ignores perf test results
The regex used to extract test result information from a test run only
matches the functional tests. Update the regex so it matches both.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com>
Closes #14185
2022-11-18 11:43:18 -08:00
Ameer Hamza 3a74f488fc zed: post a udev change event from spa_vdev_attach()
In order for zed to process the removal event correctly,
udev change event needs to be posted to sync the blkid
information. spa_create() and spa_config_update() posts
the event already through spa_write_cachefile(). Doing
the same for spa_vdev_attach() that handles the case
for vdev attachment and replacement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14172
2022-11-18 11:39:59 -08:00
George Amanakis 3226e0dc8e Fix setting the large_block feature after receiving a snapshot
We are not allowed to dirty a filesystem when done receiving
a snapshot. In this case the flag SPA_FEATURE_LARGE_BLOCKS will
not be set on that filesystem since the filesystem is not on
dp_dirty_datasets, and a subsequent encrypted raw send will fail.
Fix this by checking in dsl_dataset_snapshot_sync_impl() if the feature
needs to be activated and do so if appropriate.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13699
Closes #13782
2022-11-18 11:38:37 -08:00
Laura Hild 99c0479a4e Correct multipathd.target to .service
https://github.com/openzfs/zfs/pull/9863 says it "orders
zfs-import-cache.service and zfs-import-scan.service after
multipathd.service" but the commit (79add96) actually
ordered them after .target.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Laura Hild <lsh@jlab.org>
Closes #12709
Closes #14171
2022-11-18 11:36:19 -08:00
Richard Yao 0a0166c975 FreeBSD: do_mount() passes wrong string length to helper
It should pass `MNT_LINE_MAX`, but passes `sizeof (mntpt)`. This is
harmless because the strlen is not actually used by the helper, but
FreeBSD's Coverity scans complained about it.

This was missed in my audit of various string functions since it is not
actually passed to a string function.

Upon review, it was noticed that the helper function does not need to be
a separate function, so I have inlined it as cleanup.

Reported-by: Coverity (CID 1432079)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: szubersk <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14136
2022-11-18 11:34:25 -08:00
Richard Yao 31247c78b1 FreeBSD: get_zfs_ioctl_version() should be cast to (void)
FreeBSD's Coverity scans complain that we ignore the return value. There
is no need to check the return value so we cast it to (void) to suppress
further complaints by static analyzers.

Reported-by: Coverity (CID 1018175)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: szubersk <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14136
2022-11-18 11:33:12 -08:00
szubersk 9e7fc5da38 Ubuntu 22.04 integration: GitHub workflows
- GitHub workflows are run on Ubuntu 22.04

- Extract the `checkstyle` workflow dependencies to a separate file.

- Refresh the `build-dependencies.txt` list.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14148
2022-11-18 11:27:03 -08:00
szubersk 32ef14de0f Ubuntu 22.04 integration: ZTS
Add `detect_odr_violation=1` to ASAN_OPTIONS to allow both libzfs
and libzpool expose

```
zfeature_info_t spa_feature_table[SPA_FEATURES]
```

from module/zcommon/zfeature_common.c in public ABI.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14148
2022-11-18 11:26:55 -08:00
szubersk 28ea4f9b08 Ubuntu 22.04 integration: Cppcheck
Suppress a false positive found by new Cppcheck version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14148
2022-11-18 11:26:50 -08:00
szubersk b46be903fb Ubuntu 22.04 integration: mancheck
Correct new mandoc errors.
```
STYLE: input text line longer than 80 bytes
STYLE: no blank before trailing delimiter
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14148
2022-11-18 11:26:41 -08:00
szubersk a5087965fe Ubuntu 22.04 integration: ShellCheck
- Add new SC2312 global exclude.
  ```
  Consider invoking this command separately to avoid masking its return
  value (or use '|| true' to ignore). [SC2312]
  ```

- Correct errors detected by new ShellCheck version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14148
2022-11-18 11:24:48 -08:00
Damian Szuberski c3b6fd3d59 Make autodetection disable pyzfs for kernel/srpm configurations
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13394
Closes #14178
2022-11-16 09:27:53 -08:00
Rich Ercolani 2163cde450 Handle and detect #13709's unlock regression (#14161)
In #13709, as in #11294 before it, it turns out that 63a26454 still had
the same failure mode as when it was first landed as d1d47691, and
fails to unlock certain datasets that formerly worked.

Rather than reverting it again, let's add handling to just throw out
the accounting metadata that failed to unlock when that happens, as
well as a test with a pre-broken pool image to ensure that we never get
bitten by this again.

Fixes: #13709

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2022-11-15 14:44:12 -08:00
shodanshok b445b25b27 Fix arc_p aggressive increase
The original ARC paper called for an initial 50/50 MRU/MFU split
and this is accounted in various places where arc_p = arc_c >> 1,
with further adjustment based on ghost lists size/hit. However, in
current code both arc_adapt() and arc_get_data_impl() aggressively
grow arc_p until arc_c is reached, causing unneeded pressure on
MFU and greatly reducing its scan-resistance until ghost list
adjustments kick in.

This patch restores the original behavior of initially having arc_p
as 1/2 of total ARC, without preventing MRU to use up to 100% total
ARC when MFU is empty.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #14137 
Closes #14120
2022-11-11 10:41:36 -08:00
Paul Dagnelie 9f4ede63d2 Add ability to recompress send streams with new compression algorithm
As new compression algorithms are added to ZFS, it could be useful for 
people to recompress data with new algorithms. There is currently no 
mechanism to do this aside from copying the data manually into a new 
filesystem with the new algorithm enabled. This tool allows the 
transformation to happen through zfs send, allowing it to be done 
efficiently to remote systems and in an incremental fashion.

A new zstream command is added that decompresses WRITE records and 
then recompresses them with a provided algorithm, and then re-emits 
the modified send stream. It may also be possible to re-compress 
embedded block pointers, but that was not attempted for the initial 
version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #14106
2022-11-10 15:23:46 -08:00
John Wren Kennedy e9ab9e512c ZTS: random_readwrite test doesn't run correctly
This test uses fio's bssplit mechanism to choose io sizes for the test,
leaving the PERF_IOSIZES variable empty. Because that variable is
empty, the innermost loop in do_fio_run_impl is never executed, and as
a result, this test does the setup but collects no data. Setting the
variable to "bssplit" allows performance data to be gathered.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com>
Closes #14163
2022-11-10 14:00:04 -08:00
Richard Yao b1eec00904 Cleanup: Suppress Coverity dereference before/after NULL check reports
f224eddf92 began dereferencing a NULL
checked pointer in zpl_vap_init(), which made Coverity complain because
either the dereference is unsafe or the NULL check is unnecessary. Upon
inspection, this pointer is guaranteed to never be NULL because it is
from the Linux kernel VFS. The calls into ZFS simply would not make
sense if this pointer were NULL, so the NULL check is unnecessary.

Reported-by: Coverity (CID 1527260)
Reported-by: Coverity (CID 1527262)
Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14170
2022-11-10 13:58:05 -08:00
Richard Yao 9e2be2dfbd Fix potential NULL pointer dereference regression
945b407486 neglected to `NULL` check
`tx->tx_objset`, which is already done in the function. This upset
Coverity, which complained about a "dereference after null check".

Upon inspection, it was found that whenever `dmu_tx_create_dd()` is
called followed by `dmu_tx_assign()`, such as in
`dsl_sync_task_common()`, `tx->tx_objset` will be `NULL`.

Reported-by: Coverity (CID 1527261)
Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14170
2022-11-10 13:56:28 -08:00
Mariusz Zaborski 16f0fdaddd Allow to control failfast
Linux defaults to setting "failfast" on BIOs, so that the OS will not
retry IOs that fail, and instead report the error to ZFS.

In some cases, such as errors reported by the HBA driver, not
the device itself, we would wish to retry rather than generating
vdev errors in ZFS. This new property allows that.

This introduces a per vdev option to disable the failfast option.
This also introduces a global module parameter to define the failfast
mask value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by: Seagate Technology LLC
Submitted-by: Klara, Inc.
Closes #14056
2022-11-10 13:37:12 -08:00
Mariusz Zaborski 945b407486 quota: disable quota check for ZVOL
The quota for ZVOLs is set to the size of the volume. When the quota
reaches the maximum, there isn't an excellent way to check if the new
writers are overwriting the data or if they are inserting a new one.
Because of that, when we reach the maximum quota, we wait till txg is
flushed. This is causing a significant fluctuation in bandwidth.

In the case of ZVOL, the quota is enforced by the volsize, so we
can omit it.

This commit adds a sysctl thats allow to control if the quota mechanism
should be enforced or not.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by: Zededa Inc.
Sponsored-by: Klara Inc.
Closes #13838
2022-11-08 12:40:22 -08:00
Alan Somers e197bb24f1 Optionally skip zil_close during zvol_create_minor_impl
If there were no zil entries to replay, skip zil_close.  zil_close waits
for a transaction to sync.  That can take several seconds, for example
during pool import of a resilvering pool.  Skipping zil_close can cut
the time for "zpool import" from 2 hours to 45 seconds on a resilvering
pool with a thousand zvols.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Sponsored-by: Axcient
Closes #13999 
Closes #14015
2022-11-08 12:38:08 -08:00
youzhongyang f224eddf92 Support idmapped mount in user namespace
Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping 
infrastructure to support idmapped mounts of filesystems mounted 
with an idmapping". Update the OpenZFS accordingly to improve the 
idmapped mount support. 

This pull request contains the following changes:

- xattr setter functions are fixed to take mnt_ns argument. Without
  this, cp -p would fail for an idmapped mount in a user namespace.
- idmap_util is enhanced/fixed for its use in a user ns context.
- One test case added to test idmapped mount in a user ns.

Reviewed-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14097
2022-11-08 10:28:56 -08:00
Damian Szuberski 109731cd73 dsl_prop_known_index(): check for invalid prop
Resolve UBSAN array-index-out-of-bounds error in zprop_desc_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14142
Closes #14147
2022-11-08 10:16:01 -08:00
Mohamed Tawfik 41715771b5 Adds the -p option to zfs holds
This allows for printing a machine-readable, accurate to the second,
hold creation time in the form of a unix epoch timestamp.

Additionally, updates relevant documentation and man pages accordingly.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mohamed Tawfik <m_tawfik@aucegypt.edu>
Closes #13690
Closes #14152
2022-11-08 10:08:21 -08:00
Brooks Davis ecbf02791f freebsd: simplify MD isa_defs.h
Most of this file was a pile of defines, apparently from Solaris that
controlled nothing in the source tree.  A few things controlled the
definition of unused types or macros which I have removed.

Considerable further cleanup is possible including removal of
architectures FreeBSD never supported.  This file should likely converge
with the Linux version to the extent possible.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127
2022-11-07 15:55:37 -08:00
Brooks Davis e3ba8eb12e freebsd: trim dkio.h to the minimum
Only DKIOCFLUSHWRITECACHE is required.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127
2022-11-07 15:55:33 -08:00
Brooks Davis 20b867f5f7 freebsd: add ifdefs around legacy ioctl support
Require that ZFS_LEGACY_SUPPORT be defined for legacy ioctl support to
be built.  For now, define it in zfs_ioctl_compat.h so support is always
built.  This will allow systems that need never support pre-openzfs
tools a mechanism to remove support at build time.  This code should
be removed once the need for tool compatability is gone.

No functional change at this time.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127
2022-11-07 15:55:26 -08:00
Brooks Davis 6c89cffc2c freebsd: remove no-op vn_renamepath()
vn_renamepath() is a Solaris-ism that was defined away in the FreeBSD
port.  Now that the only use is in the FreeBSD zfs_vnops_os.c, drop it
entierly.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127
2022-11-07 15:55:20 -08:00
Brooks Davis 270b1b5fa7 freebsd: remove unused vn_rename()
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127
2022-11-07 15:54:43 -08:00
Ameer Hamza c23738c70e zed: Prevent special vdev to be replaced by hot spare
Special vdevs should not be replaced by a hot spare.
Log vdevs already support this, extending the
functionality for special vdevs.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14129
2022-11-04 11:33:47 -07:00
Alexander Lobakin 73b8f700b6 icp: fix all !ENDBR objtool warnings in x86 Asm code
Currently, only Blake3 x86 Asm code has signs of being ENDBR-aware.
At least, under certain conditions it includes some header file and
uses some custom macro from there.
Linux has its own NOENDBR since several releases ago. It's defined
in the same <asm/linkage.h>, so currently <sys/asm_linkage.h>
already is provided with it.

Let's unify those two into one %ENDBR macro. At first, check if it's
present already. If so -- use Linux kernel version. Otherwise, try
to go that second way and use %_CET_ENDBR from <cet.h> if available.
If no, fall back to just empty definition.
This fixes a couple more 'relocations to !ENDBR' across the module.
And now that we always have the latest/actual ENDBR definition, use
it at the entrance of the few corresponding functions that objtool
still complains about. This matches the way how it's used in the
upstream x86 core Asm code.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14035
2022-11-04 11:25:56 -07:00
Alexander Lobakin 61cca6fa05 icp: fix rodata being marked as text in x86 Asm code
objtool properly complains that it can't decode some of the
instructions from ICP x86 Asm code. As mentioned in the Makefile,
where those object files were excluded from objtool check (but they
can still be visible under IBT and LTO), those are just constants,
not code.
In that case, they must be placed in .rodata, so they won't be
marked as "allocatable, executable" (ax) in EFL headers and this
effectively prevents objtool from trying to decode this data. That
reveals a whole bunch of other issues in ICP Asm code, as previously
objtool was bailing out after that warning message.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14035
2022-11-04 11:25:51 -07:00
Alexander Lobakin b844489ec0 icp: properly fix all RETs in x86_64 Asm code
Commit 43569ee374 ("Fix objtool: missing int3 after ret warning")
addressed replacing all `ret`s in x86 asm code to a macro in the
Linux kernel in order to enable SLS. That was done by copying the
upstream macro definitions and fixed objtool complaints.
Since then, several more mitigations were introduced, including
Rethunk. It requires to have a jump to one of the thunks in order
to work, so the RET macro was changed again. And, as ZFS code
didn't use the mainline defition, but copied it, this is currently
missing.

Objtool reminds about it time to time (Clang 16, CONFIG_RETHUNK=y):

fs/zfs/lua/zlua.o: warning: objtool: setjmp+0x25: 'naked' return
 found in RETHUNK build
fs/zfs/lua/zlua.o: warning: objtool: longjmp+0x27: 'naked' return
 found in RETHUNK build

Do it the following way:
* if we're building under Linux, unconditionally include
  <linux/linkage.h> in the related files. It is available in x86
  sources since even pre-2.6 times, so doesn't need any conftests;
* then, if RET macro is available, it will be used directly, so that
  we will always have the version actual to the kernel we build;
* if there's no such macro, we define it as a simple `ret`, as it
  was on pre-SLS times.

This ensures we always have the up-to-date definition with no need
to update it manually, and at the same time is safe for the whole
variety of kernels ZFS module supports.
Then, there's a couple more "naked" rets left in the code, they're
just defined as:

	.byte 0xf3,0xc3

In fact, this is just:

	rep ret

`rep ret` instead of just `ret` seems to mitigate performance issues
on some old AMD processors and most likely makes no sense as of
today.
Anyways, address those rets, so that they will be protected with
Rethunk and SLS. Include <sys/asm_linkage.h> here which now always
has RET definition and replace those constructs with just RET.
This wipes the last couple of places with unpatched rets objtool's
been complaining about.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14035
2022-11-04 11:24:09 -07:00
Richard Yao 993ee7a006 FreeBSD: Fix out of bounds read in zfs_ioctl_ozfs_to_legacy()
There is an off by 1 error in the check. Fortunately, this function does
not appear to be used in kernel space, despite being compiled as part of
the kernel module. However, it is used in userspace. Callers of
lzc_ioctl_fd() likely will crash if they attempt to use the
unimplemented request number.

This was reported by FreeBSD's coverity scan.

Reported-by: Coverity (CID 1432059)
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14135
2022-11-04 11:06:14 -07:00
Serapheim Dimitropoulos f66ffe6878 Expose zfs_vdev_open_timeout_ms as a tunable
Some of our customers have been occasionally hitting zfs import failures
in Linux because udevd doesn't create the by-id symbolic links in time
for zpool import to use them. The main issue is that the
systemd-udev-settle.service that zfs-import-cache.service and other
services depend on is racy. There is also an openzfs issue filed (see
https://github.com/openzfs/zfs/issues/10891) outlining the problem and
potential solutions.

With the proper solutions being significant in terms of complexity and
the priority of the issue being low for the time being, this patch
exposes `zfs_vdev_open_timeout_ms` as a tunable so people that are
experiencing this issue often can increase it as a workaround.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14133
2022-11-03 15:02:46 -07:00
Allan Jude 595d3ac2ed Allow mounting snapshots in .zfs/snapshot as a regular user
Rather than doing a terrible credential swapping hack, we just
check that the thing being mounted is a snapshot, and the mountpoint
is the zfsctl directory, then we allow it.

If the mount attempt is from inside a jail, on an unjailed dataset
(mounted from the host, not by the jail), the ability to mount the
snapshot is controlled by a new per-jail parameter: zfs.mount_snapshot

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Modirum MDPay
Sponsored-by: Klara Inc.
Closes #13758
2022-11-03 11:53:24 -07:00
Richard Yao 11e3416ae7 Cleanup: Remove branches that always evaluate the same way
Coverity reported that the ASSERT in taskq_create() is always true and
the `*offp > MAXOFFSET_T` check in zfs_file_seek() is always false.

We delete them as cleanup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14130
2022-11-03 10:47:48 -07:00
Brooks Davis 1e1ce10e55 Remove an unused variable
Clang-16 detects this set-but-unused variable which is assigned and
incremented, but never referenced otherwise.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14125
2022-11-03 10:17:17 -07:00
Brooks Davis abb42dc5e1 Make 1-bit bitfields unsigned
This fixes -Wsingle-bit-bitfield-constant-conversion warning from
clang-16 like:

lib/libzfs/libzfs_dataset.c:4529:19: error: implicit truncation
  from 'int' to a one-bit wide bit-field changes value from
  1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
                flags.nounmount = B_TRUE;
				^ ~~~~~~

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14125
2022-11-03 10:16:16 -07:00
Richard Yao f47f6a055d Address warnings about possible division by zero from clangsa
* The complaint in ztest_replay_write() is only possible if something
   went horribly wrong. An assertion will silence this and if it goes
   off, we will know that something is wrong.
 * The complaint in spa_estimate_metaslabs_to_flush() is not impossible,
   but seems very unlikely. We resolve this by passing the value from
   the `MIN()` that does not go to infinity when the variable is zero.

There was a third report from Clang's scan-build, but that was a
definite false positive and disappeared when checked again through
Clang's static analyzer with Z3 refution via CodeChecker.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14124
2022-11-03 09:58:14 -07:00
Brooks Davis 27d29946be libuutil: deobfuscate internal pointers
uu_avl and uu_list stored internal next/prev pointers and parent
pointers (unused) obfuscated (byte swapped) to hide them from a long
forgotten leak checker (No one at the 2022 OpenZFS developers meeting
could recall the history.)  This would break on CHERI systems and adds
no obvious value.  Rename the members, use proper types rather than
uintptr_t, and eliminate the related macros.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14126
2022-11-03 09:57:05 -07:00
Attila Fülöp 211ec1b9fd Deny receiving into encrypted datasets if the keys are not loaded
Commit 68ddc06b61 introduced support
for receiving unencrypted datasets as children of encrypted ones but
unfortunately got the logic upside down. This resulted in failing to
deny receives of incremental sends into encrypted datasets without
their keys loaded. If receiving a filesystem, the receive was done
into a newly created unencrypted child dataset of the target. In
case of volumes the receive made the target volume undeletable since
a dataset was created below it, which we obviously can't handle.
Incremental streams with embedded blocks are affected as well.

We fix the broken logic to properly deny receives in such cases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13598
Closes #14055
Closes #14119
2022-11-03 09:55:13 -07:00
Brooks Davis 84477e148d lua: cast through uintptr_t when return a pointer
Don't assume size_t can carry pointer provenance and use uintptr_t
(identialy on all current platforms) instead.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131
2022-11-03 09:52:28 -07:00
Brooks Davis b9041e1f27 Use intptr_t when storing an integer in a pointer
Cast the integer type to (u)intptr_t before casting to "void *".  In
CHERI C/C++ we warn on bare casts from integers to pointers to catch
attempts to create pointers our of thin air.  We allow the warning to be
supressed with a suitable cast through (u)intptr_t.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131
2022-11-03 09:52:23 -07:00
Brooks Davis 877790001e recvd_props_mode: use a uintptr_t to stash nvlists
Avoid assuming than a uint64_t can hold a pointer.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131
2022-11-03 09:52:19 -07:00
Brooks Davis 250b2bac78 zfs_onexit_add_cb: make action_handle point to a uintptr_t
Avoid assuming than a uint64_t can hold a pointer and reduce the
number of casts in the process.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131
2022-11-03 09:52:12 -07:00
Brooks Davis d96303cb07 acl: use uintptr_t for ace walker cookies
Avoid assuming that a pointer can fit in a uint64_t and use uintptr_t
instead.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131
2022-11-03 09:51:34 -07:00
Brooks Davis 7309e94239 linux isa_defs.h: Don't define _ALIGNMENT_REQUIRED
Nothing consumes this definition so stop defining it.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14128
2022-11-03 09:39:51 -07:00
Brooks Davis 5229071ba1 Improve RISC-V support
Check __riscv_xlen == 64 rather than _LP64 and define _LP64 if missing.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Brooks Davis <brooks.davis@sri.com>
Closes #14128
2022-11-03 09:39:28 -07:00
Richard Yao da3d266672 FreeBSD: Fix regression from kmem_scnprintf() in libzfs
kmem_scnprintf() is only available in libzpool. Recent buildbot issues
with showing FreeBSD results kept us from seeing this before
97143b9d31 was merged.

The code has been changed to sanitize the output from `kmem_scnprintf()`.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14111
2022-11-01 13:58:17 -07:00
Vince van Oosten fdc59cf563 include overrides for zfs snapshot/rollback bootfs.service
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Vince van Oosten <techhazard@codeforyouand.me>
Closes #14075
Closes #14076
2022-11-01 12:23:58 -07:00
Vince van Oosten 59ca6e2ad0 include overrides for zfs-import.target
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Vince van Oosten <techhazard@codeforyouand.me>
Closes #14075
Closes #14076
2022-11-01 12:23:51 -07:00
Vince van Oosten b10f73f78e include systemd overrides to zfs-dracut module
If a user that uses systemd and dracut wants to overide certain
settings, they typically use `systemctl edit [unit]` or place a file in
`/etc/systemd/system/[unit].d/override.conf` directly.

The zfs-dracut module did not include those overrides however, so this
did not have any effect at boot time.

For zfs-import-scan.service and zfs-import-cache.service, overrides are
now included in the dracut initramfs image.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Vince van Oosten <techhazard@codeforyouand.me>
Closes #14075
Closes #14076
2022-11-01 12:23:44 -07:00
Ryan Moeller 748b9d5bda zil: Relax assertion in zil_parse
Rather than panic debug builds when we fail to parse a whole ZIL, let's
instead improve the logging of errors and continue like in a release
build.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #14116
2022-11-01 12:19:32 -07:00
youzhongyang 95055c2ce2 ZTS: rsend_009_pos.ksh is destructive on zfs-on-root system
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14113
2022-11-01 12:08:37 -07:00
Richard Yao dcce0dc5f0 Fix oversights from 4170ae4e
4170ae4ea6 was intended to tackle TOCTOU
race conditions reported by CodeQL, but as an oversight, a file
descriptor was not closed and some comments were not updated.
Interestingly, CodeQL did not complain about the file descriptor leak,
so there is room for improvement in how we configure it to try to detect
this issue so that we get early warning about this.

In addition, an optimization opportunity was missed by mistake in
lib/libshare/os/linux/smb.c, which prevented us from truly closing the
TOCTOU race. This was also caught by Coverity.

Reported-by: Coverity (CID 1524424)
Reported-by: Coverity (CID 1526804)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14109
2022-10-31 10:01:04 -07:00
Allan Jude b37d495e04 Avoid null pointer dereference in dsl_fs_ss_limit_check()
Check for cr == NULL before dereferencing it in
dsl_enforce_ds_ss_limits() to lookup the zone/jail ID.

Reported-by: Coverity (CID 1210459)
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #14103
2022-10-29 13:08:54 -07:00
Richard Yao 97143b9d31 Introduce kmem_scnprintf()
`snprintf()` is meant to protect against buffer overflows, but operating
on the buffer using its return value, possibly by calling it again, can
cause a buffer overflow, because it will return how many characters it
would have written if it had enough space even when it did not. In a
number of places, we repeatedly call snprintf() by successively
incrementing a buffer offset and decrementing a buffer length, by its
return value. This is a potentially unsafe usage of `snprintf()`
whenever the buffer length is reached. CodeQL complained about this.

To fix this, we introduce `kmem_scnprintf()`, which will return 0 when
the buffer is zero or the number of written characters, minus 1 to
exclude the NULL character, when the buffer was too small. In all other
cases, it behaves like snprintf(). The name is inspired by the Linux and
XNU kernels' `scnprintf()`. The implementation was written before I
thought to look at `scnprintf()` and had a good name for it, but it
turned out to have identical semantics to the Linux kernel version.
That lead to the name, `kmem_scnprintf()`.

CodeQL only catches this issue in loops, so repeated use of snprintf()
outside of a loop was not caught. As a result, a thorough audit of the
codebase was done to examine all instances of `snprintf()` usage for
potential problems and a few were caught. Fixes for them are included in
this patch.

Unfortunately, ZED is one of the places where `snprintf()` is
potentially used incorrectly. Since using `kmem_scnprintf()` in it would
require changing how it is linked, we modify its usage to make it safe,
no matter what buffer length is used. In addition, there was a bug in
the use of the return value where the NULL format character was not
being written by pwrite(). That has been fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098
2022-10-29 13:05:11 -07:00
Richard Yao 2e08df84d8 Cleanup dump_bookmarks()
Assertions are meant to check assumptions, but the way that this
assertion is written does not check an assumption, since it is provably
always true. Removing the assertion will cause a compiler warning (made
into an error by -Werror) about printing up to 512 bytes to a 256-byte
buffer, so instead, we change the assertion to verify the assumption
that we never do a snprintf() that is truncated to avoid overrunning the
256-byte buffer.

This was caught by an audit of the codebase to look for misuse of
`snprintf()` after CodeQL reported that we had misused `snprintf()`. An
explanation of how snprintf() can be misused is here:

https://www.redhat.com/en/blog/trouble-snprintf

This particular instance did not misuse `snprintf()`, but it was caught
by the audit anyway.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098
2022-10-29 13:05:02 -07:00
Richard Yao d71d693261 Fix too few arguments to formatting function
CodeQL reported that when the VERIFY3U condition is false, we do not
pass enough arguments to `spl_panic()`. This is because the format
string from `snprintf()` was concatenated into the format string for
`spl_panic()`, which causes us to have an unexpected format specifier.

A CodeQL developer suggested fixing the macro to have a `%s` format
string that takes a stringified RIGHT argument, which would fix this.
However, upon inspection, the VERIFY3U check was never necessary in the
first place, so we remove it in favor of just calling `snprintf()`.

Lastly, it is interesting that every other static analyzer run on the
codebase did not catch this, including some that made an effort to catch
such things. Presumably, all of them relied on header annotations, which
we have not yet done on `spl_panic()`. CodeQL apparently is able to
track the flow of arguments on their way to annotated functions, which
llowed it to catch this when others did not. A future patch that I have
in development should annotate `spl_panic()`, so the others will catch
this too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098
2022-10-29 13:04:52 -07:00
Richard Yao 4170ae4ea6 Fix TOCTOU race conditions reported by CodeQL and Coverity
CodeQL and Coverity both complained about:

 * lib/libshare/os/linux/smb.c
 * tests/zfs-tests/cmd/mmapwrite.c
 	* twice
 * tests/zfs-tests/tests/functional/tmpfile/tmpfile_002_pos.c
 * tests/zfs-tests/tests/functional/tmpfile/tmpfile_stat_mode.c
	* coverity had a second complaint that CodeQL did not have
 * tests/zfs-tests/cmd/suid_write_to_file.c
	* Coverity had two complaints and CodeQL had one complaint, both
	  differed. The CodeQL complaint is about the main point of the
	  test, so it is not fixable without a hack involving `fork()`.

The issues reported by CodeQL are fixed, with the exception of the last
one, which is deemed to be a false positive that is too much trouble to
wrokaround. The issues reported by Coverity were only fixed if CodeQL
complained about them.

There were issues reported by Coverity in a number of other files that
were not reported by CodeQL, but fixing the CodeQL complaints is
considered a priority since we want to integrate it into a github
workflow, so the remaining Coverity complaints are left for future work.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098
2022-10-29 13:04:10 -07:00
Brian Behlendorf 82ad2a06ac Revert "Cleanup: Delete dead code from send_merge_thread()"
This reverts commit fb823de9f due to a regression.  It is in fact possible
for the range->eos_marker to be false on error.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14042
Closes #14104
2022-10-28 13:25:37 -07:00
Rob N ★ 5f0a48c7c9 debug: fix output from VERIFY0 assertion
The previous version reported all the right info, but the VERIFY3 name
made a little more confusing when looking for the matching location in
the source code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Rob N ★ <robn@despairlabs.com>
Closes #14099
2022-10-28 11:46:44 -07:00
Mariusz Zaborski 8af08a69cd quota: extend quota for dataset
This patch relax the quota limitation for dataset by around 3%.
What this means is that user can write more data then the quota is
set to. However thanks to that we can get more stable bandwidth, in
case when we are overwriting data in-place, and not consuming any
additional space.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Sponsored-by: Zededa Inc.
Sponsored-by: Klara Inc.
Closes #13839
2022-10-28 11:44:18 -07:00
shodanshok dc56c673e3 Fix ARC target collapse when zfs_arc_meta_limit_percent=100
Reclaim metadata when arc_available_memory < 0 even if
meta_used is not bigger than arc_meta_limit.

As described in https://github.com/openzfs/zfs/issues/14054 if
zfs_arc_meta_limit_percent=100 then ARC target can collapse to
arc_min due to arc_purge not freeing any metadata.

This patch lets arc_prune to do its work when arc_available_memory
is negative even if meta_used is not bigger than arc_meta_limit,
avoiding ARC target collapse.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #14054 
Closes #14093
2022-10-28 10:21:54 -07:00
vaclavskala 7822b50f54 Propagate extent_bytes change to autotrim thread
The autotrim thread only reads zfs_trim_extent_bytes_min and
zfs_trim_extent_bytes_max variable only on thread start.  We
should check for parameter changes during thread execution to
allow parameter changes take effect without needing to disable
then restart the autotrim.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Václav Skála <skala@vshosting.cz>
Closes #14077
2022-10-28 10:16:31 -07:00
Aleksa Sarai dbf6108b4d zfs_rename: support RENAME_* flags
Implement support for Linux's RENAME_* flags (for renameat2). Aside from
being quite useful for userspace (providing race-free ways to exchange
paths and implement mv --no-clobber), they are used by overlayfs and are
thus required in order to use overlayfs-on-ZFS.

In order for us to represent the new renameat2(2) flags in the ZIL, we
create two new transaction types for the two flags which need
transactional-level support (RENAME_EXCHANGE and RENAME_WHITEOUT).
RENAME_NOREPLACE does not need any ZIL support because we know that if
the operation succeeded before creating the ZIL entry, there was no file
to be clobbered and thus it can be treated as a regular TX_RENAME.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Closes #12209
Closes #14070
2022-10-28 09:49:20 -07:00
Aleksa Sarai e015d6cc0b zfs_rename: restructure to have cleaner fallbacks
This is in preparation for RENAME_EXCHANGE and RENAME_WHITEOUT support
for ZoL, but the changes here allow for far nicer fallbacks than the
previous implementation (the source and target are re-linked in case of
the final link failing).

In addition, a small cleanup was done for the "target exists but is a
different type" codepath so that it's more understandable.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Closes #12209
Closes #14070
2022-10-28 09:48:58 -07:00
Aleksa Sarai 7b3ba29654 debug: add VERIFY_{IMPLY,EQUIV} variants
This allows for much cleaner VERIFY-level assertions.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Closes #14070
2022-10-28 09:48:43 -07:00
Pavel Snajdr 86db35c447 Remove zpl_revalidate: fix snapshot rollback
Open files, which aren't present in the snapshot, which is being
roll-backed to, need to disappear from the visible VFS image of
the dataset.

Kernel provides d_drop function to drop invalid entry from
the dcache, but inode can be referenced by dentry multiple dentries.

The introduced zpl_d_drop_aliases function walks and invalidates
all aliases of an inode.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #9600
Closes #14070
2022-10-28 09:47:19 -07:00
Andrew Innes e09fdda977 Fix multiplication converted to larger type
This fixes the instances of the "Multiplication result converted to 
larger type" alert that codeQL scanning found.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Closes #14094
2022-10-28 09:30:37 -07:00
youzhongyang 5d0fd8429b Fix zio_flag_t print format
Follow up for 4938d01d which changed zio_flag from enum to uint64_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14100
2022-10-28 09:08:12 -07:00
Damian Szuberski 1428ede0ba Process script directory for all configs
Even when only building kmods process the scripts directory.  This
way the common.sh script will be generated and the zfs.sh script
can be used to load/unload the in-tree kernel modules.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14027
Closes #14051
2022-10-27 16:45:14 -07:00
Umer Saleem 4d631a509d Add native Debian Packaging for Linux
Currently, the Debian packages are generated from ALIEN that converts
RPMs to Debian packages. This commit adds native Debian packaging for
Debian based systems.

This packaging is a fork of Debian zfs-linux 2.1.6-2 release.
(source: https://salsa.debian.org/zfsonlinux-team/zfs)

Some updates have been made to keep the footprint minimal that
include removing the tests, translation files, patches directory etc.
All credits go to Debian ZFS on Linux Packaging Team.

For copyright information, please refer to contrib/debian/copyright.

scripts/debian-packaging.sh can be used to invoke the build.

Reviewed-by: Mo Zhou <cdluminate@gmail.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13451
2022-10-27 15:38:45 -07:00
Richard Yao 4938d01db7 Convert enum zio_flag to uint64_t
We ran out of space in enum zio_flag for additional flags. Rather than
introduce enum zio_flag2 and then modify a bunch of functions to take a
second flags variable, we expand the type to 64 bits via `typedef
uint64_t zio_flag_t`.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Richard Yao <richard.yao@klarasystems.com>
Closes #14086
2022-10-27 09:54:54 -07:00
Richard Yao c8ae0ca11a Add CodeQL workflow
CodeQL is a static analyzer from github with a very low false positive
rate. We have long wanted to have static analysis runs done on every
pull request and using CodeQL, we can.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Andrew Innes <andrew.c12@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14087
2022-10-27 09:36:17 -07:00
Andrew Innes 07de86923b Aligned free for aligned alloc
Windows port frees memory that was alloc'd aligned in a different way
then alloc'd memory.  So changing frees to be specific.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Co-Authored-By: Jorgen Lundman <lundman@lundman.net>
Closes #14059
2022-10-26 15:08:31 -07:00
Andriy Gapon 41133c9794 FreeBSD: vn_flush_cached_data: observe vnode locking contract
vm_object_page_clean() expects that the associated vnode is locked
as VOP_PUTPAGES() may get called on the vnode.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #14079
2022-10-26 15:00:58 -07:00
Richard Yao eeddd80572 Silence objtool warnings from 55d7afa4
The use of __noreturn__ in 55d7afa4ad on
spl_panic() caused objtool warnings on Linux when the kernel is built
with CONFIG_STACK_VALIDATION=y. This patch works around that by
restricting the application of __noreturn__ to builds for static
analyzers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14068
2022-10-26 14:57:37 -07:00
Brian Behlendorf 871d66dbf2 Linux 6.0 compat: META
Update the META file to reflect compatibility with the 6.0 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14091
2022-10-26 14:55:12 -07:00
Ameer Hamza 0b2428da20 zed: Avoid core dump if wholedisk property does not exist
zed aborts and dumps core in vdev_whole_disk_from_config() if
wholedisk property does not exist. make_leaf_vdev() adds the
property but there may be already pools that don't have the
wholedisk in the label.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14062
2022-10-21 10:46:38 -07:00
youzhongyang c5a388a1ef Add delay between zpool add zvol and zpool destroy
As investigated by #14026, the zpool_add_004_pos can reliably hang if 
the timing is not right. This is caused by a race condition between 
zed doing zpool reopen (due to the zvol being added to the zpool), 
and the command zpool destroy.

This change adds a delay between zpool add zvol and zpool destroy to
avoid these issue, but does not address the underlying problem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Issue #14026
Closes #14052
2022-10-21 10:05:13 -07:00
Richard Yao 72a366f018 Linux: Fix big endian and partial read bugs in get_system_hostid()
Coverity made two complaints about this function. The first is that we
ignore the number of bytes read. The second is that we have a sizeof
mismatch.

On 64-bit systems, long is a 64-bit type. Paradoxically, the standard
says that hostid is 32-bit, yet is also a long type. On 64-bit big
endian systems, reading into the long would cause us to return 0 as our
hostid after the mask. This is wrong.

Also, if a partial read were to happen (it should not), we would return
a partial hostid, which is also wrong.

We introduce a uint32_t system_hostid stack variable and ensure that the
read is done into it and check the read's return value. Then we set the
value based on whether the read was successful. This should fix both of
coverity's complaints.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13968
2022-10-20 14:52:35 -07:00
Richard Yao ab32a14b2e Silence new static analyzer defect reports from idmap_util.c
2a068a1394 introduced 2 new defect
reports from Coverity and 1 from Clang's static analyzer.

Coverity complained about a potential resource leak from only calling
`close(fd)` when `fd > 0` because `fd` might be `0`. This is a false
positive, but rather than dismiss it as such, we can change the
comparison to ensure that this never appears again from any static
analyzer. Upon inspection, 6 more instances of this were found in the
file, so those were changed too. Unfortunately, since the file
descriptor has been put into an unsigned variable in `attr.userns_fd`,
we cannot do a non-negative check on it to see if it has not been
allocated, so we instead restructure the error handling to avoid the
need for a check. This also means that errors had not been handled
correctly here, so the static analyzer found a bug (although practically
by accident).

Coverity also complained about a dereference before a NULL check in
`do_idmap_mount()` on `source`. Upon inspection, it appears that the
pointer is never NULL, so we delete the NULL check as cleanup.

Clang's static analyzer complained that the return value of
`write_pid_idmaps()` can be uninitialized if we have no idmaps to write.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14061
2022-10-20 14:46:12 -07:00
Richard Yao a06df8d7c1 Linux: Upgrade random_get_pseudo_bytes() to xoshiro256++ 1.0
The motivation for upgrading our PRNG is the recent buildbot failures in
the ZTS' tests/functional/fault/decompress_fault test. The probability
of a failure in that test is 0.8^256, which is ~1.6e-25 out of 1, yet we
have observed multiple test failures in it. This suggests a problem with
our random number generation.

The xorshift128+ generator that we were using has been replaced by newer
generators that have "better statistical properties". After doing some
reading, it turns out that these generators have "low linear complexity
of the lowest bits", which could explain the ZTS test failures.

We do two things to try to fix this:

	1. We upgrade from xorshift128+ to xoshiro256++ 1.0.

	2. We tweak random_get_pseudo_bytes() to copy the higher order
	   bytes first.

It is hoped that this will fix the test failures in
tests/functional/fault/decompress_fault, although I have not done
simulations. I am skeptical that any simulations I do on a PRNG with a
period of 2^256 - 1 would be meaningful.

Since we have raised the minimum kernel version to 3.10 since this was
first implemented, we have the option of using the Linux kernel's
get_random_int(). However, I am not currently prepared to do performance
tests to ensure that this would not be a regression (for the time
being), so we opt for upgrading our PRNG to a newer one from Sebastiano
Vigna.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13983
2022-10-20 14:14:42 -07:00
Alexander Motin 9dcdee7889 Optimize microzaps
Microzap on-disk format does not include a hash tree, expecting one to
be built in RAM during mzap_open().  The built tree is linked to DMU
user buffer, freed when original DMU buffer is dropped from cache. I've
found that workloads accessing many large directories and having active
eviction from DMU cache spend significant amount of time building and
then destroying the trees.  I've also found that for each 64 byte mzap
element additional 64 byte tree element is allocated, that is a waste
of memory and CPU caches.

Improve memory efficiency of the hash tree by switching from AVL-tree
to B-tree.  It allows to save 24 bytes per element just on pointers.
Save 32 bits on mze_hash by storing only upper 32 bits since lower 32
bits are always zero for microzaps.  Save 16 bits on mze_chunkid, since
microzap can never have so many elements.  Respectively with the 16 bits
there can be no more than 16 bits of collision differentiators.  As
result, struct mzap_ent now drops from 48 (rounded to 64) to 8 bytes.

Tune B-trees for small data.  Reduce BTREE_CORE_ELEMS from 128 to 126
to allow struct zfs_btree_core in case of 8 byte elements to pack into
2KB instead of 4KB.  Aside of the microzaps it should also help 32bit
range trees.  Allow custom B-tree leaf size to reduce memmove() time.

Split zap_name_alloc() into zap_name_alloc() and zap_name_init_str().
It allows to not waste time allocating/freeing memory when processing
multiple names in a loop during mzap_open().

Together on a pool with 10K directories of 1800 files each and DMU
cache limited to 128MB this reduces time of `find . -name zzz` by 41%
from 7.63s to 4.47s, and saves additional ~30% of CPU time on the DMU
cache reclamation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14039
2022-10-20 11:57:15 -07:00
Richard Yao 9650b35e95 Fix multiple definitions of struct mount_attr on recent glibc versions
The ifdef used would never work because the CPP is not aware of C
structure definitions. Rather than use an autotools check, we can just
use a nameless structure that we typedef to mount_attr_t. This is a
Linux kernel interface, which means that it is stable and this is fine
to do.
    
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14057
Closes #14058
2022-10-20 09:12:21 -07:00
Richard Yao 411d327c67 Add defensive assertion to vdev_queue_aggregate()
a6ccb36b94 had been intended to include
this to silence Coverity reports, but this one was missed by mistake.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14043
2022-10-19 17:11:06 -07:00
Richard Yao d692e6c36e abd_return_buf() should call zfs_refcount_remove_many() early
Calling zfs_refcount_remove_many() after freeing memory means we pass a
reference to freed memory as the holder. This is not believed to be able
to cause a problem, but there is a bit of a tradition of fixing these
issues when they appear so that they do not obscure more serious issues
in static analyzer output, so we fix this one too.

Clang's static analyzer found this with the help of CodeChecker's CTU
analysis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14043
2022-10-19 17:11:01 -07:00
Richard Yao c77d2d7415 crypto_get_ptrs() should always write to *out_data_2
Callers will check if it has been set to NULL before trying to access
it, but never initialize it themselves. Whenever "one block spans two
iovecs", `crypto_get_ptrs()` will return, without ever setting
`*out_data_2 = NULL`. The caller will then do a NULL check against the
uninitailized pointer and if it is not zero, pass it to `memcpy()`.

The only reason this has not caused horrible runtime issues is because
`memcpy()` should be told to copy zero bytes when this happens. That
said, this is technically undefined behavior, so we should correct it so
that future changes to the code cannot trigger it.

Clang's static analyzer found this with the help of CodeChecker's CTU
analysis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14043
2022-10-19 17:10:56 -07:00
Richard Yao 44f71818f8 Silence static analyzer warnings about spa_sync_props()
Both Coverity and Clang's static analyzer complain about reading an
uninitialized intval if the property is not passed as DATA_TYPE_UINT64
in the nvlist. This is impossible becuase spa_prop_validate() already
checked this, but they are unlikely to be the last static analyzers to
complain about this, so lets just refactor the code to suppress the
warnings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14043
2022-10-19 17:10:52 -07:00
Richard Yao 4ecd96371b Fix theoretical use of uninitialized values
Clang's static analyzer complains about this.

In get_configs(), if we have an invalid configuration that has no top
level vdevs, we can read a couple of uninitialized variables. Aborting
upon seeing this would break the userland tools for healthy pools, so we
instead initialize the two variables to 0 to allow the userland tools to
continue functioning for the pools with valid configurations.

In zfs_do_wait(), if no wait activities are enabled, we read an
uninitialized error variable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14043
2022-10-19 17:10:21 -07:00
Richard Yao 219cf0f928 Fix userland memory leak in zfs_do_send()
Clang 15's static analyzer caught this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14045
2022-10-19 17:08:33 -07:00
Akash B 5405be0365 Add options to zfs redundant_metadata property
Currently, additional/extra copies are created for metadata in
addition to the redundancy provided by the pool(mirror/raidz/draid),
due to this 2 times more space is utilized per inode and this decreases
the total number of inodes that can be created in the filesystem. By
setting redundant_metadata to none, no additional copies of metadata
are created, hence can reduce the space consumed by the additional
metadata copies and increase the total number of inodes that can be
created in the filesystem.  Additionally, this can improve file create
performance due to the reduced amount of metadata which needs
to be written.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #13680
2022-10-19 17:07:51 -07:00
samwyc 2be0a124af Fix sequential resilver drive failure race condition
This patch handles the race condition on simultaneous failure of
2 drives, which misses the vdev_rebuild_reset_wanted signal in
vdev_rebuild_thread. We retry to catch this inside the
vdev_rebuild_complete_sync function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Samuel Wycliffe J <samwyc@hpe.com>
Closes #14041
Closes #14050
2022-10-19 15:48:13 -07:00
youzhongyang 2a068a1394 Support idmapped mount
Adds support for idmapped mounts.  Supported as of Linux 5.12 this 
functionality allows user and group IDs to be remapped without changing 
their state on disk.  This can be useful for portable home directories
and a variety of container related use cases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #12923
Closes #13671
2022-10-19 11:17:09 -07:00
Richard Yao eaaed26ffb Fix memory leaks in dmu_send()/dmu_send_obj()
If we encounter an EXDEV error when using the redacted snapshots
feature, the memory used by dspp.fromredactsnaps is leaked.

Clang's static analyzer caught this during an experiment in which I had
annotated various headers in an attempt to improve the results of static
analysis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13973
2022-10-18 16:03:33 -07:00
Richard Yao 84243acb91 Cleanup: Remove NULL pointer check from dmu_send_impl()
The pointer is to a structure member, so it is never NULL.

Coverity complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042
2022-10-18 15:40:04 -07:00
Richard Yao fb823de9fb Cleanup: Delete dead code from send_merge_thread()
range is always deferenced before it reaches this check, such that the
kmem_zalloc() call is never executed.

There is also no need to set `range->eos_marker = B_TRUE` because it is
already set.

Coverity incorrectly complained about a potential NULL pointer
dereference because of this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042
2022-10-18 15:39:56 -07:00
Richard Yao 717641ac09 Cleanup: zvol_add_clones() should not NULL check dp
It is never NULL because we return early if dsl_pool_hold() fails.

This caused Coverity to complain.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042
2022-10-18 15:39:48 -07:00
Richard Yao ef55679a75 Cleanup: metaslab_alloc_dva() should not NULL check mg->mg_next
This is a circularly linked list. mg->mg_next can never be NULL.

This caused 3 defect reports in Coverity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042
2022-10-18 15:39:40 -07:00
Richard Yao d953bcbf6b Cleanup: Delete unnecessary pointer check from vdev_to_nvlist_iter()
This confused Clang's static analyzer, making it think there was a
possible NULL pointer dereference. There is no NULL pointer dereference.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042
2022-10-18 15:39:32 -07:00
Richard Yao 9a8039439a Cleanup: Simplify userspace abd_free_chunks()
Clang's static analyzer complained that we could use after free here if
the inner loop ever iterated. That is a false positive, but upon
inspection, the userland abd_alloc_chunks() function never will put
multiple consecutive pages into a `struct scatterlist`, so there is no
need to loop. We delete the inner loop.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042
2022-10-18 15:39:21 -07:00
Richard Yao 6ae2f90888 Fix possible NULL pointer dereference in sha2_mac_init()
If mechanism->cm_param is NULL, passing mechanism to
PROV_SHA2_GET_DIGEST_LEN() will dereference a NULL pointer.

Coverity reported this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044
2022-10-18 15:35:23 -07:00
Richard Yao c6b161e390 set_global_var() should not pass NULL pointers to dlclose()
Both Coverity and Clang's static analyzer caught this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044
2022-10-18 15:35:13 -07:00
Richard Yao 1bd02680c0 Fix NULL pointer dereference in spa_open_common()
Calling spa_open() will pass a NULL pointer to spa_open_common()'s
config parameter. Under the right circumstances, we will dereference the
config parameter without doing a NULL check.

Clang's static analyzer found this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044
2022-10-18 15:34:54 -07:00
Richard Yao 3146fc7edf Fix NULL pointer passed to strlcpy from zap_lookup_impl()
Clang's static analyzer pointed out that whenever zap_lookup_by_dnode()
is called, we have the following stack where strlcpy() is passed a NULL
pointer for realname from zap_lookup_by_dnode():

strlcpy()
zap_lookup_impl()
zap_lookup_norm_by_dnode()
zap_lookup_by_dnode()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044
2022-10-18 15:34:44 -07:00
Richard Yao 711b35dc24 fm_fmri_hc_create() must call va_end() before returning
clang-tidy caught this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044
2022-10-18 15:34:36 -07:00
Richard Yao aa822e4d9c Fix NULL pointer dereference in zdb
Clang's static analyzer complained that we dereference a NULL pointer in
dump_path() if we return 0 when there is an error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044
2022-10-18 15:34:24 -07:00
Richard Yao 09453dea6a ZED: Fix uninitialized value reads
Coverity complained about a couple of uninitialized value reads in ZED.

 * zfs_deliver_dle() can pass an uninitialized string to zed_log_msg()
 * An uninitialized sev.sigev_signo is passed to timer_create()

The former would log garbage while the latter is not a real issue, but
we might as well suppress it by initializing the field to 0 for
consistency's sake.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14047
2022-10-18 12:42:14 -07:00
Coleman Kane ecb6a50819 Linux 6.1 compat: change order of sys/mutex.h includes
After Linux 6.1-rc1 came out, the build started failing to build a
couple of the files in the linux spl code due to the mutex_init
redefinition. Moving the sys/mutex.h include to a lower position within
these two files appears to fix the problem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #14040
2022-10-18 12:29:44 -07:00
Richard Yao d10bd7d288 Coverity model file update
Upon review, it was found that the model for malloc() was incorrect.

In addition, several general purpose memory allocation functions were
missing models:

 * kmem_vasprintf()
 * kmem_asprintf()
 * kmem_strdup()
 * kmem_strfree()
 * spl_vmem_alloc()
 * spl_vmem_zalloc()
 * spl_vmem_free()
 * calloc()

As an experiment to try to find more bugs, some less than general
purpose memory allocation functions were also given models:

 * zfsvfs_create()
 * zfsvfs_free()
 * nvlist_alloc()
 * nvlist_dup()
 * nvlist_free()
 * nvlist_pack()
 * nvlist_unpack()

Finally, the models were improved using additional coverity primitives:

 * __coverity_negative_sink__()
 * __coverity_writeall0__()
 * __coverity_mark_as_uninitialized_buffer__()
 * __coverity_mark_as_afm_allocated__()

In addition, an attempt to inform coverity that certain modelled
functions read entire buffers was used by adding the following to
certain models:

int first = buf[0];
int last = buf[buflen-1];

It was inspired by the QEMU model file.

No additional false positives were found by this, but it is believed
that the more accurate model file will help to catch false positives in
the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14048
2022-10-18 11:06:35 -07:00
Tino Reichardt 27218a32fc Fix declarations of non-global variables
This patch inserts the `static` keyword to non-global variables,
which where found by the analysis tool smatch.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13970
2022-10-18 11:05:32 -07:00
Alexander ab49df487b Linux compat: fix DECLARE_EVENT_CLASS() test when ZFS is built-in
ZFS_LINUX_TRY_COMPILE_HEADER macro doesn't take CONFIG_ZFS=y into
account. As a result, on several latest Linux versions, configure
script marks DECLARE_EVENT_CLASS() available for non-GPL when ZFS
is being built as a module, but marks it unavailable when ZFS is
built-in.
Follow the logic of the neighbor macros and adjust
ZFS_LINUX_TRY_COMPILE_HEADER accordingly, so that it doesn't try
to look for a .ko when ZFS is built-in.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14006
2022-10-17 11:08:36 -07:00
Richard Yao 0aae8a449b Fix theoretical array overflow in lua_typename()
Out of the 12 defects in lua that coverity reports, 5 of them involve
`lua_typename()` and out of the dozens of defects in ZFS that lua
reports, 3 of them involve `lua_typename()` due to the ZCP code. Given
all of the uses of `lua_typename()` in the ZCP code, I was surprised
that there were not more. It appears that only 2 were reported because
only 3 called `lua_type()`, which does a defective sanity check that
allows invalid types to be passed.

lua/lua@d4fb848be7 addressed this in
upstream lua 5.3. Unfortunately, we did not get that fix since we use
lua 5.2 and we do not have assertions enabled in lua, so the upstream
solution would not do anything.

While we could adopt the upstream solution and enable assertions, a
simpler solution is to fix the issue by making `lua_typename()` return
`internal_type_error` whenever it is called with an invalid type. This
avoids the array overflow and if we ever see it appear somewhere, we
will know there is a problem with the lua interpreter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13947
2022-10-14 13:41:56 -07:00
Alan Somers a1034ee909 zstream: allow decompress to fix metadata for uncompressed records
If a record is uncompressed on-disk but the block pointer insists
otherwise, reading it will return EIO.  This commit adds an "off" type
to the "zstream decompress" command.  Using it will set the compression
field in a zfs stream to "off" without changing the record's data.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Alan Somers <asomers@FreeBSD.org>
Sponsored by:	Axcient
Closes #13997
2022-10-14 13:40:00 -07:00
Richard Yao 6a42939fcd Cleanup: Address Clang's static analyzer's unused code complaints
These were categorized as the following:

 * Dead assignment		23
 * Dead increment		4
 * Dead initialization		6
 * Dead nested assignment	18

Most of these are harmless, but since actual issues can hide among them,
we correct them.

That said, there were a few return values that were being ignored that
appeared to merit some correction:

 * `destroy_callback()` in `cmd/zfs/zfs_main.c` ignored the error from
   `destroy_batched()`. We handle it by returning -1 if there is an
   error.

 * `zfs_do_upgrade()` in `cmd/zfs/zfs_main.c` ignored the error from
   `zfs_for_each()`. We handle it by doing a binary OR of the error
   value from the subsequent `zfs_for_each()` call to the existing
   value. This is how errors are mostly handled inside `zfs_for_each()`.
   The error value here is passed to exit from the zfs command, so doing
   a binary or on it is better than what we did previously.

 * `get_zap_prop()` in `module/zfs/zcp_get.c` ignored the error from
   `dsl_prop_get_ds()` when the property is not of type string. We
   return an error when it does. There is a small concern that the
   `zfs_get_temporary_prop()` call would handle things, but in the case
   that it does not, we would be pushing an uninitialized numval onto
   the lua stack. It is expected that `dsl_prop_get_ds()` will succeed
   anytime that `zfs_get_temporary_prop()` does, so that not giving it a
   chance to fix things is not a problem.

 * `draid_merge_impl()` in `tests/zfs-tests/cmd/draid.c` used
   `nvlist_add_nvlist()` twice in ways in which errors are expected to
   be impossible, so we switch to `fnvlist_add_nvlist()`.

A few notable ones did not merit use of the return value, so we
suppressed it with `(void)`:

 * `write_free_diffs()` in `lib/libzfs/libzfs_diff.c` ignored the error
   value from `describe_free()`. A look through the commit history
   revealed that this was intentional.

 * `arc_evict_hdr()` in `module/zfs/arc.c` did not need to use the
   returned handle from `arc_hdr_realloc()` because it is already
   referenced in lists.

 * `spa_vdev_detach()` in `module/zfs/spa.c` has a comment explicitly
   saying not to use the error from `vdev_label_init()` because whatever
   causes the error could be the reason why a detach is being done.

Unfortunately, I am not presently able to analyze the kernel modules
with Clang's static analyzer, so I could have missed some cases of this.
In cases where reports were present in code that is duplicated between
Linux and FreeBSD, I made a conscious effort to fix the FreeBSD version
too.

After this commit is merged, regressions like dee8934 should become
extremely obvious with Clang's static analyzer since a regression would
appear in the results as the only instance of unused code. That assumes
that Coverity does not catch the issue first.

My local branch with fixes from all of my outstanding non-draft pull
requests shows 118 reports from Clang's static anlayzer after this
patch. That is down by 51 from 169.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13986
2022-10-14 13:37:54 -07:00
Richard Yao 19516b69ee Fix potential NULL pointer dereference in lzc_ioctl()
Users are allowed to pass NULL to resultp, but we unconditionally assume
that they never do. When an external user does pass NULL to resultp, we
dereference a NULL pointer.

Clang's static analyzer complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14008
2022-10-14 13:33:22 -07:00
Christian Schwarz 4d5aef3ba9 zfs_domount: fix double-disown of dataset / double-free of zfsvfs_t
Before this patch, in zfs_domount, if zfs_root or d_make_root fails, we
leave zfsvfs != NULL. This will lead to execution of the error handling
`if` statement at the `out` label, and hence to a call to
dmu_objset_disown and zfsvfs_free.

However, zfs_umount, which we call upon failure of zfs_root and
d_make_root already does dmu_objset_disown and zfsvfs_free.

I suppose this patch rather adds to the brittleness of this part of the
code base, but I don't want to invest more time in this right now.
To add a regression test, we'd need some kind of fault injection
facility for zfs_root or d_make_root, which doesn't exist right now.
And even then, I think that regression test would be too closely tied
to the implementation.

To repro the double-disown / double-free, do the following:
1. patch zfs_root to always return an error
2. mount a ZFS filesystem

Here's the stack trace you would see then:

  VERIFY3(ds->ds_owner == tag) failed (0000000000000000 == ffff9142361e8000)
  PANIC at dsl_dataset.c:1003:dsl_dataset_disown()
  Showing stack for process 28332
  CPU: 2 PID: 28332 Comm: zpool Tainted: G           O      5.10.103-1.nutanix.el7.x86_64 #1
  Call Trace:
   dump_stack+0x74/0x92
   spl_dumpstack+0x29/0x2b [spl]
   spl_panic+0xd4/0xfc [spl]
   dsl_dataset_disown+0xe9/0x150 [zfs]
   dmu_objset_disown+0xd6/0x150 [zfs]
   zfs_domount+0x17b/0x4b0 [zfs]
   zpl_mount+0x174/0x220 [zfs]
   legacy_get_tree+0x2b/0x50
   vfs_get_tree+0x2a/0xc0
   path_mount+0x2fa/0xa70
   do_mount+0x7c/0xa0
   __x64_sys_mount+0x8b/0xe0
   do_syscall_64+0x38/0x50
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Christian Schwarz <christian.schwarz@nutanix.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #14025
2022-10-14 11:46:47 -07:00
ColMelvin 642c2dee44 cstyle: Allow URLs in C++ comments
If a C++ comment contained a URL, the `://` part of the URL would
trigger an error because there was no trailing blank, but trailing
blanks make for an invalid URL.  Modify the check to ignore text
within the C++ comment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com>
Closes #13987
2022-10-13 11:05:05 -07:00
Richard Yao ab8d9c1783 Cleanup: 64-bit kernel module parameters should use fixed width types
Various module parameters such as `zfs_arc_max` were originally
`uint64_t` on OpenSolaris/Illumos, but were changed to `unsigned long`
for Linux compatibility because Linux's kernel default module parameter
implementation did not support 64-bit types on 32-bit platforms. This
caused problems when porting OpenZFS to Windows because its LLP64 memory
model made `unsigned long` a 32-bit type on 64-bit, which created the
undesireable situation that parameters that should accept 64-bit values
could not on 64-bit Windows.

Upon inspection, it turns out that the Linux kernel module parameter
interface is extensible, such that we are allowed to define our own
types. Rather than maintaining the original type change via hacks to to
continue shrinking module parameters on 32-bit Linux, we implement
support for 64-bit module parameters on Linux.

After doing a review of all 64-bit kernel parameters (found via the man
page and also proposed changes by Andrew Innes), the kernel module
parameters fell into a few groups:

Parameters that were originally 64-bit on Illumos:

 * dbuf_cache_max_bytes
 * dbuf_metadata_cache_max_bytes
 * l2arc_feed_min_ms
 * l2arc_feed_secs
 * l2arc_headroom
 * l2arc_headroom_boost
 * l2arc_write_boost
 * l2arc_write_max
 * metaslab_aliquot
 * metaslab_force_ganging
 * zfetch_array_rd_sz
 * zfs_arc_max
 * zfs_arc_meta_limit
 * zfs_arc_meta_min
 * zfs_arc_min
 * zfs_async_block_max_blocks
 * zfs_condense_max_obsolete_bytes
 * zfs_condense_min_mapping_bytes
 * zfs_deadman_checktime_ms
 * zfs_deadman_synctime_ms
 * zfs_initialize_chunk_size
 * zfs_initialize_value
 * zfs_lua_max_instrlimit
 * zfs_lua_max_memlimit
 * zil_slog_bulk

Parameters that were originally 32-bit on Illumos:

 * zfs_per_txg_dirty_frees_percent

Parameters that were originally `ssize_t` on Illumos:

 * zfs_immediate_write_sz

Note that `ssize_t` is `int32_t` on 32-bit and `int64_t` on 64-bit. It
has been upgraded to 64-bit.

Parameters that were `long`/`unsigned long` because of Linux/FreeBSD
influence:

 * l2arc_rebuild_blocks_min_l2size
 * zfs_key_max_salt_uses
 * zfs_max_log_walking
 * zfs_max_logsm_summary_length
 * zfs_metaslab_max_size_cache_sec
 * zfs_min_metaslabs_to_flush
 * zfs_multihost_interval
 * zfs_unflushed_log_block_max
 * zfs_unflushed_log_block_min
 * zfs_unflushed_log_block_pct
 * zfs_unflushed_max_mem_amt
 * zfs_unflushed_max_mem_ppm

New parameters that do not exist in Illumos:

 * l2arc_trim_ahead
 * vdev_file_logical_ashift
 * vdev_file_physical_ashift
 * zfs_arc_dnode_limit
 * zfs_arc_dnode_limit_percent
 * zfs_arc_dnode_reduce_percent
 * zfs_arc_meta_limit_percent
 * zfs_arc_sys_free
 * zfs_deadman_ziotime_ms
 * zfs_delete_blocks
 * zfs_history_output_max
 * zfs_livelist_max_entries
 * zfs_max_async_dedup_frees
 * zfs_max_nvlist_src_size
 * zfs_rebuild_max_segment
 * zfs_rebuild_vdev_limit
 * zfs_unflushed_log_txg_max
 * zfs_vdev_max_auto_ashift
 * zfs_vdev_min_auto_ashift
 * zfs_vnops_read_chunk_size
 * zvol_max_discard_blocks

Rather than clutter the lists with commentary, the module parameters
that need comments are repeated below.

A few parameters were defined in Linux/FreeBSD specific code, where the
use of ulong/long is not an issue for portability, so we leave them
alone:

 * zfs_delete_blocks
 * zfs_key_max_salt_uses
 * zvol_max_discard_blocks

The documentation for a few parameters was found to be incorrect:

 * zfs_deadman_checktime_ms - incorrectly documented as int
 * zfs_delete_blocks - not documented as Linux only
 * zfs_history_output_max - incorrectly documented as int
 * zfs_vnops_read_chunk_size - incorrectly documented as long
 * zvol_max_discard_blocks - incorrectly documented as ulong

The documentation for these has been fixed, alongside the changes to
document the switch to fixed width types.

In addition, several kernel module parameters were percentages or held
ashift values, so being 64-bit never made sense for them. They have been
downgraded to 32-bit:

 * vdev_file_logical_ashift
 * vdev_file_physical_ashift
 * zfs_arc_dnode_limit_percent
 * zfs_arc_dnode_reduce_percent
 * zfs_arc_meta_limit_percent
 * zfs_per_txg_dirty_frees_percent
 * zfs_unflushed_log_block_pct
 * zfs_vdev_max_auto_ashift
 * zfs_vdev_min_auto_ashift

Of special note are `zfs_vdev_max_auto_ashift` and
`zfs_vdev_min_auto_ashift`, which were already defined as `uint64_t`,
and passed to the kernel as `ulong`. This is inherently buggy on big
endian 32-bit Linux, since the values would not be written to the
correct locations. 32-bit FreeBSD was unaffected because its sysctl code
correctly treated this as a `uint64_t`.

Lastly, a code comment suggests that `zfs_arc_sys_free` is
Linux-specific, but there is nothing to indicate to me that it is
Linux-specific. Nothing was done about that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Original-patch-by: Andrew Innes <andrew.c12@gmail.com>
Original-patch-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13984
Closes #14004
2022-10-13 10:03:29 -07:00
Richard Yao ff7a0a108f Linux: Remove ZFS_AC_KERNEL_SRC_MODULE_PARAM_CALL_CONST autotools check
On older kernels, the definition for `module_param_call()` typecasts
function pointers to `(void *)`, which triggers -Werror, causing the
check to return false when it should return true.

Fixing this breaks the build process on some older kernels because they
define a `__check_old_set_param()` function in their headers that checks
for a non-constified `->set()`. We workaround that through the c
preprocessor by defining `__check_old_set_param(set)` to `(set)`, which
prevents the build failures.

However, it is now apparent that all kernels that we support have
adopted the GRSecurity change, so there is no need to have an explicit
autotools check for it anymore. We therefore remove the autotools check,
while adding the workaround to our headers for the build time
non-constified `->set()` check done by older kernel headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13984
Closes #14004
2022-10-13 10:03:09 -07:00
наб 4e195e8254 etc: mask zfs-load-key.service
Otherwise, systemd-sysv-generator will generate a service equivalent
that breaks the boot: under systemd this is covered by
zfs-mount-generator

We already do this for zfs-import.service, and other init scripts are
suppressed automatically by the "actual" .service files

Fixes: commit f04b976200 ("Add init script
 to load keys")
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #14010
Closes #14019
2022-10-12 15:27:55 -07:00
George Melikov 87d5ff8ecd CI: bump actions/upload-artifact to v3
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #14018
2022-10-12 15:18:39 -07:00
George Melikov 4bbd7a0fbe CI: bump actions/checkout to v3
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #14018
2022-10-12 15:18:19 -07:00
Richard Yao a6ccb36b94 Add defensive assertions
Coverity complains about possible bugs involving referencing NULL return
values and division by zero. The division by zero bugs require that a
block pointer be corrupt, either from in-memory corruption, or on-disk
corruption. The NULL return value complaints are only bugs if
assumptions that we make about the state of data structures are wrong.
Some seem impossible to be wrong and thus are false positives, while
others are hard to analyze.

Rather than dismiss these as false positives by assuming we know better,
we add defensive assertions to let us know when our assumptions are
wrong.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13972
2022-10-12 11:25:18 -07:00
Richard Yao bfaa1d98f4 scripts/enum-extract.pl should not hard code perl path
This is a portability issue. The issue had already been fixed for
scripts/cstyle.pl by 2dbf1bf829.
scripts/enum-extract.pl was added to the repository the following year
without this portability fix.

Michael Bishop informed me that this broke his attempt to build ZFS
2.1.6 on NixOS, since he was building manually outside of their package
manager (that usually rewrites the shebangs to NixOS' unusual paths).
NixOS puts all of the paths into $PATH, so scripts that portably rely
on env to find the interpreter still work.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> 
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14012
2022-10-11 12:32:07 -07:00
Mark Johnston ed566bf1cd FreeBSD: Fix a pair of bugs in zfs_fhtovp()
- Add a zfs_exit() call in an error path, otherwise a lock is leaked.
- Remove the fid_gen > 1 check.  That appears to be Linux-specific:
  zfsctl_snapdir_fid() sets fid_gen to 0 or 1 depending on whether the
  snapshot directory is mounted.  On FreeBSD it fails, making snapshot
  dirs inaccessible via NFS.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Fixes: 43dbf88178 ("FreeBSD: vfsops: use setgen for error case")
Closes #14001
Closes #13974
2022-10-11 12:29:55 -07:00
Serapheim Dimitropoulos 4dcc2bde9c Stop ganging due to past vdev write errors
= Problem

While examining a customer's system we noticed unreasonable space
usage from a few snapshots due to gang blocks. Under some further
analysis we discovered that the pool would create gang blocks because
all its disks had non-zero write error counts and they'd be skipped
for normal metaslab allocations due to the following if-clause in
`metaslab_alloc_dva()`:
```
	/*
	 * Avoid writing single-copy data to a failing,
	 * non-redundant vdev, unless we've already tried all
	 * other vdevs.
	 */
	if ((vd->vdev_stat.vs_write_errors > 0 ||
	    vd->vdev_state < VDEV_STATE_HEALTHY) &&
	    d == 0 && !try_hard && vd->vdev_children == 0) {
		metaslab_trace_add(zal, mg, NULL, psize, d,
		    TRACE_VDEV_ERROR, allocator);
		goto next;
	}
```

= Proposed Solution

Get rid of the predicate in the if-clause that checks the past
write errors of the selected vdev. We still try to allocate from
HEALTHY vdevs anyway by checking vdev_state so the past write
errors doesn't seem to help us (quite the opposite - it can cause
issues in long-lived pools like the one from our customer).

= Testing

I first created a pool with 3 vdevs:
```
$ zpool list -v volpool
NAME        SIZE  ALLOC   FREE
volpool    22.5G   117M  22.4G
  xvdb     7.99G  40.2M  7.46G
  xvdc     7.99G  39.1M  7.46G
  xvdd     7.99G  37.8M  7.46G
```

And used `zinject` like so with each one of them:
```
$ sudo zinject -d xvdb -e io -T write -f 0.1 volpool
```

And got the vdevs to the following state:
```
$ zpool status volpool
  pool: volpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.
...<cropped>..
action: Determine if the device needs to be replaced, and clear the
...<cropped>..
config:

	NAME        STATE     READ WRITE CKSUM
	volpool     ONLINE       0     0     0
	  xvdb      ONLINE       0     1     0
	  xvdc      ONLINE       0     1     0
	  xvdd      ONLINE       0     4     0

```

I also double-checked their write error counters with sdb:
```
sdb> spa volpool | vdev | member vdev_stat.vs_write_errors
(uint64_t)0  # <---- this is the root vdev
(uint64_t)2
(uint64_t)1
(uint64_t)1
```

Then I checked that I the problem was reproduced in my VM as I the
gang count was growing in zdb as I was writting more data:
```
$ sudo zdb volpool | grep gang
        ganged count:              1384

$ sudo zdb volpool | grep gang
        ganged count:              1393

$ sudo zdb volpool | grep gang
        ganged count:              1402

$ sudo zdb volpool | grep gang
        ganged count:              1414
```

Then I updated my bits with this patch and the gang count stayed the
same.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14003
2022-10-11 12:27:41 -07:00
Richard Yao 70248b82e8 Fix uninitialized value read in vdev_prop_set()
If no errors are encountered, we read an uninitialized error value.

Clang's static analyzer complained about this.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14007
2022-10-11 12:24:36 -07:00
Serapheim Dimitropoulos e5646c5e37 zvol_wait logic may terminate prematurely
Setups that have a lot of zvols may see zvol_wait terminate prematurely
even though the script is still making progress.  For example, we have a
customer that called zvol_wait for ~7100 zvols and by the last iteration
of that script it was still waiting on ~2900. Similarly another one
called zvol_wait for 2200 and by the time the script terminated there
were only 50 left.

This patch adjusts the logic to stay within the outer loop of the script
if we are making any progress whatsoever.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #13998
2022-10-11 12:12:04 -07:00
Richard Yao 72c99dc959 Handle possible null pointers from malloc/strdup/strndup()
GCC 12.1.1_p20220625's static analyzer caught these.

Of the two in the btree test, one had previously been caught by Coverity
and Smatch, but GCC flagged it as a false positive. Upon examining how
other test cases handle this, the solution was changed from
`ASSERT3P(node, !=, NULL);` to using `perror()` to be consistent with
the fixes to the other fixes done to the ZTS code.

That approach was also used in ZED since I did not see a better way of
handling this there. Also, upon inspection, additional unchecked
pointers from malloc()/calloc()/strdup() were found in ZED, so those
were handled too.

In other parts of the code, the existing methods to avoid issues from
memory allocators returning NULL were used, such as using
`umem_alloc(size, UMEM_NOFAIL)` or returning `ENOMEM`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13979
2022-10-06 17:18:40 -07:00
Richard Yao 2ba240f358 PAM: Fix unchecked return value from zfs_key_config_load()
9a49c6b782 was intended to fix this issue,
but I had missed the case in pam_sm_open_session(). Clang's static
analyzer had not reported it and I forgot to look for other cases.

Interestingly, GCC gcc-12.1.1_p20220625's static analyzer had caught
this as multiple double-free bugs, since another failure after the
failure in zfs_key_config_load() will cause us to attempt to free the
memory that zfs_key_config_load() was supposed to allocate, but had
cleaned up upon failure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13978
2022-10-05 17:09:24 -07:00
Jorgen Lundman 4b629d04a5 Avoid calling rw_destroy() on uninitialized rwlock
First the function `memset(&key, 0, ...)` but
any call to "goto error;" would call zio_crypt_key_destroy(key) which
calls `rw_destroy()`. The `rw_init()` is moved up to be right after the
memset. This way the rwlock can be released.

The ctx does allocate memory, but that is handled by the memset to 0
and icp skips NULL ptrs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #13976
2022-10-05 17:07:50 -07:00
shodanshok 062d3d056b Remove ambiguity on demand vs prefetch stats reported by arc_summary
arc_summary currently list prefetch stats as "demand prefetch"
However, a hit/miss can be due to demand or prefetch, not both.
To remove any confusion, this patch removes the "Demand" word
from the affected lines.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #13985
2022-10-04 11:00:02 -07:00
Finix1979 6694ca5539 Avoid unnecessary metaslab_check_free calling
The metaslab_check_free() function only needs to be called in the
GANG|DEDUP|etc case because zio_free_sync() will internally call
metaslab_check_free().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Finix1979 <yancw@info2soft.com>
Closes #13977
2022-10-04 10:55:35 -07:00
Umer Saleem 383c3eb33d Add membar_sync abi change
It appears membar_sync was not present in libzfs.abi with other
membar_* functions. This commit updates libzfs.abi for membar_sync.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13969
2022-10-04 09:54:58 -07:00
Umer Saleem d9ac17a57f Expose libzutil error info in libpc_handle_t
In libzutil, for zpool_search_import and zpool_find_config, we use
libpc_handle_t internally, which does not maintain error code and it is
not exposed in the interface. Due to this, the error information is not
propagated to the caller. Instead, an error message is printed on
stderr.

This commit adds lpc_error field in libpc_handle_t and exposes it in
the interface, which can be used by the users of libzutil to get the
appropriate error information and handle it accordingly.

Users of the API can also control if they want to print the error
message on stderr.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13969
2022-10-04 09:54:35 -07:00
Richard Yao d62bafee9f Fix memory leak found by GCC static analyzer
GCC 12.1.1_p20220625's -fanalyzer found and reported this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13975
2022-10-03 13:41:58 -07:00
Richard Yao 67395be0c2 Fix userland dereference NULL return value bugs
* `zstream_do_token()` does not handle failures from `libzfs_init()`

 * `ztest_global_vars_to_zdb_args()` does not handle failures from
   `calloc()`.

 * `zfs_snapshot_nvl()` will pass an offset to a NULL pointer as a
   source to `strlcpy()` if the provided nvlist is `NULL`.

We handle these by doing what the existing error handling does for other
errors involving these functions.

Coverity complained about these. It had complained about several more,
but one was fixed by 570ca4441e and
another was a false positive. The remaining complaints labelled
"dereferece null return vaue" involve fetching things stored in
in-kernel data structures via `list_head()/list_next()`,
`AVL_PREV()/AVL_NEXT()` and `zfs_btree_find()`. Most of them occur in
void functions that have no error handling. They are much harder to
analyze than the two fixed in this patch, so they are left for a
follow-up patch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13971
2022-09-30 17:02:57 -07:00
Richard Yao a36b37d4de Fix potential NULL pointer dereference in dsl_dataset_promote_check()
If the `list_head()` returns NULL, we dereference it, right before we
check to see if it returned NULL.

We have defined two different pointers that both point to the same
thing, which are `origin_head` and `origin_ds`. Almost everything uses
`origin_ds`, so we switch them to use `origin_ds`.

We also promote `origin_ds` to a const pointer so that the compiler
verifies that nothing modifies it.

Coverity complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13967
2022-09-30 16:59:51 -07:00
Tino Reichardt a2d5643f88 Fix double const qualifier declarations
Some header files define structures like this one:

typedef const struct zio_checksum_info {
	/* ... */
	const char	*ci_name;
} zio_abd_checksum_func_t;

So we can use `zio_abd_checksum_func_t` for const declarations now.
It's not needed that we use the `const` qualifier again like this:
`const zio_abd_checksum_func_t *varname;`

This patch solves the double const qualifiers, which were found by
smatch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13961
2022-09-30 15:34:39 -07:00
Richard Yao 55d7afa4ad Reduce false positives from Static Analyzers
Both Clang's Static Analyzer and Synopsys' Coverity would ignore
assertions. Following Clang's advice, we annotate our assertions:

https://clang-analyzer.llvm.org/annotations.html#custom_assertions

This makes both Clang's Static Analyzer and Coverity properly identify
assertions. This change reduced Clang's reported defects from 246 to
180. It also reduced the false positives reported by Coverityi by 10,
while enabling Coverity to find 9 more defects that previously were
false negatives.

A couple examples of this would be CID-1524417 and CID-1524423. After
submitting a build to coverity with the modified assertions, CID-1524417
disappeared while the report for CID-1524423 no longer claimed that the
assertion tripped.

Coincidentally, it turns out that it is possible to more accurately
annotate our headers than the Coverity modelling file permits in the
case of format strings. Since we can do that and this patch annotates
headers whenever `__coverity_panic__()` would have been used in the
model file, we drop all models that use `__coverity_panic__()` from the
model file.

Upon seeing the success in eliminating false positives involving
assertions, it occurred to me that we could also modify our headers to
eliminate coverity's false positives involving byte swaps. We now have
coverity specific byteswap macros, that do nothing, to disable
Coverity's false positives when we do byte swaps. This allowed us to
also drop the byteswap definitions from the model file.

Lastly, a model file update has been done beyond the mentioned
deletions:

 * The definitions of `umem_alloc_aligned()`, `umem_alloc()` andi
   `umem_zalloc()` were originally implemented in a way that was
   intended to inform coverity that when KM_SLEEP has been passed these
   functions, they do not return NULL. A small error in how this was
   done was found, so we correct it.

 * Definitions for umem_cache_alloc() and umem_cache_free() have been
   added.

In practice, no false positives were avoided by making these changes,
but in the interest of correctness from future coverity builds, we make
them anyway.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13902
2022-09-30 15:30:12 -07:00
Richard Yao dee8934e8f Fix unreachable code in zstreamdump
82226e4f44 was intended to prevent a
warning from being printed in situations where it was inappropriate, but
accidentally disabled it entirely by setting featureflags in the wrong
case statement.

Coverity reported this as dead code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13946
2022-09-29 10:16:37 -07:00
Serapheim Dimitropoulos 4acc36ed7c Fix panic in dsl_process_sub_livelist for EINTR
= Issue

Recently we hit an assertion panic in `dsl_process_sub_livelist` while
exporting the spa and interrupting `bpobj_iterate_nofree`. In that case
`bpobj_iterate_nofree` stops mid-way returning an EINTR without clearing
the intermediate AVL tree that keeps track of the livelist entries it
has encountered so far. At that point the code has a VERIFY for the
number of elements of the AVL expecting it to be zero (which is not the
case for EINTR).

= Fix

Cleanup any intermediate state before destroying the AVL when
encountering EINTR. Also added a comment documenting the scenario where
the EINTR comes up. There is no need to do anything else for the calles
of `dsl_process_sub_livelist` as they already handle the EINTR case.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #13939
2022-09-29 09:39:48 -07:00
Richard Yao 1b87195c3c Fix unchecked return values
2a493a4c71 was intended to fix all
instances of coverity reported unchecked return values, but
unfortunately, two were missed by mistake. This commit fixes the
unchecked return values that had been missed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13945
2022-09-29 09:02:57 -07:00
Richard Yao 570ca4441e Miscellaneous ZTS fixes
Coverity had various complaints about minor issues. They are all fairly
straightforward to understand without reading additional files, with the
exception of the draid.c issue. vdev_draid_rand() takes a 128-bit
starting seed, but we were passing a pointer to a 64-bit value, which
understandably made Coverity complain. This is perhaps the only
significant issue fixed in this patch, since it causes stack corruption.

These are not all of the issues in the ZTS that Coverity caught, but a
number of them are already fixed in other PRs. There is also a class of
TOUTOC complaints that involve very minor things in the ZTS (e.g.
access() before unlink()). I have yet to decide whether they are false
positives (since this is not security sensitive code) or something to
cleanup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13943
2022-09-29 08:56:42 -07:00
Ameer Hamza 55c12724d3 zed: mark disks as REMOVED when they are removed
ZED does not take any action for disk removal events if there is no
spare VDEV available. Added zpool_vdev_remove_wanted() in libzfs
and vdev_remove_wanted() in vdev.c to remove the VDEV through ZED
on removal event.  This means that if you are running zed and
remove a disk, it will be properly marked as REMOVED.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13797
2022-09-28 09:48:46 -07:00
Mateusz Guzik eb9bec0a5d Bring per_txg_dirty_frees_percent back to 30
The current value causes significant artificial slowdown during mass
parallel file removal, which can be observed both on FreeBSD and Linux
when running real workloads.

Sample results from Linux doing make -j 96 clean after an allyesconfig
modules build:

before: 4.14s user 6.79s system 48% cpu 22.631 total
after:	4.17s user 6.44s system 153% cpu 6.927 total

FreeBSD results in the ticket.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Mateusz Guzik <mjguzik@gmail.com>
Closes #13932
Closes #13938
2022-09-27 17:38:03 -07:00
Toomas Soome af65073a07 btree_test: smatch did detect few issues
Add missing header.
Properly ignore return values.

Memory leak/unchecked malloc. We do allocate a bit too early (and
fail to validate the result). From this, smatch is angry when we
overwrite the value of 'node' later.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #13941
2022-09-27 17:09:21 -07:00
Christian Schwarz e872ea16f2 DMU_BACKUP_FEATURE: indicate that bit 28 and 29 are reserved
Bit 28 is used by an internal Nutanix feature which might be
upstreamed in the future.

Bit 29 is the last unused bit. It is reserved to indicate a
to-be-designed extension to the stream format which will accomodate
more feature flags.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Issue #13795
Closes #13796
2022-09-27 16:55:32 -07:00
Christian Schwarz 5c9666382a DMU_BACKUP_FEATURE: remove unused BLAKE3 feature
Commit 985c33b132 added DMU_BACKUP_FEATURE_BLAKE3 but it is not used by
the code.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Issue #13795
Closes #13796
2022-09-27 16:53:40 -07:00
Richard Yao 9a49c6b782 PAM: Fix uninitialized value read
Clang's static analyzer found that config.uid is uninitialized when
zfs_key_config_load() returns an error.

Oddly, this was not included in the unchecked return values that
Coverity found.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13957
2022-09-27 16:48:35 -07:00
Richard Yao a51288aabb Fix unsafe string operations
Coverity caught unsafe use of `strcpy()` in `ztest_dmu_objset_own()`,
`nfs_init_tmpfile()` and `dump_snapshot()`. It also caught an unsafe use
of `strlcat()` in `nfs_init_tmpfile()`.

Inspired by this, I did an audit of every single usage of `strcpy()` and
`strcat()` in the code. If I could not prove that the usage was safe, I
changed the code to use either `strlcpy()` or `strlcat()`, depending on
which function was originally used. In some cases, `snprintf()` was used
to replace multiple uses of `strcat` because it was cleaner.

Whenever I changed a function, I preferred to use `sizeof(dst)` when the
compiler is able to provide the string size via that. When it could not
because the string was passed by a caller, I checked the entire call
tree of the function to find out how big the buffer was and hard coded
it. Hardcoding is less than ideal, but it is safe unless someone shrinks
the buffer sizes being passed.

Additionally, Coverity reported three more string related issues:

 * It caught a case where we do an overlapping memory copy in a call to
   `snprintf()`. We fix that via `kmem_strdup()` and `kmem_strfree()`.

 * It caught `sizeof (buf)` being used instead of `buflen` in
   `zdb_nicenum()`'s call to `zfs_nicenum()`, which is passed to
   `snprintf()`. We change that to pass `buflen`.

 * It caught a theoretical unterminated string passed to `strcmp()`.
   This one is likely a false positive, but we have the information
   needed to do this more safely, so we change this to silence the false
   positive not just in coverity, but potentially other static analysis
   tools too. We switch to `strncmp()`.

 * There was a false positive in tests/zfs-tests/cmd/dir_rd_update.c. We
   suppress it by switching to `snprintf()` since other static analysis
   tools might complain about it too. Interestingly, there is a possible
   real bug there too, since it assumes that the passed directory path
   ends with '/'. We add a '/' to fix that potential bug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13913
2022-09-27 16:47:24 -07:00
Richard Yao 88b199c24e Cleanup spa_export_common()
Coverity complains about a possible NULL pointer dereference. This is
impossible, but it suspects it because we do a NULL check against
`spa->spa_root_vdev`. This NULL check was never necessary and makes the
code harder to understand, so we drop it.

In particular, we dereference `spa->spa_root_vdev` when `new_state !=
POOL_STATE_UNINITIALIZED && !hardforce`. The first is only true when
spa_reset is called, which only occurs under fault injection.  The
second is true unless `zpool export -F $POOLNAME` is used.  Therefore,
we effectively *always* dereference the pointer. In the cases where we
do not, there is no reason to think it is unsafe.  Therefore this change
is safe to make.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13905
2022-09-27 16:45:51 -07:00
Richard Yao 31b4e008f1 LUA: Fix CVE-2014-5461
Apply the fix from upstream.

http://www.lua.org/bugs.html#5.2.2-1
https://www.opencve.io/cve/CVE-2014-5461

It should be noted that exploiting this requires the `SYS_CONFIG`
privilege, and anyone with that privilege likely has other opportunities
to do exploits, so it is unlikely that bad actors could exploit this
unless system administrators are executing untrusted ZFS Channel
Programs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13949
2022-09-27 16:44:13 -07:00
Richard Yao fdc2d30371 Cleanup: Specify unsignedness on things that should not be signed
In #13871, zfs_vdev_aggregation_limit_non_rotating and
zfs_vdev_aggregation_limit being signed was pointed out as a possible
reason not to eliminate an unnecessary MAX(unsigned, 0) since the
unsigned value was assigned from them.

There is no reason for these module parameters to be signed and upon
inspection, it was found that there are a number of other module
parameters that are signed, but should not be, so we make them unsigned.
Making them unsigned made it clear that some other variables in the code
should also be unsigned, so we also make those unsigned. This prevents
users from setting negative values that could potentially cause bad
behaviors. It also makes the code slightly easier to understand.

Mostly module parameters that deal with timeouts, limits, bitshifts and
percentages are made unsigned by this. Any that are boolean are left
signed, since whether booleans should be considered signed or unsigned
does not matter.

Making zfs_arc_lotsfree_percent unsigned caused a
`zfs_arc_lotsfree_percent >= 0` check to become redundant, so it was
removed. Removing the check was also necessary to prevent a compiler
error from -Werror=type-limits.

Several end of line comments had to be moved to their own lines because
replacing int with uint_t caused us to exceed the 80 character limit
enforced by cstyle.pl.

The following were kept signed because they are passed to
taskq_create(), which expects signed values and modifying the
OpenSolaris/Illumos DDI is out of scope of this patch:

	* metaslab_load_pct
	* zfs_sync_taskq_batch_pct
	* zfs_zil_clean_taskq_nthr_pct
	* zfs_zil_clean_taskq_minalloc
	* zfs_zil_clean_taskq_maxalloc
	* zfs_arc_prune_task_threads

Also, negative values in those parameters was found to be harmless.

The following were left signed because either negative values make
sense, or more analysis was needed to determine whether negative values
should be disallowed:

	* zfs_metaslab_switch_threshold
	* zfs_pd_bytes_max
	* zfs_livelist_min_percent_shared

zfs_multihost_history was made static to be consistent with other
parameters.

A number of module parameters were marked as signed, but in reality
referenced unsigned variables. upgrade_errlog_limit is one of the
numerous examples. In the case of zfs_vdev_async_read_max_active, it was
already uint32_t, but zdb had an extern int declaration for it.

Interestingly, the documentation in zfs.4 was right for
upgrade_errlog_limit despite the module parameter being wrongly marked,
while the documentation for zfs_vdev_async_read_max_active (and friends)
was wrong. It was also wrong for zstd_abort_size, which was unsigned,
but was documented as signed.

Also, the documentation in zfs.4 incorrectly described the following
parameters as ulong when they were int:

	* zfs_arc_meta_adjust_restarts
	* zfs_override_estimate_recordsize

They are now uint_t as of this patch and thus the man page has been
updated to describe them as uint.

dbuf_state_index was left alone since it does nothing and perhaps should
be removed in another patch.

If any module parameters were missed, they were not found by `grep -r
'ZFS_MODULE_PARAM' | grep ', INT'`. I did find a few that grep missed,
but only because they were in files that had hits.

This patch intentionally did not attempt to address whether some of
these module parameters should be elevated to 64-bit parameters, because
the length of a long on 32-bit is 32-bit.

Lastly, it was pointed out during review that uint_t is a better match
for these variables than uint32_t because FreeBSD kernel parameter
definitions are designed for uint_t, whose bit width can change in
future memory models.  As a result, we change the existing parameters
that are uint32_t to use uint_t.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13875
2022-09-27 16:42:41 -07:00
Richard Yao 7584fbe846 Cleanup: Switch to strlcpy from strncpy
Coverity found a bug in `zfs_secpolicy_create_clone()` where it is
possible for us to pass an unterminated string when `zfs_get_parent()`
returns an error. Upon inspection, it is clear that using `strlcpy()`
would have avoided this issue.

Looking at the codebase, there are a number of other uses of `strncpy()`
that are unsafe and even when it is used safely, switching to
`strlcpy()` would make the code more readable. Therefore, we switch all
instances where we use `strncpy()` to use `strlcpy()`.

Unfortunately, we do not portably have access to `strlcpy()` in
tests/zfs-tests/cmd/zfs_diff-socket.c because it does not link to
libspl. Modifying the appropriate Makefile.am to try to link to it
resulted in an error from the naming choice used in the file. Trying to
disable the check on the file did not work on FreeBSD because Clang
ignores `#undef` when a definition is provided by `-Dstrncpy(...)=...`.
We workaround that by explictly including the C file from libspl into
the test. This makes things build correctly everywhere.

We add a deprecation warning to `config/Rules.am` and suppress it on the
remaining `strncpy()` usage. `strlcpy()` is not portably avaliable in
tests/zfs-tests/cmd/zfs_diff-socket.c, so we use `snprintf()` there as a
substitute.

This patch does not tackle the related problem of `strcpy()`, which is
even less safe. Thankfully, a quick inspection found that it is used far
more correctly than strncpy() was used. A quick inspection did not find
any problems with `strcpy()` usage outside of zhack, but it should be
said that I only checked around 90% of them.

Lastly, some of the fields in kstat_t varied in size by 1 depending on
whether they were in userspace or in the kernel. The origin of this
discrepancy appears to be 04a479f706 where
it was made for no apparent reason. It conflicts with the comment on
KSTAT_STRLEN, so we shrink the kernel field sizes to match the userspace
field sizes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13876
2022-09-27 16:35:29 -07:00
Jitendra Patidar 3ed9d6883b Enforce "-F" flag on resuming recv of full/newfs on existing dataset
When receiving full/newfs on existing dataset, then it should be done
with "-F" flag. Its enforced for initial receive in checks done in
zfs_receive_one function of libzfs. Similarly, on resuming full/newfs
recv on existing dataset, it should be done with "-F" flag.

When dataset doesn't exist, then full/new recv is done on newly created
dataset and it's marked INCONSISTENT. But when receiving on existing
dataset, recv is first done on %recv and its marked INCONSISTENT.
Existing dataset is not marked INCONSISTENT. Resume of full/newfs
receive with dataset not INCONSISTENT indicates that its resuming newfs
on existing dataset. So, enforce "-F" flag in this case.

Also return an error from dmu_recv_resume_begin_check() in zfs kernel,
when its resuming full/newfs recv without force.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #13856
Closes #13857
2022-09-27 16:34:27 -07:00
Richard Yao a2163a96ae Fix bad free in skein code
Clang's static analyzer found a bad free caused by skein_mac_atomic().
It will allocate a context on the stack and then pass it to
skein_final(), which attempts to free it. Upon inspection,
skein_digest_atomic() also has the same problem.

These functions were created to match the OpenSolaris ICP API, so I was
curious how we avoided this in other providers and looked at the SHA2
code. It appears that SHA2 has a SHA2Final() helper function that is
called by the exported sha2_mac_final()/sha2_digest_final() as well as
the sha2_mac_atomic() and sha2_digest_atomic() functions. The real work
is done in SHA2Final() while some checks and the free are done in
sha2_mac_final()/sha2_digest_final().

We fix the use after free in the skein code by taking inspiration from
the SHA2 code. We introduce a skein_final_nofree() that does most of the
work, and make skein_final() into a function that calls it and then
frees the memory.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13954
2022-09-27 12:36:58 -07:00
Richard Yao f7bda2de97 Fix userspace memory leaks found by Clang Static Analzyer
Recently, I have been making a push to fix things that coverity found.
However, I was curious what Clang's static analyzer reported, so I ran
it and found things that coverity had missed.

* contrib/pam_zfs_key/pam_zfs_key.c: If prop_mountpoint is passed more
  than once, we leak memory.
* module/zfs/zcp_get.c: We leak memory on temporary properties in
  userspace.
* tests/zfs-tests/cmd/draid.c: On error from vdev_draid_rand(), we leak
  memory if best_map had been allocated by a prior iteration.
* tests/zfs-tests/cmd/mkfile.c: Memory used by the loop is not freed
  before program termination.

Arguably, these are all minor issues, but if we ignore them, then they
could obscure serious bugs, so we fix them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13955
2022-09-26 17:18:05 -07:00
Chris Zubrzycki 5e7a2f4665 Update zfs-mount to load before fstab, matches systemd service.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Zubrzycki <github@mid-earth.net>
Closes #13895
2022-09-26 17:11:43 -07:00
Richard Yao 8ef15f9322 Cleanup: Remove ineffective unsigned comparisons against 0
Coverity found a number of places where we either do MAX(unsigned, 0) or
do assertions that a unsigned variable is >= 0. These do nothing, so
let us drop them all.

It also found a spot where we do `if (unsigned >= 0 && ...)`. Let us
also drop the unsigned >= 0 check.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13871
2022-09-26 17:02:38 -07:00
Richard Yao 52afc3443d Linux: Fix uninitialized variable usage in zio_do_crypt_data()
Coverity complained about this. An error from `hkdf_sha512()` before uio
initialization will cause pointers to uninitialized memory to be passed
to `zio_crypt_destroy_uio()`. This is a regression that was introduced
by cf63739191. Interestingly, this never
affected FreeBSD, since the FreeBSD version never had that patch ported.
Since moving uio initialization to the top of this function would slow
down the qat_crypt() path, we only move the `memset()` calls to the top
of the function. This is sufficient to fix this problem.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13944
2022-09-26 16:44:22 -07:00
Tino Reichardt bf5b42f9c8 Fix double declaration of getauxval() for FreeBSD PPC
The extern declaration is only for Linux, move this line
into the right #ifdef section.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Martin Matuska <mm@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13934
Closes #13936
2022-09-26 10:32:22 -07:00
Richard Yao ebe1d03616 Fix userland resource leaks
Coverity caught these. With the exception of the file descriptor leak in
tests/zfs-tests/cmd/draid.c, they are all memory leaks.

Also, there is a piece of dead code in zfs_get_enclosure_sysfs_path().
We delete it as cleanup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13921
2022-09-23 16:55:26 -07:00
Richard Yao 2a493a4c71 Fix unchecked return values and unused return values
Coverity complained about unchecked return values and unused values that
turned out to be unused return values.

Different approaches were used to handle the different cases of
unchecked return values:

* cmd/zdb/zdb.c: VERIFY0 was used in one place since the existing code
  had no error handling. An error message was printed in another to
  match the rest of the code.

* cmd/zed/agents/zfs_retire.c: We dismiss the return value with `(void)`
  because the value is expected to be potentially unset.

* cmd/zpool_influxdb/zpool_influxdb.c: We dismiss the return value with
  `(void)` because the values are expected to be potentially unset.

* cmd/ztest.c: VERIFY0 was used since we want failures if something goes
  wrong in ztest.

* module/zfs/dsl_dir.c: We dismiss the return value with `(void)`
  because there is no guarantee that the zap entry will always be there.
  For example, old pools imported readonly would not have it and we do
  not want to fail here because of that.

* module/zfs/zfs_fm.c: `fnvlist_add_*()` was used since the
  allocations sleep and thus can never fail.

* module/zfs/zvol.c: We dismiss the return value with `(void)` because
  we do not need it. This matches what is already done in the analogous
  `zfs_replay_write2()`.

* tests/zfs-tests/cmd/draid.c: We suppress one return value with
  `(void)` since the code handles errors already. The other return value
  is handled by switching to `fnvlist_lookup_uint8_array()`.

* tests/zfs-tests/cmd/file/file_fadvise.c: We add error handling.

* tests/zfs-tests/cmd/mmap_sync.c: We add error handling for munmap, but
  ignore failures on remove() with (void) since it is expected to be
  able to fail.

* tests/zfs-tests/cmd/mmapwrite.c: We add error handling.

As for unused return values, they were all in places where there was
error handling, so logic was added to handle the return values.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13920
2022-09-23 16:52:03 -07:00
Richard Yao d25153d555 set_global_var_parse_kv() should pass the pointer from strdup()
A comment says that the caller should free k_out, but the pointer passed
via k_out is not the same pointer we received from strdup(). Instead,
it is a pointer into the region we received from strdup(). The free
function should always be called with the original pointer, so this is
likely a bug.

We solve this by calling `strdup()` a second time and then freeing the
original pointer.

Coverity reported this as a memory leak.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13867
2022-09-23 10:51:14 -07:00
Tony Hutter e9b12d4196 zpool: Don't print "repairing" on force faulted drives
If you force fault a drive that's resilvering, it's scan stats can get
frozen in time, giving the false impression that it's being resilvered.
This commit checks the vdev state to see if the vdev is healthy before
reporting "resilvering" or "repairing" in zpool status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13927
Closes #13930
2022-09-23 10:24:19 -07:00
John Wren Kennedy ce55d6ae46 ZTS: fallocate tests fail with hard coded values
Currently, these two tests pass on disks with 512 byte sectors. In
environments where the backing store is different, the number of
blocks allocated to write the same file may differ. This change
modifies the reported size check to detect an expected change in the
reported number of blocks without specifying a particular number.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes  #13931
2022-09-22 16:42:34 -06:00
Brian Behlendorf 505df8d133 Dynamically size dbuf hash mutex array
Incorrectly sizing the array of hash locks used to protect the
dbuf hash table can lead to contention and reduce performance.
We could unconditionally allocate a larger array for the locks
but it's wasteful, particularly for a low-memory system.
Instead, dynamically allocate the array of locks and scale
it based on total system memory.

Additionally, add a new `dbuf_mutex_cache_shift` module option
which can be used to override the hash lock array size.  This is
disabled by default (dbuf_mutex_hash_shift=0) and can only be
set at module load time.  The minimum target array size is set
to 8192, this matches the current constant value.

Note that the count of the dbuf hash table and count of the
mutex array were added to the /proc/spl/kstat/zfs/dbufstats
kstat.

Finally, this change removes the _KERNEL conditional checks.
These were not required since for the user space build there
is no difference between the kmem and vmem interfaces.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13928
2022-09-22 12:59:56 -07:00
Brian Behlendorf 223b04d23d Revert "Reduce dbuf_find() lock contention"
This reverts commit 34dbc618f5.  While this
change resolved the lock contention observed for certain workloads, it
inadventantly reduced the maximum hash inserts/removes per second.  This
appears to be due to the slightly higher acquisition cost of a rwlock vs
a mutex.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2022-09-22 12:59:41 -07:00
Richard Yao e506a0ce40 Cleanup: Change 1 used in bitshifts to 1ULL
Coverity complains about this. It is not a bug as long as we never shift
by more than 31, but it is not terrible to change the constants from 1
to 1ULL as clean up.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13914
2022-09-22 11:28:33 -07:00
Mateusz Guzik c629f0bf62 Retire ZFS_TEARDOWN_TRY_ENTER_READ
There were never any users and it so happens the operation is not even
supported by rrm locks -- the macros were wrong for Linux and FreeBSD
when not using it's RMS locks.
    
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13906
2022-09-20 15:34:41 -07:00
Mateusz Guzik 402426c7d8 Add membar_sync
Provides the missing full barrier variant to the membar primitive set.

While not used right now, this is probably going to change down the
road.

Name taken from Solaris, to follow the existing routines.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13907
2022-09-20 15:32:44 -07:00
youzhongyang 62e2a2881f Fix minor issues in namespace delegation support
get_user_ns() is only done once for each namespace, so put_user_ns() 
should be done once too.
    
Fix two typos in user_namespace/user_namespace_002.ksh and 
user_namespace/user_namespace_003.ksh.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #13918
2022-09-20 15:25:21 -07:00
Mateusz Guzik fbf874a4ac FreeBSD: handle V_PCATCH
See https://cgit.FreeBSD.org/src/commit/?id=a75d1ddd74312f5dd79bc1e965f7077679659f2e

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13910
2022-09-20 15:22:32 -07:00
Mateusz Guzik 3e5caef4c5 FreeBSD: catch up to 1400068
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13909
2022-09-20 15:21:30 -07:00
Richard Yao 7c6d94728c Call va_end() before return in zpool_standard_error_fmt()
Commit ecd6cf800b63704be73fb264c3f5b6e0dafc068d by marks in OpenSolaris
at Tue Jun 26 07:44:24 2007 -0700 introduced a bug where we fail to call
`va_end()` before returning.

The man page for va_start() says:

"Each invocation of va_start() must be matched by a corresponding
invocation of va_end() in the same function."

Coverity complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13904
2022-09-20 15:20:56 -07:00
Richard Yao de6c0d3d8c Fix potential NULL pointer dereference in zfsdle_vdev_online()
Coverity complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13903
2022-09-20 15:20:04 -07:00
Ameer Hamza c50b3f14d3 Delay ZFS_PROP_SHARESMB property to handle it for encrypted raw receive
For encrypted raw receive, objset creation is delayed until a call to
dmu_recv_stream(). ZFS_PROP_SHARESMB property requires objset to be
populated when calling zpl_earlier_version(). To correctly handle the
ZFS_PROP_SHARESMB property for encrypted raw receive, this change
delays setting the property.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13878
2022-09-20 15:19:05 -07:00
Richard Yao 3f400b0f58 FreeBSD: Cleanup zfs_readdir()
The FreeBSD project's coverity scans found dead code in `zfs_readdir()`.
Also, the comment above `zfs_readdir()` is out of date.

I fixed the comment and deleted all of the dead code, plus additional
dead code that was found upon review.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13924
2022-09-20 14:50:16 -07:00
Richard Yao 9276e202eb FreeBSD: Fix uninitialized pointer read in spa_import_rootpool()
The FreeBSD project's coverity scans found this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13923
2022-09-20 14:43:03 -07:00
Richard Yao e8bdc74528 Cleanup: Remove unused uu_pname code
Coverity caught a possible NULL pointer dereference in dead code. We can
delete it all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13900
2022-09-19 17:33:52 -07:00
Richard Yao f272960d52 Fix usage of zed_log_msg() and zfs_panic_recover()
Coverity complained about the format specifiers not matching variables.
In one case, the variable is a constant, so we fix it. In another, we
were missing an argument (about which coverity also complained).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13888
2022-09-19 17:32:18 -07:00
Richard Yao 891ac937be Linux: Fix use-after-free in zfsvfs_create()
Coverity reported that we pass a pointer to zfsvfs to
`dmu_objset_disown()` after freeing zfsvfs in zfsvfs_create_impl() after
a failure in zfsvfs_init().

We have nearly identical duplicate versions of this code for FreeBSD and
Linux, but interestingly, the FreeBSD version of this code differs in
such a way that it does not suffer from this bug. We remove the
difference from the FreeBSD version to fix this bug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13883
2022-09-19 17:30:58 -07:00
Martin Matuška 042d43a1dd FreeBSD: fix static module build broken in 7bb707ffa
param_set_arc_free_target(SYSCTL_HANDLER_ARGS) and
param_set_arc_no_grow_shift(SYSCTL_HANDLER_ARGS) defined in
sysctl_os.c must be made available to arc_os.c.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #13915
2022-09-19 17:21:45 -07:00
Mateusz Guzik 9a671fe7ec FreeBSD: stop passing LK_INTERLOCK to VOP_LOCK
There is an ongoing effort to eliminate this feature.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13908
2022-09-19 17:17:27 -07:00
Tino Reichardt 48cf170d5a Add PPC cpu feature tests for FreeBSD and Linux
Add needed cpu feature tests for powerpc architecture.

Overview:
zfs_altivec_available() - needed by RAID-Z
zfs_vsx_available()     - needed by BLAKE3
zfs_isa207_available()  - needed by SHA2

Part 1 - Userspace
- use getauxval() for Linux and elf_aux_info() for FreeBSD
- direct including <sys/auxv.h> fails with double definitions
- so we self define the needed functions and definitions

Part 2 - Kernel space FreeBSD
- use exported cpu_features of <powerpc/cpu.h>

Part 3 - Kernel space Linux
- use cpu_has_feature() function of <asm/cpufeature.h>

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13725
2022-09-16 14:25:53 -07:00
Tino Reichardt eeca9d27d7 Add zfs_blake3_impl to zfs.4
The zfs module parameter zfs_blake3_impl got no manual page entry while
adding BLAKE3 to OpenZFS. This commit adds the required notes about the
parameter into zfs.4

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Ryan Moeller <ryan@freqlabs.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13725
2022-09-16 14:25:53 -07:00
Tino Reichardt 75e8b5ad84 Fix BLAKE3 tuneable and module loading on Linux and FreeBSD
Apply similar options to BLAKE3 as it is done for zfs_fletcher_4_impl.

The zfs module parameter on Linux changes from icp_blake3_impl to
zfs_blake3_impl.

You can check and set it on Linux via sysfs like this:
```
[bash]# cat /sys/module/zfs/parameters/zfs_blake3_impl
cycle [fastest] generic sse2 sse41 avx2

[bash]# echo sse2 > /sys/module/zfs/parameters/zfs_blake3_impl
[bash]# cat /sys/module/zfs/parameters/zfs_blake3_impl
cycle fastest generic [sse2] sse41 avx2
```

The modprobe module parameters may also be used now:
```
[bash]# modprobe zfs zfs_blake3_impl=sse41
[bash]# cat /sys/module/zfs/parameters/zfs_blake3_impl
cycle fastest generic sse2 [sse41] avx2
```

On FreeBSD the BLAKE3 implementation can be set via sysctl like this:
```
[bsd]# sysctl vfs.zfs.blake3_impl
vfs.zfs.blake3_impl: cycle [fastest] generic sse2 sse41 avx2
[bsd]# sysctl vfs.zfs.blake3_impl=sse2
vfs.zfs.blake3_impl: cycle [fastest] generic sse2 sse41 avx2 \
  -> cycle fastest generic [sse2] sse41 avx2
```

This commit changes also some Blake3 internals like these:
- blake3_impl_ops_t was renamed to blake3_ops_t
- all functions are named blake3_impl_NAME() now

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13725
2022-09-16 14:25:53 -07:00
Brian Behlendorf 7dee043af5 zfs_enter rework followup
The zpl_fadvise() function was recently added and was not included
in the initial patch.  Update it accordingly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13831
2022-09-16 14:25:53 -07:00
Richard Yao 4df8ccc83d Fix null pointer dereferences in PAM
Coverity caught these.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13889
2022-09-16 14:02:54 -07:00
наб 6c8e9f09c2 Handle ECKSUM as new EZFS_CKSUM ‒ "insufficient replicas"
Add a meaningful error message for ECKSUM to common error messages.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #6805 
Closes #13808
Closes #13898
2022-09-16 13:59:25 -07:00
Ameer Hamza 577d41d3b2 zfs recv hangs if max recordsize is less than received recordsize
- Some optimizations for bqueue enqueue/dequeue.
- Added a fix to prevent deadlock when both bqueue_enqueue_impl()
and bqueue_dequeue() waits for signal to be triggered.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13855
2022-09-16 13:52:25 -07:00
Richard Yao 8da218a7a2 Update coverity model
`uu_panic()` needs to be modelled and the definition of `vpanic()` from
the original coverity model was missing
`__coverity_format_string_sink__()`.

We also model `libspl_assertf()` as part of an attempt to eliminate
false positives.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13901
2022-09-16 13:45:15 -07:00
Chunwei Chen 1b6f3368dd Fix unable to export zpool without nfs-utils
Don't return error in nfs_disable_share when nfs is not available, since
it wouldn't have been able to share in the first place.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #13534
Closes #13800
2022-09-16 13:43:26 -07:00
Chunwei Chen 768eacedef zfs_enter rework
Replace ZFS_ENTER and ZFS_VERIFY_ZP, which have hidden returns, with
functions that return error code. The reason we want to do this is
because hidden returns are not obvious and had caused some missing fail
path unwinding.

This patch changes the common, linux, and freebsd parts. Also fixes
fail path unwinding in zfs_fsync, zpl_fsync, zpl_xattr_{list,get,set}, and
zfs_lookup().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #13831
2022-09-16 13:36:47 -07:00
Richard Yao b24d1c77f7 Add zfs_btree_verify_intensity kernel module parameter
I see a few issues in the issue tracker that might be aided by being
able to turn this on. We have no module parameter for it, so I would
like to add one.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13874
2022-09-15 16:22:33 -07:00
Richard Yao ddb1fd91c0 Fix incorrect size given to bqueue_enqueue() call in dmu_redact.c
We pass sizeof (struct redact_record *) rather than sizeof (struct
redact_record). Passing the pointer size is wrong.

Coverity caught this in two places.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13885
2022-09-15 16:21:21 -07:00
Mateusz Piotrowski fa22ec569c Use correct mdoc macros for arguments
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Closes #13890
2022-09-15 14:22:00 -07:00
Richard Yao e949d36040 Fix assertions in crypto reference helpers
The assertions are racy and the use of `membar_exit()` did nothing to
fix that.

The helpers use atomic functions, so we cleverly get values from the
atomics that we can use to ensure that the assertions operate on the
correct values.

We also use `membar_producer()` prior to decrementing reference counts
so that operations that happened prior to a decrement to 0 will be
guaranteed to happen before the decrement on architectures that reorder
atomics.

This also slightly improves performance by eliminating unnecessary
reads, although I doubt it would be measurable in any benchmark.

Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13880
2022-09-15 13:24:00 -07:00
John Wren Kennedy dc2fe24ca2 ZTS: parameter expansion in zfs_unshare_006_pos
zfs_unshare_006 checks to see if a dataset still has an active SMB
share after doing an NFS unshare -a. The test could fail because the
check for the SMB share does not expect dashes in a dataset name to be
converted to underscores as pathname delimiters are.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #13893
2022-09-15 13:14:35 -07:00
Richard Yao 621a7ebe58 Add coverity model to repository
Other projects such as the python project include their coverity models
in their repositories. This provides transparency, which is beneficial
in open source projects. Therefore, it is a good idea to include the
coverity model in our repository too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13884
2022-09-15 11:50:19 -07:00
Richard Yao fd8c3012b3 Fix use-after-free bugs in icp code
These were reported by Coverity as "Read from pointer after free" bugs.
Presumably, it did not report it as a use-after-free bug because it does
not understand the inline assembly that implements the atomic
instruction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13881
2022-09-15 11:46:42 -07:00
George Melikov 6f8602a5ed CI: revert --with-config=dist to hotfix Ubuntu 20.04
Recently Github action runners started to fail on kmod build.  
Revert --with-config=dist from ./configure section of github 
runners to stabilize CI for now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #13894
2022-09-14 16:26:57 -07:00
Richard Yao ccec88f11a FreeBSD: Fix integer conversion for vnlru_free{,_vfsops}()
When reviewing #13875, I noticed that our FreeBSD code has an issue
where it converts from `int64_t` to `int` when calling
`vnlru_free{,_vfsops}()`. The result is that if the int64_t is `1 <<
36`, the int will be 0, since the low bits are 0. Even when some low
bits are set, a value such as `((1 << 36) + 1)` would truncate to 1,
which is wrong.

There is protection against this on 32-bit platforms, but on 64-bit
platforms, there is no check to protect us, so we add a check.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13882
2022-09-14 12:51:55 -07:00
Richard Yao 4a6e8b99f5 Add assertion to dsl_dataset_set_compression_sync
Coverity pointed out that if we somehow receive SPA_FEATURE_NONE, we
will use a negative number as an array index. A defensive assertion
seems appropriate.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13872
2022-09-14 12:50:03 -07:00
Richard Yao d954ca19ba Fix theoretical "use-after-free" in dbuf_prefetch_indirect_done()
Coverity complains about a "use-after-free" bug in
`dbuf_prefetch_indirect_done()` because we use a pointer value after
freeing its buffer. The pointer is used for refcounting in ARC (as the
reference holder). There is a theoretical situation where the pointer
would be reused in a way that causes the refcounting to collide, so we
change the order in which we call arc_buf_destroy() and
dbuf_prefetch_fini() to match the rest of the function. This prevents
the theoretical situation from being a possibility.

Also, we have a few return statements with a value, despite this being a
void function. We clean those up while we are making changes here.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13869
2022-09-13 17:58:29 -07:00
Richard Yao fcd7293d4e Remove incorrect free() in zfs_get_pci_slots_sys_path()
Coverity found this. We attempted to free tmp, which is a pointer to a
string that should be freed by the caller.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13864
2022-09-13 17:00:53 -07:00
Richard Yao cf66e7e594 Cleanup: Make memory barrier definitions consistent across kernels
We inherited membar_consumer() and membar_producer() from OpenSolaris,
but we had replaced membar_consumer() with Linux's smp_rmb() in
zfs_ioctl.c. The FreeBSD SPL consequently implemented a shim for the
Linux-only smp_rmb().

We reinstate membar_consumer() in platform independent code and fix the
FreeBSD SPL to implement membar_consumer() in a way analogous to Linux.

Reviewed-by: Konstantin Belousov <kib@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13843
2022-09-13 16:59:33 -07:00
Richard Yao 8fdc229a9c Fix memory leak in ztest
Coverity found this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13863
2022-09-13 16:53:21 -07:00
Richard Yao d5d10f2aef Cleanup dead spa_boot code
Unused code detected by coverity.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13868
2022-09-13 16:40:10 -07:00
Richard Yao 710fd1ded6 zpool_load_compat() should create strings of length ZFS_MAXPROPLEN
Otherwise, `strlcat()` can overflow them.

Coverity found this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13866
2022-09-12 12:54:43 -07:00
Richard Yao e5327e7f97 vdev_draid_lookup_map() should not iterate outside draid_maps
Coverity reported this as an out-of-bounds read.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13865
2022-09-12 12:51:17 -07:00
Richard Yao 7195c04d98 Fix file descriptor handling in zdb_copy_object()
Coverity found a file descriptor leak. Eyeballing it showed that we had
no handling for the `open()` call failing either. We can address both of
these at once.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13862
2022-09-12 12:34:10 -07:00
Richard Yao 13f2b8fb92 Fix use-after-free in btree code
Coverty static analysis found these.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #10989
Closes #13861
2022-09-12 11:22:15 -07:00
Richard Yao 0e4c830bc1 Cleanup: Use OpenSolaris functions to call scheduler
In our codebase, `cond_resched() and `schedule()` are Linux kernel
functions that have replaced the OpenSolaris `kpreempt()` functions in
the codebase to such an extent that `kpreempt()` in zfs_context.h was
broken. Nobody noticed because we did not actually use it. The header
had defined `kpreempt()` as `yield()`, which works on OpenSolaris and
Illumos where `sched_yield()` is a wrapper for `yield()`, but that does
not work on any other platform.

The FreeBSD platform specific code implemented shims for these, but the
shim for `schedule()` forced us to wait, which is different than merely
rescheduling to another thread as the original Linux code does, while
the shim for `cond_resched()` had the same definition as its kernel
kpreempt() shim.

After studying this, I have concluded that we should reintroduce the
kpreempt() function in platform independent code with the following
definitions:

	- In the Linux kernel:
		kpreempt(unused)	-> cond_resched()

	- In the FreeBSD kernel:
		kpreempt(unused)	-> kern_yield(PRI_USER)

	- In userspace:
		kpreempt(unused)	-> sched_yield()

In userspace, nothing changes from this cleanup. In the kernels, the
function `fm_fini()` will now call `kern_yield(PRI_USER)` on FreeBSD and
`cond_resched()` on Linux.  This is instead of `pause("schedule", 1)` on
FreeBSD and `schedule()` on Linux. This makes our behavior consistent
across platforms.

Note that Linux's SPL continues to use `cond_resched()` and
`schedule()`.  However, those functions have been removed from both the
FreeBSD code and userspace code.

This should have the benefit of making it slightly easier to port the
code to new platforms by making how things should be mapped less
confusing.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13845
2022-09-12 09:55:37 -07:00
Don Brady ede037cda7 Make zfs-share service resilient to stale exports
The are a few cases where stale entries in /etc/exports.d/zfs.exports 
will cause the nfs-server service to fail when starting up.

Since the nfs-server startup consumes /etc/exports.d/zfs.exports, the 
zfs-share service (which rebuilds the list of zfs exports) should run 
before the nfs-server service.

To make the zfs-share service resilient to stale exports, this change 
truncates the zfs config file as part of the zfs share -a operation.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #13775
2022-09-09 10:54:16 -07:00
Ryan Moeller 60d995727a FreeBSD: Replace legacy make_dev() interface usage
The function make_dev_s() was introduced to replace make_dev() in
FreeBSD 11.0.  It allows further specification of properties and flags
and returns an error code on failure.  Using this we can fail loading
the module more gracefully than a panic in situations such as when a
device named zfs already exists.  We already use it for zvols.

Use make_dev_s() for /dev/zfs.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13854
2022-09-08 10:40:18 -07:00
Tony Hutter e27e692bcc zed: Fix config_sync autoexpand flood
Users were seeing floods of `config_sync` events when autoexpand was
enabled.  This happened because all "disk status change" udev events
invoke the autoexpand codepath, which calls zpool_relabel_disk(),
which in turn cause another "disk status change" event to happen,
in a feedback loop.  Note that "disk status change" happens every time
a user calls close() on a block device.

This commit breaks the feedback loop by only allowing an autoexpand
to happen if the disk actually changed size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes: #7132
Closes: #7366
Closes #13729
2022-09-08 10:32:30 -07:00
Alexander Motin 37f6845c6f Improve too large physical ashift handling
When iterating through children physical ashifts for vdev, prefer
ones above the maximum logical ashift, that we can actually use,
but within the administrator defined maximum.

When selecting top-level vdev ashift, do not set it to the defined
maximum in case physical ashift is even higher, but just ignore one.
Using the maximum does not prevent misaligned writes, but reduces
space efficiency.  Since ZFS tries to write data sequentially and
aggregates the writes, in many cases large misanigned writes may be
not as bad as the space penalty otherwise.

Allow internal physical ashifts for vdevs higher than SHIFT_MAX.
May be one day allocator or aggregation could benefit from that.

Reduce zfs_vdev_max_auto_ashift default from 16 (64KB) to 14 (16KB),
so that ZFS may still use bigger ashifts up to SHIFT_MAX (64KB),
but only if it really has to or explicitly told to, but not as an
"optimization".

There are some read-intensive NVMe SSDs that report Preferred Write
Alignment of 64KB, and attempt to build RAIDZ2 of those leads to a
space inefficiency that can't be justified.  Instead these changes
make ZFS fall back to logical ashift of 12 (4KB) by default and
only warn user that it may be suboptimal for performance.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #13798
2022-09-08 10:30:53 -07:00
Finix1979 320f0c6022 Add Linux posix_fadvise support
The purpose of this PR is to accepts fadvise ioctl from userland
to do read-ahead by demand.

It could dramatically improve sequential read performance especially
when primarycache is set to metadata or zfs_prefetch_disable is 1.

If the file is mmaped, generic_fadvise is also called for page cache
read-ahead besides dmu_prefetch.

Only POSIX_FADV_WILLNEED and POSIX_FADV_SEQUENTIAL are supported in
this PR currently.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Finix Yan <yancw@info2soft.com>
Closes #13694
2022-09-08 10:29:41 -07:00
Richard Yao 380b08098e Linux SPL module init: Handle memory allocation failures correctly
Upon inspection of our code, I noticed that we assume that
__alloc_percpu() cannot fail, and while it probably never has failed in
practice, technically, it can fail, so we should handle that.

Additionally, we incorrectly assume that `taskq_create()` in
spl_kmem_cache_init() cannot fail. The same remark applies to it.

Lastly, `spl-init()` failures should always return negative error
values, but in some places, we are returning positive 1, which is
incorrect. We change those values to their correct error codes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13847
2022-09-08 10:28:20 -07:00
pkubaj dff541f698 Fix build on FreeBSD/powerpc64*
There's no VSX handler on FreeBSD for now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Piotr Kubaj <pkubaj@FreeBSD.org>
Closes #13848
2022-09-08 10:27:25 -07:00
Christian Schwarz 5724073517 make DMU_OT_IS_METADATA and DMU_OT_IS_ENCRYPTED return B_TRUE or B_FALSE
Without this patch, the

    ASSERT3U(dbuf_is_metadata(db), ==, arc_is_metadata(buf));

at the beginning of dbuf_assign_arcbuf can panic
if the object type is a DMU_OT_NEWTYPE that has
DMU_OT_METADATA set.

While we're at it, fix DMU_OT_IS_ENCRYPTED as well.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13842
2022-09-07 17:04:15 -07:00
Walter Huf 238cd4b863 Add xattr_handler support for Android kernels
Some ARM BSPs run the Android kernel, which has
a modified xattr_handler->get() function signature.
This adds support to compile against these kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Walter Huf <hufman@gmail.com>
Closes #13824
2022-09-06 10:02:18 -07:00
Rob Wing 983096a1b4 FreeBSD: add kqfilter support for zvol cdev
The only event hooked up is NOTE_ATTRIB, which is triggered when the
device is resized.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Wing <rew@FreeBSD.org>
Closes #13773
2022-09-06 09:49:33 -07:00
Rob Wing 9d0887402b FreeBSD: add knlist_init_sx() for exclusive locks
This will be used to implement kqfilter support for zvol cdevs.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Wing <rew@FreeBSD.org>
Closes #13773
2022-09-06 09:48:57 -07:00
Richard Yao 11df48ab8b Cleanup Raid-Z Typo fixes
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13834
2022-09-06 09:43:21 -07:00
Samuel 7c0e3941cd Fix column width in 'zpool iostat -v' and 'zpool list -v'
This commit fixes a minor spacing issue caused when
enumerating vdev names, which originated from #13031

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Samuel Wycliffe <samuelwycliffe@gmail.com>
Closes #13811
2022-09-06 09:37:47 -07:00
Umer Saleem 59767479ac Add DD_FIELD string for snapshots_changed property
This commit adds DD_FIELD string used in extensified dsl_dir zap object
for snapshots_changed property.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13819
2022-09-02 13:33:50 -07:00
Andriy Gapon ee9f3bca55 Add zfs.sync.snapshot_rename
Only the single snapshot rename is provided.
The recursive or more complex rename can be scripted.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #13802
2022-09-02 13:31:19 -07:00
Ryan Moeller 7bb707ffaf FreeBSD: Organize sysctls
FreeBSD had a few platform-specific ARC tunables in the wrong place:

- Move FreeBSD-specifc ARC tunables into the same vfs.zfs.arc node as
  the rest of the ARC tunables.
- Move the handlers from arc_os.c to sysctl_os.c and add compat sysctls
  for the legacy names.

While here, some additional clean up:

- Most handlers are specific to a particular variable and don't need a
  pointer passed through the args.
- Group blocks of related variables, handlers, and sysctl declarations
  into logical sections.
- Match variable types for temporaries in handlers with the type of the
  global variable.
- Remove leftover comments.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13756
2022-09-02 13:26:24 -07:00
Ryan Moeller 4723eba8c0 FreeBSD: Mark ZFS_MODULE_PARAM_CALL as MPSAFE
ZFS_MODULE_PARAM_CALL handlers implement their own locking if needed
and do not require Giant.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13756
2022-09-02 13:26:04 -07:00
Ameer Hamza 899355d293 Add zilstat script to report zil kstats in a user friendly manner
Added a python script to process both global and per dataset
zil kstats and report them in a user friendly manner similar
to arcstat and dbufstat.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13704
2022-09-02 13:24:07 -07:00
Alexander Motin f933b3fd4d Apply arc_shrink_shift to ARC above arc_c_min
It makes sense to free memory in smaller chunks when approaching
arc_c_min to let other kernel subsystems to free more, since after
that point we can't free anything.  This also matches behavior on
Linux, where to shrinker reported only the size above arc_c_min.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #13794
2022-09-02 13:21:18 -07:00
Richard Yao 0b30dc484f FreeBSD: Cleanup dead code from VFS
The vfs_*_feature() macros turn anything that uses them into dead code,
so we can delete all of it.

As a side effect, zfs_set_fuid_feature() is now identical in
module/os/freebsd/zfs/zfs_vnops_os.c and
module/os/linux/zfs/zfs_vnops_os.c. A few other functions are identical
too. Future cleanup could move these into a common file.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13832
2022-09-02 13:20:10 -07:00
Andrew Innes 58e8054bce Alloc zdb_cd_t to fix stack issue
Alloc zdb_cd_t since it is too large for the stack on windows
which results in `zdb` crashing immediately.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Co-authored-by: Jorgen Lundman <lundman@lundman.net>
Closes #13807
2022-09-02 13:15:18 -07:00
George Wilson 2d5622f5be Importing from cachefile can trip assertion
When importing from cachefile, it is possible that the builtin retry
logic will trip an assertion because it also fails to find the pool.
This fix addresses that case and returns the correct error message to
the user.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #13781
2022-08-26 14:04:27 -07:00
Christian Schwarz 5bc0318047 ZTS: zvol_stress: fix race condition with zinject usage
In automated ZTS runs, I'd occasionally hit

    log_fail "Expected to see some write errors"

because there weren't any write errors.

The reason is that we're not syncing the zpool before `zinject -c`.
If the writes by `dd` aren't synced out at the time `zinject -c` runs,
they will not hit an error and we'll hit the log_fail above.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13793
2022-08-25 14:22:10 -07:00
Brian Behlendorf 9f346abbe8 Revert "Avoid panic with recordsize > 128k, raw sending and no large_blocks"
This reverts commit 80a650b7bb.  This change
inadvertently introduced a regression in ztest where one of the new ASSERTs
is triggered in dsl_scan_visitbp().

Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12275 
Closes #13799
2022-08-25 13:33:32 -07:00
Umer Saleem a582d52993 Updates for snapshots_changed property
Currently, snapshots_changed property is stored in dd_props_zapobj, due
to which the property is assumed to be local. This causes a difference
in behavior with respect to other readonly properties.

This commit stores the snapshots_changed property in dd_object. Source
is not set to local in this case, which makes it consistent with other
readonly properties.

This commit also updates the date string format to include seconds.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13785
2022-08-24 14:20:43 -07:00
George Amanakis 0c4064d9a0 Fix zpool status in case of unloaded keys
When scrubbing an encrypted filesystem with unloaded key still report an
error in zpool status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13675
Closes #13717
2022-08-22 17:42:01 -07:00
Paul Dagnelie 17e212652d Prevent zevent list from consuming all of kernel memory
There are a couple changes included here. The first is to introduce 
a cap on the size the ZED will grow the zevent list to. One million 
entries is more than enough for most use cases, and if you are 
overflowing that value, the problem needs to be addressed another 
way. The value is also tunable, for those who want the limit to be 
higher or lower. 
 
The other change is to add a kernel module parameter that allows 
snapshot creation/deletion to be exempted from the history logging; 
for most workloads, having these things logged is valuable, but for 
some workloads it produces large quantities of log spam and isn't 
especially helpful.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Issue #13374 
Closes #13753
2022-08-22 12:36:22 -07:00
gregory-lee-bartholomew d22dd77c4d contrib: dracut: zfs-snapshot-bootfs: exit status fix
When the zfs-snapshot-bootfs service attempts to create a snapshot 
that already exists, the exit status of the command is non-zero and 
the service reports failed to the systemd service manager. This is a 
common occurrence if bootfs.snapshot is left set on the kernel command 
line and it should not be considered a failure. 
 
This service was originally set to ignore this error by prefixing 
the command with - on the ExecStart line, but the leading - appears 
to have been dropped in #13359.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #13769
2022-08-12 14:28:15 -07:00
r-ricci e713b69e51 arcstat: fix -p option
When the -p option is used, a list of floats is passed to sep.join(),
which expects strings. Fix this by converting each value to a string.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Roberto Ricci <ricci@disroot.org>
Closes #12916 
Closes #13767
2022-08-12 14:21:52 -07:00
George Melikov fbc210fab2 Enable relatime by default
Linux sets relatime on mount by default for any file system,
but relatime=off in ZFS disables it explicitly.

Let's be consistent with other file systems on Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #13614
2022-08-12 14:20:25 -07:00
Tony Hutter b3d0568cfd ZTS: Fix zpool_expand_001_pos
`zpool_expand_001_pos` was often failing due to not seeing autoexpand
commands in the `zpool history`.  During testing, I found this to be
unreliable (sometimes the "online" wouldn't appear in `zpool history`)
and unnecessary, as we could simply check that the pool increased in
size.

This commit revamps the test to check for the expanded pool size
and corresponding new free space.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13743
2022-08-09 13:26:46 -07:00
Christian Schwarz 91983265b6 Add comment on acb_zio_dummy
Thanks to George Wilson for clarifying this on Slack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13698
2022-08-08 16:55:13 -07:00
Coleman Kane ad0967638b Linux 6.0 compat: register_shrinker() now var-arg
The 6.0 kernel added a printf-style var-arg for args > 0 to the
register_shrinker function, in order to add names to shrinkers, in
commit e33c267ab70de4249d22d7eab1cc7d68a889bac2. This enables the
shrinkers to have friendly names exposed in /sys/kernel/debug/shrinker/.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #13748
2022-08-08 16:18:30 -07:00
Ryan Moeller 947465b984 libzfs: Remove unused zpool_get_physpath()
This is an oddly specific function that has never had any consumers in
the history of this repo.  Get rid of it and the pile of helper
functions that exist for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13724
2022-08-04 17:04:09 -07:00
Stéphane Lesimple 4fc1ea9c6c zpool: fix redundancy check after vdev removal
The presence of indirect vdevs was confusing get_redundancy(), which
considered a pool with e.g. only mirror top-level vdevs and at least
one indirect vdev (due to the removal of a previous vdev) as already
having a broken redundancy, which is not the case. This lead to the
possibility of compromising the redundancy of a pool by adding
mismatched vdevs without requiring the use of `-f`, and with no
visible notice or warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stéphane Lesimple <speed47_github@speed47.net>
Closes #13705
Closes #13711
2022-08-04 17:02:57 -07:00
Brian Behlendorf c26045b435 Linux 5.20 compat: blk_cleanup_disk()
As of the Linux 5.20 kernel blk_cleanup_disk() has been removed,
all callers should use put_disk().

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13728
2022-08-04 16:57:49 -07:00
Brian Behlendorf bebdf52a16 Linux 5.20 compat: bdevname()
As of the Linux 5.20 kernel bdevname() has been removed, all
callers should use snprintf() and the "%pg" format specifier.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13728
2022-08-04 16:57:33 -07:00
Paul Dagnelie 673aa7e6cf Don't double-zero buffers in fault management nvlists
This is a small cleanup for a trivial problem which happened to
be noticed while another issue was being investigated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #13730
2022-08-04 16:53:47 -07:00
Umer Saleem 9681de4657 Add snapshots_changed as property
Make dd_snap_cmtime property persistent across mount and unmount
operations by storing in ZAP and restore the value from ZAP on hold
into dd_snap_cmtime instead of updating it.

Expose dd_snap_cmtime as 'snapshots_changed' property that provides a
mechanism to quickly determine whether snapshot list for dataset has
changed without having to mount a dataset or iterate the snapshot list.

It specifies the time at which a snapshot for a dataset was last
created or deleted. This allows us to be more efficient how often we
query snapshots.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13635
2022-08-02 16:45:30 -07:00
Ryan Moeller 5ad44a0ce9 FreeBSD: Ignore symlink to i386 includes
A symlink to i386 includes is created in the build dir on amd64 since
freebsd/freebsd-src@d07600c563

Tell git to ignore it like the other include links.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13719
2022-08-02 16:34:23 -07:00
Brian Behlendorf 2f157cbe86 Linux 5.19 compat: META
Update the META file to reflect compatibility with the 5.19 kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13715
2022-08-02 10:04:38 -07:00
Tino Reichardt 68aa3379ec Skip checksum benchmarks on systems with slow cpu
The checksum benchmarking on module load may take a really long time
on embedded systems with a slow cpu. Avoid all benchmarks >= 1MiB on
systems, where EdonR is slower then 300 MiB/s.

This limit is currently hardcoded via the define LIMIT_PERF_MBS.

This is the new benchmark output of a slow Intel Atom:

```
 implementation    1k    4k   16k   64k  256k    1m    4m   16m
 edonr-generic    209   257   268   259   262     0     0     0
 skein-generic    129   150   151   150   150     0     0     0
 sha256-generic    50    55    56    56    56     0     0     0
 sha512-generic    76    86    88    89    88     0     0     0
 blake3-generic    63    62    62    62    61     0     0     0
 blake3-sse2      114   292   301   307   309     0     0     0
```

Reviewed-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13695
2022-08-01 09:51:45 -07:00
Tino Reichardt 51946eda70 Fix checkstyle warning: E275 missing whitespace after keyword
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13710
2022-08-01 09:49:35 -07:00
Alek P e8cf3a4f76 Implement a new type of zfs receive: corrective receive (-c)
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #9372
2022-07-28 15:52:46 -07:00
Tino Reichardt 5fae33e047 FreeBSD compile fix
The file module/os/freebsd/zfs/zfs_ioctl_compat.c fails compiling
because of this error: 'static' is not at beginning of declaration

This commit fixes the three places within that file.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13702
2022-07-28 14:19:41 -07:00
Brian Behlendorf 34aa0f0487 ZTS: Fix io_uring support check
Not all Linux distribution kernels enable io_uring support by
default.  Update the run time check to verify that the booted
kernel was built with CONFIG_IO_URING=y.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Co-authored-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13648
Closes #13685
2022-07-26 14:39:23 -07:00
Ameer Hamza 3a1ce49141 Add createtxg sort support for simple snapshot iterator
- When iterating snapshots with name only, e.g., "-o name -s name",
libzfs uses simple snapshot iterator and results are displayed
in alphabetic order. This PR adds support for faster version of
createtxg sort by avoiding nvlist parsing for properties. Flags
"-o name -s createtxg" will enable createtxg sort while using
simple snapshot iterator.
- Added support to read createtxg property directly from zfs handle
for filesystem, volume and snapshot types instead of parsing nvlist.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13577
2022-07-25 14:04:46 -07:00
Brian Behlendorf 8792dd24cd ZTS: Fix occasional inherit_001_pos.ksh failure
The mountpoint may still be busy when the `zfs unmount -a` command
is run causing an unexpected failure.  Retry the unmount a couple
of times since it should not remain busy for long.

    19:10:50.29 NOTE: Reading state from .../inheritance/state021.cfg
    19:10:50.32 cannot unmount '/TESTPOOL': pool or dataset is busy
    19:10:50.32 ERROR: zfs unmount -a exited 1

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13686
2022-07-25 09:52:42 -07:00
Christian Schwarz bf61a507a2 zdb: dump spill block pointer if present
Output will look like so:

  $ sudo zdb -dddd -vv testpool/fs 2
  Dataset testpool/fs [ZPL], ID 260, cr_txg 8, 25K, 7 objects, rootbp DVA[0]=<0:1800be00:200> DVA[1]=<0:1c00be00:200> [L0 DMU objset] fletcher4 lz4 unencrypted LE contiguous unique double size=1000L/200P birth=16L/16P fill=7 cksum=d03b396cd:489ca835517:d4b04a4d0a62:1b413aac454d53

      Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
           2    1   128K    512     1K     512    512    0.00  ZFS plain file (K=inherit) (Z=inherit=lz4)
                                                 192   bonus  System attributes
      dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED SPILL_BLKPTR
      dnode maxblkid: 0
      path    /testfile
      uid     0
      gid     0
      atime   Fri Jul 15 12:36:35 2022
      mtime   Fri Jul 15 12:36:35 2022
      ctime   Fri Jul 15 12:36:51 2022
      crtime  Fri Jul 15 12:36:35 2022
      gen 10
      mode    100600
      size    0
      parent  34
      links   1
      pflags  840800000004
      SA xattrs: 248 bytes, 2 entries

          security.selinux = nutanix_u:object_r:unlabeled_t:s0\000
          user.foo = xbLQJjyVvEVPGGuRHV/gjkFFO1MdehKnLjjd36ZaoMVaUqtqFoMMYT5Ya9yywHApJNoK/1hNJfO3\012XCJWv9/QUTKamoWW9xVDE7yi8zn166RNw5QUhf84cZ3JNLnw6oN

Spill block: 0:10005c00:200 0:14005c00:200 200L/200P F=1 B=16/16 cksum=1cdfac47a4:910c5caa557:195d0493dfe5a:332b6fde6ad547
Indirect blocks:

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13640
2022-07-20 17:16:29 -07:00
ixhamza fb087146de Add support for per dataset zil stats and use wmsum counters
ZIL kstats are reported in an inclusive way, i.e., same counters are
shared to capture all the activities happening in zil. Added support
to report zil stats for every datset individually by combining them
with already exposed dataset kstats.

Wmsum uses per cpu counters and provide less overhead as compared
to atomic operations. Updated zil kstats to replace wmsum counters
to avoid atomic operations.

Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13636
2022-07-20 17:14:06 -07:00
Alexander Motin 33dba8c792 Fix scrub resume from newly created hole
It may happen that scan bookmark points to a block that was turned
into a part of a big hole.  In such case dsl_scan_visitbp() may skip
it and dsl_scan_check_resume() will not be called for it.  As result
new scan suspend won't be possible until the end of the object, that
may take hours if the object is a multi-terabyte ZVOL on a slow HDD
pool, stretching TXG to all that time, creating all sorts of problems.

This patch changes the resume condition to any greater or equal block,
so even if we miss the bookmarked block, the next one we find will
delete the bookmark, allowing new suspend.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13643
2022-07-20 17:02:36 -07:00
Tino Reichardt 97fd1ea42a Fix memory allocation for the checksum benchmark
Allocation via kmem_cache_alloc() is limited to less then 4m for
some architectures.

This commit limits the benchmarks with the linear abd cache to 1m
on all architectures and adds 4m + 16m benchmarks via non-linear
abd_alloc().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13669
Closes #13670
2022-07-20 17:01:32 -07:00
ixhamza f371cc18f8 Expose ZFS dataset case sensitivity setting via sb_opts
Makes the case sensitivity setting visible on Linux in /proc/mounts.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13607
2022-07-14 10:38:16 -07:00
Tony Hutter 9fe2f262aa zed: Look for NVMe DEVPATH if no ID_BUS
We tried replacing an NVMe drive using autoreplace, only
to see zed reject it with:

zed[27955]: zed_udev_monitor: /dev/nvme5n1 no devid source

This happened because ZED saw that ID_BUS was not set by udev
for the NVMe drive, and thus didn't think it was "real drive".
This commit allows NVMe drives to be autoreplaced even if
ID_BUS is not set.

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13512 
Closes #13646
2022-07-14 10:19:37 -07:00
Tino Reichardt 1d3ba0bf01 Replace dead opensolaris.org license link
The commit replaces all findings of the link:
http://www.opensolaris.org/os/licensing with this one:
https://opensource.org/licenses/CDDL-1.0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13619
2022-07-11 14:16:13 -07:00
Tony Hutter e4ab3f40df zed: Ignore false 'atari' partitions in autoreplace
libudev will sometimes falsely identify an 'atari' partition on a
blank disk, preventing it from being used in an autoreplace.  This
seems to be a known issue.  The workaround is to just ignore the
fake partition and continue with the autoreplace.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13497
Closes #13632
2022-07-11 13:35:19 -07:00
Tony Hutter 677ca1e825 rpm: Silence "unversioned Obsoletes" warnings on EL 9
Get rid of RPM warnings on AlmaLinux 9:

"It's not recommended to have unversioned Obsoletes"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13584
Closes #13638
2022-07-11 11:35:01 -07:00
Brian Behlendorf e6489be347 Linux: Align MODULE_LICENSE macro text
Specify the lua and zstd license text in the manor in which the
kernel MODULE_LICENSE macro requires it.  The now duplicate entries
were merged and a comment added to make it clear what they apply to.

Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13641
2022-07-11 11:29:12 -07:00
Finix1979 cb01da6805 Call nvlist_free before return
Fixes a small kernel memory leak which would occur if a pool failed
to import because the `DMU_POOL_VDEV_ZAP_MAP` key can't be read from 
a presumably damaged MOS config.  In the case of a missing key there 
was no leak.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Finix1979 <yancw@info2soft.com>
Closes #13629
2022-07-07 11:43:58 -07:00
Alexander Motin 74230a5bc1 Avoid memory copy when verifying raidz/draid parity
Before this change for every valid parity column raidz_parity_verify()
allocated new buffer and copied there existing data, then recalculated
the parity and compared the result with the copy.  This patch removes
the memory copy, simply swapping original buffer pointers with newly
allocated empty ones for parity recalculation and comparison. Original
buffers with potentially incorrect parity data are then just freed,
while new recalculated ones are used for repair.

On a pool of 12 4-wide raidz vdevs, storing 1.5TB of 16MB blocks, this
change reduces memory traffic during scrub by 17% and total unhalted
CPU time by 25%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13613
2022-07-05 16:27:29 -07:00
Alexander Motin 1ac7d194e5 Avoid memory copies during mirror scrub
Issuing several scrub reads for a block we may use the parent ZIO
buffer for one of child ZIOs.  If that read complete successfully,
then we won't need to copy the data explicitly.  If block has only
one copy (typical for root vdev, which is also a mirror inside),
then we never need to copy -- succeed or fail as-is.  Previous
code also copied data from buffer of every successfully completed
child ZIO, but that just does not make any sense.

On healthy N-wide mirror this saves all N+1 (or even more in case
of ditto blocks) memory copies for each scrubbed block, allowing
CPU to focus mostly on check-summing.  For other vdev types it
should save one memory copy per block copy at root vdev.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13606
2022-07-05 16:26:20 -07:00
наб 6fca6195cd Re-fix -Wwrite-strings on FreeBSD
Follow up fix for a926aab902.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348
Closes #13610
2022-06-30 11:31:09 -07:00
Toyam Cox eefe83eaa6 dracut: fix boot on non-zfs-root systems
Simply prevent overwriting root until it needs to be overwritten.

Dracut could change this value before this module is called, but won't
change the kernel command line.

Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Toyam Cox <vaelatern@voidlinux.org>
Closes #13592
2022-06-30 10:47:58 -07:00
gregory-lee-bartholomew 5a4dd3a262 contrib: dracut: README.md
Change zfs-snapshot-bootfs.service to zfs-rollback-bootfs.service in
cmdline point 4 of README.md.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #13609
2022-06-30 10:43:27 -07:00
George Amanakis 34e5423f83 Fix dnode byteswapping
If a dnode has a spill pointer, and we use DN_SLOTS_TO_BONUSLEN() then
we will possibly include the spill pointer in the len calculation and it
will be byteswapped. Then dnode_byteswap() will carry on and swap the
spill pointer again. Fix this by using DN_MAX_BONUS_LEN() instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13002 
Closes #13015
2022-06-29 17:06:16 -07:00
gregory-lee-bartholomew 2d434e8ae4 contrib: dracut: zfs-{rollback,snapshot}-bootfs: explicit snapname fix
Due to a missing semicolon on the ExecStart line, it wasn't possible
to specify the snapshot name on the bootfs.{rollback,snapshot}
kernel parameters if the boot dataset name was obtained from the
root=zfs:... kernel parameter.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #13585
2022-06-29 16:56:04 -07:00
Brian Behlendorf 07f2793e86 dracut: fix typo in mount-zfs.sh.in
Format the `zpool get` command correctly.  The -o option must
be followed by "all" or the requested field name.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13602
2022-06-29 15:33:38 -07:00
наб 60e389ca10 module: lua: ldo: fix pragma name
/home/nabijaczleweli/store/code/zfs/module/lua/ldo.c:175:32: warning:
unknown option after ‘#pragma GCC diagnostic’ kind [-Wpragmas]
  175 | #pragma GCC diagnostic ignored "-Winfinite-recursion"a
      |                                ^~~~~~~~~~~~~~~~~~~~~~

Fixes: a6e8113fed ("Silence
-Winfinite-recursion warning in luaD_throw()")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348
2022-06-29 14:08:59 -07:00
наб 404601aca0 linux: libzfs: util: don't fallthrough to to end-of-switch
lib/libzfs/os/linux/libzfs_util_os.c:262:3: error: fallthrough
annotation does not directly precede switch label
                zfs_fallthrough;
                ^
./lib/libspl/include/sys/feature_tests.h:34:26: note: expanded from
macro 'zfs_fallthrough'

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348
2022-06-29 14:08:59 -07:00
наб a2f6bff976 tests: modernise zdb_decompress
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348
2022-06-29 14:08:59 -07:00
наб dd66857d92 Remaining {=> const} char|void *tag
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348
2022-06-29 14:08:59 -07:00
наб a926aab902 Enable -Wwrite-strings
Also, fix leak from ztest_global_vars_to_zdb_args()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348
2022-06-29 14:08:54 -07:00
gaoyanping e7d90362e5 Fix znode group permission different from acl mask
Zp->z_mode is set at the same time inode->i_mode
is being changed. This has the effect of keeping both
in sync without relying on zfs_znode_update_vfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: yanping.gao <yanping.gao@xtaotech.com>
Closes #13581
2022-06-29 13:38:46 -07:00
Kristof Provost 325096545a FreeBSD: only define B_FALSE/B_TRUE if NEED_SOLARIS_BOOLEAN is not set
If NEED_SOLARIS_BOOLEAN is defined we define an enum boolean_t, which
defines B_TRUE/B_FALSE as well. If we have both the define and the enum
things don't build (because that translates to
'enum { 0, 1 }     boolean_t').

While here also remove an incorrect '#else'. With it in place we only
parse a section if the include guard is triggered. So we'd only use that
code if this file is included twice. This is clearly unintended, and
also means we don't get the 'boolean_t' definition. Fix this.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Kristof Provost <kprovost@netgate.com>
Sponsored-By: Rubicon Communications, LLC ("Netgate")
Closes #13596
2022-06-28 14:11:38 -07:00
Alexander Motin 827322991f Fix and disable blocks statistics during scrub
Block statistics calculation during scrub I/O issue in case of sorted
scrub accounted ditto blocks several times.  Embedded blocks on other
side were not accounted at all.  This change moves the accounting from
issue to scan stage, that fixes both problems and also allows to avoid
pool-wide locking and the lock contention it created.

Since this statistics is quite specific and is not even exposed now
anywhere, disable its calculation by default to not waste CPU time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13579
2022-06-28 11:23:31 -07:00
Brian Behlendorf 43569ee374 Fix objtool: missing int3 after ret warning
Resolve straight-line speculation warnings reported by objtool
for x86_64 assembly on Linux when CONFIG_SLS is set.  See the
following LWN article for the complete details.

https://lwn.net/Articles/877845/

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:19:43 -07:00
Brian Behlendorf 8aceded193 Fix -Wformat-overflow warning in zfs_project_handle_dir()
Switch to using asprintf() to satisfy the compiler and resolve the
potential format-overflow warning.  Not the conditional before the
sprintf() would have prevented this regardless.

    cmd/zfs/zfs_project.c: In function ‘zfs_project_handle_dir’:
    cmd/zfs/zfs_project.c:241:38: error: ‘/’ directive writing
    1 byte into a region of size between 0 and 4352
    [-Werror=format-overflow=]
    cmd/zfs/zfs_project.c:241:17: note: ‘sprintf’ output between
    2 and 4609 bytes into a destination of size 4352

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:19:38 -07:00
Brian Behlendorf f11431a317 Fix -Wformat-truncation warning in upgrade_set_callback()
Extend the buffer slightly resolve the warning.

    cmd/zfs/zfs_main.c: In function ‘upgrade_set_callback’:
    cmd/zfs/zfs_main.c:2446:22: error: ‘%llu’ directive output
    may be truncated writing between 1 and 20 bytes into a
    region of size 16 [-Werror=format-truncation=]
    cmd/zfs/zfs_main.c:2445:24: note: ‘snprintf’ output between
    2 and 21 bytes into a destination of size 16

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:19:31 -07:00
Brian Behlendorf c175f5ebb2 Fix -Wuse-after-free warning in dbuf_destroy()
Move the use of the db pointer after it is freed.  It's only used as
a tag so a dereference would never occur, but there's no reason we
can't invert the order to resolve the warning.

    module/zfs/dbuf.c: In function 'dbuf_destroy':
    module/zfs/dbuf.c:2953:17: error:
    pointer 'db' may be used after 'free' [-Werror=use-after-free]

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:19:24 -07:00
Brian Behlendorf 9619bcdefb Fix -Wuse-after-free warning in dbuf_issue_final_prefetch_done()
Move the use of the private pointer after it is freed.  It's only
used as a tag so a dereference would never occur, but there's no
harm in inverting the order to resolve the warning.

    module/zfs/dbuf.c: In function 'dbuf_issue_final_prefetch_done':
    module/zfs/dbuf.c:3204:17: error:
    pointer 'private' may be used after 'free' [-Werror=use-after-free]

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:19:19 -07:00
Brian Behlendorf ff7e405f83 Fix -Wattribute-warning in dsl layer
The memcpy(), memmove(), and memset() functions have been annotated
to perform bounds checking when using FORTIFY_SOURCE.  A warning is
now generted when writing beyond the end of the specified field.

Alternately, the new struct_group() macro could be used to create
an anonymous union member for use by memcpy().  However, since this
is the only place the macro would be helpful it's preferable to
restructure the code slights to avoid the need for additional
compatibility code when the macro does not exist.

https://lore.kernel.org/lkml/20211118183807.1283332-1-keescook@chromium.org/T/

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:19:12 -07:00
Brian Behlendorf 18df6afdfc Fix -Wattribute-warning in edonr
The wrong union memory was being accessed in EdonRInit resulting in
a write beyond size of field compiler warning.  Reference the correct
member to resolve the warning.  The warning was correct and this in
case the mistake was harmless.

    In function ‘fortify_memcpy_chk’,
    inlined from ‘EdonRInit’ at zfs/module/icp/algs/edonr/edonr.c:494:3:
    ./include/linux/fortify-string.h:344:25: error: call to
    ‘__write_overflow_field’ declared with attribute warning:
    detected write beyond size of field (1st parameter);
    maybe use struct_group()? [-Werror=attribute-warning]

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:19:04 -07:00
Brian Behlendorf b0f7dd276c Fix -Wattribute-warning in zfs_log_xvattr()
Restructure the code in zfs_log_xvattr() to use a lr_attr_end
structure when accessing lr_attr_t elements located after the
variable sized array.  This makes the code more understandable
and resolves the accessing beyond the end of the field warnings.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:18:57 -07:00
Brian Behlendorf a6e8113fed Silence -Winfinite-recursion warning in luaD_throw()
This code should be kept inline with the upstream lua version as much
as possible.  Therefore, we simply want to silence the warning.  This
check was enabled by default as part of -Wall in gcc 12.1.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:18:50 -07:00
George Amanakis 80a650b7bb Avoid panic with recordsize > 128k, raw sending and no large_blocks
The current codebase does not support raw sending buffers with block
size > 128kB when large_blocks is not active. This can happen in the
codepath dsl_dataset_sync()->dmu_objset_sync()->zio_nowait() which
calls back dmu_objset_write_done()->dsl_dataset_block_born(). If
dsl_dataset_sync() completes its run before dsl_dataset_block_born() is
called, we will end up not activating some of the necessary flags, while
having blocks based on those flags written in the filesystem. A
subsequent send will then panic.

Fix this by directly deciding in dmu_objset_sync() whether these flags
need to be activated later by dsl_dataset_sync(). Instead of panicking
due to a NULL pointer dereference in dmu_dump_write() in case of a send,
print out an error message. Also during scrub verify there are no
contradicting filesystem flags.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12275
Closes #12438
2022-06-27 14:17:25 -07:00
Alexander Motin 1cd72b9c13 Avoid two 64-bit divisions per scanned block
Change math to make it like the ARC, using multiplications instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13591
2022-06-27 11:08:21 -07:00
Alexander Motin c0bf952c84 Several B-tree optimizations
- Introduce first element offset within a leaf.  It allows to reduce
by ~50% average memmove() size when adding/removing elements.  If the
added/removed element is in the first half of the leaf, we may shift
elements before it and adjust the bth_first instead of moving more
elements after it.
 - Use memcpy() instead of memmove() when we know there is no overlap.
 - Switch from uint64_t to uint32_t.  It does not limit anything,
but 32-bit arches should appreciate it greatly in hot paths.
 - Store leaf capacity in struct btree to avoid 64-bit divisions.
 - Adjust zfs_btree_insert_into_leaf() to always result in balanced
leaves after splitting, no matter where the new element was inserted.
Not that we care about it much, but it should also allow B-trees with
as little as two elements per leaf instead of 4 previously.

When scrubbing pool of 12 SSDs, storing 1.5TB of 4KB zvol blocks this
reduces amount of time spent in memmove() inside the scan thread from
13.7% to 5.7% and total scrub time by ~15 seconds out of 9 minutes.
It should also reduce spacemaps load time, but I haven't measured it.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13582
2022-06-24 13:55:58 -07:00
Alan Somers ccf89b39fe Add a "zstream decompress" subcommand
It can be used to repair a ZFS file system corrupted by ZFS bug #12762.
Use it like this:

zfs send -c <DS> | \
zstream decompress <OBJECT>,<OFFSET>[,<COMPRESSION_ALGO>] ... | \
zfs recv <DST_DS>

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by:  Axcient
Workaround for #12762
Closes #13256
2022-06-24 13:28:42 -07:00
Alexander Motin 1c0c729ab4 Several sorted scrub optimizations
- Reduce size and comparison complexity of q_exts_by_size B-tree.
Previous code used two 64-bit divisions and many other operations to
compare two B-tree elements.  It created enormous overhead.  This
implementation moves the math to the upper level and stores the score
in the B-tree elements themselves.  Since all that we need to store in
that B-tree is the extent score and offset, those can fit into single
8 byte value instead of 24 bytes of q_exts_by_addr element and can be
compared with single operation.
 - Better decouple secondary tree logic from main range_tree by moving
rt_btree_ops and related functions into dsl_scan.c as ext_size_ops.
Those functions are very small to worry about the code duplication and
range_tree does not need to know details such as rt_btree_compare.
 - Instead of accounting number of pending bytes per pool, that needs
atomic on global variable per block, account the number of non-empty
per-vdev queues, that change much more rarely.
 - When extent scan is interrupted by TXG end, continue it in the next
TXG instead of selecting next best extent.  It allows to avoid leaving
one truncated (and so likely not the best any more) extent each TXG.

On top of some other optimizations this saves about 1.5 minutes out of
10 to scrub pool of 12 SSDs, storing 1.5TB of 4KB zvol blocks.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13576
2022-06-24 09:50:37 -07:00
Toomas Soome 83691bebf0 Use macros for quotes and such
Use Dq,Pq/Po/Pc macros. illumos dumpadm is now in section 8.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #13586
2022-06-24 09:48:10 -07:00
Brian Behlendorf ad8b9f940c Scrub mirror children without BPs
When scrubbing a raidz/draid pool, which contains a replacing or
sparing mirror with multiple online children, only one child will
be read.  This is not normally a serious concern because the DTL
records are used to determine where a good copy of the data is.
As long as the data can be read from one child the mirror vdev
will use it to repair gaps in any of its children.  Furthermore,
even if the data which was read is corrupt the raidz code will
detect this and issue its own repair I/O to correct the damage
in the mirror vdev.

However, in the scenario where the DTL is wrong due to silent
data corruption (say due to overwriting one child) and the scrub
happens to read from a child with good data, then the other damaged
mirror child will not be detected nor repaired.

While this is possible for both raidz and draid vdevs, it's most
pronounced when using draid.  This is because by default the zed
will sequentially rebuild a draid pool to a distributed spare,
and the distributed spare half of the mirror is always preferred
since it delivers better performance.  This means the damaged
half of the mirror will go undetected even after scrubbing.

For system administrations this behavior is non-intuitive and in
a worst case scenario could result in the only good copy of the
data being unknowingly detached from the mirror.

This change resolves the issue by reading all replacing/sparing
mirror children when scrubbing.  When the BP isn't available for
verification, then compare the data buffers from each child.  They
must all be identical, if not there's silent damage and an error
is returned to prompt the top-level vdev to issue a repair I/O to
rewrite the data on all of the mirror children.  Since we can't
tell which child was wrong a checksum error is logged against the
replacing or sparing mirror vdev.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13555
2022-06-23 10:36:28 -07:00
Tino Reichardt deb1213098 Fix memory allocation issue for BLAKE3 context
The kmem_alloc(sizeof (*ctx), KM_NOSLEEP) call on FreeBSD can't be
used in this code segment. Work around this by pre-allocating a percpu
context array for later use.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13568
2022-06-21 14:32:09 -07:00
Matthew Thode b17663f571 Remove install of zfs-load-module.service for dracut
The zfs-load-module.service service is not currently provided by
the OpenZFS repository so we cannot safely assume it exists.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Thode <mthode@mthode.org>
Closes #13574
2022-06-21 10:37:20 -07:00
Alexander Motin d51f4ea5f9 FreeBSD: Improve crypto_dispatch() handling
Handle crypto_dispatch() return values same as crp->crp_etype errors.
On FreeBSD 12 many drivers returned same errors both ways, and lack
of proper handling for the first ended up in assertion panic later.
It was changed in FreeBSD 13, but there is no reason to not be safe.

While there, skip waiting for completion, including locking and
wakeup() call, for sessions on synchronous crypto drivers, such as
typical aesni and software.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13563
2022-06-17 15:38:51 -07:00
Andrew f609739985 expose snapshot count via stat(2) of .zfs/snapshot (#13559)
Increase nlinks in stat results of ./zfs/snapshot based on snapshot
count. This provides quick and efficient method for administrators to
get snapshot counts without having to use libzfs or list the snapdir
contents.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #13559
2022-06-17 11:44:49 -07:00
ixhamza 10891b37fa libzfs: Prevent overridding of error code
zfs_send_cb_impl fails to report error for some flags.

Use second error variable for send_conclusion_record.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13558
2022-06-15 14:26:12 -07:00
Alexander Motin dd8671459f Reduce ZIO io_lock contention on sorted scrub
During sorted scrub multiple threads (one per vdev) are issuing many
ZIOs same time, all using the same scn->scn_zio_root ZIO as parent.
It causes huge lock contention on the single global lock on that ZIO.
Improve it by introducing per-queue null ZIOs, children to that one,
and using them instead as proxy.

For 12 SSD pool storing 1.5TB of 4KB blocks on 80-core system this
dramatically reduces lock contention and reduces scrub time from 21
minutes down to 12.5, while actual read stages (not scan) are about
3x faster, reaching 100K blocks per second per vdev.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13553
2022-06-15 14:25:08 -07:00
crass bc00d2c711 Add support for ARCH=um for x86 sub-architectures
When building modules (as well as the kernel) with ARCH=um, the options
-Dsetjmp=kernel_setjmp and -Dlongjmp=kernel_longjmp are passed to the C
preprocessor for C files. This causes the setjmp and longjmp used in
module/lua/ldo.c to be kernel_setjmp and kernel_longjmp respectively in
the object file. However, the setjmp and longjmp that is intended to be
called is defined in an architecture dependent assembly file under the
directory module/lua/setjmp. Since it is an assembly and not a C file,
the preprocessor define is not given and the names do not change. This
becomes an issue when modpost is trying to create the Module.symvers
and sees no defined symbol for kernel_setjmp and kernel_longjmp. To fix
this, if the macro CONFIG_UML is defined, then setjmp and longjmp
macros are undefined.

When building with ARCH=um for x86 sub-architectures, CONFIG_X86 is not
defined. Instead, CONFIG_UML_X86 is defined. Despite this, the UML x86
sub-architecture can use the same object files as the x86 architectures
because the x86 sub-architecture UML kernel is running with the same
instruction set as CONFIG_X86. So the modules/Kbuild build file is
updated to add the same object files that CONFIG_X86 would add when
CONFIG_UML_X86 is defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Glenn Washburn <development@efficientek.com>
Closes #13547
2022-06-15 14:22:52 -07:00
Damian Szuberski 9884319666 Fix clang 13 compilation errors
```
os/linux/zfs/zvol_os.c:1111:3: error: ignoring return value of function
  declared with 'warn_unused_result' attribute [-Werror,-Wunused-result]
                add_disk(zv->zv_zso->zvo_disk);
                ^~~~~~~~ ~~~~~~~~~~~~~~~~~~~~

zpl_xattr.c:1579:1: warning: no previous prototype for function
  'zpl_posix_acl_release_impl' [-Wmissing-prototypes]
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13551
2022-06-15 14:20:28 -07:00
Allan Jude 4ff7a8fa2f Replace ZPROP_INVAL with ZPROP_USERPROP where it means a user property
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Klara Inc.
Closes #12676
2022-06-14 11:27:53 -07:00
Ryan Moeller 9e605cf155 spl: Use a clearer name for the user namespace fd
This fd has nothing to do with cleanup, that's just the name of the
field in zfs_cmd_t that was used to pass it to the kernel.

Call it what it is, an fd for a user namespace.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13554
2022-06-14 08:14:19 -07:00
Ryan Moeller def1a401f4 libzfs: zfs_userns: Don't leak the namespace fd
zfs_userns opens a file descriptor for the kernel to look up a
namespace, but does not close it.

Close the fd when we're done with it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13554
2022-06-14 08:13:38 -07:00
Julian Brunner 482505fd42 Add weekly and monthly systemd timers for trimming
On machines using systemd, trim timers can be enabled on a per-pool
basis. Weekly and monthly timer units are provided. Timers can be
enabled as follows:

systemctl enable zfs-trim-weekly@rpool.timer --now
systemctl enable zfs-trim-monthly@datapool.timer --now

Each timer will pull in zfs-trim@${poolname}.service, which is not
schedule-specific.

The manpage zpool-trim has been updated accordingly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Julian Brunner <julian.brunner@gmail.com>
Closes #13544
2022-06-10 18:22:14 -07:00
Alexander Motin 87b46d63b2 Improve sorted scan memory accounting
Since we use two B-trees q_exts_by_size and q_exts_by_addr, we should
count 2x sizeof (range_seg_gap_t) per node.  And since average B-tree
memory efficiency is about 75%, we should increase it to 3x.

Previous code under-counted up to 30% of the memory usage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13537
2022-06-10 10:01:46 -07:00
Will Andrews 4ed5e25074 Add Linux namespace delegation support
This allows ZFS datasets to be delegated to a user/mount namespace
Within that namespace, only the delegated datasets are visible
Works very similarly to Zones/Jailes on other ZFS OSes

As a user:
```
 $ unshare -Um
 $ zfs list
no datasets available
 $ echo $$
1234
```

As root:
```
 # zfs list
NAME                            ZONED  MOUNTPOINT
containers                      off    /containers
containers/host                 off    /containers/host
containers/host/child           off    /containers/host/child
containers/host/child/gchild    off    /containers/host/child/gchild
containers/unpriv               on     /unpriv
containers/unpriv/child         on     /unpriv/child
containers/unpriv/child/gchild  on     /unpriv/child/gchild

 # zfs zone /proc/1234/ns/user containers/unpriv
```

Back to the user namespace:
```
 $ zfs list
NAME                             USED  AVAIL     REFER  MOUNTPOINT
containers                       129M  47.8G       24K  /containers
containers/unpriv                128M  47.8G       24K  /unpriv
containers/unpriv/child          128M  47.8G      128M  /unpriv/child
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Will Andrews <will.andrews@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Sponsored-by: Buddy <https://buddy.works>
Closes #12263
2022-06-10 09:51:46 -07:00
Allan Jude a1aa8f14c8 Revert parts of 938cfeb0f2
When read and writing the UID/GID, we always want the value
relative to the root user namespace, the kernel will take care
of remapping this to the user namespace for us.

Calling from_kuid(user_ns, uid) with a unmapped uid will return -1
as that uid is outside of the scope of that namespace, and will result
in the files inside the namespace all being owned by 'nobody' and not
being allowed to call chmod or chown on them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #12263
2022-06-10 09:51:32 -07:00
Alexander Motin fc5200aa9b AVL: Remove obsolete branching optimizations
Modern Clang and GCC can successfully implement simple conditions
without branching with math and flag operations.  Use of arrays for
translation no longer helps as much as it was 14+ years ago.

Disassemble of the code generated by Clang 13.0.0 on FreeBSD 13.1,
Clang 14.0.4 on FreeBSD 14 and GCC 10.2.1 on Debian 11 with this
change still shows no branching instructions.

Profiling of CPU-bound scan stage of sorted scrub shows reproducible
reduction of time spent inside avl_find() from 6.52% to 4.58%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13540
2022-06-09 15:27:36 -07:00
Ryan Moeller cdeb98a116 libzfs: Rename msg bufs to errbuf for consistency
`libzfs_pool.c` uses the name `msg` where everywhere else in libzfs uses
`errbuf` for the error message buffer.

Use the name consistent with the rest of libzfs and use ERRBUFLEN
instead of 1024.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13539
2022-06-09 15:24:51 -07:00
Ryan Moeller d68a2bfa99 libzfs: Define the defecto standard errbuf size
Every errbuf array in libzfs is 1024 chars.

Define ERRBUFLEN in a shared header, and use it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13539
2022-06-09 15:24:24 -07:00
Tony Hutter 6f73d02168 zvol: Support blk-mq for better performance
Add support for the kernel's block multiqueue (blk-mq) interface in
the zvol block driver.  blk-mq creates multiple request queues on
different CPUs rather than having a single request queue.  This can
improve zvol performance with multithreaded reads/writes.

This implementation uses the blk-mq interfaces on 4.13 or newer
kernels.  Building against older kernels will fall back to the
older BIO interfaces.

Note that you must set the `zvol_use_blk_mq` module param to
enable the blk-mq API.  It is disabled by default.

In addition, this commit lets the zvol blk-mq layer process whole
`struct request` IOs at a time, rather than breaking them down
into their individual BIOs.  This reduces dbuf lock contention
and overhead versus the legacy zvol submit_bio() codepath.

	sequential dd to one zvol, 8k volblocksize, no O_DIRECT:

	legacy submit_bio()     292MB/s write  453MB/s read
	this commit             453MB/s write  885MB/s read

It also introduces a new `zvol_blk_mq_chunks_per_thread` module
parameter. This parameter represents how many volblocksize'd chunks
to process per each zvol thread.  It can be used to tune your zvols
for better read vs write performance (higher values favor write,
lower favor read).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13148
Issue #12483
2022-06-09 08:10:38 -06:00
Tino Reichardt 985c33b132 Introduce BLAKE3 checksums as an OpenZFS feature
This commit adds BLAKE3 checksums to OpenZFS, it has similar
performance to Edon-R, but without the caveats around the latter.

Homepage of BLAKE3: https://github.com/BLAKE3-team/BLAKE3
Wikipedia: https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE3

Short description of Wikipedia:

  BLAKE3 is a cryptographic hash function based on Bao and BLAKE2,
  created by Jack O'Connor, Jean-Philippe Aumasson, Samuel Neves, and
  Zooko Wilcox-O'Hearn. It was announced on January 9, 2020, at Real
  World Crypto. BLAKE3 is a single algorithm with many desirable
  features (parallelism, XOF, KDF, PRF and MAC), in contrast to BLAKE
  and BLAKE2, which are algorithm families with multiple variants.
  BLAKE3 has a binary tree structure, so it supports a practically
  unlimited degree of parallelism (both SIMD and multithreading) given
  enough input. The official Rust and C implementations are
  dual-licensed as public domain (CC0) and the Apache License.

Along with adding the BLAKE3 hash into the OpenZFS infrastructure a
new benchmarking file called chksum_bench was introduced.  When read
it reports the speed of the available checksum functions.

On Linux: cat /proc/spl/kstat/zfs/chksum_bench
On FreeBSD: sysctl kstat.zfs.misc.chksum_bench

This is an example output of an i3-1005G1 test system with Debian 11:

implementation      1k      4k     16k     64k    256k      1m      4m
edonr-generic     1196    1602    1761    1749    1762    1759    1751
skein-generic      546     591     608     615     619     612     616
sha256-generic     240     300     316     314     304     285     276
sha512-generic     353     441     467     476     472     467     426
blake3-generic     308     313     313     313     312     313     312
blake3-sse2        402    1289    1423    1446    1432    1458    1413
blake3-sse41       427    1470    1625    1704    1679    1607    1629
blake3-avx2        428    1920    3095    3343    3356    3318    3204
blake3-avx512      473    2687    4905    5836    5844    5643    5374

Output on Debian 5.10.0-10-amd64 system: (Ryzen 7 5800X)

implementation      1k      4k     16k     64k    256k      1m      4m
edonr-generic     1840    2458    2665    2719    2711    2723    2693
skein-generic      870     966     996     992    1003    1005    1009
sha256-generic     415     442     453     455     457     457     457
sha512-generic     608     690     711     718     719     720     721
blake3-generic     301     313     311     309     309     310     310
blake3-sse2        343    1865    2124    2188    2180    2181    2186
blake3-sse41       364    2091    2396    2509    2463    2482    2488
blake3-avx2        365    2590    4399    4971    4915    4802    4764

Output on Debian 5.10.0-9-powerpc64le system: (POWER 9)

implementation      1k      4k     16k     64k    256k      1m      4m
edonr-generic     1213    1703    1889    1918    1957    1902    1907
skein-generic      434     492     520     522     511     525     525
sha256-generic     167     183     187     188     188     187     188
sha512-generic     186     216     222     221     225     224     224
blake3-generic     153     152     154     153     151     153     153
blake3-sse2        391    1170    1366    1406    1428    1426    1414
blake3-sse41       352    1049    1212    1174    1262    1258    1259

Output on Debian 5.10.0-11-arm64 system: (Pi400)

implementation      1k      4k     16k     64k    256k      1m      4m
edonr-generic      487     603     629     639     643     641     641
skein-generic      271     299     303     308     309     309     307
sha256-generic     117     127     128     130     130     129     130
sha512-generic     145     165     170     172     173     174     175
blake3-generic      81      29      71      89      89      89      89
blake3-sse2        112     323     368     379     380     371     374
blake3-sse41       101     315     357     368     369     364     360

Structurally, the new code is mainly split into these parts:
- 1x cross platform generic c variant: blake3_generic.c
- 4x assembly for X86-64 (SSE2, SSE4.1, AVX2, AVX512)
- 2x assembly for ARMv8 (NEON converted from SSE2)
- 2x assembly for PPC64-LE (POWER8 converted from SSE2)
- one file for switching between the implementations

Note the PPC64 assembly requires the VSX instruction set and the
kfpu_begin() / kfpu_end() calls on PowerPC were updated accordingly.

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Closes #10058
Closes #12918
2022-06-08 15:55:57 -07:00
Brian Behlendorf b9d98453f9 autoconf: AC_MSG_CHECKING consistency
Make the wording more consistent for the kernel AC_MSG_CHECKING
output (e.g. "checking whether ...".).  Additionally, group some
of the VFS interface checks with the others.  No functional change.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13529
2022-06-01 09:59:37 -07:00
Brian Behlendorf 4c6526208d Linux 5.19 compat: asm/fpu/internal.h
As of the Linux 5.19 kernel the asm/fpu/internal.h header was
entirely removed.  It has been effectively empty since the 5.16
kernel and provides no required functionality.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13529
2022-06-01 09:59:15 -07:00
Alexander Motin 42cf2ad0e4 Remove wrong assertion in log spacemap
It is typical, but not generally true that if log summary has more
blocks it must also have unflushed metaslabs.  Normally with metaslabs
flushed in order it works, but there are known exceptions, such as
device removal or metaslab being loaded during its flush attempt.

Before 600a02b884 if spa_flush_metaslabs() hit loading metaslab it
usually stopped (unless memlimit is also exceeded), but now it may
flush more metaslabs, just skipping that particular one.  This
increased chances of assertion to fire when the skipped metaslab is
flushed on next iteration if all other metaslabs in that summary
entry are already flushed out of order.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13486 
Closes #13513
2022-06-01 09:54:35 -07:00
Rich Ercolani bc8192cd5b Corrected parameters for zstd early abort
That'll teach me to try and recall them from the definition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13519
2022-05-31 15:41:33 -07:00
Allan Jude 2310dba9eb Fix typo in zil_commit() comment block
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #13518
2022-05-31 15:37:46 -07:00
Brian Behlendorf a70e613070 Linux 5.18 compat: META
Update the META file to reflect compatibility with the 5.18 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13527
2022-05-31 14:38:00 -07:00
Brian Behlendorf 91350681b8 Linux 5.19 compat: zap_flags_t conflict
As of the Linux 5.19 kernel an identically named zap_flags_t typedef
is declared in the include/linux/mm_types.h linux header.  Sadly,
the inclusion of this header cannot be easily avoided.  To resolve
the conflict a #define is used to remap the name in the OpenZFS
sources when building against the Linux kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-05-31 12:04:39 -07:00
Brian Behlendorf d41e864181 Linux 5.19 compat: bdev_start_io_acct() / bdev_end_io_acct()
As of the Linux 5.19 kernel the disk_*_io_acct() helper functions
have been replaced by the bdev_*_io_acct() functions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-05-31 12:04:35 -07:00
Brian Behlendorf c2c2e7bb8b Linux 5.19 compat: aops->read_folio()
As of the Linux 5.19 kernel the readpage() address space operation
has been replaced by read_folio().

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-05-31 12:04:31 -07:00
Brian Behlendorf a12a5cb5b8 Linux 5.19 compat: blkdev_issue_secure_erase()
Linux 5.19 commit torvalds/linux@44abff2c0 splits the secure
erase functionality from the blkdev_issue_discard() function.
The blkdev_issue_secure_erase() must now be issued to issue
a secure erase.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-05-31 12:04:26 -07:00
Brian Behlendorf e2c31f2bc7 Linux 5.19 compat: bdev_max_secure_erase_sectors()
Linux 5.19 commit torvalds/linux@44abff2c0 removed the
blk_queue_secure_erase() helper function.  The preferred
interface is to now use the bdev_max_secure_erase_sectors()
function to check for discard support.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-05-31 12:04:22 -07:00
Brian Behlendorf 5e4aedaca7 Linux 5.19 compat: bdev_max_discard_sectors()
Linux 5.19 commit torvalds/linux@70200574cc removed the
blk_queue_discard() helper function.  The preferred interface
is to now use the bdev_max_discard_sectors() function to check
for discard support.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-05-31 12:04:17 -07:00
Brian Behlendorf 5f264996f4 Linux 5.18 compat: bio_alloc()
As for the Linux 5.18 kernel bio_alloc() expects a block_device struct
as an argument.  This removes the need for the bio_set_dev() compatibility
code for 5.18 and newer kernels.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-05-31 12:04:03 -07:00
Kevin Jin 152d6fda54 Fix inflated quiesce time caused by lwb_tx during zil_commit()
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes #12321
2022-05-26 09:36:14 -07:00
Brian Behlendorf d98a67a53a Replace EXTRA_DIST with dist_noinst_DATA
The EXTRA_DIST variable is ignored when used in the FALSE conditional
of a Makefile.am.  This results in the `make dist` target omitting
these files from the generated tarball unless CONFIG_USER is defined.
This issue can be avoided by switching to use the dist_noinst_DATA
variable which is handled as expected by autoconf.

This change also adds support for --with-config=dist as an alias
for --with-config=srpm and updates the GitHub workflows to use it.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13459
Closes #13505
2022-05-26 09:24:50 -07:00
Ryan Moeller b62829295e Silence unused-but-set-variable warning
This was breaking the kmod port build on FreeBSD with Clang 13.

Use the same trick as we do for ASSERT() to make DNODE_VERIFY() use
its parameter at compile time without actually using it at run time
in non-debug builds.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13507
2022-05-25 17:26:59 -07:00
Alexander Motin 6aa8c21a2a More speculative prefetcher improvements
- Make prefetch distance adaptive: up to 4MB prefetch doubles for
every, hit same as before, but after that it grows by 1/8 every time
the prefetch read does not complete in time to satisfy the demand.
My tests show that 4MB is sufficient for wide NVMe pool to saturate
single reader thread at 2.5GB/s, while new 64MB maximum allows the
same thread to reach 1.5GB/s on wide HDD pool.  Further distance
increase may increase speed even more, but less dramatic and with
higher latency.

 - Allow early reuse of inactive prefetch streams: streams that never
saw hits can be reused immediately if there is a demand, while others
can be reused after 1s of inactivity, starting with the oldest.  After
2s of inactivity streams are deleted to free resources same as before.
This allows by several times increase strided read performance on HDD
pool in presence of simultaneous random reads, previously filling the
zfetch_max_streams limit for seconds and so blocking most of prefetch.

 - Always issue intermediate indirect block reads with SYNC priority.
Each of those reads if delayed for longer may delay up to 1024 other
block prefetches, that may be not good for wide pools.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13452
2022-05-25 10:12:52 -07:00
наб 1d89b989c1 automake: don't install /e/d/zfs or /e/z/zfs-functions +x
_SCRIPTS means it's made +x when installing; _DATA is made -x.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13496
Closes #13503
2022-05-25 09:29:47 -07:00
Paul Dagnelie 7829b465a7 Cancel in-progress rebuilds when we finish removal
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #13498
2022-05-25 09:25:13 -07:00
Umer Saleem b37093a188 rpm: Keep debug symbols if configured with '--enable-debuginfo'
Do not strip debug information from packages if '--enable-debuginfo' is
configured.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13500
2022-05-25 09:22:11 -07:00
Brian Behlendorf 61ef68727b Standardize RHEL version check in packages
This is a follow up to 3c35662299 which standardizes how the RHEL
version check is done.  This simpler "0%{?rhel}" check is used
elsewhere in the packages so we do the same here.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13501
2022-05-25 09:20:17 -07:00
Rich Ercolani 3bbc26097e Unbreak zstd build on sparc64
It turns out that wrapping the atomic macro in () breaks build
on Linux/SPARC64. Oops.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13506
2022-05-25 09:18:49 -07:00
Brian Behlendorf 82aa4f6f85 Switch sed -E to -r for better portability
GNU sed 4.1.2 does not support the -E flag and this version is used by
some cross-compiling tool chains.  Switch -E to -r which is understood.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13502
2022-05-25 09:13:51 -07:00
Neal Gompa (ニール・ゴンパ) 4dc1c8a0b8 rpm: Use the correct version-release information in dependencies
This tightly links the subpackages together and ensures that everything
is upgraded together.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Neal Gompa <ngompa@datto.com>
Closes #13489
2022-05-24 14:07:01 -07:00
Alexander Motin 84d0a03f3e Refactor Log Size Limit
Original Log Size Limit implementation blocked all writes in case of
limit reached until the TXG is committed and the log is freed.  It
caused huge delays and following speed spikes in application writes.

This implementation instead smoothly throttles writes, using exactly
the same mechanism as used for dirty data.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: jxdking <lostking2008@hotmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Issue #12284
Closes #13476
2022-05-24 09:46:35 -07:00
Rich Ercolani f375b23c02 Tiered early abort, zstd edition
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13244
2022-05-24 09:43:22 -07:00
Ryan Moeller 2e05765006 FreeBSD: libspl: Add locking around statfs globals
Makes getmntent and getmntany thread-safe for external consumers of
libzfs zpool_disable_datasets, zfs_iter_mounted, libzfs_mnttab_update,
libzfs_mnttab_find.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13484
2022-05-24 09:40:20 -07:00
Rich Ercolani 3c35662299 Modified ncompress requirement in RPM to exclude RHEL9
The bug this was working around is no longer present.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13480 
Closes #13490
2022-05-24 09:39:32 -07:00
Brian Behlendorf cf70c0f8ae zed: Take no action on scrub/resilver checksum errors
When scrubbing/resilvering a pool it can be counter productive to
cancel the scan and kick of a replace operation to a hot spare
when encountering checksum errors.  In this case, the best course
of action is to allow the scrub/resilver to complete as quickly
as possible and to keep the vdevs fully online if possible.

Realistically, this is less of an issue for a RAIDZ since a
traditional resilver must be used and checksums will be verified.
However, this is not the case for a mirror or dRAID pool which is
sequentially resilvered and checksum verification is deferred
until after the replace operation completes.

Regardless, we apply this policy to all pool types since it's
a good idea for all vdevs.  Degrading additional vdevs has the
potential to make a bad situation worse.  Note the checksum
errors will still be reported as both an event and by
`zpool status`.  This change only prevents the ZED from
proactively taking any action.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13499
2022-05-24 09:36:07 -07:00
Brian Behlendorf 2cd0f98f4a Verify BPs in spa_load_verify_cb() and dsl_scan_visitbp()
We want `zpool import` to be highly robust and never panic, even
when encountering corrupt metadata.  This is already handled in the
arc_read() code path, which covers most cases, but spa_load_verify_cb()
relies on zio_read() and is responsible for verifying the block pointer.

During import it is also possible to encounter blocks pointers which
contain ZIO_COMPRESS_INHERIT and ZIO_CHECKSUM_INHERIT values.  Relax
the verification function slightly to allow this.

Futhermore, extend dsl_scan_recurse() to verify the block pointer
contents of level zero blocks which are not of type DMU_OT_DNODE or
DMU_OT_OBJSET.  This is handled by arc_read() in the other cases.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13124 
Closes #13360
2022-05-20 10:36:14 -07:00
Mark Johnston 03df6bad94 zdb: Fix handling of nul termination in symlink targets
The SA attribute containing the symlink target does not include a nul
terminator, so when printing the target zdb would sometimes include
garbage at the end of the string.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #13482
2022-05-20 10:32:49 -07:00
наб e02d84d3a5 linux: libshare: smb: don't swallow net(1) errors
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13191
Closes #13470
2022-05-18 15:56:38 -07:00
наб 2b4f2fc93c libzfs: return (allocated) strings instead of filling buffers
This also expands the zfs version output from 127 characters to However
Many Are Actually Set

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13330
2022-05-18 12:52:10 -07:00
наб 38f4d99f76 linux: libzfs: simplify module-loaded check
The short-path is now one access() call,
we always modprobe zfs (ZFS_MODULE_LOADING which doesn't use the libzfs
boolean parsing is gone),
and we use a simple inotify IN_CREATE loop with a timerfd timeout
rather than 10ms kernel-style polling

There's one substantial difference: ZFS_MODULE_TIMEOUT=-1
now means "never give up", rather than "wait 10 minutes"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13330
2022-05-18 12:51:42 -07:00
наб 89e81bc6ad Remove final K&R definitions
Clang trunk now warns -Wstrict-prototypes on this, and they're removed
in C2x

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13447
2022-05-18 12:10:47 -07:00
наб 6b575417e2 libspl/include: remove unused/empty headers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13447
2022-05-18 12:10:43 -07:00
наб 2f713390d1 kmodtool: cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13447
2022-05-18 12:10:39 -07:00
наб 7062a956f7 rpm: don't spec obsolete_name/version anymore
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13447
2022-05-18 12:10:35 -07:00
наб 7506f5af92 Add make regen-tests to regenerate the test bundle
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13447
2022-05-18 12:10:09 -07:00
heeplr 08b32c6fa9 zed: support subject as header in zed_notify_email()
Some minimal MUAs don't support passing the subjects as cmdline option.
This commit checks if "@SUBJECT@" is missing in ZED_EMAIL_OPTS and then
prepends a subject header to the notification message.
Also set a default for ${subject}.

Reviewed-by: Ahelenia Ziemia<C5><84>ska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Daniel Hiepler <d-git@coderdu.de>
Closes #13440
2022-05-18 10:27:53 -07:00
Andrew 00ac77464e Expose zpool guids through kstats
There are times when end-users may wish to have
a fast and convenient method to get zpool guid
without having to use libzfs. This commit
exposes the zpool guid via kstats in similar
manner to the zpool state.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #13466
2022-05-18 10:25:33 -07:00
Coleman Kane c0cf6ed679 Fix compiler warnings about zero-length arrays in inline bitops
The compiler appears to be expanding the unused NULL pointer into a
zero-length array via the inline bitops code. When -Werror=array-bounds
is used, this causes a build failure. Recommended solution is allocate
temporary structures, fill with zeros (to avoid uninitialized data use
warnings), and pass the pointer to those to the inline calls.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #13463 
Closes #13465
2022-05-17 13:07:39 -07:00
наб 276b08cb00 linux: libzutil: zfs_strip_path: only strip known prefixes
This mirrors FreeBSD:
  # zpool create -o cachefile= testpsko media/testpsko
  # zpool create -o cachefile= testpsko2 $PWD/testpsko2
  $ ./zpool list -v
  NAME                                              SIZE  ALLOC   FREE
  filling                                          25.5T  6.85T  18.6T
    mirror-0                                       3.64T   500G  3.15T
      ata-HGST_HUS726T4TALE6L4_V6K2L4RR                -      -      -
      ata-HGST_HUS726T4TALE6L4_V6K2MHYR                -      -      -
    raidz1-1                                       21.8T  6.36T  15.5T
      ata-HGST_HUS728T8TALE6L4_VDKT237K                -      -      -
      ata-HGST_HUS728T8TALE6L4_VDGY075D                -      -      -
      ata-HGST_HUS728T8TALE6L4_VDKVRRJK                -      -      -
  cache                                                -      -      -
    nvme0n1p4                                      63.0G  12.8G  50.2G
  tarta-boot                                        240M  50.0M   190M
    mirror-0                                        240M  50.0M   190M
      tarta-boot                                       -      -      -
      tarta-boot-nvme                                  -      -      -
  tarta-zoot                                       55.5G  6.96G  48.5G
    mirror-0                                       55.5G  6.96G  48.5G
      tarta-zoot                                       -      -      -
      tarta-zoot-nvme                                  -      -      -
  testpsko                                         39.5G   744K  39.5G
    media/testpsko1                                39.5G   744K  39.5G
  testpsko2                                        39.5G   130K  39.5G
    /home/nabijaczleweli/store/code/zfs/testpsko2  39.5G   130K  39.5G

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13413
Closes #9771
2022-05-16 15:56:57 -07:00
наб 5ac80603bd libzfs: constify zfs_strip_partition(), zfs_strip_path()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13413
2022-05-16 15:56:53 -07:00
наб ca3b105cd9 libzfs: pool: zpool_vdev_name: use libzfs_envvar_is_set
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13413
2022-05-16 15:56:48 -07:00
наб e9072c76f8 zpool: max_width: monomorphise subtype iteration
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13413
2022-05-16 15:56:27 -07:00
наб de82164518 linux: spl: generic: ddi_strto*: match solaris ddi_strto*(9)
Recognise initial whitespace, + in both cases,
and - also in unsigneds

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13434
2022-05-13 10:15:47 -07:00
наб 354a1bfb8e linux: spl: generic: ddi_strtou##type: elide unused flag
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13434
2022-05-13 10:15:44 -07:00
наб c25b281378 Remove hw_serial, ddi_strtoul()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13434
2022-05-13 10:15:31 -07:00
Mateusz Piotrowski 0c11a2738d Fix typos in zfs-bookmark examples
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Closes #13456
2022-05-12 09:34:24 -07:00
наб 53600767ee linux: libshare/nfs: don't do anything unless exportfs is available
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
Closes #13324
2022-05-12 09:27:12 -07:00
наб 086af23e68 linux: libshare/smb: cache smb_available
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:27:08 -07:00
наб 1ec9218faa libzfs: zfs_unshare: minor cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:27:04 -07:00
наб 88d5580e51 tests: add zfs_unshare_008_pos checking whitespace escaping
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:27:00 -07:00
наб 9b06aa634a libshare/nfs: escape mount points when needed
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
Closes #13153
2022-05-12 09:26:54 -07:00
наб a31fcd4bad linux: libshare/nfs: bsearch() over valid keys
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:50 -07:00
наб 5b14feec06 libzfs: mount: zfs_unshare: don't reallocate mountpoint
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:47 -07:00
наб b4d9a82f62 Replace libzfs sharing _nfs() and _smb() APIs with protocol lists
With the additional benefit of removing all the _all() functions and
treating a NULL list as "all" ‒ the remaining all function is for all
/datasets/, which is consistent with the rest of the API

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:42 -07:00
наб 471e9a108e Publish libshare protocols, use enum-based API
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:38 -07:00
наб 21d976a621 libshare: delineate obsolete errors
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:34 -07:00
наб 63ce6dd988 libshare: use AVL tree with static data, pass all data in arguments
This makes it so we don't leak a consistent 64 bytes anymore,
makes the searches simpler and faster, removes /all allocations/
from the driver (quite trivially, since they were absolutely needless),
and makes libshare thread-safe (except, maybe, linux/smb, but that only
does pointer-width loads/stores so it's also mostly fine, except for
leaking smb_shares)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:25 -07:00
наб 4ccbb23971 libshare: interface: {=> const} char *
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:22 -07:00
наб 2faf05612f libshare/smb: cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:18 -07:00
наб 566e4a58b7 libshare/nfs: destaticify nfs_lock_fd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:13 -07:00
наб ee668b8331 freebsd: libshare/nfs: write directly in translate_opts()
This renders it thread-safe

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:08 -07:00
наб 5f0c1c4ebd freebsd: libshare/nfs: constify static const data
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:25:31 -07:00
Brian Behlendorf 45588be285 Add missing AC_MSG_RESULT(no) to configure
When the HAVE_IOPS_MKDIR_USERNS check fails output result
as required.
 
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13454
2022-05-12 09:12:32 -07:00
Brian Behlendorf 196e7c7617 ztest: reduce runtile of zloop.sh in CI
The zloop.sh script is primarily designed to randomly stress
the DMU and SPA layers.  This can result in some unrealistic
(or even impossible) scenarios being tested which then fail.

Since the longer we run zloop.sh the more likely this is to occur
this commit reduces the runtime.  The intention being that normally
this will result in a clean CI run unless the PR does introduce
serious breaking change.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13453
2022-05-12 09:11:29 -07:00
Rich Ercolani bd88c036e6 Added a workaround for Linux KASAN builds
Linux passes -Wframe-larger-than=1024, which breaks
our build in a number of places with -Werror.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13450
2022-05-11 13:26:55 -07:00
наб 5a142533f8 udev: zvol_id: simplify/modernise
zero-alloc, sensibler errors, don't close (or free) before exit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13337
2022-05-11 10:58:19 -07:00
наб 1f5bc12893 ztest: O_CLOEXEC ztest_fd_rand
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13411
2022-05-11 10:33:12 -07:00
наб 888914486e ztest: take -B ./path/to/ztest, LD_LIBRARY_PATH=./path/lib:$L_L_P
This changes the behaviour of -B from the illumos one which would,
in the example in the manual, take just ./chroots/lenny;
this, however, is more versatile, and scales much better for systems
with ZFS in /usr/local, for example

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13411
Closes #1770
2022-05-11 10:33:12 -07:00
наб 951a3889d1 tests: many_fds: simplify, modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13411
2022-05-11 10:33:12 -07:00
наб 510ee280c0 Remove enable_extended_FILE_stdio()
Even on Illumos it's only available in the 32-bit programming
environment, and, quoth enable_extended_FILE_stdio(3C):
> Historically, 32-bit Solaris applications have been limited to using
> only the file descriptors 0 through 255 with the standard I/O
> functions (see stdio(3C)) in the C library. The extended FILE
> facility allows well-behaved 32-bit applications to use any
> valid file descriptor with the standard I/O functions.
where "well-behaved" means that it
> does not directly access any fields in the FILE structure pointed
> to by the FILE pointer associated with any standard I/O stream,

And the stdio/flush.c implementation reads:
  /*
   * if this is not an internal extended FILE then check
   * if _file is being changed from underneath us.
   * It should not be because if
   * it is then then we lose our ability to guard against
   * silent data corruption.
   */
  if (!iop->__xf_nocheck && bad_fd > -1 && iop->_magic != bad_fd) {
      (void) fprintf(stderr,
          "Application violated extended FILE safety mechanism.\n"
          "Please read the man page for extendedFILE.\nAborting\n");
      abort();
  }

This appears to be an insane workaround for broken implementation with
exposed FILE internals and _file being an u8, both only on non-LP64;
it's shimmed out on all LP64 targets in Illumos,
and we shim it out as well: just get rid of it

This appears to've been originally fixed in illumos-gate
a5f69788de7ac07553de47f7fec8c05a9a94c105 ("PSARC 2006/162 Extended FILE
space for 32-bit Solaris processes", "1085341 32-bit stdio routines
should support file descriptors >255"), which also bears extendedFILE
and enable_extended_FILE_stdio(3C):
  -       unsigned char   _file;  /* UNIX System file descriptor */
  +       unsigned char   _magic; /* Old home of the file descriptor */
  +                               /* Only fileno(3C) can retrieve the
  				value now */
and
  +/*
  + * Macros to aid the extended fd FILE work.
  + * This helps isolate the changes to only the 32-bit code
  + * since 64-bit Solaris is not affected by this.
  + */
  +#ifdef  _LP64
  +#define        GET_FD(iop)             ((iop)->_file)
  +#define        SET_FILE(iop, fd)       ((iop)->_file = (fd))
  +#else
  +#define        GET_FD(iop)             \
  +               (((iop)->__extendedfd) ? _file_get(iop) : (iop)->_magic)
  +#define        SET_FILE(iop, fd)       (iop)->_magic = (fd); (iop)->__extendedfd = 0
  +#endif

Also remove the 1k setrlimit(NOFILE) calls: that's the default on Linux,
with 64k on Illumos and 171k on FreeBSD

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13411
2022-05-11 10:33:12 -07:00
szubersk e0911f7b7f autoconf: Fail when __copy_from_user_inatomic is a non-GPL symbol
A followup to 849c14e048
Fix https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1009242

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13389
2022-05-11 10:32:51 -07:00
Brian Atkinson f567d67fda Adding ZTS test for O_APPEND
Commit 63b18e4 fixed an issue in zpl_aio_write() to make sure that
kiocb->ki_pos was updated correctly when opening a file with O_APPEND.
Adding a test to verify O_APPEND functionality with lseek can make
sure that all other distros/kernel versions also have the correct
behavior.

Also moved the threadappends_001_pos test into this append test
directory in functional ZTS directory. This way the two append tests
are together for organization purposes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #13424
2022-05-11 08:38:16 -07:00
наб 3ed04d66aa Remove constrained path on clean
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:21:13 -07:00
наб e8ca724393 ztest: fix in-tree detection for automatic zdb path
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:21:09 -07:00
наб c6a5d7d997 ztest: use $ZDB instead of $ZDB_PATH for zdb
Which actually gets zdb as set in common.sh

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:21:04 -07:00
наб 1a58941275 scripts: zpool.sh: cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:21:00 -07:00
наб c982359460 cppcheck: explicitly exclude kernel code from userspace checks
Thus extracting the final shred of utility

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:20:55 -07:00
наб c1851bf081 Makefile: respect V=1 for checks
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:20:51 -07:00
наб 779ac93e96 autogen.sh: paper over automake <1.14's lack of %reldir% support
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:20:46 -07:00
наб 612b8dff5b contrib: bash_completion.d: install, fix dist
dist diff:
  -zfs-2.1.99/contrib/bash_completion.d/zfs

install diff:
  +destdir/usr/local/etc/bash_completion.d
  +destdir/usr/local/etc/bash_completion.d/zfs

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:20:40 -07:00
наб 0a9aaa7f0c cmd: move single-file binaries up, extract udev programs to udev/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:20:34 -07:00
наб eaf94bda6c Replace config/config.awk with simple sed invocation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:20:28 -07:00
наб ea04cc4a22 autoconf: use include directives instead of recursing down test data
We drop /multiple/ seconds off the generation, a dozen off a clean
rebuild, 185 files, and trivialise the distribution,
which can now be trivially generated via the provided snippets

Dist diff:
  -zfs-2.1.99/tests/zfs-tests/tests/functional/pam/utilities.kshlib
  +zfs-2.1.99/tests/zfs-tests/tests/functional/pam/utilities.kshlib.in

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:20:19 -07:00
наб 0425d58852 autoconf: use include directives instead of recursing down tests (mostly)
Only down to tests/zfs-tests/tests, but pull out C programs into the
main Makefile ‒ this means we get correct dependency tracking for all
programs (and parallelise across them)

dist diff:
  -zfs-2.1.99/tests/zfs-tests/tests/stress/
  -zfs-2.1.99/tests/zfs-tests/tests/stress/Makefile.am
  -zfs-2.1.99/tests/zfs-tests/tests/stress/Makefile.in

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:20:09 -07:00
наб 48f4379974 autoconf: use include directives instead of recursing down etc
dist diff:
  -zfs-2.1.99/etc/systemd/system/50-zfs.preset.in
  +zfs-2.1.99/etc/systemd/system/50-zfs.preset

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:19:58 -07:00
наб 50d2c9e4fd autoconf: use include directives instead of recursing down contrib
Also make the pyzfs build actually out-of-tree and quiet by default

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rapptz <rapptz@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:19:44 -07:00
наб 674a9f3727 autoconf: use include directives instead of recursing down udev
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:19:36 -07:00
наб 2820719800 autoconf: use include directives instead of recursing down rpm
Also simplify .gitignore

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:19:25 -07:00
наб 93a8c85e7b Move test-runner.1 into top-level man/
dist delta:
  +zfs-2.1.99/man/man1/test-runner.1
  -zfs-2.1.99/tests/test-runner/man/
  -zfs-2.1.99/tests/test-runner/man/test-runner.1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:19:18 -07:00
наб 0f6c4fd00e autoconf: use include directives instead of recursing down man
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:19:10 -07:00
наб 0a8b1fc625 autoconf: use include directives instead of recursing down scripts
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:19:02 -07:00
наб 09a7ad38a5 autoconf: single-step includes
Still descend, but only once: we get a lot of mileage out of nodist_

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:18:51 -07:00
наб 5cdca5b1da autoconf: use include directives instead of recursing down cmd
No installation diff, dist lost
  -zfs-2.1.99/cmd/fsck_zfs/fsck.zfs
which was distributed erroneously, since it's generated

Also clean gitrev on clean

Also add -e 'any possible bashisms' to default checkbashisms flags,
and fully parallelise it and shellcheck, and it works out-of-tree, too

Also align the Release in the dist META file correctly

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:18:38 -07:00
наб 3ff81c4601 cmd: zvol_id: don't build with -fno-stack-protector anymore
#569 was opened in 2012 and closed in 2015;
if the issue was still there we'd presumably've seen it?

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:18:29 -07:00
наб c8970f52ed autoconf: use include directives instead of recursing down lib
As a bonus, this also adds zfs-mount-generator (previously undescended
down) and libzstd (not included) to CppCheck

As a bonus bonus, abigail rules work out-of-tree, too

Against current trunk:
  $ diff -U0 ./destdir.listing ~/store/code/zfs/destdir.listing
  -destdir/usr/local/include/libspl/sscanf.h

  $ diff --color -U0 ./zfs-2.1.99.tar.gz.listing ../oot/zfs-2.1.99.tar.gz.listing | grep -v @@ | grep -v /Makefile
  -zfs-2.1.99/config/Abigail.am
  -zfs-2.1.99/lib/libspl/include/util/
  -zfs-2.1.99/lib/libspl/include/util/sscanf.h

  $ diff --color -U0 ./zfs-2.1.99.tar.gz.listing ../oot/zfs-2.1.99.tar.gz.listing | grep -v @@ | grep /Makefile
  -zfs-2.1.99/lib/libavl/Makefile.in
  -zfs-2.1.99/lib/libefi/Makefile.in
  -zfs-2.1.99/lib/libicp/Makefile.in
  -zfs-2.1.99/lib/libnvpair/Makefile.in
  -zfs-2.1.99/lib/libshare/Makefile.in
  -zfs-2.1.99/lib/libspl/include/Makefile.in
  -zfs-2.1.99/lib/libspl/include/os/freebsd/Makefile.am
  -zfs-2.1.99/lib/libspl/include/os/freebsd/Makefile.in
  -zfs-2.1.99/lib/libspl/include/os/freebsd/sys/Makefile.am
  -zfs-2.1.99/lib/libspl/include/os/freebsd/sys/Makefile.in
  -zfs-2.1.99/lib/libspl/include/os/linux/Makefile.am
  -zfs-2.1.99/lib/libspl/include/os/linux/Makefile.in
  -zfs-2.1.99/lib/libspl/include/os/linux/sys/Makefile.am
  -zfs-2.1.99/lib/libspl/include/os/linux/sys/Makefile.in
  -zfs-2.1.99/lib/libspl/include/os/Makefile.am
  -zfs-2.1.99/lib/libspl/include/os/Makefile.in
  -zfs-2.1.99/lib/libspl/include/rpc/Makefile.am
  -zfs-2.1.99/lib/libspl/include/rpc/Makefile.in
  -zfs-2.1.99/lib/libspl/include/sys/dktp/Makefile.am
  -zfs-2.1.99/lib/libspl/include/sys/dktp/Makefile.in
  -zfs-2.1.99/lib/libspl/include/sys/Makefile.am
  -zfs-2.1.99/lib/libspl/include/sys/Makefile.in
  -zfs-2.1.99/lib/libspl/include/util/Makefile.am
  -zfs-2.1.99/lib/libspl/include/util/Makefile.in
  -zfs-2.1.99/lib/libspl/Makefile.in
  -zfs-2.1.99/lib/libtpool/Makefile.in
  -zfs-2.1.99/lib/libunicode/Makefile.in
  -zfs-2.1.99/lib/libuutil/Makefile.in
  -zfs-2.1.99/lib/libzfsbootenv/Makefile.in
  -zfs-2.1.99/lib/libzfs_core/Makefile.in
  -zfs-2.1.99/lib/libzfs/Makefile.in
  -zfs-2.1.99/lib/libzpool/Makefile.in
  -zfs-2.1.99/lib/libzstd/Makefile.in
  -zfs-2.1.99/lib/libzutil/Makefile.in
  -zfs-2.1.99/lib/Makefile.in

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:18:11 -07:00
наб 6fc34371e1 libzfs: pool: fix false-positives -Wmaybe-uninitialised
As noted by gcc (Debian 10.2.1-6) 10.2.1 20210110

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:18:06 -07:00
наб be91239efa module: Makefile: cppcheck: zfs_config.h lives in builddir
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:18:02 -07:00
наб ae43120abc config: user: drop ZONENAME, avoid lying about being Linux-only
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:17:55 -07:00
наб 0fb1bf6586 libspl: include: remove [util/]sscanf.h
Unused, empty, installs in a weird location

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:17:49 -07:00
наб 841fd303b8 copy-builtin: add hooks with sed/>>
The order in fs/Makefile doesn't matter,
the order in fs/Kconfig is preserved (ext2 is included as the first
thing in the first if BUILD block, and only once), but I don't think it
matters much either, realistically

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:17:43 -07:00
наб c6f923fba4 autogen.sh: explicitly build its containing directory
No changes in currently-accepted usages (no-argument), but allows
  /src/path/autogen.sh && /src/path/configure
for simpler out-of-tree builds

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:17:36 -07:00
наб ea532b0e6e Remove compat spl-x.y.z from the distribution
This was added in 93ce2b4ca5 ("Update
build system and packaging"), which merged the SPL and ZFS trees,
and included in 0.8.0; "the next major release" was 2.0.0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:17:26 -07:00
наб 5fc332ae4b Makefile: zfs-tests lives in srcdir; zfs-tests: accept common.sh path
This fixes out-of-tree builds

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:16:47 -07:00
Rich Ercolani ecaccf764a Workaround broken VDEV_UPATH
Sometimes, for reasons I haven't looked into yet, VDEV_UPATH
gets set to /dev/(null), breaking all these scripts.

It'd be nice to have a fallback case to avoid total failure.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13436
2022-05-10 10:14:07 -07:00
Rich Ercolani a30927f763 Add workaround for broken Linux pipes
Linux has an unresolved hang if you resize a pipe with bytes
in it.

Since there's no obvious way to detect this happening, added a
workaround to disable resizing the pipe buffer if you set an
environment variable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13309
2022-05-09 16:33:55 -07:00
hping a18d13c200 abd_os: remove redundant refcount creation for abd_children
Refcount creation for abd_zero_scatter->abd_children is redundant in
abd_alloc_zero_scatter, as it has been done in abd_init_struct.

In addition, abd_children is undefined when ZFS_DEBUG is disabled, the
reference of abd_children in abd_alloc_zero_scatter breaks build of
libzpool when ZFS_DEBUG is disabled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Ping Huang <huangping@smartx.com>
Closes #13429
2022-05-09 16:30:16 -07:00
наб d1fd6dbf6d man: zpool-import.8: -d -or -c
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13437
2022-05-09 16:03:17 -07:00
Aidan Harris 493b6e5607 Fix functions without a prototype
clang-15 emits the following error message for functions without
a prototype:

fs/zfs/os/linux/spl/spl-kmem-cache.c:1423:27: error:
  a function declaration without a prototype is deprecated
  in all versions of C [-Werror,-Wstrict-prototypes]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Aidan Harris <me@aidanharr.is>
Closes #13421
2022-05-06 11:57:37 -07:00
наб e77d59ebb2 zpool: guard vs_noalloc and vs_pspace with VDEV_STAT_VALID()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13412
2022-05-05 09:45:12 -07:00
Rich Ercolani 7bf06f7262 Corrected edge case in uncompressed ARC->L2ARC handling
I genuinely don't know why this didn't come up before,
but adding the LZ4 early abort pointed out this flaw,
in which we're allocating a buffer of one size, and
then telling the compressor that we're handing it buffers
of a different size, which may be Very Different - say,
allocating 512b and then telling it the inputs are 128k.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13375
2022-05-04 11:59:30 -07:00
Mateusz Guzik 81b8b2d004 FreeBSD: use zero_region instead of allocating a dedicated page
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13406
2022-05-04 11:46:37 -07:00
WHR a894ae75b8 Fix incorrect use of unit prefix names in man pages
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #13363
2022-05-04 11:44:12 -07:00
Alexander Motin c55b293287 Improve mg_aliquot math
When calculating mg_aliquot alike to #12046 use number of unique data
disks in the vdev, not the total number of children vdev.  Increase
default value of the tunable from 512KB to 1MB to compensate.

Before this change each disk in striped pool was getting 512KB of
sequential data, in 2-wide mirror -- 1MB, in 3-wide RAIDZ1 -- 768KB.
After this change in all the cases each disk should get 1MB.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13388
2022-05-04 11:33:42 -07:00
Brian Behlendorf 34dbc618f5 Reduce dbuf_find() lock contention
Holding a dbuf is a common operation which can become highly contended
in dbuf_find() when acquiring the dbuf hash mutex.  This is particularly
true on Linux when reading/writing volumes since by default up to 32
threads from the zvol_taskq may be taking a hold of the same dbuf.
This should also be observable on FreeBSD as long as there are enough
processes accessing the volume concurrently.

This is further aggregrated by the fact that only the block id will
be unique when calculating the dbuf hash for a single volume.  The
objset id, object id, and level will be the same for data blocks.
This has been observed to result in a somehwat less than uniform hash
distribution and a longer than expected max hash chain depth (~20)
on a large memory system (256 GB) using volumes.

This commit improves the siutation by switching the hash mutex to
an rwlock to allow concurrent lookups, and increasing DBUF_RWLOCKS
from 2048 to 8192 to further reduce the odds of a hash collision.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13405
2022-05-04 11:17:29 -07:00
Shaan Nobee 411f4a018d Speed up WB_SYNC_NONE when a WB_SYNC_ALL occurs simultaneously
Page writebacks with WB_SYNC_NONE can take several seconds to complete 
since they wait for the transaction group to close before being 
committed. This is usually not a problem since the caller does not 
need to wait. However, if we're simultaneously doing a writeback 
with WB_SYNC_ALL (e.g via msync), the latter can block for several 
seconds (up to zfs_txg_timeout) due to the active WB_SYNC_NONE 
writeback since it needs to wait for the transaction to complete 
and the PG_writeback bit to be cleared.

This commit deals with 2 cases:

- No page writeback is active. A WB_SYNC_ALL page writeback starts 
  and even completes. But when it's about to check if the PG_writeback 
  bit has been cleared, another writeback with WB_SYNC_NONE starts. 
  The sync page writeback ends up waiting for the non-sync page 
  writeback to complete.

- A page writeback with WB_SYNC_NONE is already active when a 
  WB_SYNC_ALL writeback starts. The WB_SYNC_ALL writeback ends up 
  waiting for the WB_SYNC_NONE writeback.

The fix works by carefully keeping track of active sync/non-sync 
writebacks and committing when beneficial.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shaan Nobee <sniper111@gmail.com>
Closes #12662
Closes #12790
2022-05-03 13:23:26 -07:00
Pawel Jakub Dawidek a64d757aa4 FreeBSD: Clean up the use of ioflags
- Prefer O_* flags over F* flags that mostly mirror O_* flags anyway,
  but O_* flags seem to be preferred.
- Simplify the code as all the F*SYNC flags were defined as FFSYNC flag.
- Don't define FRSYNC flag, so we don't generate unnecessary ZIL commits.
- Remove EXCL define, FreeBSD ignores the excl argument for zfs_create()
  anyway.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #13400
2022-05-02 16:26:28 -07:00
Jitendra Patidar 159c6fd154 Add missing replay entry in zvol_replay_vector for TX_SETSAXATTR
Commit 361a7e8 (log xattr=sa create/remove/update to ZIL) introduced a
TX_SETSAXATTR, but missed to add a corresponding entry in
zvol_replay_vector. Adding a missing replay entry in zvol_replay_vector.

Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #13396
Closes #13395
2022-05-02 11:01:26 -07:00
Brian Behlendorf 1bf3abc634 Silence unused-but-set-variable warnings
Clang 13.0.0 added support for `Wunused-but-set-parameter` and
`-Wunused-but-set-variable` which correctly detects two unused
variables in zstd resulting in a build failure.  This commit
annotates these instances accordingly.

  https://releases.llvm.org/13.0.1/tools/clang/docs/ReleaseNotes.html#id6

In FSE_createCTable(), malloc() is intentionally defined as NULL when
compiled in the kernel so the variable is unused.

  zstd/lib/compress/fse_compress.c:307:12: error: variable 'size'
  set but not used [-Werror,-Wunused-but-set-variable]

Additionally, in ZSTD_seqDecompressedSize() the assert is compiled
out similarly resulting in an unused variable.

  zstd/lib/compress/zstd_compress_superblock.c:412:12: error: variable
  'litLengthSum' set but not used [-Werror,-Wunused-but-set-variable]

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13382
2022-04-29 14:21:11 -07:00
Rich Ercolani f2330bd156 Default zfs_max_recordsize to 16M
Increase the default allowed maximum recordsize from 1M to 16M.
As described in the zfs(4) man page, there are significant costs
which need to be considered before using very large blocks.
However, there are scenarios where they make good sense and
it should no longer be necessary to artificially restrict their
use behind a module option.

Note that for 32-bit platforms we continue to leave this
restriction in place due to the limited virtual address space
available (256-512MB).  On these systems only a handful
of blocks could be cached at any one time severely impacting
performance and potentially stability.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12830
Closes #13302
2022-04-28 15:12:24 -07:00
Brian Behlendorf 63b18e4097 Fix O_APPEND for Linux 3.15 and older kernels
When using a Linux kernel which predates the iov_iter interface the
O_APPEND flag should be applied in zpl_aio_write() via the call to
generic_write_checks().  The updated pos variable  was incorrectly
ignored resulting in the current offset being used.

This issue should only realistically impact the RHEL/CentOS 7.x
kernels which are based on Linux 3.10.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13370 
Closes #13377
2022-04-27 12:56:17 -07:00
Satadru Pramanik 7dde17e860 Linux 5.18 compat: replace __set_page_dirty_nobuffers
Replace __set_page_dirty_nobuffers with filemap_dirty_folio.

Upstream-commit: 6b1f86f8e9c7f9de7ca1cb987b2cf25e99b1ae3a
("Merge tag 'folio-5.18b' of
git://git.infradead.org/users/willy/pagecache ")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Authored-by: Satadru Pramanik <satadru@gmail.com>
Signed-off-by: Satadru Pramanik <satadru@gmail.com>
Closes #13325
Closes #13380
2022-04-27 12:54:17 -07:00
наб 2a70a09072 zfs: holds: dequadratify
Before:
  15  0m0.177s
  30  0m0.653s
  45  0m1.289s
  60  0m2.129s
  75  0m3.264s
  90  0m4.397s
  100 0m5.996s
  117 0m8.552s

After:
  30  0m0.053s
  117 0m0.125s

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13372
Closes #13373
2022-04-27 11:55:40 -07:00
наб 55d36e47c8 zfs: holds: general cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13373
2022-04-27 11:55:25 -07:00
Damian Szuberski 849c14e048 PPC get_user workaround
Linux 5.12 PPC 5.12 get_user() and __copy_from_user_inatomic()
inline helpers very indirectly include a reference to the GPL'd
array mmu_feature_keys[] and fails to build. Workaround this by
using copy_from_user() and throwing EFAULT for any calls to
__copy_from_user_inatomic(). This is a workaround until a fix
for Linux commit 7613f5a66becfd0e43a0f34de8518695888f5458
"powerpc/64s/kuap: Use mmu_has_feature()" is fully addressed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #11958
Closes #12590
Closes #13367
2022-04-26 10:52:40 -07:00
Damian Szuberski a0dfd98a25 autoconf: Pretend CONFIG_MODULES is always on
- Unconditionally inject `CONFIG_MODULES` make variable
  and `#define CONFIG_MODULES` to Kbuild in `ZFS_LINUX_COMPILE`
  autoconf function to emulate loadable kernel modules support.
  This allows OpenZFS to perform Linux checks despite
  `CONFIG_MODULES=n` in the actual Linux config.

- Add `ZFS_AC_KERNEL_CONFIG_MODULES` check which encompasses
  the logic from `ZFS_AC_KERNEL_TEST_MODULE` with additional
  diagnostic messages to the user

- Removed `ZFS_AC_KERNEL_TEST_MODULE` as it merely duplicates
  every check in `ZFS_AC_KERNEL_CONFIG_DEFINED`

- Moved `ZFS_AC_MODULE_SYMVERS` after `ZFS_AC_KERNEL_CONFIG_DEFINED`
  so the user has a chance to see the proper diagnostic from the
  steps before.

A workaround for Linux's

```
commit 3e3005df73b535cb849cf4ec8075d6aa3c460f68
Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Wed Mar 31 22:38:03 2021 +0900

kbuild: unify modules(_install) for in-tree and external modules

If you attempt to build or install modules ('make modules(_install)'
with CONFIG_MODULES disabled, you will get a clear error message, but
nothing for external module builds.

Factor out the modules and modules_install rules into the common part,
so you will get the same error message when you try to build external
modules with CONFIG_MODULES=n.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #10832
Closes #13361
2022-04-26 10:47:09 -07:00
Alexander Motin 600a02b884 Improve log spacemap load time
Previous flushing algorithm limited only total number of log blocks to
the minimum of 256K and 4x number of metaslabs in the pool.  As result,
system with 1500 disks with 1000 metaslabs each, touching several new
metaslabs each TXG could grow spacemap log to huge size without much
benefits.  We've observed one of such systems importing pool for about
45 minutes.

This patch improves the situation from five sides:
 - By limiting maximum period for each metaslab to be flushed to 1000
TXGs, that effectively limits maximum number of per-TXG spacemap logs
to load to the same number.
 - By making flushing more smooth via accounting number of metaslabs
that were touched after the last flush and actually need another flush,
not just ms_unflushed_txg bump.
 - By applying zfs_unflushed_log_block_pct to the number of metaslabs
that were touched after the last flush, not all metaslabs in the pool.
 - By aggressively prefetching per-TXG spacemap logs up to 16 TXGs in
advance, making log spacemap load process for wide HDD pool CPU-bound,
accelerating it by many times.
 - By reducing zfs_unflushed_log_block_max from 256K to 128K, reducing
single-threaded by nature log processing time from ~10 to ~5 minutes.

As further optimization we could skip bumping ms_unflushed_txg for
metaslabs not touched since the last flush, but that would be an
incompatible change, requiring new pool feature.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12789
2022-04-26 10:44:21 -07:00
George Amanakis 0409d33273 Improve zpool status output, list all affected datasets
Currently, determining which datasets are affected by corruption is
a manual process.

The primary difficulty in reporting the list of affected snapshots is
that since the error was initially found, the snapshot where the error
originally occurred in, may have been deleted. To solve this issue, we
add the ID of the head dataset of the original snapshot which the error
was detected in, to the stored error report. Then any time a filesystem
is deleted, the errors associated with it are deleted as well. Any time
a clone promote occurs, we modify reports associated with the original
head to refer to the new head. The stored error reports are identified
by this head ID, the birth time of the block which the error occurred
in, as well as some information about the error itself are also stored.

Once this information is stored, we can find the set of datasets
affected by an error by walking back the list of snapshots in the given
head until we find one with the appropriate birth txg, and then traverse
through the snapshots of the clone family, terminating a branch if the
block was replaced in a given snapshot. Then we report this information
back to libzfs, and to the zpool status command, where it is displayed
as follows:

 pool: test
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:00:00 with 800 errors on Fri Dec  3
08:27:57 2021
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          sdb       ONLINE       0     0 1.58K

errors: Permanent errors have been detected in the following files:

        test@1:/test.0.0
        /test/test.0.0
        /test/1clone/test.0.0

A new feature flag is introduced to mark the presence of this change, as
well as promotion and backwards compatibility logic. This is an updated
version of #9175. Rebase required fixing the tests, updating the ABI of
libzfs, updating the man pages, fixing bugs, fixing the error returns,
and updating the old on-disk error logs to the new format when
activating the feature.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Co-authored-by: TulsiJain <tulsi.jain@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #9175
Closes #12812
2022-04-25 17:25:42 -07:00
наб 95146fd7ba tests: cli_user: zfs_001_neg: print the problematic lines
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13352
2022-04-25 12:49:56 -07:00
наб 6625262c70 man: zfs-send.8: fix -X synopses and description
Also clean up the horrendously verbose -X handling in zfs_main()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13352
2022-04-25 12:46:22 -07:00
Richard Laager 7eba3891e9 zvol_wait: Ignore locked zvols
"When an encrypted zvol is locked the zfs-volume-wait service does not
start.  The /sbin/zvol_wait should not wait for links when the volume
has property keystatus=unavailable."
-- https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1888405

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Thanks: James Dingwall <james-launchpad@dingwall.me.uk>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10662
2022-04-22 15:37:03 -07:00
наб 0cdda2edb3 Linux 5.18 compat: kobj_type.default_attrs replaced with default_groups
Upstream-commit: cdb4f26a63c391317e335e6e683a614358e70aeb ("kobject:
 kobj_type: remove default_attrs")
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13357
2022-04-22 14:27:10 -07:00
наб 520a7c8a85 linux: module: zfs: sysfs: constify types and attrs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13357
2022-04-22 14:27:02 -07:00
наб c8871ff784 scripts: zfs.sh: explicitly unload all modules via rmmod
modprobe -r only works for depmodded modules, but this also means we
have to re-iterate legacy modules, and in the right order

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13356
2022-04-21 16:33:12 -07:00
наб bee24d6929 scripts: zfs.sh: explicitly ignore unloaded modules when unloading
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13356
2022-04-21 16:32:28 -07:00
Damian Szuberski 2f31b585e7 Strengthen Linux kernel capabilities detection
- Add `CONFIG_BLOCK` Linux config requirement to
  `ZFS_AC_KERNEL_CONFIG_DEFINED`. OpenZFS won't compile without
  that block device support due to large amount of functional
  dependencies on it.

- Remove dependency on `groups_alloc()` in
  `ZFS_AC_KERNEL_SRC_GROUP_INFO_GID` to circumvent the missing stub
  in Linux 4.X kernel headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13351
2022-04-21 09:37:11 -07:00
наб 47a02e3972 contrib: dracut: remove getargbool polyfill
It was originally released in dracut 008 in February 2011;
we can probably drop it now

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:45:54 -07:00
наб e3fc330d6c Add dracut.zfs.7
Thorough documentation with a dracut.bootup(7)-style flowchart,
dracut.cmdline(7)-style cmdline listing,
and per-file docs like the old README

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:45:47 -07:00
наб 1cc9cc2f89 contrib: dracut: zfs-needshutdown: don't list
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:45:39 -07:00
наб 6ebdb0b20d contrib: dracut: zfs-{rollback,snapshot}-bootfs: order after key loading
This fixes at least one race I got with an encrypted root

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:45:32 -07:00
наб 30c6dce7f7 contrib: dracut: don't require essentials to be under the same encroot
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:45:25 -07:00
наб eaf1e06045 contrib: dracut: inline single-use import_pool, move single-use ask_for_password
Also don't set ROOTFS_MOUNTED; the final mention was removed in dracut
011 from July 2011

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:45:18 -07:00
наб dac0b0785a contrib: dracut: zfs-lib: remove find_bootfs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:45:11 -07:00
наб 5d31169d7c contrib: dracut: zfs-lib: simplify ask_for_password
The only user is mount-zfs.sh (non-systemd systems),
so reduce it to what it needs

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:45:04 -07:00
наб fec2c613a4 contrib; dracut: flatten zfs-load-key, simplify zfs-env-bootfs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:44:55 -07:00
наб 245529d85f contrib; dracut: centralise root= parsing, actually support root=s
So far, everything parsed root= manually, which meant that while
zfs-parse.sh was updated, and supposedly supported + -> ' ' conversion,
it meant nothing

Instead, centralise parsing, and allow:
  root=
  root=zfs
  root=zfs:
  root=zfs:AUTO

  root=ZFS=data/set
  root=zfs:data/set
  root=zfs:ZFS=data/set (as a side-effect; allowed but undocumented)

  rootfstype=zfs AND root=data/set <=> root=data/set
  rootfstype=zfs AND root=         <=> root=zfs:AUTO

So rootfstype=zfs /also/ behaves as expected, and + decoding works

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:44:47 -07:00
наб 2c74617bcf contrib: dracut: parse-zfs: stop pretending we support FILESYSTEM=
It was added in the original ae26d0465a ("Add dracut support") commit
in 2011, and was then broken a bit later with the advent of
dracut-zfs-generator, or maybe earlier as part of other churn

Either way, it's broken, and has been in 2.0+ as well, and no-one
complained. Stop pretending we support it at all

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:44:37 -07:00
наб 47636f5661 contrib: dracut: parse-zfs: drop initqueue-finished for i/f
The switch was released in dracut 009 in March 2011,
we can safely get rid of the compatibility hook

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:44:25 -07:00
Rich Ercolani b657f2c592 Corrected oversight in ZERO_RANGE behavior
It turns out, no, in fact, ZERO_RANGE and PUNCH_HOLE do
have differing semantics in some ways - in particular,
one requires KEEP_SIZE, and the other does not.

Also added a zero-range test to catch this, corrected a flaw
that made the punch-hole test succeed vacuously, and a typo
in file_write.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13329 
Closes #13338
2022-04-20 16:07:03 -07:00
Alexander Motin 9209ea69bc FreeBSD: Fix translation from ABD to physical pages
In hypothetical case of non-linear ABD with single segment, multiple
to page size but not aligned to it, vdev_geom_fill_unmap_cb() could
fill one page less into bio_ma array.

I am not sure it is exploitable, but better to be safe than sorry.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reported-by: Mark Johnston <markj@FreeBSD.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #13345
2022-04-20 16:05:38 -07:00
наб e37e7dd6a6 man: ... -> … again
zfs-program.8 is left, but that's literal Lua syntax

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13255
2022-04-20 14:31:10 -07:00
Damian Szuberski b1fb4e1ba4 rpm -> deb doesn't fail when optional packages are missing
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13331
Closes #13336
2022-04-20 13:43:42 -07:00
наб 92295af800 Document zfs inherit -S's interaction with noninheritable properties
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11894
Closes #13335
2022-04-20 13:39:11 -07:00
Low-power 115a92ca6a zpool_history_unpack: return correct errno on nvlist_unpack failure
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #13321
2022-04-20 13:35:44 -07:00
наб 9b80d9e6f9 linux: module: uninstall legacy modules on (un)installation
This can be reverted once we're sure nobody's using them anymore
(post-3.0 release?)

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13274
2022-04-20 13:28:54 -07:00
наб 469b8481aa scripts: zfs.sh: unload zfs with dependencies
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13274
2022-04-20 13:28:48 -07:00
наб d18da59d30 scripts: zfs.sh: make usage make sense
We don't pass the arguments as arguments

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13274
2022-04-20 13:28:39 -07:00
наб f6f505c4d6 scripts: zfs.sh: remove cat
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13274
2022-04-20 13:28:28 -07:00
наб ad9e767657 linux: module: weld all but spl.ko into zfs.ko
Originally it was thought it would be useful to split up the kmods
by functionality.  This would allow external consumers to only load
what was needed.  However, in practice we've never had a case where
this functionality would be needed, and conversely managing multiple
kmods can be awkward.  Therefore, this change merges all but the
spl.ko kmod in to a single zfs.ko kmod.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13274
2022-04-20 13:28:24 -07:00
Allan Jude 310ab9d261 Improve the inline descriptions of the ARC module parameters
These are displayed as the descriptions of the sysctl's on FreeBSD

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #13334
2022-04-20 13:16:25 -07:00
Paul Dagnelie 4d46272913 Fix style checking error introduced by zfs-mount changes
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #13347
2022-04-19 13:55:04 -07:00
Brian Behlendorf 026f126b83 Linux 5.17 compat: GENHD_FL_EXT_DEVT / GENHD_FL_NO_PART_SCAN
As of the 5.17 kernel the GENHD_FL_EXT_DEVT flag has been removed
and the GENHD_FL_NO_PART_SCAN flag renamed GENHD_FL_NO_PART. Update
zvol_alloc() to set GENHD_FL_NO_PART for the newer kernels which
is sufficient.  The behavior for prior kernels remains unchanged.

1ebe2e5f ("block: remove GENHD_FL_EXT_DEVT")
46e7eac6 ("block: rename GENHD_FL_NO_PART_SCAN to GENHD_FL_NO_PART")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13294
Closes #13297
2022-04-19 10:38:04 -07:00
omni ec4d860eb0 init.d/zfs-mount: Don't fsck or mount/umount fstab entries
This is better handled by existing OS toolset.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: omni <omni+vagant@hack.org>
Issue #7374
Closes #12780
2022-04-15 14:26:44 -07:00
Low-power 4dced31b98 Fix 'zpool history' sometimes fails without reporting any error
The corresponding function 'zpool_get_history' in libzfs would printing
an error messages only when the ioctl call failed.

Add missing error reporting, specifically memory allocation failures
and error from 'zpool_history_unpack'.

Also avoid possibly reading of uninitialized 'err' variable in case
the requested offset pasts EOF.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: WHR <msl0000023508@gmail.com>
Issue #13322 
Closes #13320
2022-04-15 14:16:07 -07:00
наб 0dd34a1955 libzfs: import: zpool_clear_label: bool for boolean status
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13304
2022-04-13 11:37:18 -07:00
наб 74e4bfbcff libzfs: import: zpool_clear_label: don't allocate another time for L2ARC header
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13304
2022-04-13 11:37:10 -07:00
наб a4e0cee178 libzfs: import: zpool_clear_label: actually fail if clearing l2arc header fails
Found with -Wunused-but-set-variable on Clang trunk

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13304
2022-04-13 11:37:03 -07:00
наб 16c3290bbe module: zfs: vdev_removal: remove unused num_indirect
Found with -Wunused-but-set-variable on Clang trunk

Fixes: a1d477c24c ("OpenZFS 7614, 9064 - zfs device evacuation/removal")
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13304
2022-04-13 11:36:47 -07:00
наб 63cb3413ea tests: cmd: draid: remove unused and undocumented -v
Found with -Wunused-but-set-variable on Clang trunk

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13304
2022-04-13 11:36:29 -07:00
szubersk 4620342a32 zfs(8): remove errors reported by mandoc 1.14.6
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13315
2022-04-13 11:34:46 -07:00
szubersk 4f0be2bdb0 zfs(8): add requirements towards fs names
Provide explicit requirements towards file system naming convention
in OpenZFS man pages.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Mitigates #13310
Closes #13315
2022-04-13 11:34:28 -07:00
Brian Behlendorf 3d149db1f3 ZTS: Retry auto_spare_multiple.ksh
The auto_spare_multiple.ksh test may incorrectly fail for a similar
reason as the auto_spare_shared.ksh test.  Add it to known list of
exceptions which should be retried to prevent failures in the CI.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13318
2022-04-13 09:44:54 -07:00
Brian Behlendorf 5846f71182 ZTS: Retry redundancy_draid_spare[1,3].ksh
The redundancy_draid_spare1.ksh and redundancy_draid_spare3.ksh test
cases are a little to strict for the sequential resilver case.  While
unlikely it is possible that a handful of correctable checksum errors
will be reported resulting in a test failure.  Update the zts-report.py
script to allow this the test case to be retried if requested.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13318
2022-04-13 09:44:33 -07:00
Mark Johnston 7dcb8ed23d FreeBSD: Return Mach error codes from VOP_(GET|PUT)PAGES
FreeBSD's memory management system uses its own error numbers and gets
confused when these VOPs return EIO.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #13311
2022-04-13 09:43:15 -07:00
Mark Johnston e9084d0712 FreeBSD: Parameterize ZFS_ENTER/ZFS_VERIFY_VP with an error code
For legacy reasons, a couple of VOPs have to return error numbers that
don't come from the usual errno namespace.  To handle the cases where
ZFS_ENTER or ZFS_VERIFY_ZP fail, we need to be able to override the
default error return value of EIO.  Extend the macros to permit this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #13311
2022-04-13 09:42:51 -07:00
Damian Szuberski 35d81a75a8 initramfs: use mount.zfs instead of mount
A followup to d7a67402a8

For `mount -t zfs -o opts ds mp` command line
some implementations of `mount(8)`, e. g. Busybox in Debian
work as follows:

```
newfstatat(AT_FDCWD, "ds", 0x7fff826f4ab0, 0) = -1
mount("ds", "mp", "zfs", MS_SILENT, NULL) = 0
```

The logic above skips completely `mount.zfs` and prevents us
from reading filesystem properties and applying mount options.

For comparison, the coreutils `mount(8)` implementation does:

```
openat(AT_FDCWD, "/proc/filesystems", O_RDONLY|O_CLOEXEC) = 3
// figure out that zfs is a `nodev` filesystem and look for a helper
newfstatat(AT_FDCWD, "/sbin/mount.zfs" ...) = 0
execve("/sbin/mount.zfs" ...) = 0
```

Using `mount.zfs` in initramfs would help circumvent deficiencies
of some of `mount(8)` implementations. `mount -t zfs` translates
to `mount.zfs` invocation, except for cases when explicitly disabled
by `-i`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13305
2022-04-11 15:51:23 -07:00
наб ed715283de cmd: zed: rc: drop "should be owned by root and 0600"
It doesn't matter, 0600 are Weird Permissions, and it's even weirder to
spec them for no reason ‒ it's perfectly fine if it's the usual 0:0 644,
or literally anything else, so long as unprivileged users can't edit it
(which (a) 644 accomplishes and (b) is at the administrator's
 discretion, it's not unheard of to have adm users and having it
 be 664 in that case is just as good; it's not our place to say)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12544
Closes #13276
2022-04-05 13:50:55 -07:00
Jorgen Lundman 4d972ab5ae Prefer ATTR_ in shared codebase over AT_
An earlier commit introduces AT_MODE into the shared kernel sources,
instead of the preferred existing ATTR_MODE use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Jorgen lundman <lundman@lundman.net>
Closes #13293
2022-04-05 13:02:17 -07:00
наб e3e4c30b0a libzfs: sendrecv: use common progress thread killer
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13284
2022-04-05 09:46:07 -07:00
наб 1217fd1ff8 libzfs: sendrecv: always cancel progress thread in zfs_send_one()
This is in line with all the other uses of the progress thread

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11560
Closes #13284
2022-04-05 09:45:55 -07:00
наб 3c69b539fe Forbid asctime{,_r}(), gmtime(), localtime()
ctime() is only used in binary main threads, which is fine

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13284
2022-04-05 09:45:46 -07:00
наб 9185303b32 libspl: zed: event: use localtime_r()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13284
2022-04-05 09:45:39 -07:00
наб ae3683ce11 libspl: print_timestamp: use localtime_r()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13284
2022-04-05 09:45:33 -07:00
наб 8ab279484f libzfs: sendrecv: send_progress_thread: use localtime_r()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13284
2022-04-05 09:45:24 -07:00
наб d5036491e6 zdb: standardise on ctime()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13284
2022-04-05 09:45:05 -07:00
наб 95babbe573 scripts: cstyle: remove unused -h
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13285
2022-04-05 09:36:13 -07:00
наб 14ed1f2424 cstyle.1: note -g, $CI being equivalent to -g
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13285
2022-04-05 09:36:06 -07:00
наб 93f3a3517c scripts: cstyle: remove unused -C
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13285
2022-04-05 09:35:46 -07:00
наб 9292cf761e Fix string/index variables being unflexed by default
I got the status backward (B_FALSE for fixed, rather than B_TRUE for
flex); before:

  $ zfs get mountpoint tarta-zoot -r
  NAME                                 PROPERTY    VALUE       SOURCE
  tarta-zoot                           mountpoint  /           local
  tarta-zoot/PAGEFILE.SYS              mountpoint  -           -
  tarta-zoot/etc                       mountpoint  /etc        inherited from tarta-zoot
  tarta-zoot/home                      mountpoint  /home       inherited from tarta-zoot
  tarta-zoot/home/xspon                mountpoint  /home/xspon  inherited from tarta-zoot
  tarta-zoot/home/nabijaczleweli       mountpoint  /home/nabijaczleweli  inherited from tarta-zoot
  tarta-zoot/home/nabijaczleweli/tftp  mountpoint  /home/nabijaczleweli/tftp  inherited from tarta-zoot
  tarta-zoot/home/root                 mountpoint  /root       local

after:

  $ zfs get mountpoint tarta-zoot -r
  NAME                                 PROPERTY    VALUE                      SOURCE
  tarta-zoot                           mountpoint  /                          local
  tarta-zoot/PAGEFILE.SYS              mountpoint  -                          -
  tarta-zoot/etc                       mountpoint  /etc                       inherited from tarta-zoot
  tarta-zoot/home                      mountpoint  /home                      inherited from tarta-zoot
  tarta-zoot/home/xspon                mountpoint  /home/xspon                inherited from tarta-zoot
  tarta-zoot/home/nabijaczleweli       mountpoint  /home/nabijaczleweli       inherited from tarta-zoot
  tarta-zoot/home/nabijaczleweli/tftp  mountpoint  /home/nabijaczleweli/tftp  inherited from tarta-zoot
  tarta-zoot/home/root                 mountpoint  /root                      local

Fixes: be8e1d81bf ("Flex
 non-pretty-printed properties and raw-/pretty-print remaining ones")
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13286
2022-04-05 09:34:46 -07:00
Riccardo Schirone 7d524c068d Linux 5.18 compat: use address_space_operations->readahead
->readpages was removed and replaced by ->readahead. Define
zpl_readahead for kernels that don't have ->readpages.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Riccardo Schirone <rschirone91@gmail.com>
Closes #13278
2022-04-04 09:35:24 -07:00
Riccardo Schirone 036e846abc Linux 5.18 compat: blkg_tryget is moved to private headers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Riccardo Schirone <rschirone91@gmail.com>
Closes #13278
2022-04-04 09:35:11 -07:00
Ryan Moeller b61507ec1d FreeBSD: Use NDFREE_PNBUF if available
NDF_ONLY_PNBUF has been removed from FreeBSD in favor of NDFREE_PNBUF.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13277
2022-04-02 12:10:55 -07:00
наб 533aef2b3c tests: zts-report: add #13215 to known list for FreeBSD
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:03:32 -07:00
наб e78b7a73fe tests: zts-report: issue numbers are numbers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:03:27 -07:00
наб 6ef2151c80 tests: clean out more temporary files
What remains is a bunch of anonymous untraceable /tmp/tmp.XXXXXXXXXX
files and bak.root.receive.staff1.3835 from an error branch, testdir.1,
testdir.3, and testroot454470 (with children) in testroot

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:03:21 -07:00
наб 22ddc7f5ff scripts: zfs-tests: fix setup script detection when ran with -t /absolute
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:03:14 -07:00
наб fbe811b054 tests: remove unused functions
As found by
  git -C tests/ grep ^function | grep -vFe '.lua:' -e '.zcp:' | while IFS=":$IFS" read -r _ _ fn _; do [ $(git -C tests/ grep -wF $fn | head -2 | wc -l) -eq 1 ] && echo $fn; done
after all rounds this comes out to, sorted:
  check_slog_state
  chgusr_exec
  cksum_files
  cleanup_pools
  compare_modes
  count_ACE
  dataset_set_defaultproperties
  ds_is_snapshot
  get_ACE
  get_group
  get_min
  get_mode
  get_owner
  get_rand_checksum
  get_rand_checksum_any
  get_rand_large_recsize
  get_rand_recsize
  get_user_group
  getitem
  indirect_vdev_mapping_size
  is_dilos
  log_noresult
  log_notinuse
  log_other
  log_timed_out
  log_uninitiated
  log_warning
  num_jobs_by_cpu
  plus_sign_check_l
  plus_sign_check_v
  record_cksum
  rwx_node
  seconds_mmp_waits_for_activity
  set_cur_usr
  setup_mirrors
  setup_raidzs
  showshares_smb
  zfs_zones_setup

This, of course, doesn't catch recursive ones, or ones that log with
their own function name as a prefix, but

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:03:05 -07:00
наб 5c9f744b1a tests: include: use already-set $UNAME instead of shelling out to uname each time
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:59 -07:00
наб ff0fc5af12 tests: pam: use absolute path to module .so
This is a valid configuration and both (a) skips the tests if it's
unbuilt/not installed and (b) makes it work even if installed outside
the system directory (like in /u/l/l/s instead of /l/s)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:54 -07:00
наб 7b6c4cbb1f tests: zts-report: simplify
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:48 -07:00
наб abccb4b584 tests: zts-report: prune unused reasons
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:43 -07:00
наб 6b2616435b tests: zts-report: only known-SKIP zfs_unshare_002_pos on Linux
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:36 -07:00
наб 56692e77e2 linux: libshare: smb: fix more than one smb_is_share_active() call
This also fixes zfs_unshare_006_pos, which exposed this

Fixes: 2f71caf2d9 ("Allow zfs unshare
 <protocol> -a")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:24 -07:00
наб 34abca3e2c tests: zfs_003_neg: handle failures correctly
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:19 -07:00
наб 61f1502246 tests: zfs_share_005: don't fail open
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:14 -07:00
наб 7b875ee601 module: zstd: zfs_zstd: staticify zstd_ksp
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:09 -07:00
наб b1b5843ae4 tests: lua_core: use herewords for single-line programs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:03 -07:00
наб 598fed7ecd tests: revert back to original coredump patterns on Linux, too
Otherwise, they leak past the tests and contaminate the running system,
breaking coredumps entirely

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Yannick Le Pennec <yannick.lepennec@live.fr>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:01:51 -07:00
наб 912d2aa7d7 tests: zdb_args_pos.ksh: fix indentation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:01:46 -07:00
наб 20093de25c tests: move C test helpers into test cmd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:01:39 -07:00
наб 1948f6dbbf tests: echo-with-arguments review
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:01:31 -07:00
наб b1579689a9 tests: rsend/send-c_props: make it chooch
Original error:
23:47:40.59 SUCCESS: eval zfs receive -dFv testpool2 < /mnt/testroot/backdir-rsend/pool-final-p
23:47:40.61 1,23d0
23:47:40.61 < type      filesystem      -
23:47:40.61 < origin    POOL@psnap      -
23:47:40.61 < volblocksize      -       -
23:47:40.61 < acltype   nfsv4   inherited from POOL
23:47:40.61 < dnodesize legacy  inherited from POOL
23:47:40.61 < atime     off     local
23:47:40.61 < canmount  off     local
23:47:40.61 < checksum  off     local
23:47:40.61 < compression       off     local
23:47:40.61 < copies    3       local
23:47:40.61 < devices   off     local
23:47:40.61 < exec      off     local
23:47:40.61 < quota     none    default
23:47:40.61 < readonly  on      local
23:47:40.61 < recordsize        128K    local
23:47:40.61 < reservation       none    default
23:47:40.61 < setuid    off     local
23:47:40.61 < snapdir   hidden  local
23:47:40.61 < version   5       -
23:47:40.61 < volsize   -       -
23:47:40.61 < xattr     off     local
23:47:40.61 < mountpoint        /PREFIX inherited from POOL
23:47:40.61 < jailed    on      local
23:47:40.62 cannot open 'testpool2/pclone': dataset does not exist
23:47:40.62 ERROR: cmp_ds_prop testpool/pclone testpool2/pclone exited 1

So: (a) actually send all the datasets in -p mode and
    (b) drop origin for clones sent with -p:

00:38:05.46 SUCCESS: eval zfs receive -dFv testpool2 < /mnt/testroot/backdir-rsend/pool-final-p
00:38:05.48 2c2
00:38:05.48 < origin    POOL@psnap
00:38:05.48 ---
00:38:05.48 > origin    POOL
00:38:05.49 ERROR: cmp_ds_prop testpool/pclone testpool2/pclone nosource exited 1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13250
Closes #13259
2022-04-01 18:01:16 -07:00
наб a5addbbb7d tests: rsend.kshlib: cmp_ds_prop: anonymise "inherited from" sourcces
Fixes rsend_012_pos:
00:04:59.68 SUCCESS: cmp_ds_prop testpool/testfs/vol testpool2/testfs/vol nosource
00:04:59.70 4c4
00:04:59.70 < acltype   posix   inherited from testpool
00:04:59.70 ---
00:04:59.70 > acltype   posix   inherited from testpool2
00:04:59.70 11,12c11,12
00:04:59.70 < devices   off     inherited from testpool
00:04:59.70 < exec      on      inherited from testpool
00:04:59.70 ---
00:04:59.70 > devices   off     inherited from testpool2
00:04:59.70 > exec      on      inherited from testpool2
00:04:59.70 17c17
00:04:59.70 < setuid    off     inherited from testpool
00:04:59.70 ---
00:04:59.70 > setuid    off     inherited from testpool2
00:04:59.70 ERROR: cmp_ds_prop testpool@final testpool2@final exited 1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13250
Closes #13259
2022-04-01 18:00:59 -07:00
наб b2c5291b7e tests: prune cat (ab)uses
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:00:52 -07:00
наб f7cc8dddf7 tests: rsend_012_pos: backup/restore in one invocation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:00:44 -07:00
наб 9da14f3981 tests: rsend.kshlib: cmp_ds_prop: allow skipping source
This fixes rsend_012_pos:
20:28:50.50 SUCCESS: eval zfs receive -d -F testpool2 < /mnt/testroot/backdir-rsend/pool-final-R
20:28:50.53 4,6c4,6
20:28:50.53 < acltype   off     local
20:28:50.53 < dnodesize 4k      local
20:28:50.53 < atime     off     local
20:28:50.53 ---
20:28:50.53 > acltype   off     received
20:28:50.53 > dnodesize 4k      received
20:28:50.53 > atime     off     received
20:28:50.53 8,13c8,13
20:28:50.53 < checksum  sha256  local
20:28:50.53 < compression       off     local
20:28:50.53 < copies    2       local
20:28:50.53 < devices   on      local
20:28:50.53 < exec      on      local
20:28:50.53 < quota     1G      local
20:28:50.53 ---
20:28:50.53 > checksum  sha256  received
20:28:50.53 > compression       off     received
20:28:50.53 > copies    2       received
20:28:50.53 > devices   on      received
20:28:50.53 > exec      on      received
20:28:50.53 > quota     1G      received
20:28:50.53 15c15
20:28:50.53 < recordsize        128K    local
20:28:50.53 ---
20:28:50.53 > recordsize        128K    received
20:28:50.53 17,18c17,18
20:28:50.53 < setuid    off     local
20:28:50.53 < snapdir   visible local
20:28:50.53 ---
20:28:50.53 > setuid    off     received
20:28:50.53 > snapdir   visible received
20:28:50.53 ERROR: cmp_ds_prop testpool testpool2 exited 1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13250
Closes #13259
2022-04-01 18:00:26 -07:00
наб 91933eb977 tests: rsend.kshlib: cmp_ds_prop: mask out differences in origin pool
This fixes rsend_011_pos:
20:28:26.94 SUCCESS: cmp_ds_prop testpool/testfs/fs1/fs2 testpool2/testfs/fs1/fs2
20:28:26.96 2c2
20:28:26.96 < origin    testpool@psnap  -
20:28:26.96 ---
20:28:26.96 > origin    testpool2@psnap -
20:28:26.97 ERROR: cmp_ds_prop testpool/pclone testpool2/pclone exited 1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13250
Closes #13259
2022-04-01 17:59:41 -07:00
наб 964d41806f tests: review all wc(1) invocations
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:59:35 -07:00
наб 23914a3b91 tests: review every instance of $?
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:59:30 -07:00
наб 6586085673 tests: include: math: simplify bc conditions, review $?
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:59:24 -07:00
наб caccfc870f tests: clean out unused/single-use/useless commands from the list
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:59:18 -07:00
наб df6e0f0092 zed: functions: zed_log_err: forward to zed_log_msg
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:59:12 -07:00
наб be866839f0 zed: lets: all-debug: printenv -> env
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:59:06 -07:00
наб ee798fee45 scripts: zfs-tests.sh: cleanup, optimise
The single-stage losetup doesn't work on busybox,
but then most of the testsuite doesn't

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:58:59 -07:00
наб 0990228ab8 tests: mixed_create_failure: explicitly note the error
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue: #13215
Closes #13259
2022-04-01 17:58:44 -07:00
наб 50a33b8c69 tests: logapi: don't cat excessively
This also fixes line welding in test error output

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:58:37 -07:00
наб e294acdba8 tests: cmd: don't recurse
This confers an >10x speedup on t/z-t/cmd builds (12s -> 1.1s),
gets rid of 23 redundant identical automake specs and gitignores,
and groups the binaries with their common headers

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:58:23 -07:00
наб caeab993ec tests: {read,write}_dos_attributes: despaghettify
Also: actually accept all the flags in write_d_a

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:58:17 -07:00
наб d30577c9dd fgrep -> grep -F
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:58:11 -07:00
наб f63c9dc70a egrep -> grep -E
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:58:07 -07:00
наб 270d0a5a16 tests: nawk -> awk
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:58:01 -07:00
наб 75746e9a40 tests: review every awk(1) invocation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:55 -07:00
наб 72f3516094 tests: zfs_rollback_commit: talkative failures
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:50 -07:00
наб d7976a073e tests: README: note non-default mountd requirement
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:45 -07:00
наб d5cc930b8f tests: README: note that -d *must* be 777 for the deleg tests
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:40 -07:00
наб 41ebf40375 tests: vdev_zaps: cleanup library
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:35 -07:00
наб 25aeffadb2 tests: vdev_zaps_007: actually test the new pool
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:30 -07:00
наб 592cf7f1e2 tests: get rid of which
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:25 -07:00
наб 62c5ccdf92 tests: don't >-redirect without eval
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:19 -07:00
наб 053dac9e7d tests: vdev_zaps_007: log_must with > must eval
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:14 -07:00
наб b9b763111c tests: redacted_send: explicitly assume instant /dev on non-Linux
The users spew udevadm ENOENTs on FreeBSD

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:08 -07:00
наб 9423c932d4 tests: replace sum(1) with cksum(1)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:03 -07:00
наб c2fcc55d97 tests: snapshot_00[15]_pos: use cksum instead of sum -r
This only worked by accident on FreeBSD, where sum doesn't take flags

POSIX guarantees us cksum (Ethernet CRC) ‒ use that instead

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:56:53 -07:00
наб 7aa7e6bd0a tests: zfs_create_nomount: undefined local -> typeset
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:56:45 -07:00
наб f806d67846 tests: mixed_create_failure: undefined log_err -> log_fail
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #13215
Closes #13259
2022-04-01 17:55:49 -07:00
наб 2d9da5e1c8 tests: don't fail if no fio or python3.sysctl
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:45 -07:00
наб bd328a588b tests: nonspecific cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:40 -07:00
наб 33c319eb1e tests: zfs_share_concurrent_shares: don't use log_musts in subprocesses
This thoroughly destroys logapi and races to the log files horribly

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:35 -07:00
наб 3886e7081a tests: zfs_unshare_006: log_unsupported iff usershares are actually off
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:30 -07:00
наб a3dcc0aa0c zfs, zpool: safe_malloc() duplicate argv
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:24 -07:00
наб 84a3eab6aa zfs: simplify usage_prop_cb values
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:19 -07:00
наб 5b0e75caef tests: don't use share/unshare exportfs aliases, support FreeBSD NFS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:14 -07:00
наб c22c872036 tests: don't always skip zfs_unshare tests on FreeBSD
Previously, they'd all be skipped on FreeBSD where share is showmount

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:07 -07:00
наб 261f10b717 tests: zfs_unshare_001_pos: print which filesystem failed
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:02 -07:00
наб bf228f3de0 tests: standardise on no-arg uname with *) case for illumos
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:54:52 -07:00
Andrew eebfd28e9d Linux optimize access checks when ACL is trivial
Bypass check of ZFS aces if the ACL is trivial. When an ACL is
trivial its permissions are represented by the mode without any
loss of information. In this case, it is safe to convert the
access request into equivalent mode and then pass desired mask
and inode to generic_permission(). This has the added benefit
of also checking whether entries in a POSIX ACL on the file grant
the desired access.

This commit also skips the ACL check on looking up the xattr dir
since such restrictions don't exist in Linux kernel and it makes
xattr lookup behavior inconsistent between SA and file-based
xattrs. We also don't want to perform a POSIX ACL check while
looking up the POSIX ACL if for some reason it is located in
the xattr dir rather than an SA.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #13237
2022-04-01 09:53:54 -07:00
Rich Ercolani 6a2dda8f05 Ask libtool to stop hiding some errors
For #13083, curiously, it did not print the actual error, just
that the compile failed with "Error 1".

In theory, this flag should cause it to report errors twice sometimes.
In practice, I'm pretty okay with reporting some twice if it avoids
reporting some never.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13086
2022-03-31 10:09:18 -07:00
hpingfs 4d04e41e4d zfs_ctldir: fix incorrect argument type of rw_destroy
The argument type of rw_destroy is (krwlock_t *) while currently
krwlock_t is passed in zfs_ctldir.c. This error is hidden because
rw_destroy is defined as ((void) 0) in linux. But anyway, this
mismatch should be fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ping Huang <huangping@smartx.com>
Closes #13272
2022-03-30 15:40:31 -07:00
hpingfs abdcef47d2 zvol_os: suppress compiler warning for zvol_open_timeout_ms
When HAVE_BLKDEV_GET_ERESTARTSYS is defined, compiler will complain
"defined but not used" warning for zvol_open_timeout_ms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ping Huang <huangping@smartx.com>
Closes #13270
2022-03-30 15:39:55 -07:00
наб 7dc782e5c5 cstyle: remove unused -o
Remove handling for allowing doxygen- and embedding in splint(?)-style
comments.  This functionality is unused by OpenZFS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13264
2022-03-30 15:37:28 -07:00
наб 3cfbeb4e90 zfs: main: don't NULL-check infallible safe_malloc()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13229
2022-03-30 15:30:51 -07:00
наб 18dbf5c8c3 libzfs: don't NULL-check infallible allocations
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13229
2022-03-30 15:30:16 -07:00
наб bc3f12bfac config: user: check for <aio.h>
And always zpool_read_label_slow() on non-conformant libcs

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13207
Closes #13254
2022-03-28 10:24:22 -07:00
наб b61595ff86 man: zfs-rename.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб 4d4fa5215d man: zfs-snapshot.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб bc2fe49a4f man: zfs-promote.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб d842ca3ff2 man: zfs-create.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб 3b887e485c man: zfs-destroy.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб 9bda775bd9 man: zfs-rollback.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб c6fd08797f man: zfs-clone.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб 52f3aebdb2 man: zfs-list.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб 2057b47b91 man: zfs-send.8, zfs-receive.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб b7b3f19935 man: zfs-diff.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб 1e164f89e5 man: zfs-bookmark.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб c522339fd7 man: zfs-set.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 575e20602c man: zfs-allow.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 69b06a77b6 man: zpool-iostat.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб b36069ab88 man: zpool-remove.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб d8dd89acd1 man: zpool-upgrade.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 8ccb3fd6b1 man: zpool-import.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 4849523af0 man: zpool-export.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 4b660ffe7b man: zpool-destroy.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 74181ca44c man: zpool-list.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб c34eb6f7a1 man: zpool-add.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 9cdb067539 man: zpool-create.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
Closes #13140
2022-03-28 10:13:13 -07:00
наб a2550c9df9 man: Examples: use subsections instead of lists
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 1ebe4c482d man: zpool.8: Examples: unmirrored -> non-redundant
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 6571873f1b man: zpool.8: Examples: use Pa for disks and scripts
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 7db823bd61 module: zfs: dsl_bookmark: silence false-positive maybe-uninitialised
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13247
Closes #13258
2022-03-28 10:03:13 -07:00
Jeremy Visser 063c83a853 zfs-dkms rpm: simplify scriptlets, fix uninstall
Two problems led to unexpected behaviour of the scriptlets:

1) Newer DKMS versions change the formatting of "dkms status":

   (old) zfs, 2.1.2, 5.14.10-300.fc35.x86_64, x86_64: installed
   (new) zfs/2.1.2, 5.14.10-300.fc35.x86_64, x86_64: installed

   Which broke a conditional determining whether to uninstall.

2) zfs_config.h not packaged properly, but was attempted to be read
   in the %preun scriptlet:

   CONFIG_H="/var/lib/dkms/zfs/2.1.2/*/*/zfs_config.h"

   Which broke the uninstallation of the module, which left behind a
   dangling symlink, which broke DKMS entirely with this error:

     Error! Could not locate dkms.conf file.
     File: /var/lib/dkms/zfs/2.1.1/source/dkms.conf does not exist.

This change attempts to simplify life by:

*  Avoiding parsing anything (less prone to future breakage)
*  Uses %posttrans instead of %post for module installation, because
   %post happens before %preun, while %posttrans happens afterwards
*  Unconditionally reinstall module on upgrade, which is less
   efficient but the trade-off is that it's more reliable

Alternative approaches could involve fixing the existing parsing bugs
or improving the logic, but this comes at the cost of complexity and
possible future bugs.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jeremy Visser <jeremyvisser@google.com>
Closes #10463
Closes #13182
2022-03-28 09:55:57 -07:00
наб 86690775b0 Linux 5.18 compat: replace genhd.h with blkdev.h includes
blkdev.h includes genhd.h since dawn of upstream git, so this is
globally safe

Upstream-commit: 322cbb50de711814c42fb088f6d31901502c711a ("block:
 remove genhd.h")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13251
2022-03-28 09:52:55 -07:00
наб d1325b4fa2 Linux 5.18 compat: 4-argument bio_alloc()
bio_alloc(gfp_t gfp_mask, unsigned short nr_iovecs)

became

  bio_alloc(struct block_device *bdev, unsigned short nr_vecs,
            unsigned int opf, gfp_t gfp_mask)
passing NULL/0 continues previous behaviour

Upstream-commit: 07888c665b405b1cd3577ddebfeb74f4717a84c4 ("block:
 pass a block_device and opf to bio_alloc")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13251
2022-03-28 09:51:59 -07:00
George Melikov 427fad7f24 CI: Log test name to /dev/kmsg in ZTS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes  #13249
2022-03-24 08:01:26 -06:00
наб 6322a77ce7 linux: libzutil: zfs_path_order: don't strdup ZPOOL_IMPORT_PATH
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13223
2022-03-23 08:55:38 -07:00
наб 8a2ed86001 libzutil: zfs_resolve_shortname: don't strdup() ZPOOL_IMPORT_PATH
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13223
2022-03-23 08:55:18 -07:00
George Melikov 018937884f zpoolconcepts.7: fix comma typo
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #13239
2022-03-23 08:53:19 -07:00
Brian Behlendorf 706ba5dc1d Linux 5.17 compat: META
Update the META file to reflect compatibility with the 5.17 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13243
2022-03-23 08:52:30 -07:00
Brian Behlendorf 460748d4ae Switch from _Noreturn to __attribute__((noreturn))
Parts of the Linux kernel build system struggle with _Noreturn.  This
results in the following warnings when building on RHEL 8.5, and likely
other environments.  Switch to using the __attribute__((noreturn)).

  warning: objtool: dbuf_free_range()+0x2b8:
    return with modified stack frame
  warning: objtool: dbuf_free_range()+0x0:
    stack state mismatch: cfa1=7+40 cfa2=7+8
  ...
  WARNING: EXPORT symbol "arc_buf_size" [zfs.ko] version generation
    failed, symbol will not be versioned.
  WARNING: EXPORT symbol "spa_open" [zfs.ko] version generation
    failed, symbol will not be versioned.
  ...

Additionally, __thread_exit() has been renamed spl_thread_exit() and
made a static inline function.  This was needed because the kernel
will generate a warning for symbols which are __attribute__((noreturn))
and then exported with EXPORT_SYMBOL.

While we could continue to use _Noreturn in user space I've also
switched it to __attribute__((noreturn)) purely for consistency
throughout the code base.

Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13238
2022-03-23 08:51:00 -07:00
Tony Hutter b73505c7e0 ZTS: Log test name to /dev/kmsg on Linux
Add a -K option to the test suite to log each test name to /dev/kmsg
(on Linux), so if there's a kernel warning we'll be able to match
it up to a particular test.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes  #13227
2022-03-23 09:15:02 -06:00
Brian Behlendorf 6b444cb971 Linux 5.16 compat: restore FSR and FSAVE
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored
when using a Linux 5.16 kernel.  These instructions are only used when
XSAVE is not supported by the processor meaning only some systems will
encounter this issue.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13210
Closes #13236
2022-03-19 12:46:33 -07:00
Sean Eric Fagan 565089f592 Allow zfs send to exclude datasets
Add support for a -exclude/-X option to `zfs send` to allow dataset 
hierarchies to be excluded.

Snapshots can be excluded using a channel program; however,
this can result in failures with 'zfs send -R'; this option allows 
them to be excluded.  Fortunately, this required a change only to 
cmd/zfs/zfs_main.c, using the already-existing callback argument 
to zfs_send() that is currently unused.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sean Eric Fagan <kithrup@mac.com>
Signed-off-by: Sean Eric Fagan <kithrup@mac.com>
Closes #13158
2022-03-18 17:02:12 -07:00
ColMelvin 3ce3d30532 RPM: Split out pam_zfs_key into separate package
Create a separate `pam_zfs_key` package for the PAM module components,
an optional addition to the deliverables, in much the same way as the
Python bindings are released as a separate `python#-pyzfs` package.

This makes it clear when the PAM module is shipped with the package,
since it's now in its own package.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com>
Closes: #13026
2022-03-18 16:54:50 -07:00
наб 045aeabce6 module: zfs: arc: hdr_full_crypt_dest: drop unevaulated-only variable
This explodes as -Wunused-variable on GCC 8.5.0, despite it being used,
just not in an evaluated context

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13195
2022-03-18 16:53:05 -07:00
наб 4629dfe963 config: always-arch: don't subst TARGET_CPU
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13225
2022-03-18 16:50:52 -07:00
наб bd0955cd8e config: always-arch: prune unused TARGET_CPU_*, add TARGET_CPU catch-all
This fixes (harmless) error spew from configuring on, e.g., armv6l

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13193
Closes #13225
2022-03-18 16:49:08 -07:00
Tony Hutter 515710a1e1 zed: Fix mpath autoreplace on Centos 7
A prior commit included a udev check for MPATH_DEVICE_READY to
determine if a path was multipath when doing an autoreplace:

    f2f6c18 zed: Misc multipath autoreplace fixes

However, MPATH_DEVICE_READY is not provided by the older version of
udev that's on Centos 7 (it is on Centos 8).

This patch instead looks for 'mpath-' in the UUID, which works on
both Centos 7 and 8.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13222
2022-03-18 14:06:40 -07:00
Ryan Moeller d42979c6ef Fix ACL checks for NFS kernel server
This PR changes ZFS ACL checks to evaluate
fsuid / fsgid rather than euid / egid to avoid
accidentally granting elevated permissions to
NFS clients.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Andrew Walker <awalker@ixsystems.com>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13221
2022-03-18 06:47:57 -06:00
Mateusz Guzik a5920d24c0 FreeBSD: add missing replay check to an assert in zfs_xvattr_set
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13219
2022-03-17 10:30:10 -07:00
Kyle Evans bee314a798 module: freebsd: avoid a taking a destroyed lock in zfs_zevent bits
At shutdown time, we drain all of the zevents and set the
ZEVENT_SHUTDOWN flag.  On FreeBSD, we may end up calling
zfs_zevent_destroy() after the zevent_lock has been destroyed while
the sysevent thread is winding down; we observe ESHUTDOWN, then back
out.

Events have already been drained, so just inline the kmem_free call in
sysevent_worker() to avoid the race, and document the assumption that
zfs_zevent_destroy doesn't do anything else useful at that point.

This fixes a panic that can occur at module unload time.

Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #13220
2022-03-17 10:14:00 -07:00
наб 6ef00196db module: zstd: check we don't leak symbols; regenerate symbol map
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12988 
Closes #13209
2022-03-15 16:10:10 -07:00
наб 1c41d8941c tests: validate getsubopt(3) expulsion
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:50 -07:00
наб 77cdc63d09 zfs-list.8: mention -S after -s it calls back to
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:46 -07:00
наб 40f09cb0f4 zfs: list: only accept whole type for -t, not tp[=whatever]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:42 -07:00
наб 539d16c35e libzfs: tokenise consistently with zfs and zpool
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:38 -07:00
наб 4695230021 zfs: get: only accept whole type for -t, not tp[=whatever]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:33 -07:00
наб 7867f430b4 zfs: get: only accept whole source for -s, not src[=whatever]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:29 -07:00
наб 7c17e82cbe zfs: get: only accept whole column for -o, not col[=whatever]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:23 -07:00
наб c79787845d zpool: get: there's one fewer column than in zfs get
The current code allows -o name,property,value,source,name

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:18 -07:00
наб 15aca3ad59 zfs: wait: only accept whole activity for -t, not act[=whatever]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:14 -07:00
наб 66cd170db3 zpool: get: only accept whole columns for -o, not col[=whatever]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:06 -07:00
наб 675508f608 zpool: wait: only accept whole columns for -t, not col[=whatever]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:02 -07:00
наб c21819026f Replace FMD_B_{TRUE,FALSE} with B_{TRUE,FALSE}
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:58 -07:00
наб a9e2b22efb Integrate carcass of libspl/i/s/vtoc.h into i/s/efi_partition.h
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:54 -07:00
наб d465fc5844 Forbid b{copy,zero,cmp}(). Don't include <strings.h> for <string.h>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:48 -07:00
наб 861166b027 Remove bcopy(), bzero(), bcmp()
bcopy() has a confusing argument order and is actually a move, not a
copy; they're all deprecated since POSIX.1-2001 and removed in -2008,
and we shim them out to mem*() on Linux anyway

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:42 -07:00
наб 1d77d62f5a libspl: include: sys/vtoc.h: reduce to absolute barest minimum
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:36 -07:00
наб 26bbce8173 tests: replace explicit $? || log_fail with log_must
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:33 -07:00
наб 85c2cce51c tests: zfs_002_pos: simplify ZFS_ABORT tests
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:21 -07:00
наб e09762c6c2 tests: zfs_set_common: check_prop_inherit: print faulty values
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:00 -07:00
Brian Behlendorf 2feba9a6d8 Linux 5.16 compat: META
Update the META file to reflect compatibility with the 5.16 kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13212
Closes #13218
2022-03-14 16:39:07 -07:00
Ryan Moeller 8bb9ecf4bf libzfs: Convert to fnvpair functions
Improves readability.  No functional change intended.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13197
2022-03-14 15:44:56 -07:00
Brian Atkinson becc717f61 Adding ZERO_PAGE detection
On some architectures ZERO_PAGE is unavailable because it references
a GPL exported symbol of empty_zero_page. Originally e08b993 removed
the call to PAGE_ZERO(0) for assignment to the abd_zero_page. However,
a simple check can be done to avoid a kernel allocation and free for
the abd_zero_page if ZERO_PAGE is available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #13199
2022-03-14 12:37:39 -07:00
наб dad2b19fff module: zfs: zio_inject: zio_match_handler: don't << -1
Caught by UBSAN: ZI_NO_DVA is passed explicitly in
zio_handle_decrypt_injection() and can be an ENOENT from zio_match_dva()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13146
Closes #13190
2022-03-13 13:18:17 -07:00
Brian Behlendorf 56a0699e5e ZTS: Fix send_partial_dataset.ksh
The send_partial_dataset test verifies that partial send streams
can be resumed.  This test may occasionally fail with a "token is
corrupt" error if the `mess_send_file` truncates a send stream
below the size of the DRR_BEGIN record.  Update this function to
set a minimum size to ensure there is at least an intact DDR_BEGIN
record which allows for the receiving dataset to be created.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13177
2022-03-13 13:16:49 -07:00
Harry Sintonen ebcf12f763 get_key_material_https: removed bogus free() call
The get_key_material_https() function error code path had a bogus
free() call, either resulting in double-free or free() of undefined
pointer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Co-authored-by: Harry Sintonen <sintonen@iki.fi>
Signed-off-by: Harry Sintonen <sintonen@iki.fi>
Closes #13198
2022-03-13 13:15:40 -07:00
Ryan Moeller 76bcffb7dc libzfs: FreeBSD doesn't resize partitions for you
This code can be failure prone on FreeBSD, where zfsd will pass a guid
as the vdev path to online.  The guid causes zfs_resolve_shortname to
fail because it expects a path.  We can just skip the whole ordeal.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #12083
2022-03-11 08:52:49 -08:00
Akash B 1282274f33 Add physical device size to SIZE column in 'zpool list -v'
Add physical device size/capacity only for physical devices in
'zpool list -v' instead of displaying "-" in the SIZE column.
This would make it easier to see the individual device capacity and
to determine which spares are large enough to replace which devices.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #12561
Closes #13106
2022-03-08 16:20:41 -08:00
Attila Fülöp ce7a5dbf4b Linux x86 SIMD: factor out unneeded kernel dependencies
Cleanup the kernel SIMD code by removing kernel dependencies.

 - Replace XSTATE_XSAVE with our own XSAVE implementation for all
   kernels not exporting kernel_fpu{begin,end}(), see #13059

 - Replace union fpregs_state by a uint8_t * buffer and get the size
   of the buffer from the hardware via the CPUID instruction

 - Replace kernels xgetbv() by our own implementation which was
   already there for userspace.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13102
2022-03-08 16:19:15 -08:00
наб a86e089415 libzfs_core: lzc_send_wraper: maximise pipe buffer for existing pipes
This reduces scheduler overhead by letting the reader consume bigger
chunks (64k => 128k at full throttle)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13133
2022-03-08 09:33:12 -08:00
Rich Ercolani 7eb179be18 ZTS: /dev/null: accept no substitutes
Instead of writing to "devnull" and rming it later, just
> /dev/null to not have to cleanup later.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13133
2022-03-08 09:33:11 -08:00
наб dd899641ee Revert "ZTS: Avoid piping send directly to /dev/null"
This reverts commit 1a79f7e860.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13133
2022-03-08 09:33:10 -08:00
наб d1a5e9594c Revert "Added error for writing to /dev/ on Linux"
This reverts commit 860051f1d1.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13133
2022-03-08 09:33:09 -08:00
наб 3a909fe33e libzfs, libzfs_core: send: always write to pipe
By introducing lzc_send_wrapper() and routing all ZFS_IOC_SEND*
users through it, we fix a Linux 5.10-introduced bug (see comment)

This is all /transparent/ to the users API, ABI, and usage-wise,
and disabled on FreeBSD and if the output is already a pipe,
and transparently nestable (i.e. zfs_send_one() is wrapped,
but so is lzc_send_redacted() it calls to ‒ this wouldn't be strictly
necessary if ZFS_IOC_SEND_PROGRESS wasn't strictly denominational w.r.t.
the descriptor the send is happening on)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11445
Closes #13133
2022-03-08 09:33:08 -08:00
наб fbbea09db9 libzfs_core: simplify max_pipe_buffer(), return new max
We do still check and only optionally set because all F_SETPIPE_SZ
directives reallocate the pipe buffer

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13133
2022-03-08 09:33:07 -08:00
наб 8a3d77358a libzfs: migrate single-use libzfs_set_pipe_max() to libzfs_core
Notably, this also means that the pipe is expanded before each
dataset is received, so updates to /p/s/f/pipe-max-size are reflected
for each new dataset

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13133
2022-03-08 09:33:04 -08:00
Brian Atkinson 08eb2309ce Cleaning up a couple of ZTS tests setup scripts
With the zfs_destroy ZTS test case the setup script needed to call
default_setup_noexit so compression could be turned off. Also, added
log_must to setting compression off in the reservation setup script for
turning off compression.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #13173
2022-03-08 09:20:33 -08:00
Brian Atkinson 1b609d4b03 Added noexit variant for Raidz setup in ZTS tests
The regular default_raidz_setup function in the ZFS test suite called
log_pass after creating the zpool. However, with compression now being
on by default 56fa4aa, there is no way to turn compression off in the
setup.ksh scripts when creating a raidz VDEV. The addition of the
function default_raidz_setup_noexit allows for a raidz VDEV to be
created, additional zfs property settings to be applied and for the
setup.ksh script itself to call log_pass.

With the addition of default_raidz_setup_noexit some stray log_pass
calls were removed from any setup.ksh scripts that call
default_raidz_setup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #13173
2022-03-08 09:20:00 -08:00
Brian Behlendorf 6df43169b3 Fix ENOSPC when unlinking multiple files from full pool
When unlinking multiple files from a pool at 100% capacity, it was
possible for ENOSPC to be returned after the first unlink.  e.g.

    rm -f /mnt/fs/test1.0.0 /mnt/fs/test1.1.0 /mnt/fs/test1.2.0
    rm: cannot remove '/mnt/fs/test1.1.0': No space left on device
    rm: cannot remove '/mnt/fs/test1.2.0': No space left on device

After waiting for the pending deferred frees from the first unlink to
be processed the remaining files can then be unlinked.  This is caused
by the quota limit in dsl_dir_tempreserve_impl() being temporarily
decreased to the allocatable pool capacity less any deferred free
space.

This is resolved using the existing mechanism of returning ERESTART
when over quota as long as we know enough space will shortly be
available after processing the pending deferred frees.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13172
2022-03-08 09:16:35 -08:00
Umer Saleem 39a4daf742 Expose additional file level attributes
ZFS allows to update and retrieve additional file level attributes for
FreeBSD. This commit allows additional file level attributes to be
updated and retrieved for Linux. These include the flags stored in the
upper half of z_pflags only.

Two new IOCTLs have been added for this purpose. ZFS_IOC_GETDOSFLAGS
can be used to retrieve the attributes, while ZFS_IOC_SETDOSFLAGS can
be used to update the attributes.

Attributes that are allowed to be updated include ZFS_IMMUTABLE,
ZFS_APPENDONLY, ZFS_NOUNLINK, ZFS_ARCHIVE, ZFS_NODUMP, ZFS_SYSTEM,
ZFS_HIDDEN, ZFS_READONLY, ZFS_REPARSE, ZFS_OFFLINE and ZFS_SPARSE.
Flags can be or'd together while calling ZFS_IOC_SETDOSFLAGS.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13118
2022-03-07 17:52:03 -08:00
Windel Bouwman 9955b9ba2e Handle aarch64 defines seperate from arm
aarch64 is a different architecture than arm. Some
compilers might choke when both __arm__ and __aarch64__
are defined.

This change separates the checks for arm and for
aarch64 in the isa_defs.h header files.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Windel Bouwman <windel@windel.nl>
Closes #10335 
Closes #13151
2022-03-07 17:49:34 -08:00
Alejandro Colomar db7f1a91de Use _Noreturn (C11; GNU89) properly
A function that returns with no value is a different thing from a
function that doesn't return at all.  Those are two orthogonal
concepts, commonly confused.

pthread_create(3) expects a pointer to a start routine that has a
very precise prototype:

    void *(*start_routine)(void *);

However, other thread functions, such as kernel ones, expect:

    void (*start_routine)(void *);

Providing a different one is incorrect, and has only been working
because the ABIs happen to produce a compatible function.

We should use '_Noreturn void', since it's the natural type, and
then provide a '_Noreturn void *' wrapper for pthread functions.

For consistency, replace most cases of __NORETURN or
__attribute__((noreturn)) by _Noreturn.  _Noreturn is understood
by -std=gnu89, so it should be safe to use everywhere.

Ref: https://github.com/openzfs/zfs/pull/13110#discussion_r808450136
Ref: https://software.codidact.com/posts/285972
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Closes #13120
2022-03-04 16:25:22 -08:00
наб 06b8050678 zpool: main: list: don't pay for printf
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13125
2022-03-04 12:08:34 -08:00
наб cdb3eb5aa4 zpool: main: list: -v: update dash spacing for special vdevs
Before:
nabijaczleweli@tarta:~/store/code/zfs$ /sbin/zpool list -v file
NAME                                                    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
file                                                    118G   150K   118G        -         -     0%     0%  1.00x    ONLINE  -
  /mnt/filling/store/nabijaczleweli/code/zfs/file      39.5G      0  39.5G        -         -     0%  0.00%      -    ONLINE
dedup                                                      -      -      -        -         -      -      -      -  -
  /mnt/filling/store/nabijaczleweli/code/zfs/adedup    39.5G      0  39.5G        -         -     0%  0.00%      -    ONLINE
special                                                    -      -      -        -         -      -      -      -  -
  /mnt/filling/store/nabijaczleweli/code/zfs/aspecial  39.5G   150K  39.5G        -         -     0%  0.00%      -    ONLINE
logs                                                       -      -      -        -         -      -      -      -  -
  /mnt/filling/store/nabijaczleweli/code/zfs/alog      39.5G      0  39.5G        -         -     0%  0.00%      -    ONLINE
cache                                                      -      -      -        -         -      -      -      -  -
  /mnt/filling/store/nabijaczleweli/code/zfs/acache    40.0G  1.50K  40.0G        -         -     0%  0.00%      -    ONLINE
spare                                                      -      -      -        -         -      -      -      -  -
  /mnt/filling/store/nabijaczleweli/code/zfs/aspare        -      -      -        -         -      -      -      -     AVAIL

After:
nabijaczleweli@tarta:~/store/code/zfs$ cmd/zpool/zpool list -v file
NAME                                                    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
file                                                    118G   150K   118G        -         -     0%     0%  1.00x    ONLINE  -
  /mnt/filling/store/nabijaczleweli/code/zfs/file      39.5G      0  39.5G        -         -     0%  0.00%      -    ONLINE
dedup                                                      -      -      -        -         -      -      -      -         -
  /mnt/filling/store/nabijaczleweli/code/zfs/adedup    39.5G      0  39.5G        -         -     0%  0.00%      -    ONLINE
special                                                    -      -      -        -         -      -      -      -         -
  /mnt/filling/store/nabijaczleweli/code/zfs/aspecial  39.5G   150K  39.5G        -         -     0%  0.00%      -    ONLINE
logs                                                       -      -      -        -         -      -      -      -         -
  /mnt/filling/store/nabijaczleweli/code/zfs/alog      39.5G      0  39.5G        -         -     0%  0.00%      -    ONLINE
cache                                                      -      -      -        -         -      -      -      -         -
  /mnt/filling/store/nabijaczleweli/code/zfs/acache    40.0G  1.50K  40.0G        -         -     0%  0.00%      -    ONLINE
spare                                                      -      -      -        -         -      -      -      -         -
  /mnt/filling/store/nabijaczleweli/code/zfs/aspare        -      -      -        -         -      -      -      -     AVAIL

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13123
Closes #13125
2022-03-04 12:08:33 -08:00
наб cf5a9d8c39 zpool: main: use ARRAY_SIZE(class_name) instead of 3
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13125
2022-03-04 12:08:33 -08:00
наб f4826a2d46 libzfs: util: don't check for allocation errors from infallible zfs_*()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13125
2022-03-04 12:08:33 -08:00
наб be8e1d81bf Flex non-pretty-printed properties and raw-/pretty-print remaining ones
Before:
nabijaczleweli@tarta:~/store/code/zfs$ /sbin/zpool list -Td -o name,size,alloc,free,ckpoint,expandsz,guid,load_guid,frag,cap,dedup,health,altroot,guid,dedupditto,load_guid,maxblocksize,maxdnodesize 2>/dev/null
Sun 20 Feb 03:57:44 CET 2022
NAME         SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   GUID  LOAD_GUID   FRAG    CAP  DEDUP    HEALTH  ALTROOT   GUID  DEDUPDITTO  LOAD_GUID  MAXBLOCKSIZE  MAXDNODESIZE
filling     25.5T  6.52T  18.9T        -       64M  11512889483096932869  11656109927366648364     1%    25%  1.00x    ONLINE  -        11512889483096932869           0  11656109927366648364       1048576         16384
tarta-boot   240M  50.6M   189M        -         -  2372068846917849656  7752280792179633787    12%    21%  1.00x    ONLINE  -        2372068846917849656           0  7752280792179633787       1048576           512
tarta-zoot  55.5G  6.42G  49.1G        -         -  12971868889665384604  8622632123393589527    17%    11%  1.00x    ONLINE  -        12971868889665384604           0  8622632123393589527       1048576         16384

nabijaczleweli@tarta:~/store/code/zfs$ /sbin/zfs list -o name,guid,keyguid,ivsetguid,createtxg,objsetid,pbkdf2iters,refratio -r tarta-zoot
NAME                                  GUID  KEYGUID  IVSETGUID  CREATETXG  OBJSETID  PBKDF2ITERS  REFRATIO
tarta-zoot                           1110930838977259561     659P          -          1        54            0     1.03x
tarta-zoot/PAGEFILE.SYS              2202570496672997800    3.20E          -       2163      1539            0     1.07x
tarta-zoot/dupa                      16941280502417785695    9.81E          -    2274707      1322  1000000000000     1.00x
tarta-zoot/etc                       17029963068508333530    12.9E          -       3663      1087            0     1.52x
tarta-zoot/home                      3508163802370032575    8.50E          -       3664       294            0     1.00x
tarta-zoot/home/misio                7283672744014848555    13.0E          -       3665       302            0     2.28x
tarta-zoot/home/nabijaczleweli       12286744508078616303    5.15E          -       3666       200            0     2.05x
tarta-zoot/home/nabijaczleweli/tftp  13551632689932817643    5.16E          -       3667      1095            0     1.00x
tarta-zoot/home/root                 5203106193060067946    15.4E          -       3668       698            0     2.86x
tarta-zoot/home/shared-config        8866040021005142194    14.5E          -       3670      2069            0     1.20x
tarta-zoot/home/tymek                9472751824283011822    4.56E          -       3671      1202            0     1.32x
tarta-zoot/oldboot                   10460192444135730377    13.8E          -    2268398      1232            0     1.01x
tarta-zoot/opt                       9945621324983170410    5.84E          -       3672      1210            0     1.00x
tarta-zoot/opt/icecc                 13178238931846132425    9.04E          -       3673      1103            0     2.83x
tarta-zoot/opt/swtpm                 10172962421514870859    4.13E          -     825669    145132            0     1.87x
tarta-zoot/srv                       217179989022738337    3.90E          -       3674      2469            0     1.00x
tarta-zoot/usr                       12214213243060765090    15.0E          -       3675      2477            0     2.58x
tarta-zoot/usr/local                 7542700368693813134     941P          -       3676      2484            0     2.33x
tarta-zoot/var                       13414177124447929530    10.2E          -       3677      2492            0     1.57x
tarta-zoot/var/lib                   6969944550407159241    5.28E          -       3678      2499            0     2.34x
tarta-zoot/var/tmp                   6399468088048343912    1.34E          -       3679      1218            0     3.95x

After:
nabijaczleweli@tarta:~/store/code/zfs$ cmd/zpool/zpool list -Td -o name,size,alloc,free,ckpoint,expandsz,guid,load_guid,frag,cap,dedup,health,altroot,guid,dedupditto,load_guid,maxblocksize,maxdnodesize 2>/dev/null
Sun 20 Feb 03:57:42 CET 2022
NAME         SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ                  GUID             LOAD_GUID   FRAG    CAP  DEDUP    HEALTH  ALTROOT                  GUID  DEDUPDITTO             LOAD_GUID  MAXBLOCKSIZE  MAXDNODESIZE
filling     25.5T  6.52T  18.9T        -       64M  11512889483096932869  11656109927366648364     1%    25%  1.00x    ONLINE  -        11512889483096932869           0  11656109927366648364            1M           16K
tarta-boot   240M  50.6M   189M        -         -   2372068846917849656   7752280792179633787    12%    21%  1.00x    ONLINE  -         2372068846917849656           0   7752280792179633787            1M           512
tarta-zoot  55.5G  6.42G  49.1G        -         -  12971868889665384604   8622632123393589527    17%    11%  1.00x    ONLINE  -        12971868889665384604           0   8622632123393589527            1M           16K

nabijaczleweli@tarta:~/store/code/zfs$ cmd/zfs/zfs list -o name,guid,keyguid,ivsetguid,createtxg,objsetid,pbkdf2iters,refratio -r tarta-zoot
NAME                                                 GUID               KEYGUID  IVSETGUID  CREATETXG  OBJSETID    PBKDF2ITERS  REFRATIO
tarta-zoot                            1110930838977259561    741529699813639505          -          1        54              0     1.03x
tarta-zoot/PAGEFILE.SYS               2202570496672997800   3689529982640017884          -       2163      1539              0     1.07x
tarta-zoot/dupa                      16941280502417785695  11312442953423259518          -    2274707      1322  1000000000000     1.00x
tarta-zoot/etc                       17029963068508333530  14852574366795347233          -       3663      1087              0     1.52x
tarta-zoot/home                       3508163802370032575   9802810070759776956          -       3664       294              0     1.00x
tarta-zoot/home/misio                 7283672744014848555  14983161489316798151          -       3665       302              0     2.28x
tarta-zoot/home/nabijaczleweli       12286744508078616303   5937870537299886218          -       3666       200              0     2.05x
tarta-zoot/home/nabijaczleweli/tftp  13551632689932817643   5950522828900813054          -       3667      1095              0     1.00x
tarta-zoot/home/root                  5203106193060067946  17718025091255443518          -       3668       698              0     2.86x
tarta-zoot/home/shared-config         8866040021005142194  16716354482778968577          -       3670      2069              0     1.20x
tarta-zoot/home/tymek                 9472751824283011822   5251854710505749954          -       3671      1202              0     1.32x
tarta-zoot/oldboot                   10460192444135730377  15894065034622168157          -    2268398      1232              0     1.01x
tarta-zoot/opt                        9945621324983170410   6737735639539098405          -       3672      1210              0     1.00x
tarta-zoot/opt/icecc                 13178238931846132425  10425145983015238428          -       3673      1103              0     2.83x
tarta-zoot/opt/swtpm                 10172962421514870859   4764783754852521469          -     825669    145132              0     1.87x
tarta-zoot/srv                         217179989022738337   4492810461439647259          -       3674      2469              0     1.00x
tarta-zoot/usr                       12214213243060765090  17306702395865262834          -       3675      2477              0     2.58x
tarta-zoot/usr/local                  7542700368693813134   1059954157997659784          -       3676      2484              0     2.33x
tarta-zoot/var                       13414177124447929530  11764397504176937123          -       3677      2492              0     1.57x
tarta-zoot/var/lib                    6969944550407159241   6084753728494937404          -       3678      2499              0     2.34x
tarta-zoot/var/tmp                    6399468088048343912   1548692824635344277          -       3679      1218              0     3.95x

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13122
Closes #13125
2022-03-04 12:08:33 -08:00
наб a9a89755fa module: zcommon: zprop: common: zprop_width: namespace exceptions
Before this, /all/ numerical properties 1 (ZFS_PROP_CREATION,
ZPOOL_PROP_SIZE, VDEV_PROP_CAPACITY) would be non-fixed and
/all/ numerical properties 5 (ZFS_PROP_COMPRESSRATIO,
ZPOOL_PROP_HEALTH, VDEV_PROP_PSIZE) would be 8-wide

Realistically, this doesn't appear to be much of a problem

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13125
2022-03-04 12:08:30 -08:00
наб 6d8a00ff1f contrib/dracut: README: note rootfstype=zfs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13093
2022-03-03 10:45:31 -08:00
наб 6a41310c70 contrib/dracut: zfs-lib: export_all: replace with inline zpool export -a
07a3312f17, which introduced this in
October of 2014, didn't have zpool export -a available; we do

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13093
2022-03-03 10:45:24 -08:00
наб 93669c3812 contrib/dracut: export-zfs: simplify
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13093
2022-03-03 10:45:19 -08:00
наб 497bf14a4a zdb.8: cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13093
2022-03-03 10:44:49 -08:00
Rich Ercolani 56fa4aa96e Default to ON for compression
A simple change, but so many tests break with it,
and those are the majority of this.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13078
2022-03-03 10:43:38 -08:00
Brian Behlendorf 29a0ffe795 ZTS: Fix import_devices_missing.ksh
Related to commit 90b77a036.  Retry the `zpool export` if the pool
is "busy" indicating there is a process accessing the mount point.
This can happen after an import, allowing it to be retried will
avoid spurious test failures.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13169
2022-03-02 11:03:53 -08:00
Rich Ercolani fe2ea67ddd Re-apply 6ba2e72b, silence lint
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12978
2022-03-01 13:56:00 -08:00
Rich Ercolani e220635995 Re-apply a78f19d3
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12978
2022-03-01 13:55:51 -08:00
Rich Ercolani 234e9605c1 Explode zstd 1.4.5 into separate upstream files
It's much nicer to import from upstream this way, and compiles
faster too.

Everything in lib/ is unmodified 1.4.5.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12978
2022-03-01 13:55:12 -08:00
Aleksa Sarai 669683c4cb ZTS: switch to rsync for directory diffs
While "diff -r" is the most straightforward way of comparing directory
trees for differences, it has two major issues:

 * File metadata is not compared, which means that subtle bugs may be
   missed even if a test is written that exercises the buggy behaviour.
 * diff(1) doesn't know how to compare special files -- it assumes they
   are always different, which means that a test using diff(1) on
   special files will always fail (resulting in such tests not being
   added).

rsync can be used in a very similar manner to diff (with the -ni flags),
but has the additional benefit of being able to detect and resolve many
more differences between directory trees. In addition, rsync has a
standard set of features and flags while diffs feature set depends on
whether you're using GNU or BSD binutils.

Note that for several of the test cases we expect that file timestamps
will not match. For example, the ctime for a file creation or modify
event is stored in the intent log but not the mtime. Thus when replaying
the log the correct ctime is set but the current mtime is used. This is
the expected behavior, so to prevent these tests from failing, there's a
replay_directory_diff function which ignores those kinds of changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Closes #12588
2022-03-01 10:05:32 -08:00
Brian Behlendorf 19229e5f1e ZTS: Modify receive-o-x_props_override.ksh exception
As previously noted in #12272 the receive-o-x_props_override.ksh test
reliably fails on FreeBSD.  Since we don't expect this test to pass
move the exception from the "maybe" to "known" section.  This way we
don't retry the FAILED test when it is not expected to pass.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13167
2022-03-01 08:47:30 -08:00
Brian Behlendorf a6bf7ea16a ZTS: Move largest_pool_001_pos.ksh to Linux runfile
On FreeBSD pools are not allowed to be created using vdevs which are
backed by ZFS volumes.  This configuration is not recommended for any
supported platform, nevertheless the largest_pool_001_pos.ksh test
case makes use of it as a convenience.  This causes the test case to
fail reliably on FreeBSD.  The layout is still tolerated on Linux
so only perform this test on Linux.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13166
2022-03-01 08:46:00 -08:00
Savyasachee Jha b3ab290855 dracut: skip zfsexpandknoweldge when zfs_devs is present in dracut
PR 1711 (https://github.com/dracutdevs/dracut/pull/1711) adds a zfs_devs
function to dracut to detect the physical devices backing zfs pools. If
this function exists in the version of dracut this module is being
called from, then it does not need to run.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13121
2022-02-28 14:35:25 -08:00
наб 9e532d17f3 config: gcc != cc
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13145
Closes #13152
2022-02-26 11:28:26 -08:00
наб b08153f1c1 freebsd: libzfs: zmount: void-cast unused assert(3) variables
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13145
Closes #13152
2022-02-26 11:28:18 -08:00
наб 1ea68b6d38 config: check for -Wno-cast-function-type
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13145
Closes #13152
2022-02-26 11:26:55 -08:00
Paul Dagnelie 82226e4f44 Fix erroneous zstreamdump warning
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #13154
2022-02-26 11:24:27 -08:00
Brian Behlendorf ce91f973ec ztest: Fix ASSERT in ztest_objset_destroy_cb()
The dsl_destroy_snapshot() call in ztest_objset_destroy_cb() may
encounter a runtime error when the pool is out of space.  This is
similar to the error handling for the dsl_destroy_head() case,
but since dsl_destroy_snapshot() is implemented as a channel
program ECHRNG is returned instead of ENOSPC.  ECHRNG may also
be returned instead of EBUSY if there is a hold on the snapshot.

Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13155
2022-02-26 11:20:32 -08:00
наб 4f453dcc1f Fix FreeBSD reporting on reruns
Turns out, when your test-suite fails on FreeBSD the rerun logic
would fail as follows:

Results Summary
PASS	 1358
FAIL	   7
SKIP	  47

Running Time:	04:00:02
Percent passed:	96.2%
Log directory:	/var/tmp/test_results/20220225T092538
mktemp: illegal option -- p
usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
       mktemp [-d] [-q] [-u] -t prefix
mktemp: illegal option -- p
usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
       mktemp [-d] [-q] [-u] -t prefix
/usr/local/share/zfs/zfs-tests.sh: cannot create :
                                   No such file or directory
...

This change resolves a flaw from the original commit, 2320e6eb4
("Add zfs-test  facility to automatically rerun failing tests")

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13156
2022-02-26 11:19:05 -08:00
Tony Hutter f2f6c18f17 zed: Misc multipath autoreplace fixes
We recently had a case where our operators replaced a bad
multipathed disk, only to see it fail to autoreplace.  The
zed logs showed that the multipath replacement disk did not pass
the 'is_dm' test in zfs_process_add() even though it should have.
is_dm is set if there exists a sysfs entry for to the
underlying /dev/sd* paths for the multipath disk.  It's
possible this path didn't exist due to a race condition where
the sysfs paths weren't created at the time the udev event came
in to zed, but this was never verified.

This patch updates the check to look for udev properties that
indicate if the new autoreplace disk is an empty multipath disk,
rather than looking for the underlying sysfs entries. It also
adds in additional logging, and fixes a bug where zed allowed
you to use an already zfs-formatted disk from another pool
as a multipath auto-replacement disk.

Furthermore, while testing this patch, I also ran across a case
where a force-faulted disk did not have a ZPOOL_CONFIG_PHYS_PATH
entry in its config.  This prevented it from being autoreplaced.
I added additional logic to derive the PHYS_PATH from the PATH if
the PATH was a /dev/disk/by-vdev/ path.  For example, if PATH
was /dev/disk/by-vdev/L28, then PHYS_PATH would be L28.  This is
safe since by-vdev paths represent physical locations and do not
change between boots.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13023
2022-02-24 11:43:39 -08:00
Damian Szuberski e25bcf906a Fix directory detection in dkms.mkconf
Fix `zfs-dkms` installation on Debian-derived distributions by
aligning the directory detection logic to #13096.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #11449
Closes #13141
2022-02-24 10:33:48 -08:00
Damian Szuberski 78fad47cb3 Add Linux kmemleak support to ZTS
- Kmemleak `clear` is invoked right before every test case run.
- Kmemleak `scan` is requested right after each test case is finished.
- Kmemleak instrumentation is not used for
  setup/cleanup/pretest/posttest/failsafe stages to shorten the test
  case execution time.
- Kmemleak periodic scan is disabled (`scan=0`) before the test suite
  run to avoid interfering with the on-demand scan results.
- There are unavoidable potential false positives coming from kernel
  areas other than OpenZFS module.
- The ZTS with kmemleak enabled duration is increased by ~50%.

Example run
```
Running Time:   07:12:13
Percent passed: 98.3%

unreferenced object 0xffff9da82aea5410 (size 80):
  comm "kworker/u32:10", pid 942206, jiffies 4296749716 (age 2615.516s)
  hex dump (first 32 bytes):
    00 30 30 00 00 00 00 00 ff 8f 30 00 00 00 00 00  .00.......0.....
    51 e6 77 05 a8 9d ff ff 00 00 00 00 00 00 00 00  Q.w.............
  backtrace:
    [<000000005cf1fea2>] alloc_extent_state+0x1d/0xb0 [btrfs]
    [<0000000083f78ae5>] set_extent_bit+0x2ff/0x670 [btrfs]
    [<00000000de29249e>] lock_extent_bits+0x6b/0xa0 [btrfs]
    [<00000000b241f424>] lock_and_cleanup_extent_if_need+0xaf/0x1c0
       [btrfs]
    [<0000000093ca72b5>] btrfs_buffered_write+0x297/0x7d0 [btrfs]
    [<000000002c2938c8>] btrfs_file_write_iter+0x127/0x390 [btrfs]
    [<00000000b888f720>] do_iter_readv_writev+0x152/0x1b0
    [<00000000320f0bcc>] do_iter_write+0x7c/0x1c0
    [<000000000b5a8fe0>] lo_write_bvec+0x62/0x150 [loop]
    [<000000009aa03c73>] loop_process_work+0x250/0xbd0 [loop]
    [<00000000c7487d8a>] process_one_work+0x1f1/0x390
    [<000000000b236831>] worker_thread+0x53/0x3e0
    [<0000000023cb3e57>] kthread+0x127/0x150
    [<000000002d48676a>] ret_from_fork+0x22/0x30
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13084
2022-02-24 10:21:13 -08:00
Attila Fülöp 4d14a285b3 Linux 5.11 compat: x86 SIMD: fix kernel_fpu_{begin,end}() detection
Linux 5.11 changed kernel_fpu_begin() to an inlined function and
moved the functionality to kernel_fpu_begin_mask(). This breaks the
existing detection mechanism since it checks if kernel_fpu_begin is
an exported kernel symbol, which isn't the case for an inlined
function.

To avoid assumptions about internal implementation, replace
ZFS_LINUX_TEST_RESULT_SYMBOL in favor of  ZFS_LINUX_TEST_RESULT
which already makes sure kernel_fpu_{begin,end}() is usable by us.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13147
2022-02-24 09:23:41 -08:00
Jitendra Patidar 361a7e8211 log xattr=sa create/remove/update to ZIL
As such, there are no specific synchronous semantics defined for
the xattrs. But for xattr=on, it does log to ZIL and zil_commit() is
done, if sync=always is set on dataset. This provides sync semantics
for xattr=on with sync=always set on dataset.

For the xattr=sa implementation, it doesn't log to ZIL, so, even with
sync=always, xattrs are not guaranteed to be synced before xattr call
returns to caller. So, xattr can be lost if system crash happens, before
txg carrying xattr transaction is synced.

This change adds xattr=sa logging to ZIL on xattr create/remove/update
and xattrs are synced to ZIL (zil_commit() done) for sync=always.
This makes xattr=sa behavior similar to xattr=on.

Implementation notes:
The actual logging is fairly straight-forward and does not warrant
additional explanation.
However, it has been 14 years since we last added new TX types
to the ZIL [1], hence this is the first time we do it after the
introduction of zpool features. Therefore, here is an overview of the
feature activation and deactivation workflow:

1. The feature must be enabled. Otherwise, we don't log the new
    record type. This ensures compatibility with older software.
2. The feature is activated per-dataset, since the ZIL is per-dataset.
3. If the feature is enabled and dataset is not for zvol, any append to
    the ZIL chain will activate the feature for the dataset. Likewise
    for starting a new ZIL chain.
4. A dataset that doesn't have a ZIL chain has the feature deactivated.

We ensure (3) by activating on the first zil_commit() after the feature
was enabled. Since activating the features requires waiting for txg
sync, the first zil_commit() after enabling the feature will be slower
than usual. The downside is that this is really a conservative
approximation: even if we never append a 'TX_SETSAXATTR' to the ZIL
chain, we pay the penalty for feature activation. The upside is that the
user is in control of when we pay the penalty, i.e., upon enabling the
feature.

We ensure (4) by hooking into zil_sync(), where ZIL destroy actually
happens.

One more piece on feature activation, since it's spread across
multiple functions:

zil_commit()
  zil_process_commit_list()
    if lwb == NULL // first zil_commit since zil_open
      zil_create()
        if no log block pointer in ZIL header:
          if feature enabled and not active:
	    // CASE 1
            enable, COALESCE txg wait with dmu_tx that allocated the
	    log block
         else // log block was allocated earlier than this zil_open
          if feature enabled and not active:
	    // CASE 2
            enable, EXPLICIT txg wait
    else // already have an in-DRAM LWB
      if feature enabled and not active:
        // this happens when we enable the feature after zil_create
	// CASE 3
        enable, EXPLICIT txg wait

[1] https://github.com/illumos/illumos-gate/commit/da6c28aaf62fa55f0fdb8004aa40f88f23bf53f0

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #8768 
Closes #9078
2022-02-22 13:06:43 -08:00
Krzysztof Piecuch ccdcc1dbe8 systemd: read initconfdir
Systemd units do not read @initconfdir@ but refer to variables defined
there, also a minor fixup in zfs-scrub service file.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Krzysztof Piecuch <piecuch@kpiecuch.pl>
Closes #12946
2022-02-22 12:59:11 -08:00
наб 7454ca413c zpoo-features.7: raidz -> RAID-Z near dRAID
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:56:21 -08:00
наб 1f4376eb01 zpool-features.7: never-return-enabled consistency
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:56:14 -08:00
наб 58a4848132 zpool-features.7: zfs sendstreams
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:56:06 -08:00
наб 6f763af270 zpool-features.7: spurious line break in enabled_txg
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:55:58 -08:00
наб 2d232ca806 man: full stop at EOL
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:55:51 -08:00
наб 6985944e2f man: -based when -based
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:55:44 -08:00
наб a737b415d6 man: IO -> I/O; I/Os -> I/O operations again
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:55:37 -08:00
наб aafb89016b man: final RAIDZ -> RAID-Z
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:55:29 -08:00
наб 6bf6f5196b man: final VDEV -> vdev
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:55:18 -08:00
Brian Behlendorf a5b3fab341 ZTS: Retry in import_rewind_config_changed.ksh
As explained by the disclaimer in the test case,

    "This test can fail since nothing guarantees that old
    MOS blocks aren't overwritten."

This behavior is expected and correct, but results in a
flaky test case which is problematic for the CI.  The best
we can do to resolve this is to retry the sub-test which
failed when the MOS blocks have clearly been overwritten.

When testing failures were rare enough that a single retry
should normally be sufficient.  However, we allow up to
five for good measure.

Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13119
2022-02-20 19:21:31 -08:00
Damian Szuberski 806739f991 Correct compilation errors reported by GCC 10/11
New `zfs_type_t` value `ZFS_TYPE_INVALID` is introduced.
Variable initialization is now possible to make GCC happy.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12167
Closes #13103
2022-02-20 19:20:00 -08:00
Ryan Moeller e41013078a libzfs: Fail making a dataset handle gracefully
When a dataset is in the process of being received it gets marked as
inconsistent and should not be used.  We should check for this when
opening a dataset handle in libzfs and return with an appropriate error
set, rather than hitting an abort because of the incomplete data.

zfs_open() passes errno to zfs_standard_error() after observing
make_dataset_handle() fail, which ends up aborting if errno is 0.
Set errno before returning where we know it has not been set already.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13077
2022-02-18 13:09:03 -08:00
Damian Szuberski a014378dd0 spl: make 'spl_panic_halt' working for all cases
The default behavior where the serious ZFS errors cause FS thread to
stuck is very bad for some production scenario.

In some production scenarios (Linux), it is recommended to make real
kernel PANIC, where system can be rebooted by watchdog or kernel itself.
This patch enables coherent handling of spl_panic_halt parameter.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: Wojciech Nizinski <w.nizinski@grinn-global.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12120
Closes #13109
2022-02-18 11:43:11 -08:00
наб 15b982492a cstyle: forbid ARGSUSED
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:35:06 -08:00
наб a2995e7641 config: add -Wextra (sans sign-compare and missing-field-initializers)
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:35:02 -08:00
наб 642827ecda module: zfs: zcp_get: fix uninitialised warning
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:56 -08:00
наб b7c42ce5b2 raidz_test: silence unsigned >=0 warnings
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:52 -08:00
наб a215c3e834 libzpool: kernel: silence unrelated-function-pointer cast warning
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:47 -08:00
наб d8552689d1 libnvpair: json: suppress wchar_t >=0 warnings
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:43 -08:00
наб 4e1f1035d0 config: prune unused -Wno-bool-compare checks
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:37 -08:00
наб 72154bd6c9 libtpool: -Wno-clobbered
Also remove -Wno-unused-but-set-variable

Upstream-bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61118
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:25 -08:00
наб 0ea6510aa0 module: icp: remove useless assert
Which produces a warning since uints are, by definition, >=0

Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:18 -08:00
наб 5cf3c24fd8 libzfs: sendrecv: fix NULL arithmetic UB
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:14 -08:00
наб 46c7a80280 userspace: mark arguments used
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:08 -08:00
наб ef70eff198 module: mark arguments used
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:03 -08:00
наб 51c747de43 linux: module/zfs: vnops: make null_xattr static
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:33:20 -08:00
Brian Behlendorf 7901b62685 ZTS: Fix vdev_zaps_004_pos.ksh
When attaching a vdev to a mirror wait for the resilver to complete
before invoking `zdb` to inspect the pool.  This ensures the pool is
essentially idle which allows `zdb` to open the imported pool reliably.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13112 
Closes #6935
2022-02-17 12:09:06 -08:00
Savyasachee Jha 5eae5a88b2 Multiple dracut module install script cleanups
- Replaced intances of `dracut_install` with `inst_simple`
- Removed calls to `test -x mark_hostonly` because the function is an
inbuilt dracut function
- Removed redundant installation of `systemd-ask-password` and
`systemd-tty-ask-password-agent` because they are already installed by
the systemd module. There is no need to install them again
- Removed multiple calls to the `mark_hostonly` function because the
`inst_simple` has a command-line switch for it
- Cleaned up the installation of the `zpool.cache`, `vdev_id.conf` and
`hostid` files to make the logic easier to follow
- Cleaned up and simplified the systemd service installation logic by
invoking systemctl instead of creating symlinks manually
- Replaced various hard-coded paths with dracut equivalents to better
conform with expected dracut behaviour
- Removed redundant call to `mkdir` (`inst_simple` creates the parent
directory if it does not exist on the destination initrd)

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010
2022-02-16 15:28:18 -08:00
Savyasachee Jha ed66eafd05 Remove absolute paths to udev rules and binaries for dracut
Since dracut functions can locate both udev rules and binaries, there is
no point in keeping absolute paths in the module setup script. It also
breaks the --sysroot option in dracut. This commit removes mentions to
absolute paths for binaries and udev rules.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010
2022-02-16 15:26:08 -08:00
Savyasachee Jha 7f5a01bdd3 Make dracut fail if essential files cannot be installed
Dracut will now fail in initramfs generation if essential files cannot
be installed.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010
2022-02-16 15:25:59 -08:00
Savyasachee Jha 44b0380a35 Make better use of dracut functions when building initramfs
Setting up the module involves multiple redundant calls to a bunch of
dracut functions wheich can be combined into one. Additionally, the mass
of code required to load libgcc_s.so* can be replaced with one dracut
function. This has the additional effect of removing errors involving
the non-installation of libgcc_s.so* which are seen on debian bullseye
when using version 2.1.2-1~bpo11+1 from the backports repository.

The systemd binaries are separated out into their own `dracut_install`
function call so they do not get pulled in when dracut does not load the
systemd module.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010
2022-02-16 15:23:30 -08:00
Damian Szuberski 72f3c8b130 Fix Linux kernel directories detection
Most modern Linux distributions have separate locations for bare
source and prebuilt ("build") files. Additionally, there are `source`
and `build` symlinks in `/lib/modules/$(KERNEL_VERSION)` pointing to
them. The order of directory search is now:
- `configure` command line values if both `--with-linux` and
  `--with-linux-obj` were defined
- If only `--with-linux` was defined, `--with-linux-obj` is assumed
  to have the same value as `--with-linux`
- If neither `--with-linux` nor `--with-linux-obj` were defined
  autodetection is used:
  - `/lib/modules/$(uname -r)/{source,build}` respectively, if exist
  - The first directory in `/lib/modules` with the highest version
    number according to `sort -V` which contains `source` and `build`
    symlinks/directories
  - The first directory matching `/usr/src/kernels/*` and
    `/usr/src/linux-*` with the highest version number according to
    `sort -V`. Here the source and prebuilt directories are assumed
    to be the same.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #9935
Closes #13096
2022-02-16 15:17:16 -08:00
George Amanakis 52a36bd41a Enable encrypted raw sending to pools with greater ashift
Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with
ashift=12 results to failure when mounting pool2/encrypted (Input/Output
error). Notably, the opposite, raw sending from a greater ashift to a
lower one does not fail.

This happens because zio_compress_write() falsely checks only
ZIO_FLAG_RAW_COMPRESS and not ZIO_FLAG_RAW_ENCRYPT which is also set in
encrypted raw send streams. In this case it rounds up the psize and if
not equal to the zio->io_size it modifies the block by zeroing out
the extra bytes. Because this happens in a SA attr. registration object
(type=46), the decryption fails upon mounting the filesystem, and zpool
status falsely reports an error.

Fix this by checking both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT
before deciding whether to zero-pad a block.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13067 
Closes #13074
2022-02-16 11:52:02 -08:00
Damian Szuberski ba6005175e checkabi/storeabi relevant only to x86_64
The stored ABI files are for the x86_64 architecture.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #11345
Closes #13104
2022-02-16 11:48:01 -08:00
Rich Ercolani 0df22afea2 Colorize the Github test output
Let's color our workflow test output for better readability.

Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13000
2022-02-16 11:40:25 -08:00
наб 67de71e644 libzfs: calculate receive times with sub-second precision
Provide two digits of precision when reporting send/receive
times.  Tiny snapshots may take significantly less than a second
and rounding up to a full second can introduce a significant error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13047
2022-02-15 16:42:30 -08:00
Ryan Moeller 5c0061345b Cross-platform xattr user namespace compatibility
ZFS on Linux originally implemented xattr namespaces in a way that is
incompatible with other operating systems.  On illumos, xattrs do not
have namespaces.  Every xattr name is visible.  FreeBSD has two
universally defined namespaces: EXTATTR_NAMESPACE_USER and
EXTATTR_NAMESPACE_SYSTEM.  The system namespace is used for protected
FreeBSD-specific attributes such as MAC labels and pnfs state.  These
attributes have the namespace string "freebsd:system:" prefixed to the
name in the encoding scheme used by ZFS.  The user namespace is used
for general purpose user attributes and obeys normal access control
mechanisms.  These attributes have no namespace string prefixed, so
xattrs written on illumos are accessible in the user namespace on
FreeBSD, and xattrs written to the user namespace on FreeBSD are
accessible by the same name on illumos.

Linux has several xattr namespaces.  On Linux, ZFS encodes the
namespace in the xattr name for every namespace, including the user
namespace.  As a consequence, an xattr in the user namespace with the
name "foo" is stored by ZFS with the name "user.foo" and therefore
appears on FreeBSD and illumos to have the name "user.foo" rather than
"foo".  Conversely, none of the xattrs written on FreeBSD or illumos
are accessible on Linux unless the name happens to be prefixed with one
of the Linux xattr namespaces, in which case the namespace is stripped
from the name.  This makes xattrs entirely incompatible between Linux
and other platforms.

We want to make the encoding of user namespace xattrs compatible across
platforms.  A critical requirement of this compatibility is for xattrs
from existing pools from FreeBSD and illumos to be accessible by the
same names in the user namespace on Linux.  It is also necessary that
existing pools with xattrs written by Linux retain access to those
xattrs by the same names on Linux.  Making user namespace xattrs from
Linux accessible by the correct names on other platforms is important.
The handling of other namespaces is not required to be consistent.

Add a fallback mechanism for listing and getting xattrs to treat xattrs
as being in the user namespace if they do not match a known prefix.

Do not allow setting or getting xattrs with a name that is prefixed
with one of the namespace names used by ZFS on supported platforms.

Allow choosing between legacy illumos and FreeBSD compatibility and
legacy Linux compatibility with a new tunable.  This facilitates
replication and migration of pools between hosts with different
compatibility needs.

The tunable controls whether or not to prefix the namespace to the
name.  If the xattr is already present with the alternate prefix,
remove it so only the new version persists.  By default the platform's
existing convention is used.

Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11919
2022-02-15 16:35:30 -08:00
наб 666749806d module: icp: remove provider stats
These were all folded into a single kstat at
  /proc/spl/kstat/kcf/NONAME_provider_stats
with no way to know which one it actually was,
and only the AES and SHA (so not Skein) ones were ever updated

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:26:08 -08:00
наб bf86638687 module: icp: enforce KCF_{OPS_CLASSSIZE,MAXMECHTAB}
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:26:04 -08:00
наб e013057492 module: icp: remove unused pd_{remove_cv,hash_limit}
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:26:00 -08:00
наб de0ec5e7df module: icp: remove vestigia of crypto sessions
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:56 -08:00
наб cf497e18df module: icp: remove unused (and mostly faked) cm_{{min,max}_key_length,mech_flags}
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:52 -08:00
наб 11320b4cdf module: icp: remove unused crypto_provider_handle_t
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:46 -08:00
наб f5e7d918a7 module: icp: remove pre-set entries from mechtabs
They don't do anything except clogging up the AVL tree

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:41 -08:00
наб df7b54f1d9 module: icp: rip out insane crypto_req_handle_t mechanism, inline KM_SLEEP
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:37 -08:00
наб 15ec086396 include: crypto: clean out api.h
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:32 -08:00
наб 42dbc2025a module: icp: remove unused headers. Migrate {ops => sched}_impl
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:28 -08:00
наб 1949be46c3 include: crypto: clean out unused SYSCALL32 and flags
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:24 -08:00
наб f43748f6e1 module: icp: remove algorithm name defines used only in the default mechtab
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:21 -08:00
наб d223af9bbc include: crypto: remove unused algorithm name defines
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:17 -08:00
наб 64e82cea13 module: icp: remove set-but-unused cd_miscdata
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:13 -08:00
наб 739afd9475 module: icp: fold away all key formats except CRYPTO_KEY_RAW
It's the only one actually used

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:07 -08:00
наб 1018e81e30 module: icp: remove unused CRYPTO_* error codes
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:03 -08:00
наб 7eacb87112 module: icp: rip out modhash. Replace the one user with AVL
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:59 -08:00
наб 1cb6fa2cb8 module: icp: remove unused me_mutex
It only needs to be locked if dynamic changes can occur. They can't.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:54 -08:00
наб cb6e9c3f5f module: icp: remove unused me_threshold
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:50 -08:00
наб 7f90cf3043 module: icp: remove unused struct crypto_ctx::cc_{session,flags,opstate}
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:46 -08:00
наб a288428d83 module: icp: remove unused gswq, kcfpool, [as]req_cache, reqid_table, obsolete kstat
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:42 -08:00
наб 1c17d2940c module: icp: remove unused notification framework
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:37 -08:00
наб 3fd5ead75e module: icp: remove unused kcf_op_{group,type}, req_params, ...
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:33 -08:00
наб f3c3a6d47e module: icp: remove unused p[di]_flags
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:29 -08:00
наб d77702035a module: icp: remove unused CRYPTO_{NOTIFY_OPDONE,SKIP_REQID,RESTRICTED}
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:24 -08:00
наб eb1e09b7ec module: icp: remove unused CRYPTO_ALWAYS_QUEUE
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:19 -08:00
наб 65a613b70d module: icp: remove unused kcf_digest.c
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:14 -08:00
наб 255bc38e6f module: icp: drop software provider generation numbers
We register all providers at once, before anything happens

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:09 -08:00
наб bf3fffe70d module: icp: remove unused kcf_mac operations
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:04 -08:00
наб 2c2f955aae module: icp: remove unused kcf_cipher operations
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:59 -08:00
наб 710657f51d module: icp: remove other provider types
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:53 -08:00
наб 167ced3fb1 module: icp: use original mechanisms
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:49 -08:00
наб bcee18d4e0 module: icp: use original description
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:45 -08:00
наб b0502ab097 module: icp: guarantee the ops vector is persistent
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:40 -08:00
наб d59a7fae40 module: icp: have a static 8 providers
This is currently twice the amount we actually have (sha[12], skein,
aes), and 512 * sizeof(void*) = 4096: 128x more than we need and a waste
of most of a page in the kernel address space

Plus, there's no need to actually allocate it dynamically: it's always
got a static size. Put it in .data

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:33 -08:00
наб 464700ae02 module: icp: spi: crypto_ops_t: remove unused op types
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:28 -08:00
наб f5896e2bdf module: icp: spi: flatten struct crypto_ops, crypto_provider_info
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:24 -08:00
наб 959b9d6392 module: icp: spi: remove crypto_control_ops_t
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:19 -08:00
наб 9cdf015d0a module: icp: spi: remove crypto_{provider,op}_notification()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:22:57 -08:00
Jorgen Lundman 4759342a5e Add spa _os() hooks
Add hooks for when spa is created, exported, activated and
deactivated. Used by macOS to attach iokit, and lock
kext as busy (to stop unloads).

Userland, Linux, and, FreeBSD have empty stubs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12801
2022-02-15 15:54:25 -08:00
George Amanakis 2fb52853dc Avoid dirtying the final TXGs when exporting a pool
There are two codepaths than can dirty final TXGs:

1) If calling spa_export_common()->spa_unload()->
   spa_unload_log_sm_flush_all() after the spa_final_txg is set, then
   spa_sync()->spa_flush_metaslabs() may end up dirtying the final
   TXGs. Then we have the following panic:
   Call Trace:
    <TASK>
    dump_stack_lvl+0x46/0x62
    spl_panic+0xea/0x102 [spl]
    dbuf_dirty+0xcd6/0x11b0 [zfs]
    zap_lockdir_impl+0x321/0x590 [zfs]
    zap_lockdir+0xed/0x150 [zfs]
    zap_update+0x69/0x250 [zfs]
    feature_sync+0x5f/0x190 [zfs]
    space_map_alloc+0x83/0xc0 [zfs]
    spa_generate_syncing_log_sm+0x10b/0x2f0 [zfs]
    spa_flush_metaslabs+0xb2/0x350 [zfs]
    spa_sync_iterate_to_convergence+0x15a/0x320 [zfs]
    spa_sync+0x2e0/0x840 [zfs]
    txg_sync_thread+0x2b1/0x3f0 [zfs]
    thread_generic_wrapper+0x62/0xa0 [spl]
    kthread+0x127/0x150
    ret_from_fork+0x22/0x30
    </TASK>

2) Calling vdev_*_stop_all() for a second time in spa_unload() after
   spa_export_common() unnecessarily delays the final TXGs beyond what
   spa_final_txg is set at.

Fix this by performing the check and call for
spa_unload_log_sm_flush_all() before the spa_final_txg is set in
spa_export_common(). Also check if the spa_final_txg has already been
set in spa_unload() and skip those calls in this case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
External-issue: https://www.illumos.org/issues/9081
Closes #13048 
Closes #13098
2022-02-15 15:48:59 -08:00
Jorgen Lundman 9a70e97fe1 Rename fallthrough to zfs_fallthrough
Unfortunately macOS has obj-C keyword "fallthrough" in the OS headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #13097
2022-02-15 08:58:59 -08:00
наб ae07fc1393 libzfs: sendrecv: fix missing error output for invalid properties
Fixes: 7633c0aedd ("libzfs: sendrecv:
fix unused, remove argsused")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13100
Closes #13101
2022-02-14 13:07:56 -08:00
наб 0c8aab9586 zfs-receive.8: properly unlight = in option setting
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13101
2022-02-14 13:07:50 -08:00
наб 4631bf7bfb zfs-receive.8: fix Op Fl x Ar encryption in running text
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13101
2022-02-14 13:07:45 -08:00
Rich Ercolani dec1eef4c5 Silence uninitialized warnings in dsl_dataset.c
On newer compilers, dsl_dataset.c now warns (or, on DEBUG, errors)
on uninitialized variable usage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13083
2022-02-14 10:04:50 -08:00
Brian Behlendorf 9f734e81f4 ZTS: Fix checkpoint_ro_rewind.ksh
Related to commit 90b77a036.  Retry the `zpool export` if the pool is
"busy" indicating there is a process accessing the mount point.  This
can happen after an import and allowing it to be retried will avoid
spurious test failures.

Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13092
2022-02-13 14:22:49 -08:00
Brian Behlendorf b7baf49bd3 ZTS: Fix zpool_expand_001_pos
The dRAID section of the zpool_expand_001_pos test would reliably fail
because the calculated expansion size assumed the dRAID top-level vdev
was created with a distributed spare.  Create the vdev as expected to
resolve the test failure.

This test case flaw was accidentally caused by changing the default
number of dRAID distributed spares from one to zero while dRAID was
being developed.

Additionally, remove zpool_expand_005_pos from the list of possible
faulty tests.  It appears to be passing consistently in my testing.

Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13091
2022-02-13 14:22:00 -08:00
Brian Behlendorf 10271af05c Fix gcc warning in kfpu_begin()
Observed when building on CentOS 8 Stream.  Remove the `out`
label at the end of the function and instead return.

  linux/simd_x86.h: In function 'kfpu_begin':
  linux/simd_x86.h:337:1: error: label at end of compound statement

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13089
2022-02-11 14:31:45 -08:00
Paul Zuchowski fe804dc412 ZTS: Fix problem with zdb_objset_id test
Use large numbers for datasets with numeric names to avoid name
and id collisions.  Sporadic test failures were observed when the
test would create $TESTPOOL/100 with an objset ID of 100.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #13087
2022-02-11 13:32:08 -08:00
наб 4ea4a46c27 zpool-import.8: WARNING should be emphasised
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13082
2022-02-11 11:50:54 -08:00
наб a9e62ec368 zpool-import.8: newpool is Ar, not Sy
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13082
2022-02-11 11:50:14 -08:00
наб 17fd8f6b0a zpoolprops.7: document leaked
It's noted very scarcely in the code as it stands, indeed the only
actual comment on this is

  /*
   * We have finished background destroying, but there is still
   * some space left in the dp_free_dir. Transfer this leaked
   * space to the dp_leak_dir.
   */

Introduced in fbeddd60b7 ("Illumos 4390 -
I/O errors can corrupt space map when deleting fs/vol"),
which explains, alongside the references, that this can only happen
with a corrupted pool

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13081
2022-02-11 11:49:07 -08:00
Zhu Chuang 0a222408f1 Correct a typo in zfs-receive.8
Should be  `-o keyformat=passphrase` instead of `-o -keyformat=passphrase`

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chuang Zhu <chuang@melty.land>
Closes #13072
2022-02-11 11:47:35 -08:00
наб 4a4e168392 contrib: rename initrd READMEs to README.md
It's an unhelpful naming scheme and one that breaks GitHub autoreadme.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13017
2022-02-11 11:44:27 -08:00
наб b8d9679f36 contrib: dracut: zfs-{rollback,snapshot}-bootfs: don't shell for test
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13017
2022-02-11 11:43:07 -08:00
наб 34ec548980 dracut: README: rewrite
The documentation in the dracut README has grown stale and inaccurate.
Remove the stale content and write a short and useful reference manual.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13012
Closes #13017
2022-02-11 11:42:07 -08:00
Brian Behlendorf 399159f7fb ZTS: Fix zvol_misc_volmode test
Changing volmode may need to remove minors, which could be open, so
call udev_wait() before we "zfs set volmode=<value>".  This ensures
no udev process has the zvol open (i.e. blkid) and the kernel
zvol_remove_minor_impl() function won't skip removing the in use
device.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13075
2022-02-09 17:00:03 -08:00
drowfx 3819aaaff9 Add dataset_kstats_update.. to mmap read/write paths
This allows reads/writes caused by accesses to mmap files to be
accounted correctly in the per-dataset kstats for both Linux and
FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matthias Blankertz <matthias@blankertz.org>
Closes #12994 
Closes #13044
2022-02-09 14:41:42 -08:00
Attila Fülöp 68ddc06b61 Receive checks should allow unencrypted child datasets
dmu_recv_begin_check() unconditionally sets the DS_HOLD_FLAG_DECRYPT
flag before calling dsl_dataset_hold_flags(). If the key on the
receiving side isn't loaded or the send stream contains embedded
blocks, the receive check fails for a stream which is perfectly
valid and could be received without any problem. This seems like
a remnant of the initial design, where unencrypted datasets below
encrypted ones weren't allowed.

Add a condition to set `DS_HOLD_FLAG_DECRYPT` only for encrypted
datasets, modify an existing test to detect this regression and add
a test for raw replication streams.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Co-authored-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13033 
Closes #13076
2022-02-09 14:38:33 -08:00
Jorgen Lundman c28d6ab08b Rename EMPTY_TASKQ into taskq_empty
To follow a change in illumos taskq

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12802
2022-02-09 15:04:26 -07:00
Damian Szuberski 4eea717c4f Propagate KERNEL_* to *.spec
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Peter Levine <plevine457@gmail.com>
Closes #13046
2022-02-09 13:27:11 -08:00
Peter Levine b66140c6ad Add support for $KERNEL_{CC,LD,LLVM} variables
Currently, $(CC), $(LD), and $(LLVM) variables aren't passed to kbuild
while building modules.  This causes modules to build with the default
GNU GCC toolchain and prevents experimenting with other toolchains such
as CLANG/LLVM.  It can also lead to build failure if the CFLAGS/LDFLAGS
passed are incompatible with gcc/ld.

Pass $KERNEL_CC, $KERNEL_LD, and $KERNEL_LLVM as $(CC), $(LD), and
$(LLVM), respectively, to kbuild for each that is defined in the
environment.  This should take care of the majority of alternative
toolchain use cases.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Peter Levine <plevine457@gmail.com>
Closes #13046
2022-02-09 13:24:22 -08:00
Attila Fülöp 8e94ac0e36 Linux 5.16 compat: don't use XSTATE_XSAVE to save FPU state
Linux 5.16 moved XSTATE_XSAVE and XSTATE_XRESTORE out of our reach,
so add our own XSAVE{,OPT,S} code and use it for Linux 5.16.

Please note that this differs from previous behavior in that it
won't handle exceptions created by XSAVE an XRSTOR. This is sensible
for three reasons.

 - Exceptions during XSAVE and XRSTOR can only occur if the feature
   is not supported or enabled or the memory operand isn't aligned
   on a 64 byte boundary. If this happens something else went
   terribly wrong, and it may be better to stop execution.

 - Previously we just printed a warning and didn't handle the fault,
   this is arguable for the above reason.

 - All other *SAVE instruction also don't handle exceptions, so this
   at least aligns behavior.

Finally add a test to catch such a regression in the future.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13042
Closes #13059
2022-02-09 12:50:10 -08:00
Damian Szuberski e2909fae8f Remove unused files from GitHub runners
Majority of the software installed by default in GitHub runners is
irrelevant to OpenZFS. Reclaimed space could be used for more/bigger
vdev files. File deletion happens in the background, leveraging
`systemd-run` - the workflow is not significantly slowed down.

Before
```
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   53G   31G  63% /
```

After
```
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   15G   70G  18% /
```

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13066
2022-02-09 11:50:13 -08:00
Damian Szuberski d7a67402a8 mount.zfs -o zfsutil leverages zfs_mount_at()
Using `zfs_mount_at()` gives opportunity to properly propagate
mountopts from what's stored in a pool to the `mount(2)` syscall
invocation. It fixes cases when mount options are set to incorrect
values and rectification is impossible (e. g. Linux initrd boot
sequence in #7947).
Moved debug information printing after all variables are
initialized - printed text reflects what is passed to `mount(2)`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Issue #7947 
Closes #13021
2022-02-08 10:53:23 -08:00
Tomohiro Kusumi 5f65d008e9 Remove unneeded "extern inline" function declarations
All of these externs are already #included as static inline
functions via corresponding headers.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #13073
2022-02-08 10:48:57 -08:00
Damian Szuberski 8df0bde321 Add --enable=all to ShellCheck by default
Change enforced shell type from `dash` to `sh` and excluded
`SC2039` and `SC3043` by default. `local` keyword is accepted by all
POSIX shells from practical point of view. There is no need anymore
to enforce dash so `local` is accepted.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13020
2022-02-07 11:59:09 -08:00
Damian Szuberski add15e9539 Extract workflows dependencies
- Move build dependencies moved to
  `.github/workflows/build-dependencies.txt` shared among workflows.

- Change `ubuntu-latest` -> `ubuntu-20.04` to avoid unexpected
  runner environment updates in `zloop` workflow.

- Change `ubuntu-20.04` -> `ubuntu-latest` to track changes in
  runner environment in `checkstyle` workflow.

- Kernel buffer is flushed before ZTS invocation to avoid storing
  the same data after each test case run.

- `make` is invoked with consistent set of options to reduce
  clutter in logs.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13037
2022-02-07 11:44:17 -08:00
Christian Schwarz 1dccfd7a38 zvol: make calls to platform ops static
There's no need to make the platform ops dynamic dispatch.

This change replaces the dynamic dispatch with static calls to the
platform-specific functions.
To avoid name collisions, prefix all platform-specific functions
with `zvol_os_`.
I actually find `zvol_..._os` slightly nicer to read in the calling
code, but having it as a prefix is useful.

Advantage:
- easier jump-to-definition / grepping
- potential benefits to static analysis
- better legibility

Future work: also prefix remaining `static` functions in zvol_os.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #12965
2022-02-07 10:24:38 -08:00
Alexander Motin f2c5bc150e Add more control/visibility to spa_load_verify().
Use error thresholds from policy to control whether to scrub data
and/or metadata.  If threshold is set to UINT64_MAX, then caller
probably does not care about result and we may skip that part.

By default import neither set the data error threshold nor read
the error counter, so skip the data scrub for faster import.
Metadata are still scrubbed and fail if even single error found.

While there just for symmetry return number of metadata errors in
case threshold is not set to zero and we haven't reached it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #13022
2022-02-04 13:06:38 -08:00
Christian Schwarz 2f14adacaa zfs_set_prop_nvlist: make it easier to spot the call to dsl_props_set
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #12963
2022-02-04 11:52:10 -08:00
Christian Schwarz db87580076 dsl_dir_tempreserve_impl: remove unused deferred variable
The following commit moved the users of `deferred` into function
dsl_pool_unreserved_space:

    commit d2734cce68
    Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
    Date:   Fri Dec 16 14:11:29 2016 -0800

        OpenZFS 9166 - zfs storage pool checkpoint

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13056
2022-02-04 10:33:34 -08:00
Brian Behlendorf 100b8950f4 ZTS: Update enospc_002_pos test case
The on-disk cost of creating a snapshot or bookmark is sufficiently low
that it is difficult to make it reliably fail even when the pool is
"full".  In order to avoid false positives remove these two checks from
the test case.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13060
2022-02-04 09:36:46 -08:00
Pawel Jakub Dawidek 3d244b4881 Fix clearing set-uid and set-gid bits on a file when replying a write
POSIX requires that set-uid and set-gid bits to be removed when an
unprivileged user writes to a file and ZFS does that during normal
operation.

The problem arrises when the write is stored in the ZIL and replayed.
During replay we have no access to original credentials of the process
doing the write, so zfs_write() will be performed with the root
credentials. When root is doing the write set-uid and set-gid bits
are not removed from the file.

To correct that, log a separate TX_SETATTR entry that removed those bits
on first write to such file.

Idea from:	Christian Schwarz

Add test for ZIL replay of setuid/setgid clearing.

Improve various edge cases when clearing setid bits:
- The setid bits can be readded during a single write, so make sure to check
  for them on every chunk write.
- Log TX_SETATTR record at most once per transaction group (if the setid bits
  are keep coming back).
- Move zfs_log_setattr() outside of zp->z_acl_lock.

Reviewed-by: Dan McDonald <danmcd@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Christian Schwarz <me@cschwarz.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #13027
2022-02-03 14:37:57 -08:00
Damian Szuberski 63652e1546 Add --enable-asan and --enable-ubsan switches
`configure` now accepts `--enable-asan` and `--enable-ubsan` switches
which results in passing `-fsanitize=address`
and `-fsanitize=undefined`, respectively, to the compiler. Those
flags are enabled in GitHub workflows for ZTS and zloop. Errors
reported by both instrumentations are corrected, except for:

- Memory leak reporting is (temporarily) suppressed. The cost of
  fixing them is relatively high compared to the gains.

- Checksum computing functions in `module/zcommon/zfs_fletcher*`
  have UBSan errors suppressed. It is completely impractical
  to enforce 64-byte payload alignment there due to performance
  impact.

- There's no ASan heap poisoning in `module/zstd/lib/zstd.c`. A custom
  memory allocator is used there rendering that measure
  unfeasible.

- Memory leaks detection has to be suppressed for `cmd/zvol_id`.
  `zvol_id` is run by udev with the help of `ptrace(2)`. Tracing is
  incompatible with memory leaks detection.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12928
2022-02-03 14:35:38 -08:00
Phil Kauffman aa9905d89b zed-functions.sh: escape newline to produce valid json
This was discovered when using Discords Slack compatible webhook.

Slack webhooks works without the escape, however Discord rightly refuses
the POST as it contains invalid JSON.

https://discord.com/developers/docs/resources/webhook#execute-slackcompatible-webhook

Valid (while escaping the newline:
```
+ msg_json='{"text": "*ZFS scrub_finish error for test on quartz*\nZFS has detected a data error:\n\n   eid: 124\n class: scrub_finish\n  host: quartz\n  time: \n error: \n objid: :\n  pool: test\n"}'
```

Invalid (no escape):
```
+ msg_json='{"text": "*ZFS scrub_finish error for test on quartz*
ZFS has detected a data error:\n\n   eid: 124\n class: scrub_finish\n  host: quartz\n  time: \n error: \n objid: :\n  pool: test\n"}'
```
The new line gets rendered and not sent inside the JSON as intended.

```
++ curl -X POST https://discord.com/api/webhooks/{webhook.id}/{webhook.token}/slack --header 'Content-Type: application/json' --data-binary '{"text": "*ZFS scrub_finish error for test on quartz*
ZFS has detected a data error:\n\n   eid: 124\n class: scrub_finish\n  host: quartz\n  time: \n error: \n objid: :\n  pool: test\n"}'
+ msg_out='{"message": "Cannot send an empty message", "code": 50006}'
```

Test method:
`root@quartz:/etc/zfs/zed.d# export ZED_ZEDLET_DIR=/etc/zfs/zed.d; export ZEVENT_EID=124; export ZEVENT_SUBCLASS=scrub_finish; export ZEVENT_POOL=test; export ZED_NOTIFY_DATA=1; bash -x ./data-notify.sh`

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Philip Kauffman <philip@kauffman.me>
Closes #13049
2022-02-03 14:31:57 -08:00
Akash B 7b468ed2d8 Add enumerated vdev names to 'zpool iostat -v' and 'zpool list -v'
This commit adds enumerated names to disambiguate between the
different vdevs. Previously only 'zpool status' showed enumerated
vdev names, now 'zpool list -v' and 'zpool iostat -v' also shows
the enumerated vdev names.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #12510
Closes #13031
2022-02-03 14:29:29 -08:00
George Amanakis f3b08dfd7f Report dnodes with faulty bonuslen
In files created/modified before 4254acb there may be a corruption of
xattrs which is not reported during scrub and normal send/receive. It
manifests only as an error when raw sending/receiving. This happens
because currently only the raw receive path checks for discrepancies
between the dnode bonus length and the spill pointer flag.

In case we encounter a dnode whose bonus length is greater than the
predicted one, we should report an error. Modify in this regard
dnode_sync() with an assertion at the end, dump_dnode() to error out,
dsl_scan_recurse() to report errors during a scrub, and zstream to
report a warning when dumping. Also added a test to verify spill blocks
are sent correctly in a raw send.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12720 
Closes #13014
2022-02-03 14:28:19 -08:00
наб 142a5010ab Notice if the test-runner dies
Currently, we seem to only care if the results collector errors.
We also should care if the test-runner died.

Authored-by: Rich Ercolani <rincebrain@gmail.com>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by:  Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12998
2022-02-02 14:17:46 -08:00
Tomohiro Kusumi 955bf4dc04 Fix trivial calloc(3) arguments order
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #13052
2022-02-02 11:27:35 -08:00
Ryan Moeller 15aa38690e Simplify resume token generation
* Improve naming.
* Reduce indentation.
* Avoid boilerplate logic duplication.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:04:08 -08:00
Ryan Moeller d1a38ee742 libzfs_sendrecv: Factor out lzc_flags_from_resume_nvl
Improve the readability of zfs_send_resume_impl by moving resume nvl
decoding into a separate helper function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:04:01 -08:00
Ryan Moeller 102eb6733c libzfs_sendrecv: Refactor find_redact_book
Factor out get_bookmarks, find_redact_pair, and get_redact_complete
helper functions to improve the readability of find_redact_book.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:03:53 -08:00
Ryan Moeller 2fafbfde9d libzfs_sendrecv: Fix some comment style nits
* Capitalize and punctuate complete sentences.
* Add a blank line between functions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:03:45 -08:00
Ryan Moeller 45932229d5 libzfs_sendrecv: Style pass on dump_filesystems
* Add a high level comment.
* Eliminate unnecessarily void arg.
* Capitalize and punctuate complete sentences in comments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:03:38 -08:00
Ryan Moeller 1910a30848 libzfs_sendrecv: Style pass on dump_filesystem
* Add high level comments.
* Eliminate unnecessarily void arg.
* Avoid unnecessary line wrapping.
* Initialize sdd fields with the correct types.
* Remove extra whitespace.
* Refactor replication checks for clarity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:03:30 -08:00
Ryan Moeller 1ae7835177 libzfs_sendrecv: Style pass on dump_snapshot
* Add a high level comment.
* Avoid unnecessary line wrapping.
* Simplify size accounting logic.
* Eliminate unnecessary buffer on the stack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:03:21 -08:00
Ryan Moeller 447e90e360 libzfs_sendrecv: Style pass on send_print_verbose
* Add missing dgettext calls.
* Avoid unnecessary line wraps.
* Factor out duplicated parsable check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:03:13 -08:00
Ryan Moeller fbcc25c9b2 libzfs_sendrecv: Pull header line out of loop
This makes the header print before the sleep as well, which is fine.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:03:05 -08:00
Ryan Moeller 24f1aa023a libzfs_sendrecv: Initialize in case of failure
In zfs_send_progress, initialize \*bytes_written and \*blocks_visited
in case we have to return early due to ioctl failure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:02:58 -08:00
Ryan Moeller 25074b472a libzfs_sendrecv: Style pass on dump_ioctl
* Don't bother building a debug nvlist if we can't return it.
* Save errno after ioctl failure in case snprintf clobbers it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:02:50 -08:00
Ryan Moeller a8747c0403 libzfs_sendrecv: Style pass on zfs_send_space
* Reduce indentation.
* Move locals closer to use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:02:41 -08:00
Ryan Moeller 0c1c746a74 libzfs_sendrecv: Style pass on send_iterate_fs
* Capitalize and punctuate complete sentences in comments.
* Separate out a group of locals to add a comment on their purpose.
* Remove unnecessary line wrapping.
* Make it clear that dds_origin is a string by using explicit character
  comparison to check for an empty string, rather than implictly
  treating it as a boolean.
* Reorganize manipulation of props and holds nvlists to improve
  clarity.
* There's no need to initialize the snapname buffer with zeros, we're
  immediately overwriting it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:02:33 -08:00
Ryan Moeller 2b6b7111f4 libzfs_sendrecv: Style pass on send_iterate_prop
* Add a high level comment.
* Move locals closer to point of use.
* Use fnv* routines rather than explicit verification of success.
* Factor out duplicated code by introducing isspacelimit to clarify
  behavior.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:02:25 -08:00
Ryan Moeller dd59c422d3 libzfs_sendrecv: Style pass on send_iterate_snap
* Add a high level comment.
* Use local variables to reduce line wrapping.
* Remove extra braces and insert space for clarity.
* Assert precondition that the dataset name contains '@' for sanity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:02:13 -08:00
Ryan Moeller f2b36b2db0 libzfs_sendrecv: Fix leaked holds nvlist
There is no need to allocate a holds nvlist.  lzc_get_holds does that
for us.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:02:05 -08:00
Ryan Moeller af2b1fbda6 libzfs_sendrecv: Avoid extra avl_find
avl_add does avl_find internally, then avl_insert.  We're already doing
the avl_find, so using avl_insert directly avoids repeating the search.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:01:56 -08:00
Ryan Moeller 1488e822f2 libzfs_sendrecv: Simplify out guid temporary
De-clutter the clode and make it clear the guid is only used here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:01:48 -08:00
Ryan Moeller 01a0039ac9 libzfs_sendrecv: Use size_t for payload_len
We won't be passing a negative value here, so make it clear.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:01:21 -08:00
наб 3c617c7921 libzfs: mount: don't leak mnt_param_t if mnt_func fails
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-02-01 16:57:16 -08:00
наб 61b5b27853 libtpool: remove useless CSTYLED
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-02-01 16:57:09 -08:00
наб 1ef686905b linux: libspl: zone: () -> (void)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-02-01 16:57:00 -08:00
наб a401a2c54a zpool: misc. cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-02-01 16:56:52 -08:00
наб 0481eabfa1 libzfs: share_proto: constify, staticify
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-02-01 16:56:40 -08:00
наб bde0f7d4c6 zpool: main: print hostids with PRIx64
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-02-01 16:56:31 -08:00
наб 0f7a0cc7c2 libzfs: const correctness
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-02-01 16:56:18 -08:00
наб bf12053cb2 zpool: main: const correctness, use time_t for time
The cast will explode on 32-bit big-endian architectures

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-02-01 16:55:11 -08:00
Mark Johnston fdcb79b52e spl: Don't check FreeBSD rwlocks for double initialization (#13019)
This checking breaks KMSAN since it effectively loads from uninitialized
memory to see if the lock is already initialized.  This happens in
dnode_cons() for example.  This checking is not very useful, partly due
to UMA's memory trashing, and is already disabled for mutexes.  Make
mutexes and rwlocks consistent: remove double-initialization checking
for rwlocks, and pass SX_NEW to disable the same checking in
lock_init().

No functional change intended, this affects only debug builds.

As a side note, kmem cache constructors/destructors are implemented
suboptimally on FreeBSD.  FreeBSD's slab allocator, UMA, supports two
pairs of constructors/destructors: ctor/dtor and init/fini.  The former
are called upon every allocation and free of an item, while the latter
are called when an item is imported or released from a zone,
respectively.  That is, when a slab is allocated to a particular cache,
it is subdivided into items, and init is called on each.  fini is called
when the slab is being prepared to be freed back to the system.  The
intent is for them to initialize static fields such as locks, which
do not need to be initialized upon each allocation of an item.

In illumos, kmem_cache constructors/destructors correspond to UMA's
init/fini callbacks.  However, in the SPL they are implemented as UMA
ctor/dtors, meaning that they get called far more often than necessary.
This may be difficult to fix, since new code may assume the kmem cache
ctor/dtors are in fact called upon each allocation/free, and there
doesn't seem to be a clear way to implement the intended semantics on
Linux.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #13019
2022-01-31 10:58:45 -08:00
наб 73e972af7a ZTS: explicitly strip whitespace for broken wc(1) implementations
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13032
2022-01-28 16:59:52 -08:00
наб 7633c0aedd libzfs: sendrecv: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12906
2022-01-28 08:34:48 -07:00
наб c70bb2f610 Replace *CTASSERT() with _Static_assert()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12993
2022-01-26 11:38:52 -08:00
наб 7ada752a93 Clean up CSTYLEDs
69 CSTYLED BEGINs remain, appx. 30 of which can be removed if cstyle(1)
had a useful policy regarding
  CALL(ARG1,
  	ARG2,
  	ARG3);
above 2 lines. As it stands, it spits out *both*
  sysctl_os.c: 385: continuation line should be indented by 4 spaces
  sysctl_os.c: 385: indent by spaces instead of tabs
which is very cool

Another >10 could be fixed by removing "ulong" &al. handling.
I don't foresee anyone actually using it intentionally
(does it even exist in modern headers? why did it in the first place?).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12993
2022-01-26 11:38:52 -08:00
наб a3fecf4f10 icp: asm_linkage.h: clean out unused bits, CSTYLED
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12993
2022-01-26 11:38:28 -08:00
наб be93512152 libspl: include: ia32: remove
It hosts only asm_linkage.h, which is entirely unused,
and has slightly diverged from the one that's actually used
(module/icp/include/sys/ia32/asm_linkage.h)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12993
2022-01-26 11:37:45 -08:00
наб 7467281594 tests: simplify check_bg_procs_limit_num(), inline into setup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12979
2022-01-26 11:30:20 -08:00
наб d9fdba124d tests: prune remaining xargs(1), add missing zfs-project -c0 note
-c0 suppresses diagnoses ‒ it's not just -c but with NULs; cf.
  http://build.zfsonlinux.org/builders/Debian%2010%20x86_64%20%28TEST%29/builds/10605/steps/shell_4/logs/log
search for the second "zfs project -s -p" instance

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12979
2022-01-26 11:30:09 -08:00
наб e0c5a48b3f tests: simplify find_vfstab_dev()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12979
2022-01-26 11:30:03 -08:00
наб a67a187335 Makefile: simplify away xargs(1)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12979
2022-01-26 11:29:58 -08:00
наб c5d8cd63ae module: Makefile: simplify clean and install jobs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12979
2022-01-26 11:29:23 -08:00
Ryan Moeller 3158c2e3cb FreeBSD: Fix zvol_cdev_open locking
First open locking changes were correctly applied to zvol_geom_open but
incorrectly applied to zvol_cdev_open, causing spa_namespace_lock to be
held indefinitely.

Make the first open locking in zvol_cdev_open match zvol_geom_open.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13016
2022-01-26 11:23:39 -08:00
ColMelvin 8d6a598a20 RPM: Add missing BuildRequires for PAM component
When the optional PAM binaries are included in a build, ./configure will
look for security/pam_modules.h and - if it doesn't find it - recommend
the user install `libpam0g-dev`.  On Red Hat systems, `pam-devel` is the
package that supplies this requirement; `libpam0g-dev` does not exist.

By encoding this requirement in the spec file, we give packagers more
appropriate (and timely) recommendations for completing the build.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com>
Closes #13001
2022-01-25 13:14:43 -08:00
Finix1979 a3fbe2b942 Linux <4.8 compat: submit_bio() rw arg
When using the two argument version of submit_bio() in kernel's prior
to 4.8 the first argument should be specified.  It's used by block
dump to report the bio direction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Finix Yan <yancw@info2soft.com>
Closes #13006
2022-01-25 13:12:49 -08:00
наб 5dccd7e8f3 Linux 5.17 compat: PDE_DATA() renamed to pde_data()
Upstream commit 359745d78351c6f5442435f81549f0207ece28aa
("proc: remove PDE_DATA() completely")

Link: https://lore.kernel.org/all/20211124081956.87711-2-songmuchun@bytedance.com/T/#u

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13004
Closes #12989
2022-01-25 12:53:00 -08:00
наб a46237106c Linux 5.17 compat: dequeue_signal() takes a 4th argument
Linux 5.17's dequeue_signal() takes an additional enum pid_type *
output argument

Upstream commit 5768d8906bc23d512b1a736c1e198aa833a6daa4
("signal: Requeue signals in the appropriate queue")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12989
2022-01-25 12:52:51 -08:00
наб a9856574cf Linux 5.17 compat: detect complete_and_exit() rename
Linux 5.17 sees a rename from complete_and_exit()
to kthread complete_and_exit()

Upstream commit cead18552660702a4a46f58e65188fe5f36e9dfe
("exit: Rename complete_and_exit to kthread_complete_and_exit")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12989
2022-01-25 12:52:12 -08:00
наб a9e2788ffe libspl: cast to uintptr_t instead of !!ing
This led to these two warning types:
  debug.h:139:67: warning: the address of ‘ARC_anon’
  will always evaluate as ‘true’ [-Waddress]
    139 | #define ASSERT3P(x, y, z)
              ((void) sizeof (!!(x)), (void) sizeof (!!(z)))
        |                                               ^
  arc.c:1591:2: note: in expansion of macro ‘ASSERT3P’
   1591 |  ASSERT3P(hdr->b_l1hdr.b_state, ==, arc_anon);
        |  ^~~~~~~~
and
  arc.h:66:44: warning: ‘<<’ in boolean context,
  did you mean ‘<’? [-Wint-in-bool-context]
     66 | #define HDR_GET_LSIZE(hdr)
              ((hdr)->b_lsize << SPA_MINBLOCKSHIFT)
  debug.h:138:46: note: in definition of macro ‘ASSERT3U’
    138 | #define ASSERT3U(x, y, z)
              ((void) sizeof (!!(x)), (void) sizeof (!!(z)))
        |                        ^
  arc.c:1760:12: note: in expansion of macro ‘HDR_GET_LSIZE’
   1760 |   ASSERT3U(HDR_GET_LSIZE(hdr), !=, 0);
        |            ^~~~~~~~~~~~~

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13009
2022-01-24 17:05:42 -08:00
Rich Ercolani cd26b217dc Add explicit timeout to test step
If we die from timeout of the whole GH action run, we don't run the
collect step afterward, which can make it hard to investigate the
timeout.

If we timeout first in the test action, though, it qualifies as
failure, and collects appropriately.

(330 minutes seems like an acceptable tradeoff between the 6h
timeout by default on the action and the 4h and change "functional"
usually takes.)

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12999
2022-01-24 15:54:52 -08:00
Rich Ercolani 4372e96f4b Add support for FALLOC_FL_ZERO_RANGE
For us, I think it's always just FALLOC_FL_PUNCH_HOLE with a fake
mustache on.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12975
2022-01-24 12:59:25 -08:00
Rich Ercolani 299fbf75ec Linux 5.16 compat: Added mapping for iov_iter_fault_in_readable
Linux decided to rename this for some reason. At some point, we
should probably invert this mapping, but for now...

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12975
2022-01-24 12:59:09 -08:00
Rich Ercolani 12fa250d84 Linux 5.16 compat: Added add_disk check for return
add_disk went from void to must-check int return.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12975
2022-01-24 12:59:01 -08:00
Rich Ercolani e86f88aa0b Linux 5.16 compat: Check slab.h for kvmalloc
As it says on the tin - the folio work moved a bunch out of mm.h.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12975
2022-01-24 12:57:50 -08:00
наб 17b2ae0b24 Fix test-runner on FreeBSD
CLOCK_MONOTONIC_RAW is only a thing on Linux and macOS. I'm not
actually sure why the previous hardcoding of a constant didn't
error out, but when we removed it, it sure does now.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12995
2022-01-21 15:37:46 -08:00
Mark Johnston 063daa8350 Fix handling of errors from dmu_write_uio_dbuf() on FreeBSD
FreeBSD's implementation of zfs_uio_fault_move() returns EFAULT when a
page fault occurs while copying data in or out of user buffers.  The VFS
treats such errors specially and will retry the I/O operation (which may
have made some partial progress).

When the FreeBSD and Linux implementations of zfs_write() were merged,
the handling of errors from dmu_write_uio_dbuf() changed such that
EFAULT is not handled as a partial write.  For example, when appending
to a file, the z_size field of the znode is not updated after a partial
write resulting in EFAULT.

Restore the old handling of errors from dmu_write_uio_dbuf() to fix
this.  This should have no impact on Linux, which has special handling
for EFAULT already.

Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12964
2022-01-21 11:54:05 -08:00
George Amanakis 63a26454ba Introduce a flag to skip comparing the local mac when raw sending
Raw receiving a snapshot back to the originating dataset is currently
impossible because of user accounting being present in the originating
dataset.

One solution would be resetting user accounting when raw receiving on
the receiving dataset. However, to recalculate it we would have to dirty
all dnodes, which may not be preferable on big datasets.

Instead, we rely on the os_phys flag
OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is
incomplete when raw receiving. Thus, on the next mount of the receiving
dataset the local mac protecting user accounting is zeroed out.
The flag is then cleared when user accounting of the raw received
snapshot is calculated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12981 
Closes #10523
Closes #11221
Closes #11294
Closes #12594
Issue #11300
2022-01-21 11:41:17 -08:00
Mark Johnston 6e2a59181e Avoid memory allocations in the ARC eviction thread
When the eviction thread goes to shrink an ARC state, it allocates a set
of marker buffers used to hold its place in the state's sublists.

This can be problematic in low memory conditions, since
1) the allocation can be substantial, as we allocate NCPU markers;
2) on at least FreeBSD, page reclamation can block in
   arc_wait_for_eviction()

In particular, in stress tests it's possible to hit a deadlock on
FreeBSD when the number of free pages is very low, wherein the system is
waiting for the page daemon to reclaim memory, the page daemon is
waiting for the ARC eviction thread to finish, and the ARC eviction
thread is blocked waiting for more memory.

Try to reduce the likelihood of such deadlocks by pre-allocating markers
for the eviction thread at ARC initialization time.  When evicting
buffers from an ARC state, check to see if the current thread is the ARC
eviction thread, and use the pre-allocated markers for that purpose
rather than dynamically allocating them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12985
2022-01-21 10:28:13 -08:00
наб bc40713a8f libspl: ASSERT*: !! for sizeof
sizeof(bitfield.member) is invalid, and this shows up in some FreeBSD
build configurations: work around this by !!ing ‒
this makes the sizeof target the ! result type (_Bool), instead

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Fixes: 42aaf0e ("libspl: ASSERT*: mark arguments as used")
Closes #12984
Closes #12986
2022-01-21 10:20:11 -08:00
Paul Zuchowski 5a4d282f55 Fix problem with zdb -d
zdb -d <pool>/<objset ID> does not work when
other command line arguments are included i.e.
zdb -U <cachefile> -d <pool>/<objset ID>
This change fixes the command line parsing
to handle this situation.  Also fix issue
where zdb -r <dataset> <file> does not handle
the root <dataset> of the pool. Introduce -N
option to force <objset ID> to be interpreted
as a numeric objsetID.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #12845
Closes #12944
2022-01-20 10:28:55 -07:00
наб e1c720de7d libefi: remove efi_type()
All it is right now is some #if 0ed Solaris code that returns ENOSYS,
and is only applicable for the Solaris blockdev layer.
In the Illumos gate, there's a single user: rmformat(1);
I recommend a read of the manual as a blast from the past, but, well

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12844
Closes #12969
2022-01-18 14:40:43 -08:00
наб 18168da727 module/*.ko: prune .data, global .rodata
Evaluated every variable that lives in .data (and globals in .rodata)
in the kernel modules, and constified/eliminated/localised them
appropriately. This means that all read-only data is now actually
read-only data, and, if possible, at file scope. A lot of previously-
global-symbols became inlinable (and inlined!) constants. Probably
not in a big Wowee Performance Moment, but hey.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12899
2022-01-14 15:37:55 -08:00
Brian Behlendorf 7adc190098 Clarify failmode=wait documentation
Nowhere in the description of the failmode property does it
clearly state how to bring a suspended pool back online.
Add a few words to property description and the zpool-clear(8)
man page.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12907
Closes #9395
2022-01-14 15:30:07 -08:00
Ryan Moeller 5a57d6f73b FreeBSD: Touch up comments in zvol_os
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12934
2022-01-14 12:43:31 -08:00
Ryan Moeller 020545a95d FreeBSD: Fix zvol_*_open() locking
These are the changes for FreeBSD corresponding to the changes made for
Linux in #12863, see that PR for details.

Changes from #12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12934
2022-01-14 12:43:05 -08:00
Ryan Moeller 69c5e694a1 FreeBSD: Fix leaked strings in libspl mnttab
The FreeBSD implementations of various libspl functions for getting
mounted device information were found to leak several strings which
were being allocated in statfs2mnttab but never freed.

The Solaris getmntany(3C) and related interfaces are expected to return
strings residing in static buffers that need to be copied rather than
freed by the caller.

Use static thread-local storage to stash the mnttab structure strings
from FreeBSD's statfs info rather than strings allocated on the heap by
strdup(3).

While here, remove some stray commented out lines.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12961
2022-01-14 12:38:32 -08:00
наб 12bd322dde man: zfs.4: miscellaneous cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12941
2022-01-13 11:09:03 -08:00
наб 4737a9eb70 linux: libzfs: mount: fix uninitialised flags
They're later |=d with constants, but never reset

Caught by valgrind while investigating
https://github.com/openzfs/zfs/pull/12928#issuecomment-1007496550

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12954
2022-01-13 11:07:54 -08:00
Damian Szuberski 836136961a Add ShellCheck's --enable=all inside scripts/
Strengthen static code analysis for shell scripts.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12914
2022-01-13 11:09:19 -07:00
Damian Szuberski 8a7c4efd3c Removed Python 2 and Python 3.5- support
Deprecation of Python versions below 3.6 gives opportunity to unify the
build and install requirements for OpenZFS packages. The minimal
supported Python version is 3.6 as this is the most recent Python
package CentOS/RHEL 7 users can get.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12925
2022-01-13 09:51:12 -07:00
Mark Maybee da9c6c0333 Remove VERIFY() in vdev_props_set_sync()
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Maybee <mark.maybee@delphix.com>
Closes #12951
2022-01-12 16:15:30 -08:00
Rich Ercolani 63f4bfd6ac lz4: Cherrypick fix for CVE-2021-3520
There should be no risk of us accidentally hitting this since
we'd need maliciously malformed data to wind up in the pipeline,
or a very unfortunate random bit flip at exactly the right moment.
Still since we can handle it we should.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12947
2022-01-12 16:14:36 -08:00
Rich Ercolani d6c1bbdd65 Updated the lz4 decompressor
As an experiment, I stole the lz4 decompressor from
upstream lz4 (1.9.3), and landed it.

Feedback suggested that keeping the vendor lz4 code isolated and
unlinted was probably reasonable, so I lobbed it into its own file.

It also seemed reasonable to put the mostly-untouched* code into
lz4.c proper, and relegate the integrated and ZFS-specific code to
lz4_zfs.c.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12805
2022-01-07 10:36:49 -08:00
Ryan Hirasaki 862d5dfc84 README: Update OpenZFS website url
This change is to first replace the OpenZFS website in the README to
point to openzfs.org as this is what open-zfs.org redirects to.
Along with replacing the URL, the protocol is also upgraded
from http to https.

These changes should prevent web browsers such as Firefox from
complaining about visiting a http site, if the proper security
settings are enabled, when it will still end up on a https page
after the redirect.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Hirasaki <ryanhirasaki@gmail.com>
Closes #12939
2022-01-06 16:25:01 -08:00
Tino Reichardt a798b485ae Remove sha1 hashing from OpenZFS, it's not used anywhere.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12895
Closes #12902
2022-01-06 16:16:28 -08:00
наб 3e310f099d module: icp: remove solaris crypto and cryptoadm ioctl definitions
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12895
Closes #12902
2022-01-06 16:16:19 -08:00
наб 5c8389a8cb module: icp: rip out the Solaris loadable module architecture
After progressively folding away null cases, it turns out there's
/literally/ nothing there, even if some things are part of the
Solaris SPARC DDI/DKI or the seventeen module types (some doubled for
32-bit userland), or the entire modctl syscall definition.
Nothing.

Initialisation is handled in illumos-crypto.c,
which calls all the initialisers directly

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12895
Closes #12902
2022-01-06 16:14:04 -08:00
Damian Szuberski c1d3be19d7 Add ShellCheck's --enable=all inside cmd/
The only exception is `cmd/vdev_id/vdev_id` which might be a subject of
refactoring (see #12084)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12912
2022-01-06 16:07:54 -08:00
Christian Schwarz a8f27ec6c5 l2arc_write_buffers: remove redundant asserts
Probably introduced inadvertently in b525630 (Native Encryption).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #12935
2022-01-06 14:39:22 -08:00
Damian Szuberski ae66d3aa90 Add ShellCheck's --enable=all inside etc/
Strengthen static code analysis for shell scripts.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12913
2022-01-06 14:36:04 -08:00
наб ccc421ec39 module: Makefile: flatten subdir loop, use $PWD instead of pwd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12899
2022-01-06 14:29:41 -08:00
наб 1add1a5b3c FreeBSD: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12899
2022-01-06 12:46:49 -08:00
наб f788aaeb0e config: check for parallel(1), use it for cstyle
Before:
$ time make cstyle
real    0m23.118s
user    0m23.002s
sys     0m0.114s

After:
$ time make cstyle
real    0m4.577s
user    0m31.487s
sys     0m0.699s

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12899
2022-01-06 12:46:42 -08:00
наб 95aab0b7d2 libfetch: unquote @LIBFETCH_SONAME@ subst
@LIBFETCH_SONAME@ is no longer quoted. The C define still is.

Ref: 153f7c9f72
Ref: https://github.com/openzfs/zfs/pull/12835#discussion_r776833743
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12922
2022-01-06 11:26:40 -08:00
наб 7c2eb1c875 zvol: remove unused variable
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12917
2022-01-06 11:20:13 -08:00
наб c25e639f2b fm: remove unused variables
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12917
2022-01-06 11:20:13 -08:00
наб 7c41df4c77 zvol: remove unused variable
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12917
2022-01-06 11:20:06 -08:00
наб 43dbf88178 FreeBSD: vfsops: use setgen for error case
Fix from https://github.com/openzfs/zfs/pull/12844#discussion_r774179413

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12905
2022-01-06 11:15:08 -08:00
Paul Dagnelie 399b98198a Revert "zfs list: Allow more fields in ZFS_ITER_SIMPLE mode"
This reverts commit f6a0dac84a.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12938
2022-01-06 11:12:53 -08:00
chrisrd 0175272f64 man: speling
Fix spelling.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #12911
2022-01-06 11:00:01 -08:00
Allan Jude 7454275a53 ZTS: normalize on use of sync_pool and sync_all_pools
- Replaces use of manual `zpool sync`
- Don't use `log_must sync_pool` as `sync_pool` uses it internally
- Replace many (but not all) uses of `sync` with `sync_pool`

This makes the tests more consistent, and makes searching easier.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #12894
2022-01-06 10:57:09 -08:00
Manoj Joseph 6b2e32019e Long opts for zdb
This change introduces long options for zdb. It updates the usage
message as well to include the long options.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Manoj Joseph <manoj.joseph@delphix.com>
Closes #12818
2022-01-06 10:54:32 -08:00
chrisrd 2a149775b4 zfs_prune: reset sc.nr_to_scan
sc.nr_to_scan is an input to super_cache_clean (via
shrinker->scan_objects), used to set the number of objects to scan
in the various caches. However super_cache_scan also modifies
sc.nr_to_scan, so when used in a loop we need to reset
sc.nr_to_scan back to our desired nr_to_scan for the next
iteration.

Issue discovered and solution suggested by
Tenzin Lhakhang @tlhakhan.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Issue #12433
Closes #12908
2022-01-04 17:07:33 -08:00
Brian Behlendorf 3c80e0742a Verify dRAID empty sectors
Verify that all empty sectors are zero filled before using them to
calculate parity.  Failure to do so can result in incorrect parity
columns being generated and written to disk if the contents of an
empty sector are non-zero.  This was possible because the checksum
only protects the data portions of the buffer, not the empty sector
padding.

This issue has been addressed by updating raidz_parity_verify() to
check that all dRAID empty sectors are zero filled.  Any sectors
which are non-zero will be fixed, repair IO issued, and a checksum
error logged.  They can then be safely used to verify the parity.

This specific type of damage is unlikely to occur since it requires
a disk to have silently returned bad data, for an empty sector, while
performing a scrub.  However, if a pool were to have been damaged
in this way, scrubbing the pool with this change applied will repair
both the empty sector and parity columns as long as the data checksum
is valid.  Checksum errors will be reported in the `zpool status`
output for any repairs which are made.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12857
2022-01-04 16:46:32 -08:00
наб 1135d0a5ff FreeBSD: fix unpropagated error
When performing I/O on FreeBSD using a file based vdev ensure all
errors encountered when reading/writing are propagated through the
zio pipeline.  

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12904
2021-12-23 11:39:29 -08:00
наб 18e4f67960 module: icp: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 14e4e3cb9f module: zfs: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 868998220e module: zfs: freebsd: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 66cd33e09b module: zfs: linux: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб dac85132c6 module: zcommon: zprop: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 2447a1e59c module: avl: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 1f182103aa libzfs: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 0cad373e5c libnvpair: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 9028f99a79 libzutil: import: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 56050599c9 libzpool: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 57bff80ebf libshare: nfs: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб bfc1789703 libzfs_core: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 7fe47a2386 libefi: efi_type: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб c2f94afa0e include: dmu.h: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 855e49e881 module: zfs: vdev: shim out vdev_indirect_mapping_verify()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:41 -08:00
наб ce767d69b0 module: zfs: vdev: shim out vdev_indirect_births_verify()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:29 -08:00
наб 36542b065d module: zfs: spa: shim out vdev_count_verify_zaps()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:36:50 -08:00
наб 2c1988e96f module: zfs: multilist: shim out multilist_d2l()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:36:45 -08:00
наб 89495a427f module: zfs: dsl: pool: shim out dsl_early_sync_task_verify()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:36:36 -08:00
наб 16a32ce402 module: zfs: dnode: use debug-only in debug mode only
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:36:31 -08:00
наб 83719bd68c include: sys/arc.h: shim out arc_referenced()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:36:26 -08:00
наб 29d033c6b5 include: qat.h: mark unused macro arguments as used
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:36:21 -08:00
наб da5f5ec508 libspl: kmem.h: mark unused kmem_*() macro arguments used
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:36:12 -08:00
наб 42aaf0e7c4 libspl: ASSERT*: mark arguments as used
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:35:47 -08:00
Brian Behlendorf d6885f3209 ZTS: Fix enospc_002_pos.ksh again
This is a follow up commit for e03a41a60 which aimed to resolve
this same test failure.  The core "problem" here is that it takes
very little space to perform a clone/snapshot/bookmark, which
means if we want these commands to reliably fail the pool must
truely have exhausted all free space.

This commit increases the number of fill iterations to try and
consume every block which we can.  This still can't guarantee
the clone/snapshot/bookmark will fail, but it significantly
improves the odds.  The exception was kept since it's still
not a sure thing.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12903
2021-12-23 09:21:40 -08:00
Alexander Motin 462217d1c2 Reduce number of arc_prune threads
On FreeBSD vnode reclamation is single-threaded, protected by single
global lock.  Linux seems to be able to use a thread per mount point,
but at this time it creates more harm than good.

Reduce number of threads to 1, adding tunable in case somebody wants
to try more.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12896
Issue #9966
2021-12-22 17:07:13 -08:00
Brian Behlendorf d2f374c3f2 ZTS: Fix rollback_003_pos.ksh
Under Linux when rolling back a mounted filesystem negative dentries
may not be dropped from the cache.  This can result in an ENOENT
being incorrectly returned on first access.  Issuing a `df` before
the unmount results in the negative dentries being invalidated and
side steps the issue.

This is solely a workaround for the test case on Linux and not
correct behavior.  The core issue of invalidating negative dentries
needs to be handled with a kernel side change.  This is being
tracked as issue #6143.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12898 
Issue #6143
2021-12-22 11:05:07 -08:00
Brian Behlendorf 9ba5d8d204 ZTS: Fix refreserv_raidz.ksh
The rerefreserv_raidz test was failing on Linux because the sync being
issued doesn't guarantee a pool sync.  Switch to using the sync_pool
function and remove the ZTS exception for Linux.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12897
2021-12-22 09:37:27 -08:00
Georgy Yakovlev 2f411512be zfs-test/mmap_seek: fix build on musl
The build on musl needs linux/fs.h for SEEK_DATA and friends,
and sys/sysmacros.h for P2ROUNDUP.  Add the needed headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #12891
2021-12-21 16:44:18 -08:00
shodanshok bfb3b0d77a zed: send notification email by default
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #12806
2021-12-21 16:24:05 -08:00
наб 153f7c9f72 contrib/initrd hooks: properly quote @LIBFETCH_SONAME@
Bullseye shellcheck picks these up as SC2140, and it's right!
@LIBFETCH_SONAME@ is already quoted, so dracut had
  "$d/"libcurl.so.4""
and i-t had
  ""libcurl.so.4""

Partially reverts 34eef3e9a7 (#12760),
which broke this

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:12 -08:00
наб 1f63c64559 contrib/pam_zfs_key: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:12 -08:00
наб cf8d708b7a tests: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:12 -08:00
наб 1fedd01247 zfs: iter: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:12 -08:00
наб d8f32c0eb7 ztest: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:12 -08:00
наб 058cd62432 zdb: il: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:12 -08:00
наб 6fa118b857 zdb: main: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:12 -08:00
наб 724e6c68e5 zed: agents: fmd_api: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб d6fccfe62e zed: agents: zfs_retire: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб 491165c079 zed: agents: zfs_mod: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб e265a082eb zed: agents: zfs_diagnosis: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб 16529f305a zed: agents: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб e2a59aa701 zed: exec: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб 008f30c730 zed: main: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб ab860757f5 zpool: vdev_os: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб e40ca391f8 zpool: main: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб b487738d34 zpool: iter: zpool_compare: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб 964e6a497b zinject: cancel_one_handler: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб 63b6c3e1d1 zhack: space_delta_cb: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб 876b60dcfb raidz_test: init_rand: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
Brian Behlendorf ff1acbac30 ZTS: speed up rsend tests
With some minor tweaks several of rsend tests can be sped up
considerably without significantly reducing test coverage.

* send-c_verify_ratio:  ~120s -> ~60s
* send_realloc_*_files: ~330s -> ~65s

For the send_realloc* tests this also has the advantage of removing
(most of) the linux/freebsd conditional logic.  Note that for this
test more passes, and thus more incremental send/recvs, are preferable
to a larger number of files.

Total run time of the rsend test group was reduced from roughly 20 to
11 minutes in an environment similar to what's used by the CI.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12876
2021-12-21 11:12:38 -08:00
Brian Behlendorf 7b5d783a46 ZTS: rsend_007_pos failures
The rsend_007_pos test reliably fails on Linux in the cleanup
function.  This is caused by an unmount error when attempting to
recursively destroy the newly received datasets.  Invoking `df`
prior to the `zfs destroy` interestingly avoids the unmont error.

Why this should matter is unclear and should be investigated.
However, this minor tweak may allow us to remove the ZTS rsend
exceptions.  The subsequent rsend_010_pos and rsend_011_pos
failures were a result of this initial failure.  The other
"maybe" failures I was unable to reproduce and have not been
recently observed in the master branch.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5665
Closes #6086
Closes #6087
Closes #6446
Closes #12876
2021-12-21 11:11:07 -08:00
Martin Matuška 20f5c5b912 FreeBSD: fix world build after 143476ce8
Do not redefine the fallthrough macro when building with libcpp.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #12880
2021-12-20 14:28:43 -08:00
Philipp Riederer 8623bd962d Fix error propagation from lzc_send_redacted
Any error from lzc_send_redacted is overwritten by the error of
send_conclusion_record; skip writing the conclusion record if there
was an earlier error.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Philipp Riederer <philipp@riederer.email>
Closes #12766
2021-12-20 10:50:46 -08:00
Ryan Moeller 3fa5266d72 Linux: Implement FS_IOC_GETVERSION
Provide access to file generation number on Linux.

Add test coverage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12856
2021-12-17 16:18:37 -08:00
наб 82e414f1b2 libshare: nfs: always try to mkdir()
This also works out to one syscall if the directory exists,
but is one syscall shorter if it doesn't.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:54:25 -08:00
наб 1e78b4eee0 libshare: nfs: set export file 644
The shares are publicly known anyway and can be interrogated by any
user, so this is a debugging aid more than anything.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:54:14 -08:00
наб 9d4a44f0b8 linux/libshare: nfs: don't needlessly strdup() hostspec
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:54:09 -08:00
наб 605e03e51a libshare: nfs: share nfs_is_shared()
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:54:04 -08:00
наб 4e225e7316 libshare: nfs: share nfs_copy_entries()
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:54:00 -08:00
наб c53f2e9b50 libshare: nfs: open temporary file once
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:53:54 -08:00
наб f50697f95b libshare: nfs: retry flock() when interrupted
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:53:49 -08:00
наб bdf6464c6c freebsd/libshare: nfs: don't send SIGHUP to all processes
pidfile_open() sets *pidptr to -1 if the process currently holding
the lock is between pidfile_open() and pidfile_write(),
the subsequent kill(mountdpid) would potentially SIGHUP all
non-system processes except init: just sleep for half a millisecond
and try again in that case

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:53:25 -08:00
наб cf65c33c9c zfs-share.8: document -l flag
Description stolen from zfs-mount.8

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:52:26 -08:00
наб 3a661613df contrib/initrd: systemd-ask-password --no-tty before argument
In systemd 249 (sid), sd-a-p processes its arguments in getopt + mode,
so "systemd-ask-password zupa --no-tty" prompts for "zupa --no-tty",
not "zupa" not on the tty, as expected (bullseye, 247).

Ref: https://github.com/systemd/systemd/commit/4b1c842d95bfd6ab352ade1a4655f9e512f35185
Ref: https://github.com/systemd/systemd/pull/19806
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12870
2021-12-17 12:44:23 -08:00
Rich Ercolani f68b9c81c8 Workaround Debian's fake System.map behavior
Debian ships fake System.map files by default, leading to the
invocation of depmod with them to flood you with errors about
missing symbols.

Let's notice and not do that.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12862
2021-12-17 12:43:13 -08:00
Brian Behlendorf eecd3f1a21 ZTS: alloc_class.ksh must wait for the process to exit
The alloc_class_* tests may fail on Linux with an EBUSY error if
`zfs destroy` is run before the `dd` process has had a chance to
terminate.  Wait on the pid after the `kill -9` to make sure.

When testing I didn't observe any failures for the alloc_class
tests.  Remove them from the exceptions list, the CI was used to
verify the tests pass on all platforms.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12873
2021-12-17 12:40:34 -08:00
Rich Ercolani 1a79f7e860 ZTS: Avoid piping send directly to /dev/null
Unfortunately, #11445 means while we fail gracefully now, we still
fail, unless people want to implement a complex workaround just to
support /dev/null.

So let's just use the cheap workaround in a test for now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12872
2021-12-17 12:39:10 -08:00
Tony Hutter 9aa0915f87 ZTS: Fix zpool_reopen_[1-5] on Fedora 35
The zpool_reopen_[1-5] tests are failing Fedora 35 with:

zpool_reopen_001_pos.ksh[64]: log_must[67]: log_pos[270]:
wait_for_resilver_end[98]: wait_for_action: line 71: func: is read only

Renaming 'func' -> 'funct' fixes the issue.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12871
2021-12-17 12:37:21 -08:00
Brian Behlendorf 8a02d01e85 Fix zvol_open() lock inversion
When restructuring the zvol_open() logic for the Linux 5.13 kernel
a lock inversion was accidentally introduced.  In the updated code
the spa_namespace_lock is now taken before the zv_suspend_lock
allowing the following scenario to occur:

    down_read <=== waiting for zv_suspend_lock
    zvol_open <=== holds spa_namespace_lock
    __blkdev_get
    blkdev_get_by_dev
    blkdev_open
    ...

     mutex_lock <== waiting for spa_namespace_lock
     spa_open_common
     spa_open
     dsl_pool_hold
     dmu_objset_hold_flags
     dmu_objset_hold
     dsl_prop_get
     dsl_prop_get_integer
     zvol_create_minor
     dmu_recv_end
     zfs_ioc_recv_impl <=== holds zv_suspend_lock via zvol_suspend()
     zfs_ioc_recv
     ...

This commit resolves the issue by moving the acquisition of the
spa_namespace_lock back to after the zv_suspend_lock which restores
the original ordering.

Additionally, as part of this change the error exit paths were
simplified where possible.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12863
2021-12-17 09:52:13 -08:00
Alan Somers ca1b2bb4b5 FreeBSD: Update argument types for VOP_READDIR
A recent commit to FreeBSD changed the type of
vop_readdir_args.a_cookies to a uint64_t**.  There is no functional
impact to ZFS because ZFS only uses 32-bit cookies, which will be
zero-extended to 64-bits by the existing code.

https://github.com/freebsd/freebsd-src/commit/b214fcceacad6b842545150664bd2695c1c2b34f

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #12874
2021-12-17 09:50:12 -08:00
наб eb51a9d747 zcommon: pre-iterate over sysfs instead of statting every feature
If sufficient memory (<2K, realistically) is available, libzfs_init()
can be significantly shorted by iterating over the correct sysfs
directory before registrations, we can turn 168 stats into 15/18
syscalls (3 opens (6 if built in), 3 fstats, 6 getdentses, and 3
closes), a tenfoldish reduction; this is probably a bit faster, too.

The list is always optional, and registration functions (and one-off
users) can simply pass NULL, which will fall back to the previous
mechanism

Also, don't allocate in zfs_mod_supported_impl, and use use access()
instead of stat(), since existence is really what we care about

Also, fix pre-prop-checking compat in fallback for built-in ZFS

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12089
2021-12-16 16:43:10 -08:00
наб 8fdc6f618c zcommon: *_prop: make all zprop_index_t tables const
They're already static, and there's no point in them being R/W
and living outside .rodata

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12836
2021-12-16 13:26:04 -08:00
Ryan Moeller 92a9e8c618 FreeBSD: Provide correct file generation number
va_seq was actually a thin veil over va_gen, so z_gen is a more
appropriate value than z_seq to populate the field with.

Drop the unnecessary compat obfuscation and provide the correct
file generation number.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@freebsd.org>
Closes #12851
2021-12-16 13:22:15 -08:00
Allan Jude f6a0dac84a zfs list: Allow more fields in ZFS_ITER_SIMPLE mode
If the fields to be listed and sorted by are constrained
to those populated by dsl_dataset_fast_stat(), then
zfs list is much faster, as it does not need to open each
objset and reads its properties.

A previous optimization by Pawel Dawidek
(0cee24064a) took advantage
of this to make listing snapshot names sorted only by name
much faster.

However, it was limited to `-o name -s name`, this work
extends this optimization to work with:
  - name
  - guid
  - createtxg
  - numclones
  - inconsistent
  - redacted
  - origin
and could be further extended to any other properties
supported by dsl_dataset_fast_stat() or similar, that do
not require extra locking or reading from disk.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11080
2021-12-16 11:56:22 -08:00
Georgy Yakovlev 2300621dc7 systemd: add weekly and monthly scrub timers
Timers can be enabled as follows:

systemctl enable zfs-scrub-weekly@rpool.timer --now
systemctl enable zfs-scrub-monthly@datapool.timer --now

Each timer will pull in zfs-scrub@${poolname}.service, which is not
schedule-specific.

Added PERIODIC SCRUB section to zpool-scrub.8.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #12193
2021-12-16 11:47:22 -08:00
наб f291fa658e t/z_diff/socket, zfs: main: fix unused argument warnings, ARGSUSED tags
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12829
2021-12-13 15:50:47 -08:00
наб b7ef2340c2 libzfs: diff: simplify superfluous stdio
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12829
2021-12-13 15:50:38 -08:00
наб 9bdf0c592b libzfs: diff: print_what() can return the symbol => get_what()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12829
2021-12-13 15:50:29 -08:00
наб a72129edcb libzfs: diff: stream_bytes: use fputc, %hho formats chars
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12829
2021-12-13 15:50:20 -08:00
наб 1cfb6ef36e libzfs: zpool_set_vdev_prop: remove unused vprop
Found by clang 14 with -Wunused-but-set-variable

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12829
2021-12-13 15:50:09 -08:00
наб 9e184b7c35 linux: libspl: getmntany: remove unused argument
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12829
2021-12-13 15:49:59 -08:00
наб 344bbc82e7 zfs, libzfs: diff: accept -h/ZFS_DIFF_NO_MANGLE, disabling path escaping
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12829
2021-12-13 15:49:40 -08:00
ogelpre f04b976200 Add init script to load keys
Add new init scripts which allow automatic loading of keys if
keylocation property is set to a URI.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Benedikt Neuffer <ogelpre@itfriend.de>
Closes #11659
Closes #11662
2021-12-12 11:17:14 -08:00
Till Maas 4a5b6ced41 zfs-dkms rpm: Fix scriptlets dependencies
To ensure that the necessary packages are available during the %post and
%preun scriptlets, require them properly.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Till Maas <opensource@till.name>
Closes #12822
Closes #12832
2021-12-12 11:15:25 -08:00
Ryan Moeller 23cee221b7 FreeBSD: Add vop_standard_writecount_nomsync
https://cgit.freebsd.org/src/commit?id=3ffcfa599e29686cf2b3c1a6087408c37acaed78

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12828
2021-12-12 11:13:18 -08:00
Mark Johnston cdf74673bc zfs: Fix a deadlock between page busy and the teardown lock
When rolling back a dataset, ZFS has to purge file data resident in the
system page cache.  To do this, it loops over all vnodes for the
mountpoint and calls vn_pages_remove() to purge pages associated with
the vnode's VM object.  Each page is thus exclusively busied while the
dataset's teardown write lock is held.

When handling a page fault on a mapped ZFS file, FreeBSD's page fault
handler busies newly allocated pages and then uses VOP_GETPAGES to fill
them.  The ZFS getpages VOP acquires the teardown read lock with vnode
pages already busied.  This represents a lock order reversal which can
lead to deadlock.

To break the deadlock, observe that zfs_rezget() need only purge those
pages marked valid, and that pages busied by the page fault handler are,
by definition, invalid.  Furthermore, ZFS pages always transition from
invalid to valid with the teardown lock held, and ZFS never creates
partially valid pages.  Thus, zfs_rezget() can use the new
vn_pages_remove_valid() to skip over pages busied by the fault handler.

PR:		258208
Tested by:	pho
Reviewed by:	avg, sef, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32931

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12828
2021-12-12 11:13:18 -08:00
Ryan Moeller d172264d1c FreeBSD: Catch up with more VFS changes
Unused thread argument was removed from NDINIT*

https://cgit.freebsd.org/src/commit?id=7e1d3eefd410ca0fbae5a217422821244c3eeee4

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12828
2021-12-12 11:13:18 -08:00
наб 510885a84c FreeBSD supports edonr follow up
This chases 269b5dadcf (#12735),
which touched the actual code but didn't fix the comment

Additionally, ignore the name.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12823
2021-12-08 17:01:36 -08:00
Brian Behlendorf 9699e45d57 Linux 5.15 compat: META (#12824)
The final 5.15 kernel is available and has been tested. 

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2021-12-07 15:35:42 -08:00
наб 5ece420f03 contrib/bash_completion.d: fix error spew from __zfs_match_snapshot()
Given:
  /sbin/zfs list filling/a-zvol<TAB> -o space,refratio
The rest of the cmdline gets vored by:
  /sbin/zfs list filling/a-zvolcannot open 'filling/a-zvol':
  operation not applicable to datasets of this type

With -x (fragment):
  + COMPREPLY=($(compgen -W "$(__zfs_match_snapshot)" -- "$cur"))
  +++ __zfs_match_snapshot
  +++ local base_dataset=filling/dziadtop-nowe-duchy
  +++ [[ filling/dziadtop-nowe-duchy != filling/dziadtop-nowe-duchy ]]
  +++ [[ filling/dziadtop-nowe-duchy != '' ]]
  +++ __zfs_list_datasets filling/dziadtop-nowe-duchy
  +++ /sbin/zfs list -H -o name -s name -t filesystem
                     -r filling/dziadtop-nowe-duchy
  +++ tail -n +2
  cannot open 'filling/dziadtop-nowe-duchy':
  operation not applicable to datasets of this type
  +++ echo filling/dziadtop-nowe-duchy
  +++ echo filling/dziadtop-nowe-duchy@
  ++ compgen -W 'filling/dziadtop-nowe-duchy

This properly completes with:
  $ /sbin/zfs list filling/a-zvol<TAB> -o space,refratio
  filling/a-zvol   filling/a-zvol@
  $ /sbin/zfs list filling/a-zvol<cursor> -o space,refratio

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12820
2021-12-07 12:30:10 -08:00
Coleman Kane a78f19d3ec Linux 5.16: Resolve ZSTD_isError symbol collision in Linux kernel
Newer zstd code introduced in the main kernel tree now creates a symbol
collision with ZSTD_isError in our ZSTD code. This change relabels our
implementation with a ZFS-specific symbol name, and undoes some
macro-based micro-optimizations that conflict with the attempt to rename
our internal-use version.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 12:28:22 -08:00
Coleman Kane 1e767532f2 Linux 5.16: The blk-cgroup.h header is where struct blkcg_gq is defined
The definition of struct blkcg_gq was moved into blk-cgroup.h, which is
a header that's been in Linux since 2015. This is used by
vdev_blkg_tryget() in module/os/linux/zfs/vdev_disk.c. Since the kernel
for CentOS 7 and similar-generation releases doesn't have this header,
its inclusion is guarded by a configure test.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 12:28:12 -08:00
Coleman Kane d08b99aca0 Linux 5.16: bio_set_dev is no longer a helper macro
This change adds a confiugre check to determine if bio_set_dev is a
helper macro or not. If not, then the attempt to override its internal
call to bio_associate_blkg(), with a macro definition to our own
version, is no longer possible, as the compiler won't use it when
compiling the new inline function replacement implemented in the header.
This change also creates a new vdev_bio_set_dev() function that performs
the same work, and also performs the work implemented in
vdev_bio_associate_blkg(), as it is the only thing calling that function
in our code. Our custom vdev_bio_associate_blkg() is now only compiled
if the bio_set_dev() is a macro in the Linux headers.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 12:28:00 -08:00
Coleman Kane f6e22561d2 Linux 5.16: type member of iov_iter renamed iter_type
The iov_iter->type member was renamed iov_iter->iter_type. However,
while looking into this, realized that in 2018 a iov_iter_type(*iov)
accessor function was introduced. So if that is present, use it,
otherwise fall back to trying the existing behavior of directly
accessing type from iov_iter.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 12:27:49 -08:00
Coleman Kane 435a451e5c Linux 5.16: block_device_operations->submit_bio now returns void
The return type for the submit_bio member of struct
block_device_operations was changed to no longer return a value.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 12:27:27 -08:00
Paul Dagnelie 376027331d ZFS send/recv with ashift 9->12 leads to data corruption
Improve the ability of zfs send to determine if a block is compressed
or not by using information contained in the blkptr.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12770
2021-12-07 11:27:59 -07:00
Arshad Hussain b6fc42b5e1 Update "tests/README.md"
This patch adds detail section on adding and running
test-case. It also changes markdown number list to
more readeable headers

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #12737
2021-12-07 09:49:25 -07:00
Paul Dagnelie 795075e638 Add const to nvlist functions to properly expose their real behavior
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12728
2021-12-06 18:19:13 -07:00
Brian Behlendorf 14ba514af6 ZTS: import_rewind_device_replaced reliably fails
The import_rewind_device_replaced.ksh test was never entirely reliable
because it depends on MOS data not being overwritten.  The MOS data is
not protected by the snapshot so occasional failures were always
expected.  However, this test is now failing reliably on all platforms
indicating something has changed in the code since the test was marked
"maybe".  Convert the test to a "known" failure until the root cause
is identified and resolved.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12821
2021-12-06 09:45:17 -08:00
Rich Ercolani df42e20ac6 Corrected a case where we could read uninited ABD memory
For my sins, I started running valgrind over ztest to try and fix
that pesky intermittent "zloop dies with malloc errors" problem.

This one seemed exciting enough to merit cutting a PR for before
the rest get polished.

Suggested-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12214
2021-12-03 13:13:21 -08:00
John Wren Kennedy ddc026f59b Strip colons from all test result filenames
The upload artifact functionality in github can't handle colons in
filenames. The current code handles this for files under the most
recent set of results. With the ability to rerun failed tests, now
there can be multiple sets of results, and they all need to be
processed in the same way.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12815
2021-12-01 16:18:45 -08:00
Brian Behlendorf 77e2756de0 Linux 5.13 compat: retry zvol_open() when contended
Due to a possible lock inversion the zvol open call path on Linux
needs to be able to retry in the case where the spa_namespace_lock
cannot be acquired.

For Linux 5.12 an older kernel this was accomplished by returning
-ERESTARTSYS from zvol_open() to request that blkdev_get() drop
the bdev->bd_mutex lock, reaquire it, then call the open callback
again.  However, as of the 5.13 kernel this behavior was removed.

Therefore, for 5.12 and older kernels we preserved the existing
retry logic, but for 5.13 and newer kernels we retry internally in
zvol_open().  This should always succeed except in the case where
a pool's vdev are layed on zvols, in which case it may fail.  To
handle this case vdev_disk_open() has been updated to retry when
opening a device when -ERESTARTSYS is returned.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12301
Closes #12759
2021-12-01 17:07:12 -07:00
John Wren Kennedy 31d2f42b2a Temporarily remove tests from sanity runfile
With the addition of functionality to rerun failing tests, some
tests that fail only sometimes still fail often enough to degrade
the reliability of the sanity runs. Remove them from the runfile
until they reliably pass.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12814
2021-12-01 13:22:52 -08:00
Paul Dagnelie 2320e6eb43 Add zfs-test facility to automatically rerun failing tests
This was a project proposed as part of the Quality theme for the
hackthon for the 2021 OpenZFS Developer Summit. The idea is to improve
the usability of the automated tests that get run when a PR is created
by having failing tests automatically rerun in order to make flaky
tests less impactful.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12740
2021-12-01 10:38:53 -07:00
Attila Fülöp 861dca065e get_key_material: fix style
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12765
2021-11-30 11:54:36 -08:00
Harald van Dijk 85638aa870 get_key_material: skip passphrase validation when loading keys
The restriction that an encryption key must be at least
MIN_PASSPHRASE_LEN characters long make sense when changing the
encryption key, but not when loading: as this restriction is not
enforced in the libraries, it is possible to bypass zfs change-key's
restrictions and end up with a key that becomes impossible to load with
zfs load-key, for example through pam_zfs_key.

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Harald van Dijk <harald@gigawatt.nl>
Closes #12765
2021-11-30 11:54:06 -08:00
Attila Fülöp 4234812d1a pam_zfs_key: tests: check if zfs load-key works on short passphrases
The pam_zfs_key pam module does not enforce a minimum password
length while changing the user password and thus the users home
dataset passphrase. To not end up with a dateset `zfs load-key`
can't load the key for, `zfs load-key` should not enforce a minimum
passphrase length. This adds a test for that.

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12765
Closes #12651
Closes #12656
2021-11-30 11:52:21 -08:00
Attila Fülöp 307db92823 pam_zfs_key: tests: clean up the generated pam service config file
Remove the generated pam service config file
`/etc/pam.d/pam_zfs_key_test` on test cleanup, since the tests
shouldn't alter system state.

While here, move the pam service config file name into a variable.

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12765
2021-11-30 11:51:45 -08:00
jokersus b5c16861e9 Remove REMAKE_INITRD
The option has been deprecated in dkms and will break packaging in
future versions. See https://github.com/dell/dkms/commit/7114c62

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: jokersus <jokersus.cava@gmail.com>
Closes #12781
2021-11-30 11:09:15 -08:00
Brian Behlendorf 05b3eb6d23 Default to zfs_dmu_offset_next_sync=1
Strict hole reporting was previously disabled by default as a
performance optimization.  However, this has lead to confusion
over the expected behavior and a variety of workarounds being
adopted by consumers of ZFS.  Change the default behavior to
always report holes and force the TXG sync.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12746
2021-11-30 10:38:09 -08:00
Rich Ercolani 5dc6fc2b73 Stop segfaulting on unmount error case
After interrupting ZTS runs that errored out, I found that
"zpool export testpool2" was segfaulting.

This seems unnecessary.

Reviewed-by: szubersk <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12804
2021-11-30 10:36:36 -08:00
Pawel Jakub Dawidek 547df81641 Code cleanups
- Allocate ve_search on the stack, so we avoid allocating memory for
  every I/O even if the VDEV cache is disabled.
- Reduce lock scope.
- Avoid locking in vdev_cache_read() when the VDEV cache is disabled.
- Sort file names properly.
- Correct comment.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12749
2021-11-30 10:32:38 -08:00
maxz cfc62062ae Replace wrong occurrences of affect by effect in the man pages
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Max Zettlmeißl <max@zettlmeissl.de>
Closes #12784
2021-11-30 10:28:57 -08:00
Rich Ercolani c5ccbbdcf6 Allow printing special vdev metaslab groups
Sometimes, we'd like to know info about the metaslab groups
on special vdevs too. So let's make -MM do something useful.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12750
2021-11-30 10:26:45 -08:00
Damian Szuberski 34eef3e9a7 Pass --enable=all to shellcheck within contrib/
- Remove `SHELLCHECK_IGNORE` in favor of inline suppressions
  and more general `SHELLCHECK_OPTS`.

- Exclude `SC2250` (turned on by `--enable=all`) globally

- Pass `--enable=all` to shellcheck for scripts in contrib/: it's
  very important to catch errors early in areas that are not easily
  testable.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12760
2021-11-30 10:23:10 -08:00
наб 4325de09cd etc/systemd/zfs-mount-generator: serialise, handle keylocation=http[s]://
* etc/systemd/zfs-mount-generator: serialise

The wins for a relatively normal workload are rather slim:
	real	0.02119s/0.00985s=2.15029x
	user	0.02130s/0.00346s=6.15560x
	sys	0.03858s/0.00643s=6.00062x

	wall-total	0.014518s/0.005925s=2.45009x
	wall-init	0.014518s/0.002457s=5.90684x
	wall-real	0.014518s/0.003467s=4.18668x

But this is a big win on machines with a lot of datasets and expensive
forks.

For example, the gain on a VM on my work laptop with 900+ legacy-mount
Docker datasets, the original gains from the C rewrite were
only five-fold:
	real    0.516s/0.102s=5.05882x
	user    0.237s/0.143s=1.65734x
	sys     0.287s/0.100s=2.87x

And this serial variant gains this back there as well:
	real    0.102s/0.008s=12.75x
	user    0.143s/0.007s=20.42857
	sys     0.100s/0.001s=100x

	wall-total	0.09717s/0.00319s=30.40255x
	wall-init	0.00203s/0.00200s=1.015941x
	wall-real	0.09513s/0.00118s=80.02043x

For a total of
	real    0.516s/0.008s=64.5x
	user    0.237s/0.007s=33.85714x
	sys     0.287s/0.001s=287x

Suggested-by: Richard Laager <rlaager@wiktel.com>

* etc/systemd/zfs-mount-generator: pull in network for keylocation=https

Also simplify RequiresMountsFor= handling
Ref: #11956

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12138
2021-11-30 09:29:50 -07:00
Allan Jude 2a673e76a9 Vdev Properties Feature
Add properties, similar to pool properties, to each vdev.
This makes use of the existing per-vdev ZAP that was added as
part of device evacuation/removal.

A large number of read-only properties are exposed,
many of the members of struct vdev_t, that provide useful
statistics.

Adds support for read-only "removing" vdev property.
Adds the "allocating" property that defaults to "on" and
can be set to "off" to prevent future allocations from that
top-level vdev.

Supports user-defined vdev properties.
Includes support for properties.vdev in SYSFS.

Co-authored-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11711
2021-11-30 07:46:25 -07:00
pstef 5f64bf7fde Fix typo in zpool.8
Update zpool.8 to avoid parseltongue.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Piotr P. Stefaniak <pstef@freebsd.org>
Closes #12763
2021-11-29 10:52:42 -08:00
Coleman Kane 75b309a938 Linux 5.16 compat: asm/fpu/xcr.h is new location for xgetbv/xsetbv
Linux 5.16 moved these functions into this new header in commit
1b4fb8545f2b00f2844c4b7619d64d98440a477c. This change adds code to look
for the presence of this header, and include it so that the code using
xgetbv & xsetbv will compile again.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12800
2021-11-29 10:49:33 -08:00
Coleman Kane c0fb44c506 Linux 5.16: wait_on_page_bit() no longer available to modules
Instead, linux/pagemap.h offers a number of folio-specific functions to
be called instead. In this case, module/os/linux/zfs/zfs_vnops_os.c
wants to call wait_on_page_bit(pp, PG_writeback). This gets replaced
with folio_wait_bit(folio_page(pp), PG_writeback). This change modifies
the code to conditionally compile that if configure identifies th
presence of the folio_wait_bit() function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12800
2021-11-29 10:48:52 -08:00
Mark Johnston ded851b2e0 Fix several bugs in the FreeBSD rename VOP implementation
- To avoid a use-after-free, zfsvfs->z_log needs to be loaded after the
  teardown lock is acquired with ZFS_ENTER().
- Avoid leaking vnode locks in zfs_rename_relock() and zfs_rename_()
  when the ZFS_ENTER() macros forces an early return.

Refactor the rename implementation so that ZFS_ENTER() can be used
safely.  As a bonus, this lets us use the ZFS_VERIFY_ZP() macro instead
of open-coding its implementation.

Reported-by: Peter Holm <pho@FreeBSD.org>
Tested-by: Peter Holm <pho@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #12717
2021-11-19 15:26:39 -07:00
Paul Dagnelie f9e39f98a0 Add notes to system_taskq
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12771
2021-11-19 10:02:45 -07:00
Rich Ercolani 269b5dadcf Enable edonr in FreeBSD
The code is integrated, builds fine, runs fine, there's not really
any reason not to.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12735
2021-11-16 12:40:10 -07:00
Martin Matuška b8dcfb2c9f FreeBSD: fix world build after de198f2d9
The inline function vn_flush_cached_data() in vnode.h
must not be compiled when building BASE.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #12743
2021-11-15 09:07:39 -07:00
Damian Szuberski 8ac58c3f56 Fix zfs:AUTO autodetection in initramfs scripts
Don't exit early in find_rootfs() when zpool.bootfs
is set to `zfs:AUTO`.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12658
2021-11-13 08:02:50 -07:00
Pawel Jakub Dawidek ac32854a6e Remove (now unused) td argument from zfs_lookup()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12748
2021-11-12 17:06:44 -08:00
George Amanakis c9d62d1356 Introduce a tunable to exclude special class buffers from L2ARC
Special allocation class or dedup vdevs may have roughly the same
performance as L2ARC vdevs. Introduce a new tunable to exclude those
buffers from being cacheable on L2ARC.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11761 
Closes #12285
2021-11-11 12:52:16 -08:00
наб 420b44488f Remove basename(1). Clean up/shorten some coreutils pipelines
Basenames that remain, in cmd/zed/zed.d/statechange-led.sh:
	dev=$(basename "$(echo "$therest" | awk '{print $(NF-1)}')")
	vdev=$(basename "$ZEVENT_VDEV_PATH")
I don't wanna interfere with #11988

scripts/zfs-tests.sh:
	SINGLETESTFILE=$(basename "$SINGLETEST")
tests/zfs-tests/tests/functional/cli_user/zfs_list/zfs_list.kshlib:
	ACTUAL=$(basename $dataset)
	ACTUAL=$(basename $dataset)
tests/zfs-tests/tests/functional/cli_user/zpool_iostat/
	zpool_iostat_-c_homedir.ksh:
	typeset USER_SCRIPT=$(basename "$USER_SCRIPT_FULL")
tests/zfs-tests/tests/functional/cli_user/zpool_iostat/
	zpool_iostat_-c_searchpath.ksh:
	typeset CMD_1=$(basename "$SCRIPT_1")
	typeset CMD_2=$(basename "$SCRIPT_2")
tests/zfs-tests/tests/functional/cli_user/zpool_status/
	zpool_status_-c_homedir.ksh:
	typeset USER_SCRIPT=$(basename "$USER_SCRIPT_FULL")
tests/zfs-tests/tests/functional/cli_user/zpool_status/
	zpool_status_-c_searchpath.ksh
	typeset CMD_1=$(basename "$SCRIPT_1")
	typeset CMD_2=$(basename "$SCRIPT_2")
tests/zfs-tests/tests/functional/migration/migration.cfg:
	export BNAME=`basename $TESTFILE`
tests/zfs-tests/tests/perf/perf.shlib:
	typeset logbase="$(get_perf_output_dir)/$(basename \
tests/zfs-tests/tests/perf/perf.shlib:
	typeset logbase="$(get_perf_output_dir)/$(basename \

These are potentially Of Directories, where basename is actually
useful

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12652
2021-11-11 13:27:37 -07:00
Fedor Uporov 49d42425d6 Check l2cache vdevs pending list inside the vdev_inuse()
The l2cache device could be added twice because vdev_inuse() does not
check spa_l2cache for added devices. Make l2cache vdevs inuse checking
logic more closer to spare vdevs.

Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #9153 
Closes #12689
2021-11-11 11:54:15 -08:00
Fedor Uporov d04b5c9e87 zhack: Add repair label option
In case if all label checksums will be invalid on any vdev, the pool
will become unimportable. The zhack with newly added cli options could
be used to restore label checksums and make pool importable again.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #2510
Closes #12686
2021-11-11 11:26:18 -08:00
Palash Gandhi 637771a066 ZTS: zfs_list_004_neg should not check paths that belong to ZFS
When ZFS is on root, /tmp is a ZFS. This causes zfs_list_004_neg to
fail since `zfs list` on /tmp passes when the test expects it not to.
The fix is to exclude paths that belong to ZFS.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Palash Gandhi <pbg4930@rit.edu>
Closes #12744
2021-11-11 08:46:44 -07:00
Brian Behlendorf c23803be84 Restore dirty dnode detection logic
In addition to flushing memory mapped regions when checking holes,
commit de198f2d95 modified the dirty dnode detection logic to check
the dn->dn_dirty_records instead of the dn->dn_dirty_link.  Relying
on the dirty record has not be reliable, switch back to the previous
method.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900 
Closes #12745
2021-11-10 16:14:32 -08:00
Brian Behlendorf 371e0f7754 Exclude zfs_copies_003_pos on Linux
This test case may fail on 5.13 and newer Linux kernels if the
/dev/zvol/ device is not created by udev.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12301
Closes  #12738
2021-11-10 13:56:01 -07:00
Fedor Uporov 2a9c572059 zdb: Report bad label checksum
In case if all label checksums will be invalid on any vdev, the pool
will become unimportable. From other side zdb with -l option will not
provide any useful information why it happened. Add notifications
about corrupted label checksums.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #2509
Closes #12685
2021-11-10 12:22:00 -07:00
Dimitri John Ledkov 6c8f03232a Upgrade to libabigail 2.0.0
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Closes #12722
Closes #12739
2021-11-09 17:36:42 -08:00
Tony Hutter ae70d628ff zed: Control NVMe fault LEDs
The ZED code currently can only turn on the fault LED for
a faulted disk in a JBOD enclosure.  This extends support
for faulted NVMe disks as well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12648
Closes #12695
2021-11-09 16:50:18 -08:00
Fedor Uporov e39fe05b69 Skip spacemaps reading in case of pool readonly import
The only zdb utility require to read metaslab-related data during
read-only pool import because of spacemaps validation. Add global
variable which will allow zdb read spacemaps in case of readonly
import mode.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #9095
Closes #12687
2021-11-09 12:50:39 -08:00
Brian Atkinson 345196be18 Single IO issue for raidz writes with skip sector
In order to reduce contention on the vq_lock, optional skip sectors
for Raidz writes can be placed into a single IO request. This is done by
padding out the linear ABD for a parity column to contain the skip
sector and by creating gang ABD to contain the data and skip sector for
data columns.

The vdev_raidz_map_alloc() function now contains specific functions for
both reads and write to allocate the ABD's that will be issued down to
the VDEV chldren.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-By: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #12333
2021-11-09 12:51:33 -07:00
Brian Behlendorf 453c63e9b7 Linux 5.16 compat: submit_bio()
The submit_bio() prototype has changed again.  The version is 5.16
still only expects a single argument but the return type has changed
to void.  Since we never used the returned value before update the
configure check to detect both single arg versions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12725
2021-11-09 11:27:46 -08:00
Brian Behlendorf 1e7d634867 Linux 5.16 compat: linux/elevator.h
Commit https://github.com/torvalds/linux/commit/2e9bc346 moved
the elevator.h header under the block/ directory as part of some
refactoring.  This turns out not to be a problem since there's
no longer anything we need from the header.  This has been the
case for some time, this change removes the elevator.h include
and replaces it with a major.h include.

Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12725
2021-11-09 11:26:35 -08:00
Rich Ercolani 380b072403 Exclude zvol_misc_volmode for now
It keeps failing, on changes which aren't related at all.

So until someone runs down why, I'd like it to stop being the
sole reason for CI failures.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12733
2021-11-08 19:01:19 -07:00
Brian Behlendorf de198f2d95 Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
When using lseek(2) to report data/holes memory mapped regions of
the file were ignored.  This could result in incorrect results.
To handle this zfs_holey_common() was updated to asynchronously
writeback any dirty mmap(2) regions prior to reporting holes.

Additionally, while not strictly required, the dn_struct_rwlock is
now held over the dirty check to prevent the dnode structure from
changing.  This ensures that a clean dnode can't be dirtied before
the data/hole is located.  The range lock is now also taken to
ensure the call cannot race with zfs_write().

Furthermore, the code was refactored to provide a dnode_is_dirty()
helper function which checks the dnode for any dirty records to
determine its dirtiness.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900
Closes #12724
2021-11-07 14:27:44 -07:00
Michael Franzl 4b87c1981d Update contrib/initramfs/README.initramfs.markdown
Note that Dropbear supports ed25519 keys since version 2020.79.

See https://github.com/mkj/dropbear/pull/91

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Michael Franzl <michael@franzl.name>
Closes #12715
2021-11-04 09:23:50 -06:00
Rich Ercolani 05679465ac Revert behavior of 59eab109 on not-Linux
It turns out that short-circuiting the EFAULT behavior on a short read
breaks things on FreeBSD. So until there's a nicer solution, let's
just revert the behavior for not-Linux.

Reference:
https://reviews.freebsd.org/R10:70f51f0e474ffe1fb74cb427423a2fba3637544d

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12698
2021-11-04 07:49:40 -06:00
Rich Ercolani e79b6807b8 Workaround issue cleaning up automounted snapshots on Linux
On Linux, sometimes, when ZFS goes to unmount an automounted snap,
it fails a VERIFY check on debug builds, because taskq_cancel_id
returned ENOENT after not finding the taskq it was trying to cancel.

This presumably happens when it already died for some reason; in this
case, we don't really mind it already being dead, since we're just
going to dispatch a new task to unmount it right after.

So we just ignore it if we get back ENOENT trying to cancel here,
retry a couple times if we get back the only other possible condition
(EBUSY), and log to dbgmsg if we got anything but ENOENT or success.

(We also add some locking around taskqid, to avoid one or two cases
of two instances of trying to cancel something at once.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #11632
Closes #12670
2021-11-03 09:00:08 -06:00
Rich Ercolani a2ffc0e025 Add more explicit warning about dedup being dropped
"has unsupported feature: [number]" seems reasonable when we can't
know what the problem was, but with the send -D removal, we know
what it was, and can explicitly tell people "don't do that; try
this if you must".

So let's.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12708
2021-11-02 15:45:20 -06:00
Damian Szuberski 6d680e61ef Update checkstyle workflow env to ubuntu-20.04
- `checkstyle` workflow uses ubuntu-20.04 environment
- improved `mancheck.sh` readability

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12713
2021-11-02 14:02:57 -06:00
Paul Dagnelie 58bf6afd3f Fix cpu hotplug atomic sleep issue
We move the spinlock unlock before the thread creation. This should be
safe because the thread creation code doesn't actually manipulate any
taskq data structures; that's done by the thread once it's created.

We also remove the assertion that the maxthreads is the current threads
plus one; that assertion could fail if multiple hotplug events come in
quick succession, and the first new taskq thread hasn't had a chance to
start processing yet.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
eviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12714
2021-11-02 10:23:48 -06:00
Mike Swanson 321c1b6f39 Disable normalization implicitly when setting "utf8only=off"
When a parent dataset has normalization set to any value other than
"none", and a file system is created with the property "utf8only=off",
implicitly also set "normalization=none" instead of overriding the
desire for a non-UTF8 enforcing file system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com>
Closes #11892
Closes #12038
2021-10-29 16:59:18 -07:00
Mark Johnston 4d1a3ba6ed Exit the teardown section later in rename on FreeBSD
We have to hold the teardown lock while dereferencing zfsvfs->z_os and,
I believe, when committing to the ZIL.

Note that jumping to the "out" label, "error" is always non-zero.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12704
2021-10-29 16:40:25 -07:00
Mark Johnston 68a7a9edc5 Fix potential use-after-frees in FreeBSD getpages and setattr VOPs
The objset object is reallocated during certain dataset operations, such
as rollbacks, so the objset pointer must be loaded after acquiring the
teardown lock.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12704
2021-10-29 16:39:47 -07:00
D. Ebdrup 28401f6668 zfsprops.7: Add note about comma-separation
This change primarily seeks to make implicit documentation explicit, as
it is not outright stated that options should be comma-separated, nor is
there a reason given for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Daniel Ebdrup Jensen <debdrup@FreeBSD.org>
Closes #12579
2021-10-29 16:30:44 -07:00
Fedor Uporov 475e41b9f5 Do not print UINT64_MAX value for some of zfs properties
The values of next properties: filesystem_limit, filesystem_count,
snapshot_limit, snapshot_count were returned to user as UINT64_MAX
integers in case if -p cli option is used, return 'none' value instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #9306 
Closes #12690
2021-10-29 16:18:13 -07:00
Rich Ercolani 1139e170d4 Add explicit error for device_rebuild being disabled
Currently, you get back "can only attach to mirrors and top-level disks"
unconditionally if zpool attach returns ENOTSUP, but that also happens
if, say, feature@device_rebuild=disabled and you tried attach -s.

So let's print an error for that case, lest people go down a rabbit hole
looking into what they did wrong.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #11414 
Closes #12680
2021-10-29 15:55:22 -07:00
Rich Ercolani 4476ccd906 Normalize property names for zfs receive
It turns out, userland is much more happy with aliased property
names than the kernel is.

So let's normalize those to the expected names before we pass
them off.

Added a test case hacked up from the other recv -o/-x test that fails
on unpatched git and passes here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12607 
Closes #12609
2021-10-29 15:38:10 -07:00
Rich Ercolani adeccfea17 Python 3.10 fixes, part 2
There was a fallback case I overlooked in the initial patch, with
a similarly imperfect version extractor.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12045 
Closes #12673
2021-10-29 15:36:01 -07:00
Peter Levine 6ef28c526b Set DEFAULT_INIT_SHELL to /sbin/openrc-run for Gentoo and Alpine
Gentoo and Alpine always set the rc init scripts' shebang to
#!/sbin/openrc-run, whether or not openrc is installed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Peter Levine <plevine457@gmail.com>
Closes #12683 
Closes #12692
2021-10-29 15:34:37 -07:00
Tony Hutter 4d4998ea39 vdev_id: Fix PHY sorting
One of our developers noticed a bug in vdev_id where we were incorrectly
sorting PHYs using alphabetical sorting (which usually works) instead
of natural sorting (-v).  For example:

	[port-0:0]# ls -d phy*
	phy-0:10  phy-0:11  phy-0:8  phy-0:9

	[port-0:0]# ls -vd phy*
	phy-0:8  phy-0:9  phy-0:10  phy-0:11

This fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12699
2021-10-29 15:33:34 -07:00
Fedor Uporov d5a5ec4693 Remove unused function zvol_set_volblocksize()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #12688
2021-10-26 17:07:53 -07:00
Rich Ercolani 6f57f1e382 Make dsl_scan print the pool name in dbgmsg
If you've got multiple scrubs/resilvers going, it's rather helpful
to know which pool each scan line refers to.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12674
2021-10-26 17:24:14 -06:00
Allan Jude 65ad5d1165 spa.c: Replace VERIFY(nvlist_*(...) == 0) with fnvlist_* (#12678)
The fnvlist versions of the functions are fatal if they fail,
saving each call from having to include checking the result.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Allan Jude <allan@klarasystems.com>
2021-10-26 17:15:38 -06:00
Brian Behlendorf 90b77a0364 ZTS: Standardize use of destroy_dataset in cleanup
When cleaning up a test case standardize on using the convention:

    datasetexists $ds && destroy_dataset $ds <flags>

By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure
that the destroy is retried in the event that a ZFS volume is busy.
This helps ensures ensure tests are fully cleaned up and prevents false
positive test failures on Linux.

Note that all of the tests which used 'zfs destroy' in cleanup have
been updated even if they don't use volumes.  This was done to
clearly establish the expected convention.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12663
2021-10-25 15:13:50 -06:00
Rich Ercolani 731fbb5d22 Workaround cloud-init hotplug issue
cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.

So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12644
Closes #12669
2021-10-25 11:27:05 -06:00
Ryan Moeller 14b69c0929 FreeBSD: Catch up with recent VFS changes
cn_thread is always curthread.

https://cgit.freebsd.org/src/commit?id=b4a58fbf640409a1e507d9f7b411c83a3f83a2f3
https://cgit.freebsd.org/src/commit?id=2b68eb8e1dbbdaf6a0df1c83b26f5403ca52d4c3

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12668
2021-10-25 09:46:28 -07:00
Attila Fülöp 7cc5cb8083 pam_zfs_key: malloc and mlock/munlock won't match
mlock(2) and munlock(2) operate on memory pages whereas malloc(3)
does not. So if you munlock(2) a malloced memory region, the whole
page containing it is freed. Since this page may contain another
malloced and mlocked memory region, used as a password buffer by a
concurrent running instance of pam_zfs_key, there is a slight chance
of leaking passwords. By using mmap(2) we avoid such problems since
it will return whole pages on page aligned addresses.

Although the above concern may be mostly academical, it is still
better to use mmap(2) for allocating memory since the FreeBSD
documentation suggests to call mlock(2) and munlock(2) on page
aligned addresses, and other implementations even require it.

While here, remove duplicate code in alloc_pw_string() by calling
alloc_pw_size().

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12665
2021-10-22 11:42:34 -07:00
Attila Fülöp 50292e2545 pam_zfs_key: mlock(2) and munlock(2) can fail
Since both syscalls can fail, add error handling, including EAGAIN.

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12665
2021-10-22 11:42:23 -07:00
Attila Fülöp ee7c30b350 pam_zfs_key: change test user name to conform to standards
The useradd(8) command on my system won't accept login names with
uppercase letters in them, so adjust for that.

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12665
2021-10-22 11:42:10 -07:00
youzhongyang ec64fdb93d Skip snapshot in zfs_iter_mounted()
The intention of the zfs_iter_mounted() is to traverse the dataset
and its descendants, not the snapshots. The current code can cause
a mounted snapshot to be included and thus zfs_open() on the snapshot
with ZFS_TYPE_FILESYSTEM would print confusing message such as "cannot
open 'rpool/fs@snap': snapshot delimiter '@' is not expected here".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #12447
Closes #12448
2021-10-20 16:07:19 -07:00
Tony Hutter 1886cdfcfb vdev_id: Fix enclosure_symlinks feature
The vdev_id.conf "enclosure_symlinks" option persistently creates
and maps /dev/by-enclosure symlinks to dynamic /dev/sg* devices.

This patch fixes two issues:

1. The enclosure_symlinks feature was accidentally broken in:

   vdev_id: Support daisy-chained JBODs in multipath mode

2. Even when working, the feature numbered the enclosure
   sequentially rather than by HBA port number.  That meant that
   if a port was down or didn't appear in sysfs, then the
   enclosure_sumlinks numbers would be numbered wrong.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12660
2021-10-20 15:48:04 -07:00
felixdoerre 6cb5e1e759 libshare: nfs: pass through ipv6 addresses in bracket notation
Recognize when the host part of a sharenfs attribute is an ipv6
Literal and pass that through without modification.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Felix Dörre <felix@dogcraft.de>
Closes: #11171
Closes #11939
Closes: #1894
2021-10-20 10:40:00 -07:00
Toomas Soome a4cecfbdc9 zpool should call zfs_nicestrtonum() with non-NULL handle
When zfs_nicestrtonum() is called and there will be an error,
the message is left in libzfs handle, if provided. We can use
this message, to provide better feedback for user.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #12650
2021-10-20 10:34:34 -07:00
Pawel Jakub Dawidek a95c82bed8 Remove code duplication
Remove code duplication by moving code responsible for partial block
zeroing to a separate function: dnode_partial_zero().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12627
2021-10-18 16:50:33 -07:00
Francesco Mazzoli fd778e44c8 Notify on UNAVAIL statechange
`UNAVAIL` is maybe not quite as concerning as `DEGRADED`, but still an
event of notice, in my opinion. For example it is triggered when a
drive goes missing.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Francesco Mazzoli <f@mazzo.li>
Closes #12629 
Closes #12630
2021-10-15 15:57:23 -07:00
Teodor Spæren 01b572bc62 zdb: fix overflow of time estimation
The calculation of estimated time remaining in zdb -cc could overflow,
as reported in #10666. This patch fixes this, by using uint64_t instead
of ints in the calculations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Teodor Spæren <teodor@sparen.no>
Closes #10666 
Closes #12610
2021-10-15 15:55:34 -07:00
Pawel Jakub Dawidek afbc617921 Remove FreeBSD's local copy of the dmu_buf_hold_array() function
Make the main dmu_buf_hold_array() function non-static.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12628
2021-10-13 11:01:01 -07:00
Teodor Spæren d785245857 zio: use unsigned values for enum
cppcheck complains about the use of 1 << 31, because enums are signed
ints which cannot represent this. As discussed in issue #12611, it
appears that with C99, we can use an unsiged int for the enum, on most
platforms.

I've crafted this commit for just the include/sys/zio.h header, as it's
the only one with a shift of 31. If this is something we want to adopt
in the rest of the project, I will go through and apply it to the rest
of the project.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Teodor Spæren <teodor@sparen.no>
Closes #12611 
Closes #12615
2021-10-11 10:58:06 -07:00
Brian Behlendorf 280d0f0ce4 Export minimal zfs_refcount interfaces
Lustre makes light use of the zfs_refcount interfaces which
isn't a problem when using a non-debug build of OpenZFS. However,
when debugging is enabled the required symbols are not exported.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12613
2021-10-11 10:54:39 -07:00
Brian Behlendorf 648445e007 ZTS: Add known exceptions
Add the following test failures to the exception list for FreeBSD
to ensure we notice new unexpected failures.

   pool_checkpoint/checkpoint_big_rewind
   pool_checkpoint/checkpoint_indirect

And the following for Linux.

   zvol/zvol_misc/zvol_misc_snapdev

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12621
Issue #12622
Issue #12623
Closes #12624
2021-10-11 10:52:32 -07:00
Brian Behlendorf 72f06d01b5 ZTS: deadman_sync fix
In the CI environment it's possible for events to be slightly
delayed resulting in 4, instead of 5, events appearing in the
log file.  This isn't a problem and should be considered a
success to avoid false positive test results.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12625
2021-10-11 10:49:13 -07:00
nachtgeist a5b464263b initramfs: use correct dataset for rootfs on rollback=1
When booting with root=zfs:rpool/myrootfs@foosnapshot rollback=1,
myrootfs and its descendants get rolled back to foosnapshot, however
ZFS_BOOTFS still contains myrootfs@foosnapshot instead of the
actually desired value of myrootfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Daniel Reichelt <hacking@nachtgeist.net>
Closes #12585 
Closes #12586
2021-10-08 11:16:31 -07:00
Ryan Moeller 97bbeeb938 Fail invalid incremental recursive send gracefully
zfs send -R -i snap1 pool/ds@snap1 is an invalid invocation of zfs send
because the incremental source and target snapshots are the same.  We
have an error message for this condition, but we don't make it there
because of a failed assert while iterating through the dataset's
snapshots.

Check for NULL to avoid the assert so we can make it to the error
message.

Test this form of invalid send invocation in rsend tests.  Fix the
rsend_016_neg test while here: log_neg itself doesn't fail the test,
and writing to /dev/null is not supported on all Linux kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11121 
Closes #12533
2021-10-08 11:14:26 -07:00
Rich Ercolani 9d1407e8f2 Correct refcount_add in dmu_zfetch
refcount_add_many(foo,N) is not the same as
for (i=0; i < N; i++) { refcount_add(foo); }

Unfortunately, this is only actually true with debug kernels and
reference_tracking_enable=1.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12589 
Closes #12602
2021-10-08 11:10:34 -07:00
Valmiky Arquissandas 2d02bba23d arcstat: Fix integer division with python3
The arcstat script requests compatibility with python2 and python3, but
PEP 238 modified the / operator and results in erroneous output when
run under python3.

This commit replaces instances of / with //, yielding the expected
result in both versions of Python.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Valmiky Arquissandas <foss@kayvlim.com>
Closes #12603
2021-10-08 09:32:27 -06:00
Brian Behlendorf 514498fef6 Simplify and document OpenZFS library dependencies
For those not already familiar with the code base it can be a
challenge to understand how the libraries are laid out.  This
has sometimes resulted in functionality being added in the
wrong place.  To help avoid that in the future this commit
documents the high-level dependencies for easy reference in
lib/Makefile.am.  It also simplifies a few things.

- Switched libzpool dependency on libzfs_core to libzutil.
  This change makes it clear libzpool should never depend
  on the ioctl() functionality provided by libzfs_core.

- Moved zfs_ioctl_fd() from libzutil to libzfs_core and
  renamed it lzc_ioctl_fd().  Normal access to the kmods
  should all be funneled through the libzfs_core library.
  The sole exception is the pool_active() which was updated
  to not use lzc_ioctl_fd() to remove the libzfs_core
  dependency.

- Removed libzfs_core dependency on libzutil.

- Removed the lib/libzfs/os/freebsd/libzfs_ioctl_compat.c
  source file which was all dead code.

- Removed libzfs_core dependency from mkbusy and ctime
  test utilities.  It was only needed for some trivial
  wrapper functions and that code is easy to replicate
  to shed the unneeded dependency.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12602
2021-10-07 11:31:26 -06:00
Serapheim Dimitropoulos 48df24d4ce Remove zdb and libzpool from initramfs image
= Motivation

At Delphix we are heavy users of kernel crash dumps that are captured
through a crash kernel that is spawned whenever the main kernel panics.
The way that this works internally is that a certain amount of memory is
reserved while the main system is running so the initramfs of the crash
kernel can be loaded when a panic occurs.

In order to keep reserved memory at minimum we've been historically
trying to identify the binaries that are part of the kernel's initramfs
that are big and finding ways of either making them smaller or do not
include them in the initramfs image. An example is always stripping the
DWARF info of the ZFS kernel module copy that is included in the
initramfs image of both our running and our crash kernel (the difference
in size there is 76MB vs 4MB).

We've recently identified that libzpool has been the largest binary in
our initramfs images - currently sized around 17MB.

= This Patch

The ZFS scripts do not explicitly copy libzpool to initramfs. They copy
zdb which pulls in libzpool as a dependency. Given that both zdb and
libzpool are not really essential for initramfs (e.g. we'll still have
access to the once the root filesystem is unpacked) this patch removes
them from initramfs.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #12616
2021-10-07 10:52:03 -06:00
Rich Ercolani 543ab04cc3 Document additional -c caveat
One might expect "send data as it is on disk, and cannot trigger
compression changes" to imply "does not attempt to compress data
that was not compressed on the sender."

One would be mistaken.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12570
2021-10-05 11:48:17 -07:00
Tony Hutter 2a8430a260 Rescan enclosure sysfs path on import
When you create a pool, zfs writes vd->vdev_enc_sysfs_path with the
enclosure sysfs path to the fault LEDs, like:

    vdev_enc_sysfs_path = /sys/class/enclosure/0:0:1:0/SLOT8

However, this enclosure path doesn't get updated on successive imports
even if enclosure path to the disk changes.  This patch fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11950 
Closes #12095
2021-10-04 12:32:16 -07:00
Rich Ercolani aad91df075 Reject zfs send -RI with nonexistent fromsnap
Right now, zfs send -I dataset@nonexistent dataset@existent fails, but
zfs send -RI dataset@nonexistent dataset@existent does not.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12574
Closes #12575
2021-10-04 10:17:30 -07:00
Attila Fülöp 60b618a967 ZFS: Remove a redundant if condition (#12598)
Commit 0c03d21ac9 left in a redundant if condition while
removing some code. Just remove it.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12598
2021-10-02 12:50:57 -06:00
José Luis Salvador Rufo ed3a3bdb0d Proper support for DESTDIR and INSTALL_MOD_PATH
The environment variables DESTDIR and INSTALL_MOD_PATH must
be mutually exclusive.

https://www.gnu.org/prep/standards/html_node/DESTDIR.html
https://www.kernel.org/doc/Documentation/kbuild/modules.txt

This issue was discussed in this Buildroot thread:
https://lists.buildroot.org/pipermail/buildroot/2021-August/621350.html

I saw this behavior in other different projects, as:

- Yocto Project:
  https://www.yoctoproject.org/pipermail/meta-freescale/2013-August/004307.html

- Google IA Coral:
  https://coral.googlesource.com/linux-imx-debian/+/refs/heads/master/debian/rules

For the above reasons, INSTALL_MOD_PATH will be set as DESTDIR
by default.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Signed-off-by: Romain Naour <romain.naour@gmail.com>
Closes #12577
2021-10-01 10:44:34 -07:00
Ryan Moeller 96ad227a9d ZTS: Minimize udev_wait in zvol_misc tests
The zvol_misc tests, in particular zvol_misc_volmode, make use of a
common udev_wait function to wait for zvol devices in /dev to quiesce
on Linux.  On other platforms this function currently only sleeps for
one second before returning.  This is insufficient, and
zvol_misc_volmode has been flaky on FreeBSD as a result.

Replace udev_wait with block_device_wait, passing through the optional
device parameter where possible.  Rearrange a few checks to strengthen
the verifications we are making and avoid unnecessarily sleeping.  We
must keep udev_wait in a couple places to pass in Github CI workflows.
Remove zvol_misc_volmode from the maybe failing tests on FreeBSD in
zts-report.py.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12583
2021-10-01 09:36:02 -06:00
Érico Nogueira Rolim ce2bdcedf5 libspl: fix warning about missing spl_pagesize declaration
Though it's unlikely anyone will alter its signature, avoids any
possible type mismatch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Érico Nogueira <erico.erc@gmail.com>
Closes #12567
2021-09-24 16:59:26 -06:00
John Wren Kennedy df5ea74ff6 Assorted parameter changes for performance tests
* Add async runs for sequential_writes, random_readwrite_fixed and
  random_writes
* Remove some larger block sizes that give similar results to others
* Remove nthreads == 4 from random_writes_zil test

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12576
2021-09-21 16:17:36 -06:00
Rich Ercolani 59eab1093a Handle partial reads in zfs_read
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12370 
Closes #12509
Closes #12516
2021-09-20 10:30:50 -07:00
Jorgen Lundman 1d901c3ee5 Upstream: unmount snapshots before destroying them on macOS
Add function zfs_destroy_snaps_nvl_os() call. The main issue is that
macOS needs to unmount any mounted snapshots before they can be
destroyed. Other platforms can handle this in the kernel, but sending
a storm of zed events to unmount seems undesirable when we can do it
in userland to start with.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Co-authored-by: ilovezfs <ilovezfs@icloud.com>
Closes #12550
2021-09-20 09:29:59 -06:00
Rich Ercolani 8a3fe59c03 Added test for being able to read various variants of zstd
As detailed in #12022 and #12008, it turns out the current zstd
implementation is quite nonportable, and results in various
configurations of ondisk header that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64 and one written by FBSD/ppc64.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12030
2021-09-20 09:08:20 -06:00
Alexander Motin 139690d6c3 Really zero the zero page
While switching abd_zero_buf allocation KPI I've missed the fact
that kmem_zalloc() zeroed the allocation, while kmem_cache_alloc()
does not.  Add explicit bzero() after it.

I don't think it should have caused real problems, but leaking one
memory page content all over the pool is not good.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12569
2021-09-17 10:17:18 -07:00
George Amanakis 2a49ebbb4d Avoid panic in case of pool errors and missing L2ARC
In case an ARC buffer is allocated only on L2ARC, and there are
underlying errors in a pool with the cache device in faulty state, a
panic can occur in arc_read_done()->arc_hdr_destroy()->
arc_hdr_l2arc_destroy()->arc_hdr_clear_flags() when trying to free
the ARC buffer.

Fix this by discarding the buffer's identity in arc_hdr_destroy(), in
case the buffer is not empty, before calling arc_hdr_l2hdr_destroy().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12392
2021-09-16 09:40:15 -07:00
Brian Behlendorf 6065740811 Linux 5.14 compat: META
Increase the Linux-Maximum version in the META file to 5.14.
All of the required compatibility patches have been merged
and the 5.14 kernel has been officially released.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12565
2021-09-15 14:09:31 -07:00
Allan Jude 4a1195ca50 Temporarily use root credentials to mount snapshots in .zfs
When mounting a snapshot in the .zfs/snapshots control directory,
temporarily assume roots credentials to perform the VFS_MOUNT().

This allows regular users and users inside jails to access these
snapshots.

The regular usermount code is not helpful here, since it requires
that the user performing the mount own the mountpoint, which won't
be the case for .zfs/snapshot/<snapname>

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-By: Modirum MDPay
Sponsored-By: Klara Inc.
Closes #11312
2021-09-14 17:10:00 -06:00
Brian Behlendorf 6954c22f35 Use fallthrough macro
As of the Linux 5.9 kernel a fallthrough macro has been added which
should be used to anotate all intentional fallthrough paths.  Once
all of the kernel code paths have been updated to use fallthrough
the -Wimplicit-fallthrough option will because the default.  To
avoid warnings in the OpenZFS code base when this happens apply
the fallthrough macro.

Additional reading: https://lwn.net/Articles/794944/

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12441
2021-09-14 10:17:54 -06:00
Jorgen Lundman 7443299fe0 Iterate encrypted clones at zvol_create_minor
Userland figures out which encryption-root keys are required to load,
and issues ZFS_IOC_LOAD_KEY.
The tail section of spa_keystore_load_wkey() will call
zvol_create_minors() on the encryption-root object.

Any clones of the encrypted zvol will not be plumbed. This commits
adds additional logic to detect if zvol has clones, and is encrypted,
then adds these to the list of zvols to call zvol_create_minors() on.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12471
2021-09-13 13:27:07 -07:00
Arun KV f82f0279ed Fixed data integrity issue when underlying disk returns error
Errors in zil_lwb_write_done() are not propagated to
zil_lwb_flush_vdevs_done() which can result in zil_commit_impl()
not returning an error to applications even when zfs was not able
to write data to the disk.

Remove the ZIO_FLAG_DONT_PROPAGATE flag from zio_rewrite() to
allow errors to propagate and consolidate the error handling for
flush and write errors to a single location (rather than having
error handling split between the "write done" and "flush done"
handlers).

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Arun KV <arun.kv@datacore.com>
Closes #12391
Closes #12443
2021-09-13 13:02:39 -07:00
Brian Behlendorf 695d4ae815 ZTS: Waiting for zvols to be available
This is a follow up patch for PR #12515 which addresses some
additional ZTS tests which are unreliable are should explicitly
wait for the required zvols to be available.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: @Theo13111
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12553
2021-09-13 12:18:01 -07:00
Brian Behlendorf b9ec4a15e5 Verify embedded blkptr's in arc_read()
The block pointer verification check in arc_read() should also
cover embedded block pointers.  While highly unlikely, accessing
a damaged block pointer can result in panic.  To further harden
the code extend the existing check to include embedded block
pointers and add a comment explaining the rational for this
sanity check.  Lastly, correct a flaw in zfs_blkptr_verify()
so the error count is checked even when checking a untrusted
config to verify the non-pool-specific portions of a block
pointer.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12535
2021-09-09 19:02:07 -06:00
Jorgen Lundman 5a54a4e051 Upstream: Add snapshot and zvol events
For kernel to send snapshot mount/unmount events to zed.

For kernel to send symlink creates/removes on zvol plumbing.
(/dev/run/dsk/zvol/$pool/$zvol -> /dev/diskX)

If zed misses the ENODEV, all errors after are EINVAL. Treat any error
as kernel module failure.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12416
2021-09-09 10:44:21 -07:00
Brian Behlendorf 2079111f42 Linux 5.15 compat: get_acl()
Kernel commits

332f606b32b6 ovl: enable RCU'd ->get_acl()
0cad6246621b vfs: add rcu argument to ->get_acl() callback

Added compatibility code to detect the new ->get_acl() interface
and correctly handle the case where the new rcu argument is set.

Reviewed-by: Coleman Kane <ckane@colemankane.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12548
2021-09-09 09:38:35 -07:00
Allan Jude a68e4b5919 Allow sending corrupt snapshots even if metadata is corrupted
When zfs_send_corrupt_data is set, use the TRAVERSE_HARD flag,
so traverse_visitbp() will not fail with ECKSUM if a blockpointer
cannot be read, but rather will continue and send the objects it can.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-By: Klara Inc.
Sponsored-By: WHC Online Solutions Inc.
Closes #12541
2021-09-09 08:17:31 -06:00
Rich Ercolani 7676ffc51f arc: Drop an incorrect assert
Unfortunately, there was an overzealous assertion that was (in pretty
specific circumstances) false, causing failure.  This assertion was
added in error, so we're removing it.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #9897
Closes #12020
Closes #12246
2021-09-08 14:00:03 -07:00
Jorgen Lundman 49d8a99a4d Upstream: zdb inode mapping
Unfortunately macOS reserves inode ID numbers 0-15, and we can
not used them. In macOS port we simply map them really high IDs.

Normally this is hidden inside the _os implementation, but this is
the one place in the common source files.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12530
2021-09-08 13:56:04 -07:00
Paul Dagnelie c634320e51 Compressed receive with different ashift can result in incorrect PSIZE on disk
We round up the psize to the nearest multiple of the asize or to the
lsize, whichever is smaller. Once that's done, we allocate a new
buffer of the appropriate size, zero the tail, and copy the data
into it. This adds a small performance cost to these kinds of writes,
but fixes the bookkeeping problems.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12522
Closes #8462
2021-09-08 13:52:28 -07:00
Alexander a3588c68f7 Linux 5.15 compat: standalone <linux/stdarg.h>
Kernel commits

39f75da7bcc8 ("isystem: trim/fixup stdarg.h and other headers")
c0891ac15f04 ("isystem: ship and use stdarg.h")
564f963eabd1 ("isystem: delete global -isystem compile option")

(for now can be found in linux-next.git tree, will land into the
 Linus' tree during the ongoing 5.15 cycle with one of akpm merges)

removed the -isystem flag and disallowed the inclusion of any
compiler header files. They also introduced a minimal
<linux/stdarg.h> as a replacement for <stdarg.h>.
include/os/linux/spl/sys/cmn_err.h in the ZFS source tree includes
<stdarg.h> unconditionally. Introduce a test for <linux/stdarg.h>
and include it instead of the compiler's one to prevent module
build breakage.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #12531
2021-09-08 12:59:43 -07:00
Brian Behlendorf f616605821 Linux 5.15 compat: block device readahead
The 5.15 kernel moved the backing_dev_info structure out of
the request queue structure which causes a build failure.

Rather than look in the new location for the BDI we instead
detect this upstream refactoring by the existance of either
the blk_queue_update_readahead() or disk_update_readahead()
functions.  In either case, there's no longer any reason to
manually set the ra_pages value since it will be overridden
with a reasonable default (2x the block size) when
blk_queue_io_opt() is called.

Therefore, we update the compatibility wrapper to do nothing
for 5.9 and newer kernels.  While it's tempting to do the
same for older kernels we want to keep the compatibility
code to preserve the existing behavior.  Removing it would
effectively increase the default readahead to 128k.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12532
2021-09-08 09:03:13 -06:00
Don Brady ab15b1fc0e Detect iSCSI in the zpool cmd vdev media script
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #12206
2021-09-02 16:11:53 -07:00
George Melikov 3eb3e4d14c CI: don't install abigail-tools
We use docker image instead.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-09-02 10:02:27 -07:00
George Melikov 6ea058da16 Update ABI files via new libabigail version
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-09-02 10:02:18 -07:00
George Melikov 1aaebea2f5 Libabigail: make .abi files more consistent
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-09-02 10:02:08 -07:00
George Melikov d510924520 CI: use fresh libabigail via docker image
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-09-02 10:01:58 -07:00
George Melikov a9655fc2bd Check for libabigail version
We need to use 1.8.0+ version, older versions
may segfault and give inconsistent results.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-09-02 10:01:45 -07:00
Jorgen Lundman 157a4d05bd autoconf: allow Release to contain hyphen
To avoid clashing with tags and releases, we'll use "zfs-macOS".

Meta:          1
Name:          zfs-macOS

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12437
2021-09-02 07:51:29 -06:00
Ryan Moeller c27c124a88 ZTS: Remove exceptions for flaky zhack on FreeBSD
Issue #11854 has been resolved, so we can remove the exceptions for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12527
2021-09-01 13:20:00 -07:00
Jorgen Lundman 3e8d5e4ff3 Add zpool_disable_datasets_os() / zfs_unmount_os()
zpool_disable_datasets_os():
macOS needs to do a bunch of work to kick everything off zvols.

zfs_unmount_os():
This allows us to unmount any zvols that may be mounted. Like with
zfs destroy foo/vol

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12436
2021-08-31 09:56:00 -06:00
Ryan Moeller 3b89d9518d FreeBSD: Don't remove SA xattr if not SA znode
We attempt to remove an existing SA xattr when setting a dir xattr, but
this only makes sense if the znode has been upgraded to the SA format.
Otherwise, we will hit an assert in zfs_sa_get_xattr.

Make sure this is an SA znode before attempting to remove the SA xattr.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12514
2021-08-30 16:01:09 -07:00
Rich Ercolani b1a1c64313 Fix cross-endian interoperability of zstd
It turns out that layouts of union bitfields are a pain, and the
current code results in an inconsistent layout between BE and LE
systems, leading to zstd-active datasets on one erroring out on
the other.

Switch everyone over to the LE layout, and add compatibility code
to read both.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12008
Closes #12022
2021-08-30 14:13:46 -07:00
Ka Ho Ng c3cb57ae47 ZTS: Enable punch-hole tests on FreeBSD
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ka Ho Ng <khng@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #12458
2021-08-30 13:33:32 -07:00
Ka Ho Ng f3bbeb970e FreeBSD: Implement hole-punching support
This adds supports for hole-punching facilities in the FreeBSD kernel
starting from __FreeBSD_version 1400032.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ka Ho Ng <khng@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #12458
2021-08-30 13:33:30 -07:00
Brian Behlendorf 70bf547a98 ZTS: Waiting for zvols to be available
The ZTS block_device_wait helper function should use -e when waiting
for a file to appear since it will be either a block special device
or a symlink.  This didn't cause any failures but when a device path
was specified the function would wait longer than needed.

Additionally update the most flakey test cases to pass the file path
to block_device_wait to try and improve the test reliability.  The
udev behavior on Fedora in particular can result in frequent false
positives.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12515
2021-08-29 09:56:58 -06:00
Ryan Moeller 5bee265908 Correct checking bdev_check_media_change message
We're not looking for bdev_disk_changed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12492
2021-08-27 10:02:54 -07:00
Tony Hutter 4dc115328f Make 'zpool labelclear -f' work on offlined disks
This patch allows you to clear the label on offlined disks in an active
pool with `-f`.  Previously, labelclear wouldn't let you do that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12511
2021-08-27 10:26:49 -06:00
Trevor Bautista 00888c0898 Extend zpool-iostat to account for ZIO_PRIORITY_REBUILD (#12319)
Previously, zpool-iostat did not display any data regarding rebuild I/Os
in either the latency/size histograms (-w/-l/-r) or the queue data (-q).
This fix essentially utilizes the existing infrastructure for tracking
rebuild queue data and displays this data in the proper places within
zpool-iostat's output.

Signed-off-by: Trevor Bautista <tbautista@newmexicoconsortium.org>
Signed-off-by: Trevor Bautista <tbautista@lanl.gov>
Co-authored-by: Trevor Bautista <tbautista@newmexicoconsortium.org>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2021-08-26 11:26:49 -07:00
Anton Gubarkov b0f3f393d1 vdev_id: Return an error if config file is not found (#12508)
Signed-off-by: Anton Gubarkov <anton.gubarkov@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-08-25 13:01:26 -07:00
Sam Hathaway 2e9e2c8ff5 zpool-remove.8: describe top-level vdev sector size limitation
Document that top-level vdevs cannot be removed unless all top-level
vdevs have the same sector size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Sam Hathaway <sam@sam-hathaway.com>
Closes #11339
Closes #12472
2021-08-23 14:59:18 -07:00
Mark Johnston 3ee9a997a3 Initialize parity blocks before RAID-Z reconstruction benchmarking
benchmark_raidz() allocates a row to benchmark parity calculation and
reconstruction.  In the latter case, the parity blocks are left
uninitialized, leading to reports from KMSAN.

Initialize parity blocks to 0xAA as we do for the data earlier in the
function.  This does not affect the selected RAID-Z implementation on
any of several systems tested.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12473
2021-08-23 11:10:17 -07:00
Ryan Moeller 8ae86e2edc ZTS: Add tests for creation time
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12432
2021-08-17 10:25:58 -07:00
Richard Yao abbf0bd4eb Linux 4.11 compat: statx support
Linux 4.11 added a new statx system call that allows us to expose crtime
as btime. We do this by caching crtime in the znode to match how atime,
ctime and mtime are cached in the inode.

statx also introduced a new way of reporting whether the immutable,
append and nodump bits have been set. It adds support for reporting
compression and encryption, but the semantics on other filesystems is
not just to report compression/encryption, but to allow it to be turned
on/off at the file level. We do not support that.

We could implement semantics where we refuse to allow user modification
of the bit, but we would need to do a dnode_hold() in zfs_znode_alloc()
to find out encryption/compression information. That would introduce
locking that will have a minor (although unmeasured) performance cost.
It also would be inferior to zdb, which reports far more detailed
information. We therefore omit reporting of encryption/compression
through statx in favor of recommending that users interested in such
information use zdb.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #8507
2021-08-17 10:25:58 -07:00
Gordon Bergling 0f402668f9 zfs.4: Fix typo s/compatiblity/compatibility/
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Gordon Bergling <gbergling@googlemail.com>
Closes #12464
2021-08-17 11:01:07 -06:00
Alexander Motin 6b88b4b501 Remove b_pabd/b_rabd allocation from arc_hdr_alloc()
When a header is allocated for full overwrite it is a waste of time
to allocate b_pabd/b_rabd for it, since arc_write() will free them
without ever being touched.  If it is a read or a partial overwrite
then arc_read() and arc_hdr_decrypt() allocate them explicitly.

Reduced memory allocation in user threads also reduces ARC eviction
throttling there, proportionally increasing it in ZIO threads, that
is not good.  To minimize or even avoid it introduce ARC allocation
reserve, allowing certain arc_get_data_abd() callers to allocate a
bit longer in situations where user threads will already throttle.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12398
2021-08-17 10:15:54 -06:00
Alexander Motin 72f0521aba Increase default volblocksize from 8KB to 16KB
Many things has changed since previous default was set many years ago.
Nowadays 8KB does not allow adequate compression or even decent space
efficiency on many of pools due to 4KB disk physical block rounding,
especially on RAIDZ and DRAID.  It effectively limits write throughput
to only 2-3GB/s (250-350K blocks/s) due to sync thread, allocation,
vdev queue and other block rate bottlenecks.  It keeps L2ARC expensive
despite many optimizations and dedup just unrealistic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12406
2021-08-17 09:59:46 -06:00
Alexander Motin bb7ad5d326 Optimize arc_l2c_only lists assertions
It is very expensive and not informative to call multilist_is_empty()
for each arc_change_state() on debug builds to check for impossible.
Instead implement special index function for arc_l2c_only->arcs_list,
multilists, panicking on any attempt to use it.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12421
2021-08-17 09:55:34 -06:00
Alexander Motin cfe8e960f1 Fix/improve dbuf hits accounting
Instead of clearing stats inside arc_buf_alloc_impl() do it inside
arc_hdr_alloc() and arc_release().  It fixes statistics being wiped
every time a new dbuf is filled from the ARC.

Remove b_l1hdr.b_l2_hits. L2ARC hits are accounted at b_l2hdr.b_hits.
Since the hits are accounted under hash lock, replace atomics with
simple increments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12422
2021-08-17 09:50:31 -06:00
Alexander Motin 7f9d9e6f39 Avoid vq_lock drop in vdev_queue_aggregate()
vq_lock is already too congested for two more operations per I/O.
Instead of dropping and reacquiring it inside vdev_queue_aggregate()
delegate the zio_vdev_io_bypass() and zio_execute() calls for parent
I/Os to callers, that drop the lock any way to execute the new I/O.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12297
2021-08-17 09:47:00 -06:00
Alexander Motin e829a865bf Use more atomics in refcounts
Use atomic_load_64() for zfs_refcount_count() to prevent torn reads
on 32-bit platforms.  On 64-bit ones it should not change anything.

When built with ZFS_DEBUG but running without tracking enabled use
atomics instead of mutexes same as for builds without ZFS_DEBUG.
Since rc_tracked can't change live we can check it without lock.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12420
2021-08-17 09:44:34 -06:00
Ryan Moeller 5bfc3a99f9 ZTS: Avoid unset $tmpdir in redacted_panic
The redacted_send tests make use of a $tmpdir variable, except in
redacted_send/redacted_panic the variable is never defined.

Use $TEST_BASE_DIR instead.

Clean up the stream file after the test.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12455
2021-08-16 16:38:34 -07:00
Allan Jude e945e8d7f4 Restore FreeBSD sysctl processing for arc.min and arc.max
Before OpenZFS 2.0, trying to set the FreeBSD sysctl vfs.zfs.arc_max
to a disallowed value would return an error.
Since the switch, it instead only generates WARN_IF_TUNING_IGNORED

Keep the ability to set the sysctl's specifically to 0, even though
that is less than the minimum, because some tests depend on this.

Also lost, was the ability to set vfs.zfs.arc_max to a value less
than the default vfs.zfs.arc_min at boot time. Restore this as well.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #12161
2021-08-16 09:35:19 -06:00
Ryan Moeller 41bee4039c zfs: add missed dependency of zfs module on zlib
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Martin Matuska <mm@FreeBSD.org>
Co-authored-by: Konstantin Belousov <kib@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
External-issue: https://reviews.freebsd.org/D31207
Closes #12442
2021-08-13 13:42:45 -07:00
Ryan Moeller 6daf036ab5 Add zfs.sh -r flag to reload modules
zfs.sh already can load and unload, so why not both?

This is convenient when developing changes to the module and you want
to rapidly make some changes, rebuild the module, reload the module,
and test the changes.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12450
2021-08-13 13:37:46 -07:00
Ryan Moeller a7491f9990 Fix usage of find in tests/Makefile.am
The path is not optional on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12453
2021-08-13 13:13:57 -07:00
Tony Nguyen 6bc61d22c4 Run arc_evict thread at higher priority
Run arc_evict thread at higher priority, nice=0, to give it more CPU
time which can improve performance for workload with high ARC evict
activities.

On mixed read/write and sequential read workloads, I've seen between
10-40% better performance.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #12397
2021-08-10 11:36:26 -06:00
Rich Ercolani f3678d70ff Make get_key_material_file fail more verbosely
It turns out, there are a lot of possible reasons for fopen to fail.
Let's share which reason we failed for today.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12410
2021-08-05 17:48:33 -06:00
Brian Behlendorf ae1e40b329 Enable /proc/diskstats for zvols
The /proc/diskstats accounting needs to be explicitly enabled
for block devices which do not use multi-queue.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12440
Closes #12066
2021-08-05 15:35:34 -06:00
George Melikov e1870be450 Man zpool-scrub.8: describe sequential scrub
Describe sequential scrub and add examples of scrub status.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12429
2021-08-05 15:30:28 -06:00
hedongzhang 4357552785 Modify checksum obtain method of QAT
CpaDcGeneratefooter function that obtain the checksum code
does not support the CPA_DC_STATELESS mode. So we get the
adler32 chencksum of the end of the zlib from dc_results.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chengfei Zhu <chengfeix.zhu@intel.com>
Signed-off-by: hedong.zhang <h_d_zhang@163.com>
Closes #12343
2021-08-03 11:46:33 -06:00
Mark Johnston 46b2623586 Allow disabling of unmapped I/O on FreeBSD
We have a tunable which permits one to disable the use of unmapped I/O
for the buffer cache.  Respect it in ZFS as well.  This is useful for
KMSAN, which cannot easily maintain shadow state for unmapped pages.

No functional change intended, as unmapped I/O is permitted by default
and there's no real reason to disable it in practice except for
debugging.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12446
2021-08-02 13:18:24 -06:00
Alexander Motin 7eebcd2be6 Avoid small buffer copying on write
It is wrong for arc_write_ready() to use zfs_abd_scatter_enabled to
decide whether to reallocate/copy the buffer, because the answer is
OS-specific and depends on the buffer size.  Instead of that use
abd_size_alloc_linear(), moved into public header.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12425
2021-07-27 16:05:47 -07:00
Brian Behlendorf b72611f0f6 Fix format specifier warnings
Commit 5dbf6c5a66 did not address these format specifier warnings
since they were introduced by an unrelated commit which had not
been rebased on 5dbf6c5a66 when merged.  Fix them.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12435
2021-07-27 09:50:00 -07:00
Brian Behlendorf 4bd99c11d7 Remove overlooked __sun_attr__ based macros
The __NORETURN, __CONST, and __PURE macros in the FreeBSD platform
code were based on the __sun_attr__ macro which was removed in
commit 5dbf6c5a6.  This caused a build failure because the
__NORETURN macro was still used in one place in kernel code.
The __CONST and __PURE macros were entirely unused.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12435
2021-07-27 09:49:11 -07:00
George Melikov 9776838cfb Update ABI files with generated in CI worker
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12388
2021-07-26 16:55:18 -07:00
George Melikov 0b072481af storeabi: add no-corpus-path flag
Try to minimize diffs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12388
2021-07-26 16:54:53 -07:00
Jorgen Lundman 273730d5b5 macOS can also set va_type
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12357
2021-07-26 16:38:06 -07:00
Jorgen Lundman 02601d8aa4 Move check_file to os/$platform section
Keep check_file_generic() in shared code base, and allow special case
code in check_file() in os section. In future, macOS will have
additional checks in check_file().

Linux and FreeBSD wrappers just calls check_file_generic().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12385
2021-07-26 16:34:11 -07:00
Alexander Motin dd3bda39cf Add comment on metaslab_class_throttle_reserve() locking
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Issue #12314
Closes #12419
2021-07-26 16:30:20 -07:00
John Wren Kennedy bdd2bfd02c Assorted fixes for the performance tests
- Bail out early if we're running the perf tests and forget to
  specify disks.
- Allow perf tests to run with any number of disks.
- Remove weekly vs. nightly settings
- Move variables with common values to perf.shlib
- Use zinject to clear the ARC over export/import
- Fix dbuf cache size calculation

When the meaning of `dbuf_cache_max_bytes` changed, the performance
test that covers the dbuf cache started to fail. The test would try to
write files for the test using the max possible size of the cache,
inevitably filling the pool and failing. This change uses
`dbuf_cache_shift` to correctly calculate the dbuf cache size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12408
2021-07-26 15:47:08 -06:00
Matthew Ahrens d8381f50d6 Read past end of argv array in zpool_do_import()
`zpool_do_import()` passes `argv[0]`, (optionally) `argv[1]`, and
`pool_specified` to `import_pools()`.  If `pool_specified==FALSE`, the
`argv[]` arguments are not used.  However, these values may be off the
end of the `argv[]` array, so loading them could dereference unmapped
memory.  This error is reported by the asan build:

```
=================================================================
==6003==ERROR: AddressSanitizer: heap-buffer-overflow
READ of size 8 at 0x6030000004a8 thread T0
    #0 0x562a078b50eb in zpool_do_import zpool_main.c:3796
    #1 0x562a078858c5 in main zpool_main.c:10709
    #2 0x7f5115231bf6 in __libc_start_main
    #3 0x562a07885eb9 in _start

0x6030000004a8 is located 0 bytes to the right of 24-byte region
allocated by thread T0 here:
    #0 0x7f5116ac6b40 in __interceptor_malloc
    #1 0x562a07885770 in main zpool_main.c:10699
    #2 0x7f5115231bf6 in __libc_start_main
```

This commit passes NULL for these arguments if they are off the end
of the `argv[]` array.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #12339
2021-07-26 12:51:39 -07:00
Václav Skála 31c41aea9c Add missing properties to zfs allow manpage
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Václav Skála <skala@vshosting.cz>
Closes #12402
2021-07-26 12:44:01 -07:00
George Amanakis ab8a8f0745 Fixes in persistent L2ARC
In l2arc_add_vdev() first decide whether the device is eligible for
L2ARC rebuild or whole device trim and then add it to the list of cache
devices. Otherwise l2arc_feed_thread() might already start writing on
the device invalidating previous content as l2ad_hand = l2ad_start.
However l2arc_rebuild_vdev() needs the device present in the cache
device list to figure out its l2arc_dev_t. Fix this by moving most of
l2arc_rebuild_vdev() in a new function l2arc_rebuild_dev() which does
not need to search in the cache device list.

In contrast to l2arc_add_vdev() we do not have to worry about
l2arc_feed_thread() invalidating previous content when onlining a
cache device. The device parameters (l2ad*) are not cleared when
offlining the device and writing new buffers will not invalidate
all previous content. In worst case only buffers that have not had
their log block written to the device will be lost.

Retire persist_l2arc_00{4,5,8} tests since they cover code already
covered by the remaining ones. Test persist_l2arc_006 is renamed to
persist_l2arc_004 and persist_l2arc_007 is renamed to persist_l2arc_005.

Fix a typo in persist_l2arc_004, and remove an assertion that is not
always true from l2arc_arcstats_pos. Also update an assertion in
persist_l2arc_005 and explain why in a comment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12365
2021-07-26 12:30:24 -07:00
наб 037af3e0d4 Remove NOTE(CONSTCOND) and note.h
These were mostly used to annotate do {} while(0)s

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12201
2021-07-26 12:07:53 -07:00
наб 2c69ba6444 Normalise /*FALLTHR{OUGH,U}*/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12201
2021-07-26 12:07:39 -07:00
наб 90f1c3c946 Prune /*NOTREACHED*/
This includes a simplification of mkbusy and format correctness in zhack
and ztest

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12201
2021-07-26 12:07:26 -07:00
наб 5dbf6c5a66 Replace /*PRINTFLIKEn*/ with attribute(printf)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12201
2021-07-26 12:07:15 -07:00
Mark Johnston 1373709450 Initialize dn_next_type[] in the dnode constructor
It seems nothing ensures that this array is zeroed when a dnode is
freshly allocated, so in principle it retains the values from the
previous allocation.  In practice it seems to be the case that the
fields should end up zeroed, but we can zero the field anyway for
consistency.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-07-26 11:53:47 -07:00
Mark Johnston 3a185275a0 Zero pad bytes following TX_WRITE log data
When logging a TX_WRITE record in the case where file data has to be
copied from the DMU, we pad the log record size to a multiple of 8
bytes.  In this case, any padding bytes should be zeroed, otherwise the
contents of uninitialized memory are written to the ZIL.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-07-26 11:53:47 -07:00
Mark Johnston 58714c2817 Zero pad bytes when allocating a ZIL record
When allocating a record, we round up the allocation size to a multiple
of 8.  In this case, any padding bytes should be zeroed, otherwise the
contents of uninitialized memory are written to the ZIL.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-07-26 11:53:47 -07:00
Mark Johnston 03363b2f86 Initialize all fields in zfs_log_xvattr()
When logging TX_SETATTR, we could otherwise fail to initialize part of
the corresponding ZIL record depending on which fields are present in
the xvattr.  Initialize the creation time and the AV scan timestamp to
zero so that uninitialized bytes are not written to the ZIL.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-07-26 11:53:47 -07:00
Mark Johnston da27b8bc7f Initialize "autoreplace" in spa_ld_get_props()
spa_prop_find() may fail to find the specified property, in which case
it suppresses ENOENT from zap_lookup().  In this case, the return value
is left uninitialized, so spa_autoreplace was being initialized using an
uninitialized stack variable.

This was found using KMSAN.  It appears to be a regression from commit
9eb7b46ed0, which removed the initialization of "autoreplace" from the
definition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-07-26 11:53:47 -07:00
Coleman Kane 1c24bf966c Linux 5.14 compat: explicity assign set_page_dirty
Kernel 5.14 introduced a change where set_page_dirty of
struct address_space_operations is no longer implicitly set to
__set_page_dirty_buffers(), which ended up resulting in a NULL
pointer deref in the kernel when it is attempted to be called.
This change sets .set_page_dirty in the structure to
__set_page_dirty_nobuffers(), which was introduced with the
related patch set. The breaking change was introduce in commit
0af573780b0b13fceb7fabd49dc1b073cee9a507 to torvalds/linux.git.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12427
2021-07-26 10:55:55 -07:00
Rich Ercolani f1ca7999bb Fix unfortunate NULL in spa_update_dspace
After 1325434b, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12380 
Closes #12428
2021-07-26 10:51:30 -07:00
Brian Behlendorf 1b06b03a7b Linux 5.14 compat: blk_alloc_disk()
In Linux 5.14, blk_alloc_queue is no longer exported, and its usage
has been superseded by blk_alloc_disk, which returns a gendisk struct
from which we can still retrieve the struct request_queue* that is
needed in the one place where it is used. This also replaces the call
to alloc_disk(minors), and minors is now set via struct member
assignment.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12362 
Closes #12409
2021-07-23 15:28:03 -07:00
Ryan Moeller 14b43fbd9c zloop: Add a max iterations option, use default run/pass times
It is useful to have control over the number of iterations of zloop so
we can easily produce "x core dumps found *in y iterations*" metrics.

Using random values for run/pass times doesn't improve coverage in a
meaningful way.

Randomizing run time could be seen as a compromise between running a
greater variety of shorter tests versus a smaller variety of longer
tests within a fixed time span.  However, it is not desirable when
running a fixed number of iterations.

Pass time already incorporates randomness within ztest.

Either parameter can be passed to ztest explicitly if the defaults are
not satisfactory.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12411
2021-07-22 15:29:27 -06:00
Alexander Motin 46197dc858 FreeBSD: Ignore make_dev_s() errors
Since errors returned by zvol_create_minor_impl() are ignored by the
common code, it is more convenient to ignore make_dev_s() errors there.
It allows, for example, to get device created for the zvol after later
rename instead of having it further stuck in half-created state.
zvol_rename_minor() already ignores those errors.

While there, switch from MAXPHYS to maxphys in FreeBSD 13+.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12375
2021-07-22 10:22:14 -06:00
Jorgen Lundman c14ad80fcb Remove old orig_fd variable from zfs send
Possibly required in the past, but is currently fills no purpose.
Ordinarily such tiny cleanup is not generally worth it, however
on the macOS port, in a future commit, we do unspeakable things to the
"fd" for send/recv, and it would be easier to only have to deal with
one "fd" instead of two.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12404
2021-07-21 20:22:27 -06:00
Alexander Motin 1b50749ce9 Optimize allocation throttling
Remove mc_lock use from metaslab_class_throttle_*().  The math there
is based on refcounts and so atomic, so the only race possible there
is between zfs_refcount_count() and zfs_refcount_add().  But in most
cases metaslab_class_throttle_reserve() is called with the allocator
lock held, which covers the race.  In cases where the lock is not
held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we
do not use zfs_refcount_count().  And even if we assume some other
non-existing scenario, the worst that may happen from this race is
few more I/Os get to allocation earlier, that is not a problem.

Move locks and data of different allocators into different cache
lines to avoid false sharing.  Group spa_alloc_* arrays together
into single array of aligned struct spa_alloc spa_allocs.  Align
struct metaslab_class_allocator.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12314
2021-07-21 06:40:36 -06:00
George Melikov bc93935ef0 CI: generate ABI files if changed
So commit author can just download them as
artifacts and commit.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12379
2021-07-20 16:21:00 -06:00
Kevin Jin a7bd20e309 Add Module Parameter Regarding Log Size Limit
* Add Module Parameters Regarding Log Size Limit

zfs_wrlog_data_max
The upper limit of TX_WRITE log data. Once it is reached,
write operation is blocked, until log data is cleared out
after txg sync. It only counts TX_WRITE log with WR_COPIED
or WR_NEED_COPY.

Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes #12284
2021-07-20 09:40:24 -06:00
Alexander Motin 8172df643b Minor ARC optimizations
Remove unneeded global, practically constant, state pointer variables
(arc_anon, arc_mru, etc.), replacing them with macros of real state
variables addresses (&ARC_anon, &ARC_mru, etc.). 

Change ARC_EVICT_ALL from -1ULL to UINT64_MAX, not requiring special
handling in inner loop of ARC reclamation.  Respectively change bytes
argument of arc_evict_state() from int64_t to uint64_t.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12348
2021-07-20 08:13:21 -06:00
Jorgen Lundman e04210035e dmu_redact.c does not call bqueue_destroy
Ensure all calls to bqueue_init() has a corresponding call to bqueue_destroy()

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12118
2021-07-20 08:08:45 -06:00
Alexander 23c13c7e80 A few fixes of callback typecasting (for the upcoming ClangCFI)
* zio: avoid callback typecasting
* zil: avoid zil_itxg_clean() callback typecasting
* zpl: decouple zpl_readpage() into two separate callbacks
* nvpair: explicitly declare callbacks for xdr_array()
* linux/zfs_nvops: don't use external iput() as a callback
* zcp_synctask: don't use fnvlist_free() as a callback
* zvol: don't use ops->zv_free() as a callback for taskq_dispatch()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #12260
2021-07-20 08:03:33 -06:00
Ryan Moeller 65b9293641 Use SET_ERROR for more errors in FreeBSD vnops
We should use SET_ERROR when we first get an error.

Add it in the FreeBSD xattr implementations where missing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12356
2021-07-19 10:52:50 -06:00
Ryan Moeller de12cd2511 Remove unused fields from zvol_task_t
We don't use or need the pool name or value source in the zvol tasks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12361
2021-07-19 10:02:35 -06:00
Alexander Motin eecceeae9f FreeBSD: Switch from MAXPHYS to maxphys on FreeBSD 13+
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12378
2021-07-19 09:56:58 -06:00
Kevin Bowling ca14e08cbf Detect HAVE_LARGE_STACKS at compile time
Move HAVE_LARGE_STACKS definitions to header and set when appropriate.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Kevin Bowling <kbowling@FreeBSD.org>
Closes #12350
2021-07-16 14:28:55 -06:00
George Melikov b17b19943e zpool_influxdb: fix -Werror=stringop-truncation
Use strlcpy instead of problematic strncpy

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12344
2021-07-16 14:04:00 -06:00
Rich Ercolani b7ec530233 Correct zfs-send(8) on readonly sends
zfs-send(8) claimed in the flags list you could use -pR when sending
a readonly filesystem or volume. You cannot.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12336
2021-07-16 13:58:01 -06:00
Alexander Motin c1b5869bab Introduce dsl_dir_diduse_transfer_space()
Most of dsl_dir_diduse_space() and dsl_dir_transfer_space() CPU time
is a dd_lock overhead and time spent in dmu_buf_will_dirty(). Calling
them one after another is a waste of time and even more contention.
Doing that twice for each rewritten block within dbuf_write_done()
via dsl_dataset_block_kill() and dsl_dataset_block_born() created one
of the biggest CPU overheads in case of small blocks rewrite.

dsl_dir_diduse_transfer_space() combines functionality of these two
functions for cases where it is needed, but without double overhead,
practically for the cost of dsl_dir_diduse_space() or even cheaper.

While there, optimize dsl_dir_phys() calls in dsl_dir_diduse_space()
and dsl_dir_transfer_space().  It seems Clang detects some aliasing
there, repeating dd->dd_dbuf->db_data dereference multiple times,
increasing dd_lock scope and contention.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Author: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12300
2021-07-16 13:39:24 -06:00
Jorgen Lundman 41eba77061 pass handle to do_unmount()
The same change has already been done for domount(). On macOS platform
we need to have access to zhp to handle devdisks and snapshots.
Also, symmetry is pleasing.

In addition, the code in zpool_disable_datasets which sorts the
mountpoints did not sort the related handle, which meant that the
mountpoint, and the handle that it is paired with, was lost.
You'd get a random handle with the mountpoint.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12296
2021-07-15 12:31:00 -06:00
наб d9f0f1582c config/libatomic: require -latomic iff atomic.c doesn't link w/o it
In absence of LTO, and dynamic libatomic, la.so ends up in the needs
section of every toolchain executable; some consider this an issue.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12345 
Closes #12359
2021-07-13 13:50:48 -07:00
Rich Ercolani 1325434b2d Tinker with slop space accounting with dedup
* Tinker with slop space accounting with dedup

Do not include the deduplicated space usage in the slop space
reservation, it leads to surprising outcomes.

* Update spa_dedup_dspace sometimes

Sometimes, we get into spa_get_slop_space() with
spa_dedup_dspace=~0ULL, AKA "unset", while spa_dspace is correctly set.

So call the code to update it before we use it if we hit that case.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12271
2021-07-13 09:47:57 -06:00
Alexander Motin f7de776da2 Fix ARC ghost states eviction accounting
arc_evict_hdr() returns number of evicted bytes in scope of specific
state.  For ghost states it does not mean the amount of really freed
memory, but the logical buffer size.  It is correct for the eviction
process, but not for waking up threads waiting for ARC size reduction,
as added in "Revise ARC shrinker algorithm" commit, causing premature
wakeups while ARC is still overflowed, allowing even bigger overflow,
plus processing overhead when next allocation will also get blocked,
probably also for too short time.

To fix that make arc_evict_hdr() also return the amount of really
freed memory, which for the ghost states is only the header, and use
it to update arc_evict_count instead.  Originally I was thinking to
not return it at all, since arc_get_data_impl() does not account for
the headers, but decided that some slow allocation progress is better
than long waits, reaching on my tests up to 100ms.

To reduce negative latency effects of long time periods when reclaim
thread can free little real memory, start reclamation process earlier,
before we actually reached the overflow threshold, when we have to
throttle new allocations.  We can also do it without taking global
arc_evict_lock, reducing the contention.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12279
2021-07-13 09:41:59 -06:00
Brian Behlendorf 07a4c76e90 Update bug report template
- Remove the "SPL Version" line, the repositories have been merged
  since the 0.8 release and we no longer need to ask about this.

- Simply ask for the kernel version / patch level and add a hint
  about how to get this information on Linux and FreeBSD.

- Remove "Status: Triage Needed" from the template, in practice
  we really haven't been using this label so let's step setting it.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #12340
2021-07-12 14:05:50 -06:00
George Wilson 958826be7a file reference counts can get corrupted
Callers of zfs_file_get and zfs_file_put can corrupt the reference
counts for the file structure resulting in a panic or a soft lockup.
When zfs send/recv runs, it will add a reference count to the
open file, and begin to send or recv the stream. If the file descriptor
is closed, then when dmu_recv_stream() or dmu_send() return we will
call zfs_file_put to remove the reference we placed on the file
structure. Unfortunately, because zfs_file_put() uses the file
descriptor to lookup the file structure, it may end up finding that
the file descriptor table no longer contains the file struct, thus
leaking the file structure. Or it might end up finding a file
descriptor for a different file and blindly updating its reference
counts. Other failure modes probably exists.

This change reworks the zfs_file_[get|put] interface to not rely
on the file descriptor but instead pass the zfs_file_t pointer around.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-issue: DLPX-76119
Closes #12299
2021-07-10 19:00:37 -06:00
Jorgen Lundman 03dba7ae31 dprintf_dnode: strcpy -> strlcpy
Missed a couple of strcpy() in earlier commit, this is only used with
--enable-debug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12311
2021-07-07 21:13:40 -06:00
Jorgen Lundman d2119d0e6f Replace strchrnul() with strrchr()
Could have gone either way with this one, either adding it to
macOS/Windows SPL, or returning it to "classic" usage with strrchr().
Since the new special way isn't really used, and only used once,
we have this commit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes  #12312
2021-07-07 21:08:13 -06:00
Alexander Motin eb5983e1b7 FreeBSD: Use unmapped I/O for scattered/gang ABD buffers
Many FreeBSD disk drivers support "unmapped" I/O mode, when data
buffer represented not with a virtually contiguous KVA-mapped address
range, but with a list of physical memory pages.  Originally it was
designed to do I/O from buffers without KVA mapping (unmapped).  But
moving virtual addresses out of equation allows us to operate even
non-contiguous data buffers with one condition: all buffer discon-
tinuities must be aligned to memory page borders.

Doing I/O to capable GEOM device this patch traverses through non-
linear ABD buffers, validating the chunks borders.  If the condition
is met, it supplies GEOM with the list of original physical memory
pages instead of copying the data into temporary contiguous buffer.
On capable hardware on pools with ashift=12 and default ABD chunk of
4KB it should handle all the I/O without additional memory copying.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12320
2021-07-07 16:39:00 -07:00
Alexander Motin bdd11cbb90 FreeBSD: Hardcode abd_chunk_size to PAGE_SIZE
It makes no sense to set it below PAGE_SIZE, since it increases all
overheads and makes returning memory to OS problematic.  It makes no
sense to set it above PAGE_SIZE, since such allocations and especially
frees are too expensive and cause KVA fragmentation to benefit from
fewer chunks.  After that it makes no sense to keep more complicated
math here.

What may have sense though is just a tunable border between linear and
scatter ABDs, previously also controlled by this tunable.  Retain that
functionality by taking abd_scatter_min_size tunable from Linux, just
with different default value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12328
2021-07-06 17:39:23 -07:00
Alexander Motin 97752ba22a Move gethrtime() calls out of vdev queue lock
This dramatically reduces the lock contention on systems with slower
(non-TSC) timecounters.  With TSC the difference is minimal, but since
this lock is pretty congested, any improvement counts.  Plus I don't
see any reason to do it under the lock other than the latency of the
lock itself, which this change actually reduces.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12281
2021-07-06 14:38:00 -07:00
Justin Gottula 6e4e3c3ab6 Udev rules: remove zvol compat symlinks (without the leading zvol/)
This is a potentially arguable change, because it removes some
compatibility cruft that certain systems or people may have come to rely
on (either a very long time ago, or unwisely in recent times).

On the other hand, it's been literally over a decade since OpenZFS
switched to the strategy of using opaque numbered /dev/zd* device nodes,
with the canonical zvol access path being a directory tree of symlinks
created by udev rules inside /dev/zvol/*. (See #102.) Even at the time,
the /dev/* scheme was labeled as being for "compatibility".

This commit removes the second tree of symlinks located directly at
/dev/*, under the assumption that anybody with any sense has been using
the intended /dev/zvol/* path for a very very long time now.

(The more I think about this, the more I anticipate that some large
fraction of people will have been blissfully unaware that the intention
has been for them to use the /dev/zvol/* tree all along, and they will
have come to rely upon the /dev/* tree simply because it's been there
this whole time despite being a compat thing.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12303
2021-07-06 13:41:17 -07:00
Justin Gottula f24c7c359e Use substantially more robust program exit status logic in zvol_id
Currently, there are several places in zvol_id where the program logic
returns particular errno values, or even particular ioctl return values,
as the program exit status, rather than a straightforward system of
explicit zero on success and explicit nonzero value(s) on failure.

This is problematic for multiple reasons. One particularly interesting
problem that can arise, is that if any of these values happens to have
all 8 least significant bits unset (i.e., it is a positive or negative
multiple of 256), then although the C program sees a nonzero int value
(presumed to be a failure exit status), the actual exit status as seen
by the system is only the bottom 8 bits of that integer: zero.

This can happen in practice, and I have encountered it myself. In a
particularly weird situation, the zvol_open code in the zfs kernel
module was behaving in such a manner that it caused the open() syscall
to fail and for errno to be set to a kernel-private value (ERESTARTSYS,
which happens to be defined as 512). It turns out that 512 is evenly
divisible by 256; or, in other words, its least significant 8 bits are
all-zero. So even though zvol_id believed it was returning a nonzero
(failure) exit status of 512, the system modulo'd that value by 256,
resulting in the actual exit status visible by other programs being 0!
This actually-zero (non-failure) exit status caused problems: udev
believed that the program was operating successfully, when in fact it
was attempting to indicate failure via a nonzero exit status integer.
Combined with another problem, this led to the creation of nonsense
symlinks for zvol dev nodes by udev.

Let's get rid of all this problematic logic, and simply return
EXIT_SUCCESS (0) is everything went fine, and EXIT_FAILURE (1) if
anything went wrong.

Additionally, let's clarify some of the variable names (error is similar
to errno, etc) and clean up the overall program flow a bit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-07-02 13:10:36 -07:00
Justin Gottula b19e2bdfb5 Print zvol_id error messages to stderr rather than stdout
The zvol_id program is invoked by udev, via a PROGRAM key in the
60-zvol.rules.in rule file, to determine the "pretty" /dev/zvol/*
symlink paths paths that should be generated for each opaquely named
/dev/zd* dev node.

The udev rule uses the PROGRAM key, followed by a SYMLINK+= assignment
containing the %c substitution, to collect the program's stdout and then
"paste" it directly into the name of the symlink(s) to be created.

Unfortunately, as currently written, zvol_id outputs both its intended
output (a single string representing the symlink path that should be
created to refer to the name of the dataset whose /dev/zd* path is
given) AND its error messages (if any) to stdout.

When processing PROGRAM keys (and others, such as IMPORT{program}), udev
uses only the data written to stdout for functional purposes. Any data
written to stderr is used solely for the purposes of logging (if udev's
log_level is set to debug).

The unintended consequence of this is as follows: if zvol_id encounters
an error condition; and then udev fails to halt processing of the
current rule (either because zvol_id didn't return a nonzero exit
status, or because the PROGRAM key in the rule wasn't written properly
to result in a "non-match" condition that would stop the current rule on
a nonzero exit); then udev will create a space-delimited list of symlink
names derived directly from the words of the error message string!

I've observed this exact behavior on my own system, in a situation where
the open() syscall on /dev/zd* dev nodes was failing sporadically (for
reasons that aren't especially relevant here). Because the open() call
failed, zvol_id printed "Unable to open device file: /dev/zd736\n" to
stdout and then exited.

The udev rule finished with SYMLINK+="zvol/%c %c". Assuming a volume
name like pool/foo/bar, this would ordinarily expand to
   SYMLINK+="zvol/pool/foo/bar pool/foo/bar"
and would cause symlinks to be created like this:
   /dev/zvol/pool/foo/bar -> /dev/zd736
   /dev/pool/foo/bar      -> /dev/zd736

But because of the combination of error messages being printed to
stdout, and the udev syntax freely accepting a space-delimited sequence
of names in this context, the error message string
   "Unable to open device file: /dev/zd736\n"
in reality expanded to
   SYMLINK+="zvol/Unable to open device file: /dev/zd736"
which caused the following symlinks to actually be created:
   /dev/zvol/Unable -> /dev/zd736
   /dev/to          -> /dev/zd736
   /dev/open        -> /dev/zd736
   /dev/device      -> /dev/zd736
   /dev/file:       -> /dev/zd736
   /dev//dev/zd736  -> /dev/zd736

(And, because multiple zvols had open() syscall errors, multiple zvols
attempted to claim several of those symlink names, resulting in numerous
udev errors and timeouts and general chaos.)

This commit rectifies all this silliness by simply printing error
messages to stderr, as Dennis Ritchie originally intended.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-07-02 13:10:06 -07:00
Justin Gottula f1aef8d4df Udev rules: use match (==) rather than assign (=) for PROGRAM
Assignment syntax (=) can be used for the PROGRAM key. But the PROGRAM
key is really a match key, not an assign key. The internal logic used by
udev to decide whether a PROGRAM key "matched" or not (which determines
whether the remainder of the rule is evaluated) depends on whether the
operator was OP_MATCH (==) or OP_NOMATCH (!=). [1]

The man page claims that '"=", ":=", and "+=" have the same effect as
"=="' for PROGRAM keys. And, after a brief perusal, the udev source code
does seem to confirm that operators other than OP_MATCH (==) or
OP_NOMATCH (!=) are implicitly converted to OP_MATCH (==). [2] But it's
not entirely clear that this is definitely the case: anecdotal testing
seems to indicate that when OP_ASSIGN (=) is used, the program's exit
status is disregarded and the remainder of the rule is processed
regardless of whether it was, in fact, a successful exit.

The bottom line here is that, if zvol_id hits some snag and returns a
nonzero exit status, then we almost certainly do NOT want to continue on
with the rule and use whatever the stdout contents may have been to
mindlessly create /dev/zvol/* symlinks. Therefore, let's be extra-sure
and use the match (==) operator explicitly, to eliminate any possibility
that udev might do the wrong thing, and ensure that a nonzero exit
status will definitely short-circuit the rest of the rule, bypassing the
SYMLINK+= assignments.

[1]
udev,
 file src/udev/udev-rules.c,
  func udev_rule_apply_token_to_event,
   switch case TK_M_PROGRAM if r != 0 (nonzero exit status):
      return token->op == OP_NOMATCH;
   switch case TK_M_PROGRAM if r == 0 (zero exit status):
      return token->op == OP_MATCH;
   func retval 0 => key is considered to have matched
   func retval 1 => key is considered to have NOT matched

[2]
udev,
 file src/udev/udev-rules.c,
  func parse_token,
   at func start:
      bool is_match = IN_SET(op, OP_MATCH, OP_NOMATCH);
   in else-if case streq(key, "PROGRAM"):
      if (!is_match) op = OP_MATCH;

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-07-02 13:09:39 -07:00
Justin Gottula 17c794e7b0 Udev rules: replace deprecated $tempnode with $devnode
The $tempnode substitution is so old that it's not even mentioned in the
man page anymore. It is still technically supported by udev, but with
plenty of "deprecated" comments surrounding it.

The preferred modern equivalent of $tempnode is $devnode (or
alternatively, %N).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-07-02 13:09:09 -07:00
Justin Gottula a5398f8782 Udev rules: use non-ancient comma syntax
This file is old as dirt. It's entirely possible that commas were
optional in udev back at that time. But they're definitely supposed to
be there nowadays.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-07-02 13:08:42 -07:00
Alexander Motin b192a2c0a1 Remove avl_size field from struct avl_tree
This field is used only by illumos mdb.  On other platforms it only
increases the struct size from 32 to 40 bytes.  For struct vdev_queue
including 13 instances of avl_tree_t size means active cache lines.

Keep the padding in user-space for now to not break the ABI.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12290
2021-07-01 09:32:31 -06:00
Alexander Motin 490c845efe Compact dbuf/buf hashes and lock arrays
With default dbuf cache size of 1/32 of ARC, it makes no sense to have
hash table of the same size (or even bigger on Linux).  Reduce it to
1/8 of ARC's one, still leaving some slack, assuming higher I/O rate
via dbuf cache than via ARC.

Remove padding from ARC hash locks array.  The idea behind padding
is to avoid false sharing between locks.  It would have sense if
there would be a limited number of very busy locks.  But since we
have no limit on the number, using the same memory for more locks we
can achieve even lower lock contention with the same false sharing,
or we can use less memory for the same contention level.

Reduce number of hash locks from 8192 to 2048.  The number is still
big enough to not cause contention, but reduced memory size improves
cache hit rate for mutex_tryenter() in ARC eviction thread, saving
about 1% of the thread time.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12289
2021-07-01 09:30:31 -06:00
Jorgen Lundman c6d1112bf4 Fix abd leak, kmem_free correct size of abd_t
Fix a leak of abd_t that manifested mostly when using
raidzN with at least as many columns as N (e.g. a
four-disk raidz2 but not a three-disk raidz2).
Sufficiently heavy raidz use would eventually run a system
out of memory.

Additionally:

* Switch abd_cache arena to FIRSTFIT, which empirically
improves perofrmance.

* Make abd_chunk_cache more performant and debuggable.

* Allocate the abd_zero_buf from abd_chunk_cache rather
than the heap.

* Don't try to reap non-existent qcaches in abd_cache arena.

* KM_PUSHPAGE->KM_SLEEP when allocating chunks from their
own arena

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Co-authored-by: Sean Doran <smd@use.net>
Closes #12295
2021-07-01 09:28:15 -06:00
Jorgen Lundman eca174527e Upstream: dmu_zfetch_stream_fini leaks refcount
dmu_zfetch_stream_fini() is missing calls to destroy the refcounts,
leaking them and the mutex inside.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12294
2021-07-01 09:22:16 -06:00
Kevin Jin 50e09eddd0 Optimize txg_kick() process (#12274)
Use dp_dirty_pertxg[] for txg_kick(), instead of dp_dirty_total in
original code. Extra parameter "txg" is added for txg_kick(), thus it
knows which txg to kick. Also txg_kick() call is moved from
dsl_pool_need_dirty_delay() to dsl_pool_dirty_space() so that we can
know the txg number assigned for txg_kick().

Some unnecessary code regarding dp_dirty_total in txg_sync_thread() is
also cleaned up.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes #12274
2021-07-01 09:20:27 -06:00
Alexander Motin 42afb12da7 Remove refcount from spa_config_*()
The only reason for spa_config_*() to use refcount instead of simple
non-atomic (thanks to scl_lock) variable for scl_count is tracking,
hard disabled for the last 8 years.  Switch to simple int scl_count
reduces the lock hold time by avoiding atomic, plus makes structure
fit into single cache line, reducing the locks contention.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12287
2021-07-01 09:16:54 -06:00
Ryan Moeller cfc564f9b1 ZED: Match added disk by pool/vdev GUID if found (#12217)
This enables ZED to auto-online vdevs that are not wholedisk managed by
ZFS.

Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2021-06-30 07:37:20 -07:00
Brian Behlendorf 4694131a0a Linux 5.13 compat: META
Increase the Linux-Maximum version in the META file to 5.13.
All of the required compatibility patches have been merged
and the 5.13 kernel has been officially released.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-06-29 13:16:38 -07:00
Laurențiu Nicola 3482c2b79a zed: fix sending emails (#12292)
Commit 6fc3099 broke the quoting when invoking the mail program, revert
that change.

Signed-off-by: Laurențiu Nicola <lnicola@dend.ro>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2021-06-29 12:33:49 -07:00
Alexander 6a19dea7f6 module/zfs: simplify ddt_stat_add() loop
LLVM's Polly (ISL to be precise) is unhappy with the loop from
ddt_stat_add():

  CC [M]  fs/zfs/zfs/ddt.o
../lib/External/isl/isl_schedule_node.c:2470: cannot insert node
between set or sequence node and its filter children

(building with the custom patch which adds Polly support to Kbuild)

The mentioned loop is rather suboptimal. All that we need is to just
treat ddt_stat_t as an array of u64 and perform 1:1 addition or
substraction. This can be done in simpler for-loop with the
determined index and bounds. Compiler will expand d_end - d into
a number of ddt_stat_t fields at compile time.
This prevents Polly from failing on this file.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #12253
2021-06-29 08:26:11 -06:00
Alexander Motin 5b7053a9a5 Avoid 64bit division in multilist index functions
The number of sublists in a multilist is relatively small. We dont need
64 bits to calculate an index. 32 bits is sufficient and makes the
code more efficient.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> 
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12288
2021-06-29 06:59:14 -06:00
Michal Vasilek f20fb199e6 Fix plymouth passphrase prompt with dracut
plymouth --command splits the command on spaces which means
that zfs-load-key was getting the filesystem name enclosed
in single quotes (since 13c59bb76) and failing. This commit
fixes it by piping the password directly to the command
similar to how it's done in other scripts (initramfs,
dracut without plymouth).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Michal Vasilek <michal@vasilek.cz>
Related-to: #9193
Related-to: #9202
Closes #12147
2021-06-25 22:43:25 -07:00
Rich Ercolani 2084e9f748 Fix build with KASAN
The stock zstd code expects some helpers from ASAN if present.
This works fine in userland, but in kernel, KASAN also gets detected,
and lacks those helpers. So let's make some empty substitutes for
that case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12232
2021-06-25 22:28:12 -07:00
Alexander Motin 5e2c8338bf Help compiller optimize out abd_verify()
While abd_verify() does nothing when built without debug, compiler
can't optimize it out by itself due to calls to external list_*()
and abd_verify_scatter().  This commit makes it explicit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12280
2021-06-25 16:38:31 -07:00
Martin Matuška 14d2841b53 FreeBSD: fix compilation of FreeBSD world after 29274c9f6
prng32_bounded() is available to kernel only on FreeBSD 13+.

Call inline random_get_pseudo_bytes() with correct pointer type.
To be consistent, apply to Linux as well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #12282
2021-06-25 10:28:51 -07:00
Brian Behlendorf 88a4833039 Update cache file when setting compatibility property
Unlike most other properties the 'compatibility' property is stored
in the pool config object and not the DMU_OT_POOL_PROPS object.

This had the advantage that the compatibility information is available
without needing to fully import the pool (it can be read with zdb).
However, this means we need to make sure to update both the copy of
the config in the MOS and the cache file.  This wasn't being done.

This commit adds a call to spa_async_request() to ensure the copy of
the config in the cache file gets updated as well as the one stored
in the pool.  This same change is made for the 'comment' property
which suffers from the same inconsistency.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Colm Buckley <colm@tuatha.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12261 
Closes #12276
2021-06-24 14:30:02 -07:00
Paul Dagnelie 8f11b1d26e Fix flag copying in resume case
A couple flags weren't being copied in the case where we're doing size
estimation on a resume.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes: #12266
2021-06-24 13:42:01 -06:00
jumbi77 86f5e0bbce zfs_metaslab_mem_limit should be 25 instead of 75
According to current zfs man page zfs_metaslab_mem_limit should be
25 instead of 75.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: jumbi77@users.noreply.github.com
Closes #12273
2021-06-24 10:02:54 -07:00
Rich Ercolani 126615303d Stop using "zstreamdump" in tests/
zstreamdump was replaced with "zstream dump"; let's stop using the
old name, compat symlink or no.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12277
2021-06-24 09:38:33 -07:00
Jonathon 942121cfcb Update libera webchat client URL
Libera have made a webchat client available. This change builds on #12127.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jonathon Fernyhough <jonathon@m2x.dev>
Closes #12251
2021-06-23 17:14:58 -07:00
Attila Fülöp 1b610ae45f gcc 11 cleanup
Compiling with gcc 11.1.0 produces three new warnings.
Change the code slightly to avoid them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12130
Closes #12188
Closes #12237
2021-06-23 17:57:06 -06:00
Brian Behlendorf 63f4b959a6 ZTS: Add known exceptions
The receive-o-x_props_override test case reliably fails on the
FreeBSD main builders (but not on Linux), until the root cause is
understood add this test to the FreeBSD exception list.

On Linux the alloc_class_012_pos test case may occasionally fail.
This is a known false positive which has also been added to the
Linux exception list until the test can be made entirely reliable.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12272
2021-06-23 15:53:13 -07:00
Rich Ercolani 8e739b2c9f Annotated dprintf as printf-like
ZFS loves using %llu for uint64_t, but that requires a cast to not 
be noisy - which is even done in many, though not all, places.
Also a couple places used %u for uint64_t, which were promoted
to %llu. 

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12233
2021-06-22 21:53:45 -07:00
Antonio Russo a81b812495 Revert Consolidate arc_buf allocation checks
This reverts commit 13fac09868.

Per the discussion in #11531, the reverted commit---which intended only
to be a cleanup commit---introduced a subtle, unintended change in
behavior.

Care was taken to partially revert and then reapply 10b3c7f5e4
which would otherwise have caused a conflict.  These changes were
squashed in to this commit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Suggested-by: @chrisrd
Suggested-by: robn@despairlabs.com
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11531 
Closes #12227
2021-06-22 21:39:15 -07:00
Alexander Motin 29274c9f6d Optimize small random numbers generation
In all places except two spa_get_random() is used for small values,
and the consumers do not require well seeded high quality values.
Switch those two exceptions directly to random_get_pseudo_bytes()
and optimize spa_get_random(), renaming it to random_in_range(),
since it is not related to SPA or ZFS in general.

On FreeBSD directly map random_in_range() to new prng32_bounded() KPI
added in FreeBSD 13.  On Linux and in user-space just reduce the type
used to uint32_t to avoid more expensive 64bit division.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12183
2021-06-22 17:35:23 -06:00
Brian Behlendorf ba91311561 Linux 5.12 compat: META
Increase the Linux-Maximum version in the META file to 5.12.
All of the required compatibility patches have been merged.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-06-21 08:22:40 -07:00
Alexander Motin c4c162c1e8 Use wmsum for arc, abd, dbuf and zfetch statistics. (#12172)
wmsum was designed exactly for cases like these with many updates
and rare reads.  It allows to completely avoid atomic operations on
congested global variables.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12172
2021-06-16 18:19:34 -06:00
George Amanakis 9ffcaa370a Avoid deadlock when removing L2ARC devices under I/O
In case we have I/O and try to remove an L2ARC device a deadlock might
occur. arc_read()->zio_read()->zfs_blkptr_verify() waits for SCL_VDEV
to be dropped while holding the hash_lock. However, spa_l2cache_load()
holds SCL_ALL and waits for the hash_lock in l2arc_evict().

Fix this by moving zfs_blkptr_verify() to the top top arc_read() before
the hash_lock is taken. Verify the block pointer and return a checksum
error if damaged rather than halting the system, by using
BLK_VERIFY_LOG instead of BLK_VERIFY_HALT.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12054
2021-06-16 18:17:42 -06:00
Rich Ercolani ff31750405 Fix importing with symlinks
It turns out that symlinks are heavily used on Linux in /dev/disk.
So let's allow importing from them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12238
2021-06-14 09:59:54 -07:00
наб 83ba91adf6 systemd: import: expand $ZPOOL_IMPORT_OPTS correctly
Turns out $ZPOOL_IMPORT_OPTS expands in a shell-like fashion,
yielding 'import' '-aN' '-o' 'cachefile=none' for an unset variable,
and 'import' '-aN' '-o' 'cachefile=none' 'word1' 'word2' for a
white-spaced one, but ${ZPOOL_IMPORT_OPTS} expands like "${Z_I_O}"
would in a shell, yielding 'import' '-aN' '-o' 'cachefile=none' ''
(empty) and 'import' '-aN' '-o' 'cachefile=none' 'word1 word2' (spaced)

Fixes eec5ba113e "dracut: 90zfs: respect
zfs_force=1 on systemd systems"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes: #12231
2021-06-14 09:48:53 -06:00
Matthew Ahrens 069bf406b4 vdev_draid_min_asize() ignores reserved space
vdev_draid_min_asize() returns the minimum size of a child vdev.  This
is used when determining if a disk is big enough to replace a child.
It's also used by zdb to determine how big of a child to make to test
replacement.

vdev_draid_min_asize() says that the child’s asize has to be at least
1/Nth of the entire draid’s asize, which is the same logic as raidz.
However, this contradicts the code in vdev_draid_open(), which
calculates the draid’s asize based on a reduced child size:

  An additional 32MB of scratch space is reserved at the end of each
  child for use by the dRAID expansion feature

So the problem is that you can replace a draid disk with one that’s
vdev_draid_min_asize(), but it actually needs to be larger to accommodate
the additional 32MB.  The replacement is allowed and everything works at
first (since the reserved space is at the end, and we don’t try to use
it yet), but when you try to close and reopen the pool,
vdev_draid_open() calculates a smaller asize for the draid, because of
the smaller leaf, which is not allowed.

I think the confusion is that vdev_draid_min_asize() is correctly
returning the amount of required *allocatable* space in a leaf, but the
actual *size* of the leaf needs to be at least 32MB more than that.
ztest_vdev_attach_detach() assumes that it can attach that size of
device, and it actually can (the kernel/libzpool accepts it), but it
then later causes zdb to not be able to open the pool.

This commit changes vdev_draid_min_asize() to return the required size
of the leaf, not the size that draid will make available to the metaslab
allocator.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11459
Closes #12221
2021-06-13 10:48:53 -07:00
Paul Zuchowski afa7b34845 Do not hash unlinked inodes
In zfs_znode_alloc we always hash inodes.  If the
znode is unlinked, we do not need to hash it.  This
fixes the problem where zfs_suspend_fs is doing zrele
(iput) in an async fashion, and zfs_resume_fs unlinked
drain processing will try to hash an inode that could
still be hashed, resulting in a panic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9741
Closes #11223
Closes #11648
Closes #12210
2021-06-11 17:00:33 -07:00
наб 10bcc4da6c scripts/commitcheck.sh: fix false positive for signed commits
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12105
2021-06-11 09:10:25 -07:00
наб feb04e6680 Forbid basename(3) and dirname(3)
There are at least two interpretations of basename(3),
in addition to both functions being allowed to /both/ return a static
buffer (unsuitable in multi-threaded environments) /and/ raze the input
(which encourages overallocations, at best)

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12105
2021-06-11 09:10:21 -07:00
наб 64dfdaba37 libzutil: import: filter out unsuitable files earlier
Only accept the right type of file, if available, and reject too-small
files before opening them on Linux

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12105
2021-06-11 09:10:13 -07:00
наб bf80fb53f5 linux/libzutil: zpool_open_func: don't dup name, extract untouchables
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12105
2021-06-11 09:10:09 -07:00
наб 0854d4c186 libzutil: add zfs_{base,dir}name()
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12105
2021-06-11 09:10:05 -07:00
наб 3aa81a6635 linux/libzutil: use ARRAY_SIZE instead of constant for search paths
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12105
2021-06-11 09:10:00 -07:00
Rich Ercolani 1a345d645a Added uncompress requirement
Having an old enough version of "file" and no "uncompress" program
installed can cause rpmbuild as root to crash and mangle rpmdb.

So let's add a build dependency for RPM-based systems.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12071
Closes: #12168
2021-06-11 09:38:23 -06:00
Brian Behlendorf 9d639d8799 ZTS: Add zfs_clone_livelist_dedup.ksh to Makefile.am
Commit 86b5f4c12 added a new zfs_clone_livelist_dedup.ksh test case
but didn't include it in the Makefile.am.  This results in the test
not being included in the dist tarball so it's never run by the CI.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #12224
2021-06-11 09:21:36 -06:00
Alexander Motin ffdf019cb3 Re-embed multilist_t storage
This commit partially reverts changes to multilists in PR 7968
(multi-threaded spa-sync()) and adds some cache line alignments to
separate read-only multilists and heavily modified refcount's to different
cache lines.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-by: iXsystems, Inc.
Closes #12158
2021-06-10 10:42:31 -06:00
наб eec5ba113e dracut: 90zfs: respect zfs_force=1 on systemd systems
On systemd systems provide an environment generator in order
to respect the zfs_force=1 kernel command line option.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11403
Closes #12195
2021-06-10 09:26:37 -07:00
Alexander Motin 371f88d96f Remove pool io kstats (#12212)
This mostly reverts "3537 want pool io kstats" commit of 8 years ago.

From one side this code using pool-wide locks became pretty bad for
performance, creating significant lock contention in I/O pipeline.
From another, there are more efficient ways now to obtain detailed
statistics, while this statistics is illumos-specific and much less
usable on Linux and FreeBSD, reported only via procfs/sysctls.

This commit does not remove KSTAT_TYPE_IO implementation, that may
be removed later together with already unused KSTAT_TYPE_INTR and
KSTAT_TYPE_TIMER.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12212
2021-06-10 08:27:33 -07:00
Rich Ercolani 860051f1d1 Added error for writing to /dev/ on Linux
Starting in Linux 5.10, trying to write to /dev/{null,zero} errors out.
Prefer to inform people when this happens rather than hoping they guess
what's wrong.

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes:  #11991
2021-06-09 18:57:57 -06:00
наб 327c904615 lib{efi,avl,share,tpool,zfs_core,zfsbootenv,zutil}: -fvisibility=hidden
No symbols affected in libavl
No symbols affected by libtpool, but pre-ANSI declarations got purged
No symbols affected by libzfs_core
No symbols affected by libzfs_bootenv

libefi got cleaned, gained efi_debug documentation in efi_partition.h,
and removes one undocumented and unused symbol from libzfs_core:
  D default_vtoc_map

libnvpair saw removal of these symbols:
  D nv_alloc_nosleep_def
  D nv_alloc_sleep
  D nv_alloc_sleep_def
  D nv_fixed_ops_def
  D nvlist_hashtable_init_size
  D nvpair_max_recursion

libshare saw removal of these symbols from libzfs:
  T libshare_nfs_init
  T libshare_smb_init
  T register_fstype
  B smb_shares

libzutil saw removal of these internal symbols from libzfs_core:
  T label_paths
  T slice_cache_compare
  T zpool_find_import_blkid
  T zpool_open_func
  T zutil_alloc
  T zutil_strdup

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12191
2021-06-09 17:04:32 -07:00
наб d406a695c6 libefi: remove efi_auto_sense()
It's present (but undocumented) in the illumos gate and used exclusively
by rmformat(1) (which I recommend as a nice blast from the past),
and also the math assumes 512B sectors and is therefore wrong

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12191
2021-06-09 17:03:42 -07:00
наб 4b7ed6a286 zgenhostid.8: revisit
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12212
2021-06-09 14:36:03 -07:00
наб 1b37cc1abe Consistentify miscellaneous style on remaining manpages
Most notably this fixes the vdev_id(8) non-.Xrs in vdev_id.conf.5

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12212
2021-06-09 14:35:53 -07:00
наб 2badb3457a Move properties, parameters, events, and concepts around manual sections
The pages moved as follows:
  zpool-features.{5 => 7}
  spl{-module-parameters.5 => .4}
  zfs{-module-parameters.5 => .4}
  zfs-events.5 => into zpool-events.8
  zfsconcepts.{8 => 7}
  zfsprops.{8 => 7}
  zpoolconcepts.{8 => 7}
  zpoolprops.{8 => 7}

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Co-authored-by: Daniel Ebdrup Jensen <debdrup@FreeBSD.org>
Closes #12149
Closes #12212
2021-06-09 14:35:30 -07:00
наб b0f3e8a6eb man: use one Makefile, use OpenZFS for .Os
The prevailing style is to use either nothing, or the originating
organisational umbrella (here: OpenZFS), and these aren't Linux manpages

This also deduplicates the substitution code, and makes adding/removing
sexions simpler in future

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12212
2021-06-09 14:34:47 -07:00
Brian Behlendorf 88af959b24 Fix minor shellcheck 0.7.2 warnings
The first warning of a misspelling is a false positive, so we annotate
the script accordingly.  As for the x-prefix warnings update the check
to use the conventional '[ -z <string> ]' syntax.

all-syslog.sh:46:47: warning: Possible misspelling: ZEVENT_ZIO_OBJECT
    may not be assigned, but ZEVENT_ZIO_OBJSET is. [SC2153]
make_gitrev.sh:53:6: note: Avoid x-prefix in comparisons as it no
    longer serves a purpose [SC2268]
man-dates.sh:10:7: note: Avoid x-prefix in comparisons as it no
    longer serves a purpose [SC2268]

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12208
2021-06-09 12:21:24 -07:00
Rich Ercolani 08cd071735 Correct a flaw in the Python 3 version checking
It turns out the ax_python_devel.m4 version check assumes that
("3.X+1.0" >= "3.X.0") is True in Python, which is not when X+1
is 10 or above and X is not. (Also presumably X+1=100 and ...)

So let's remake the check to behave consistently, using the
"packaging" or (if absent) the "distlib" modules.

(Also, update the Github workflows to use the new packages.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12073
2021-06-08 18:20:16 -06:00
наб 1fcfc21cd8 zed.d/history_event-zfs-list-cacher.sh.in: parallelise, simplify
This:
  (a) improves the error log message,
  (b) locks per pool instead of globally,
  (c) locks the actual output file instead of /var/lock/zfs-list,
      which would otherwise linger there forever (well, still will,
      but you can remove it and it won't come back), and
  (d) preserves attributes of the output file
      instead of reverting them to 0:0 644

It is imperative that the previous commit
("zed-functions.sh: zed_lock(): don't truncate lock")
be included in any series that contains this one

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-06-08 16:09:39 -07:00
наб e20c9330d7 zed.d/all-debug.sh: simplify
By locking the log file itself, we can omit arduous rebinding and
explicit umask setting, but, perhaps more importantly, avoid permanently
littering /var/lock/ with zed.debug.log.lock we will never delete

It is imperative that the previous commit
("zed-functions.sh: zed_lock(): don't truncate lock")
be included in any series that contains this one

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-06-08 16:09:33 -07:00
наб b828cf1d01 zed-functions.sh: zed_lock(): don't truncate lock
By appending instead of truncating, we can lock on any file (with write
permissions) instead of only dedicated lock files, since the locking
process itself no longer alters the file in any way

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-06-08 16:09:22 -07:00
Alan Somers 75b4cbf625 libzfs: On FreeBSD, use MNT_NOWAIT with getfsstat
`getfsstat(2)` is used to retrieve the list of mounted file systems,
which libzfs uses when fetching properties like mountpoint, atime,
setuid, etc.  The `mode` parameter may be `MNT_NOWAIT`, which uses
information in the VFS's cache, or `MNT_WAIT`, which effectively does a
`statfs` on every single mounted file system in order to fetch the most
up-to-date information.  As far as I can tell, the only fields that
libzfs cares about are the filesystem's name, mountpoint, fstypename,
and mount flags.  Those things are always updated on mount and unmount,
so they will always be accurate in the VFS's mount cache except in two
circumstances:

1) When a file system is busy unmounting
2) When a ZFS file system changes the value of a mount-overridable
   property like atime or setuid, but doesn't remount the file system.
   Right now that only happens when the property is changed by an
   unprivileged user who has delegated authority to change the property
   but not to mount the dataset.  But perhaps libzfs could choose to do
   it for other reasons in the future.

Switching to `MNT_NOWAIT` will greatly improve speed with no downside,
as long as we explicitly update the mount cache whenever we change a
mount-overridable property.

For comparison, Illumos gets this information using the native
`getmntany` and `getmntent` functions, which also use cached
information.  The illumos function that would refresh the cache,
`resetmnttab`, is never called by libzfs.

And on GNU/Linux, `getmntany` and `getmntent` don't even communicate
with the kernel directly.  They simply parse the file they are given,
which is usually /etc/mtab or /proc/mounts.  Perhaps the implementation
of /proc/mounts is synchronous, ala MNT_WAIT; I don't know.

Sponsored-by:	Axcient
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes: #12091
2021-06-08 07:36:43 -06:00
наб 9685f363c3 tests/file_check: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-06-07 20:59:01 -07:00
наб f423411c44 module/zfs: vdev_removal: spa_vdev_remove_thread: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-06-07 20:58:56 -07:00
наб 6bab6ee838 module/zfs: vdev_indirect: vdev_indirect_repair: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-06-07 20:58:51 -07:00
наб f70f3e2fbc module/zfs: dbuf: dbuf_read_impl: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-06-07 20:58:47 -07:00
наб f719d3b160 module/zfs: arc: arc_hdr_realloc_crypt: remove unused variables
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-06-07 20:58:42 -07:00
наб 4ff49c5a06 libzfs: zfs_send: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-06-07 20:58:28 -07:00
наб 4f1009face zdb: zdb_decompress_block: don't needlessly set buf
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-06-07 20:58:23 -07:00
наб 77f3b67520 libzutil: zpool_find_config: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-06-07 20:58:10 -07:00
наб 2d815d955e Modernise/fix/rewrite unlinted manpages
zpool-destroy.8: flatten, fix description
zfs-wait.8: flatten, fix description, use list for events
zpool-reguid.8: flatten, fix description
zpool-history.8: flatten, fix description
zpool-export.8: flatten, fix description, remove -f "unmount" reference
  AFAICT no such command exists even in Illumos (as of today, anyway),
  and we definitely don't call it
zpool-labelclear.8: flatten, fix description
zpool-features.5: modernise
spl-module-parameters.5: modernise
zfs-mount-generator.8: rewrite
zfs-module-parameters.5: modernise

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12169
2021-06-07 13:41:54 -06:00
Rich Ercolani afb96fa6ee Force --enable-debug on FreeBSD if INVARIANTS is set
There's already logic to force INVARIANTS on for building if it's
present in the running kernel; however, not having DEBUG enabled
when DEBUG and INVARIANTS are can cause strange panics.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12185
Closes #12163
2021-06-07 13:29:27 -06:00
Serapheim Dimitropoulos 86b5f4c121 Livelist logic should handle dedup blkptrs
Update the logic to handle the dedup-case of consecutive
FREEs in the livelist code. The logic still ensures that
all the FREE entries are matched up with a respective
ALLOC by keeping a refcount for each FREE blkptr that we
encounter and ensuring that this refcount gets to zero
by the time we are done processing the livelist.

zdb -y no longer panics when encountering double frees

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #11480
Closes #12177
2021-06-07 13:09:07 -06:00
Alexander Motin ea400129c3 More aggsum optimizations
- Avoid atomic_add() when updating as_lower_bound/as_upper_bound.
Previous code was excessively strong on 64bit systems while not
strong enough on 32bit ones.  Instead introduce and use real
atomic_load() and atomic_store() operations, just an assignments
on 64bit machines, but using proper atomics on 32bit ones to avoid
torn reads/writes.

 - Reduce number of buckets on large systems.  Extra buckets not as
much improve add speed, as hurt reads.  Unlike wmsum for aggsum
reads are still important.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12145
2021-06-07 09:02:47 -07:00
наб e5e76bd643 libzfs: write_inuse_diffs_one: format strerror() with "%s"
Fixes 50353dbd ("Let zfs diff be more  permissive") which accidentally
introduced a build warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12197
2021-06-04 16:04:37 -07:00
наб 2ba7a9e5cc i-t: don't try to import from empty cache
Chases 7c64ee9e77
 ("zfs-import-{cache,scan}: change condition to FileNotEmpty")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12108
2021-06-04 14:01:08 -07:00
наб a0242eceff dracut: 90zfs: zfs-load-key: wait for key to appear for up to 10 seconds
Also reduce password retries to 3 to match i-t

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12065
Closes #12108
2021-06-04 14:01:08 -07:00
наб b2c68bea50 Use %%/* instead of awk -F/ {print $1} to strip datasets
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12108
2021-06-04 14:01:08 -07:00
наб c38bc221b2 dracut: 90zfs: zfs-load-key: don't load unencrypted bootfs' keylocation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11800
Closes #12108
2021-06-04 14:01:08 -07:00
наб cfc8dd1983 dracut: 90zfs: module-setup: try /lib*/libgcc_s.so*, relax /u/l/gcc path
SUSE stores the library at /lib64/libgcc_s.so.1 (/lib/libgcc_s.so.1 for
i686 glibc), which is in the search path

Also relax the /usr/lib path to catch systems similar to SUSE
(/usr/lib64/gcc/x86_64-suse-linux/10/libgcc_s.so) but without
the top-level lib64

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11750
Closes #12108
2021-06-04 14:01:08 -07:00
Rich Ercolani 50353dbd05 Let zfs diff be more permissive
In the current world, `zfs diff` will die on certain kinds of errors
that come up on ordinary, not-mangled filesystems - like EINVAL,
which can come from a file with multiple hardlinks having the one
whose name is referenced deleted.

Since it should always be safe to continue, let's relax about all
error codes - still print something for most, but don't immediately
abort when we encounter them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12072
2021-06-04 15:00:39 -06:00
jharmening 8dddb25d2c FreeBSD: incorporate changes to the VFS_QUOTACTL(9) KPI
VFS_QUOTACTL(9) has been updated to allow each filesystem to indicate
whether it has changed the busy state of the mount.  The filesystem
may still assume that its .vfs_quotactl entrypoint is always called
with the mount busied, but only needs to unbusy the mount (and clear
*mp_busy) if it does something that actually requires the mount to be
unbusied.  It no longer needs to blindly copy-paste the UFS protocol
for calling vfs_unbusy(9) for the Q_QUOTAOFF and Q_QUOTAON commands.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jason Harmening <jason.harmening@gmail.com>
Closes #12052
2021-06-04 14:11:08 -06:00
Ryan Moeller 1f8e5b6cb8 Fix error check in nvlist_print_json_string
Move check for errors from mbrtowc() into the loop.  The error values
are not actually negative, so we don't break out of the loop when they
are encountered.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12175
Closes #12176
2021-06-04 13:53:44 -06:00
наб f84fe3fc87 Lint most manpages
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12129
2021-06-04 12:48:31 -07:00
наб 4a98300feb mancheck: accept lints, accept lint overrides
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12129
2021-06-04 12:48:26 -07:00
Brian Behlendorf 7837845822 Linux: Set spl_kmem_cache_slab_limit when page size !4K
For small objects the kernel's slab implementation is very fast and
space efficient. However, as the allocation size increases to
require multiple pages performance suffers. The SPL kmem cache
allocator was designed to better handle these large allocation
sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers
prefer to use the kernel's slab allocator for small objects and
the custom SPL kmem cache allocator for larger objects.

This logic was effectively disabled for all architectures using
a non-4K page size which caused all kmem caches to only use the
SPL implementation. Functionally this is fine, but the SPL code
which calculates the target number of objects per-slab does not
take in to account that __vmalloc() always returns page-aligned
memory. This can result in a massive amount of wasted space when
allocating tiny objects on a platform using large pages (64k).

To resolve this issue we set the spl_kmem_cache_slab_limit cutoff
to 16K for all architectures. 

This particular change does not attempt to update the logic used
to calculate the optimal number of pages per slab. This remains
an issue which should be addressed in a future change.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12152
Closes #11429
Closes #11574
Closes #12150
2021-06-03 14:37:45 -06:00
наб 739cfb965b libzfs: convert to -fvisibility=hidden
Also mark all printf-like funxions in libzfs_impl.h as printf-like
and add --no-show-locs to storeabi, in hopes diffs will make more sense
in future

This removes these symbols from libzfs:
  D nfs_only
  T SHA256Init
  T SHA2Final
  T SHA2Init
  T SHA2Update
  T SHA384Init
  T SHA512Init
  D share_all_proto
  D smb_only
  T zfs_is_shared_proto
  W zpool_mount_datasets
  W zpool_unmount_datasets

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12048
2021-06-03 13:17:55 -07:00
наб e00aae4be2 libefi: efi_get_devname: don't allocate procfs path
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12048
2021-06-03 13:17:49 -07:00
наб eefaa55f64 libzfs: don't distribute libzfs_impl.h
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12048
2021-06-03 13:17:35 -07:00
наб 94f942c658 libspl: staticify buf and pagesize, rename aok to libspl_assert_ok
Exporting names this short can easily cause nasty collisions with user code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12050
2021-06-03 11:04:13 -06:00
Colm f97142c748 A couple of small style cleanups
In `zpool_load_compat()`:

  * initialize `l_features[]` with a loop rather than a static
    initializer.

  * don't redefine system constants; use private names instead

Rationale here:

When an array is initialized using a static {foo}, only the specified
members are initialized to the provided values, the rest are
initialized to zero. While B_FALSE is of course zero, it feels
unsafe to rely on this being true forever, so I'm inclined to sacrifice
a few microseconds of runtime here and initialize using a loop.

When looking for the correct combination of system constants to use
(in open() and mmap()), I prefer to use private constants rather than
redefining system ones; due to the small chance that the system
ones might be referenced later in the file. So rather than defining
O_PATH and MAP_POPULATE, I use distinct constant names.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #12156
2021-06-03 09:13:42 -06:00
наб f645d4416f zfs-module-parameters.5: remove nonexistent parameters
zfs_arc_overflow_shift was never a parameter:
ca0bf58d65 ("Illumos 5497 - lock
contention on arcs_mtx") is the only result in
git log -Soverflow_shift, and it wasn't exposed then, nor is it now

zfs_read_chunk_size was renamed to zfs_vnops_read_chunk_size in
e53d678d4a ("Share zfs_fsync, zfs_read,
zfs_write, et al between Linux and FreeBSD")

zio_decompress_fail_fraction was never a parameter: it was added in
c3bd3fb4ac ("OpenZFS 9403 - assertion
failed in arc_buf_destroy()") as a developer aid for setting in zdb, but
it's a dangerous test tunable and has no place in public documentation,
(not to mention that it obviously doesn't work):
> Although this did uncover a few low priority issues, this
  unfortuantely also causes ztest to ASSERT in many locations where the
  code is working correctly since it is designed to fail on IO errors.
  Developers can manually set this variable with the '-o' option to find
  and debug issues.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12157
2021-06-02 12:53:22 -07:00
наб ace760a0b4 spl-module-parameters.5: remove spl_kmem_cache_{expire,obj_per_slab_min}
Both were removed in 4fbdb10c7b ("remove
kmem_cache module parameter KMC_EXPIRE_AGE")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12157
2021-06-02 12:52:32 -07:00
Rich Ercolani 6c7c7201d9 Quick fixes for two ZTS failures
On FreeBSD 14, these two tests started erroring out like the
objects they're attempting to examine don't exist.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12165
2021-06-01 15:34:19 -06:00
Rich Ercolani 8bd41ebd54 Added another missed case to arc_summary3
It turns out that sometimes, evidently only when run inside the
ZTS handler, arc_summary3 | head > /dev/null will die with ENOTCONN,
and ruin the test run.

Added handling for that.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12160
2021-06-01 15:20:50 -06:00
Ryan Moeller e1af0d0d73 libzfs_core: Fix some style violations
Made function names start on a new line. Added a blank line between
functions. This helps when grepping for functions.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12137
2021-06-01 15:13:26 -06:00
grembo 65d9212aee FreeBSD boot code reminder after zpool upgrade
There used to be a warning after upgrading a zpool in FreeBSD, so users
won't forget to update the boot loader that pool is booted from.

This change brings this warning back, but only if the bootfs property
is set on the pool, which should be sufficient for the vast majority of
FreeBSD installations. People running something custom are most likely
aware of what to do after an upgrade in their specific environment.

Functionality is implemented in an OS specific helper function.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Michael Gmelin <grembo@FreeBSD.org>
Signed-off-by: Michael Gmelin <grembo@FreeBSD.org>
Closes #12099
Closes #12104
2021-06-01 15:03:49 -06:00
Rich Ercolani 3f81aba766 Remove iov_iter_advance() for iter_write
The additional iter advance is incorrect, as copy_from_iter() has
already done the right thing.  This will result in the following
warning being printed to the console as of the 5.12 kernel.

    Attempted to advance past end of bvec iter

This change should have been included with #11378 when a
similar change was made on the read side.

Suggested-by: @siebenmann
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Issue #11378
Closes #12041
Closes #12155
2021-06-01 11:58:08 -07:00
наб f7d7ee0583 Turn checkbashisms into a make target
make_gitrev.sh actually breaks checkbashisms' parser,
which /insists/ that the end-of-line " is actually a string start

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12101
2021-06-01 11:38:54 -07:00
наб c3ef9f7528 Turn shellcheck into a normal make target. Fix new files it caught
This checks every file it checked (and a few more),
but explicitly instead of "if it works it works" best-effort
(which wasn't that good anyway)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #10512
Closes #12101
2021-06-01 11:38:49 -07:00
наб d3858ab788 udev/rules.d: .gitignore: glob all rules
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12101
2021-06-01 11:38:45 -07:00
наб 8d5f211fc8 i-t: don't suggest zpool-import with altroot to /root
This *will fail* when remounted by the real root

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12148
2021-05-29 21:33:53 -07:00
наб 7b5bf4758b i-t: let rootdelay= set $ZFS_INITRD_PRE_MOUNTROOT_SLEEP
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Ref: https://github.com/openzfs/zfs/issues/11420#issuecomment-850338673
Closes #11663
Closes #12148
2021-05-29 21:32:48 -07:00
наб d484a7255b zstream: force-install zstreamdump link
Accidentally introduced by commit dd00925e8d.

Force-install the zstreamdump link, this is a supported configuration
and the install should not fail if it needs to overwrite an existing
file.

Also cd to work around some funny platforms as noted in AC_PROG_LN_S doc

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12143
2021-05-29 20:37:05 -07:00
наб 102c91b4f8 Widen mancheck to all of man and test-runner
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:37 -07:00
наб 4910903c07 test-runner.1: modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:32 -07:00
наб d44449025c zfs-events.5: modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:26 -07:00
наб c3fb6653b5 vdev_id.conf.5: modernise
Also yeet pci_slot since it doesn't seem to exist?

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:21 -07:00
наб 56454b0391 man: use Nm/Cm/Fl consistently
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:19 -07:00
наб 1f3cbcfcc5 zed.8: modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:14 -07:00
наб 3bcbb6af16 cstyle.1: modernise
Also remove note about the OS/Net consolidation, now the illumos gate

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:07 -07:00
наб 76b8f7cf53 vdev_id.8: modernise, note scsi topology
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:01 -07:00
наб 5aace42ce7 zhack.1: modernise
The spacing on zhack    feature stat    pool is a bit iffy(?)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:00 -07:00
наб 2c9c5bc859 zpool_influxdb.8: modernise
Also rip out the section about potentially including in the OpenZFS
distribution and simplify -e description

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:21:59 -07:00
наб 3d00fbf9a0 zinject.8: modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:21:58 -07:00
наб 3365d2cf71 raidz_test.1: modernise
Also re-add articles left out by the slav who wrote this

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:21:56 -07:00
наб defa5a0ae5 zpoolprops.8: fix spacing in ashift
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:21:55 -07:00
наб 2b42a1e57e fsck.zfs.8: modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:21:54 -07:00
наб 639871363a arcstat.1: modernise
Also slim down the description a tad

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:21:53 -07:00
наб 8134178f3b ztest.1: modernise
I fixed a few typos, but avoided changing anything beyond that;
the sould of the document should be preserved

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:21:52 -07:00
наб b1821d06c3 zgenhostid.8: use single-line indent macro for single-line examples
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:21:46 -07:00
наб 757df52928 libzfs: add zfs_get_underlying_type. Stop including libzfs_impl.h in cmd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12116
2021-05-29 14:26:38 -07:00
наб e618e4a4ff include: move SPA_MINBLOCKSHIFT and zio_encrypt to sys/fs/zfs.h
These are used by userspace, so should live in a public header

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12116
2021-05-29 14:26:32 -07:00
наб f00f469052 libzfs: format safety
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12116
2021-05-29 14:26:25 -07:00
наб c4e5a07fd0 libzfs: expose zfs_mount_delegation_check
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12116
2021-05-29 14:25:48 -07:00
Manoj Joseph 45516b4a0a long options for ztest
This change introduces long options for ztest. It builds the usage
message as well as the long_options array from a single table. It also
adds #defines for the default values.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Manoj Joseph <manoj.joseph@delphix.com>
Closes #12117
2021-05-28 16:06:07 -06:00
Paul Dagnelie 5d59178e98 Don't direct to freenode in issue template
While Libera doesn't yet have a webchat client, we should at least
direct them to the right network. Once a webchat client is available,
we can direct them to it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12127
2021-05-28 15:08:41 -06:00
Armin Wehrfritz 6465e590dc RPM: Explicitly set the required min/max kernel version for the DKMS package
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Armin Wehrfritz <dkxls23@gmail.com>
Closes #12124
2021-05-27 23:06:45 -07:00
Rich Ercolani d66b817f0a Minor fix to configure on s390x
configure on s390x has a key check fail with an error about
a variable being used uninitialized. So let's initialize it.

Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12126
2021-05-27 22:39:53 -07:00
Alexander Motin 86706441a8 Introduce write-mostly sums
wmsum counters are a reduced version of aggsum counters, optimized for
write-mostly scenarios.  They do not provide optimized read functions,
but instead allow much cheaper add function.  The primary usage is
infrequently read statistic counters, not requiring exact precision.

The Linux implementation is directly mapped into percpu_counter KPI.
The FreeBSD implementation is directly mapped into counter(9) KPI.
In user-space due to lack of better implementation mapped to aggsum.

Unfortunately neither Linux percpu_counter nor FreeBSD counter(9)
provide sufficient functionality to completelly replace aggsum, so
it still remains to be used for several hot counters.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12114
2021-05-27 14:27:29 -06:00
Alexander Motin 2041d6eecd Improve scrub maxinflight_bytes math.
Previously, ZFS scaled maxinflight_bytes based on total number of
disks in the pool.  A 3-wide mirror was receiving a queue depth of 3
disks, which it should not, since it reads from all the disks inside.
For wide raidz the situation was slightly better, but still a 3-wide
raidz1 received a depth of 3 disks instead of 2.

The new code counts only unique data disks, i.e. 1 disk for mirrors
and non-parity disks for raidz/draid.  For draid the math is still
imperfect, since vdev_get_nparity() returns number of parity disks
per group, not per vdev, but still some better than it was.

This should slightly reduce scrub influence on payload for some pool
topologies by avoiding excessive queuing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored-By:	iXsystems, Inc.
Closing #12046
2021-05-27 10:11:39 -06:00
Rich Ercolani ba646e3e89 Bend zpl_set_acl to permit the new userns* parameter
Just like #12087, the set_acl signature changed with all the bolted-on
*userns parameters, which disabled set_acl usage, and caused #12076.

Turn zpl_set_acl into zpl_set_acl and zpl_set_acl_impl, and add a
new configure test for the new version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12076
Closes #12093
2021-05-27 08:55:49 -07:00
наб 5611bdc8f0 etc/systemd/zfs-mount-generator: output tweaks
git-diff--w-dirty, but:
  * zfs-load-key-$DSET.service -> zfs-load-key@$DSET.service
  * flattened set -eu into other /bin/sh flags
  * simpler (for 1 2 3 vs while [ counter ]; counter+=1) prompt loop
  * exec $ZFS where applicable

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11915
Closes #11917
2021-05-27 08:49:26 -07:00
наб 5a1fb060fd etc/systemd/zfs-mount-generator: rewrite in C
A plain rewrite of the shell version, and generates identical
units, save for replacing some empty lines with nothing, having fewer
meaningless spaces in After=s and different spacing in the lock scripts,
for a clean git diff -w

This is a gain of anywhere from 0m0.336s vs 0m0.022s (15.27x)
to 0m0.202s vs 0m0.006s (33.67x), depending on the hardware,
a.k.a. from "absolutely unusable" to "perfectly fine"

This also properly deals with canmount=noauto units across multiple
pools

See PR for detailed timings (of an early version) and diffs

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11915
Closes #11917
2021-05-27 08:45:51 -07:00
Rich Ercolani 0bb736ce0b Reinstate the old zpool read label logic as a fallback
In case of AIO failure, we should probably fallback to the old
behavior and still work.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12032
Closes #12040
2021-05-26 22:07:31 -07:00
наб 20bd864edc mount.zfs.8: match to reality; zfsprops.8: add missing temporary options
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12111
2021-05-26 21:44:56 -07:00
наб eae3598ae4 mount.zfs.8: modernise
No changes to the text itself

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12111
2021-05-26 21:44:50 -07:00
наб fb9baa9b20 zfsprops.8: remove nbmand-not-used-on-Linux and pointer to mount(8)
Linux man-pages' mount(8) points at fcntl(2), as does mount(2),
and support for it is little-used, deprecated, and configurable
since 4.5.

As far as I can tell, FreeBSD doesn't support nbmand at all ‒
mandatory locks are mostly dead

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12111
2021-05-26 21:44:29 -07:00
наб 69cbd0a360 Various Linux kABI cosmetics
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12103
2021-05-26 15:26:06 -07:00
наб 202498c958 linux: don't fall through to 3-arg vfs_getattr
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12103
2021-05-26 15:25:34 -07:00
наб a1ca39655b Forbid strtok(3)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:51:30 -07:00
наб a0d7e27a13 zdb: remove strtok
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:51:18 -07:00
наб 1ce6d70c52 zpool: print_zpool_script_list: remove strtok
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:51:03 -07:00
наб a281f7690d zpool: import: use realloc for realloc, remove strtok
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:50:59 -07:00
наб 31f4c8cb19 linux/libzutil: zfs_path_order: remove strtok
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:50:52 -07:00
наб 2a8c606082 libzutil: zfs_resolve_shortname: remove strtok
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:50:46 -07:00
наб 91d6780671 libzutil: zfs_strcmp_shortname: remove strtok
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:50:41 -07:00
наб 2fdd61a30b libzutil: zfs_strcmp_pathname: don't allocate, remove strtok
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:50:35 -07:00
наб a0cb347cea libzfs_core: fini: don't check for refcount twice
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:50:08 -07:00
Alexander Motin c71a2bb52f FreeBSD: Update dataset_kstats for zvols in dev mode
Previous commit added accounting for geom mode, but not for dev.
In geom mode we actually have GEOM statistics, while in dev mode
additional accounting actually makes more sense.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12097
2021-05-26 12:14:26 -06:00
наб 671ea40f62 freebsd/libzfs: import execvPe() from FreeBSD 13
It allocates less and properly deals with argv={NULL}

With minor cosmetic changes to match cstyle, remove whitespace damage,
and restore direct string printing

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12051
2021-05-26 11:03:47 -06:00
Rich Ercolani f172c3088f Correct flaws in arc_summary[23] and their test.
The change correctly handles BrokenPipeError and improves the
associated tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12037
Closes #12036
2021-05-25 20:02:01 -06:00
Alexander Motin 211cee4fcf FreeBSD: avoid memory allocation in arc_prune_async
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Closes #12049
2021-05-25 19:38:34 -06:00
Rich Ercolani 90c0524535 Add note for printing all dbgmsg entries on FreeBSD
I looked for a bit, and couldn't find any documentation on
how to print all logged dbgmsg entries, just messages since
the DTrace probe started, until @allanjude kindly pointed me
toward the sysctl.

So let's add that note where the DTrace probe is mentioned for
FreeBSD, so other people can find it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12113
2021-05-25 18:08:27 -07:00
vermavipinkumar dce1bf99ec Propagate vdev state due to invalid label corruption
Propagate vdev child state to parents on invalid label
Add VDEV_AUX_BAD_LABEL to print_import_config()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Co-authored-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Signed-off-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Closes #12088
2021-05-25 12:32:07 -06:00
Toomas Soome 1a1302f8c4 zdb: dump_history needs space
One space is missing from zdb -h output causing strings to be concatenated. (fixing #11940)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes  #12098
2021-05-25 11:33:18 -06:00
Christian Schwarz 0989d798fa ZTS: remove verify_slog_support helper
verify_slog_support no longer applies to ZFS since slog support
is always available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #12092
2021-05-24 14:57:29 -06:00
Alexander Motin f8646c871a FreeBSD: Retry OCF ENOMEM errors.
ZFS does not expect transient errors from crypto.  For read they are
counted as checksum errors, while for write end up in panic.  To not
panic on random low memory conditions retry ENOMEM errors in the OCF
wrapper function.

While there remove unneeded timeout and priority from msleep().

External-issue: https://reviews.freebsd.org/D30339
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12077
2021-05-24 14:42:45 -06:00
наб 5d1a32a542 linux/libshare: smb: don't leak share name in smb_disable_share_one()
Fixes: 645fb9cc21 "Implemented sharing datasets via SMB using libshare"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12015
2021-05-21 10:16:18 -07:00
наб dd00925e8d zstreamdump: replace with link to zstream
zstreamdump(8) was in quite a bad state,
and the wrapper didn't work if invoked without /sbin in $PATH

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12015
2021-05-21 10:16:14 -07:00
наб 93ef500388 Don't abuse vfork()
According to POSIX.1, "vfork() has the same effect as fork(2),
except that the behavior is undefined if the process created by vfork()
either modifies any data other than a variable of type pid_t
used to store the return value from vfork(), [...],
or calls any other function before successfully calling _exit(2)
or one of the exec(3) family of functions."

These do all three, and work by pure chance
(or maybe they don't, but we blisfully don't know).
Either way: bad idea to call vfork() from C,
unless you're the standard library, and POSIX.1-2008 removes it entirely

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12015
2021-05-21 10:16:06 -07:00
наб 5da6353987 libzfs: run_process: don't leak fd on reopen failure
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12082
2021-05-21 09:49:05 -07:00
наб 7c20ceebdd libzfs: run_process: reuse line, don't leak it
line will grow as wide as it needs (glibc starts off at 120),
we can store a narrower view; this also fixes leaks in a few scenarios

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12082
2021-05-21 09:48:59 -07:00
наб 30dadd5c04 libzfs: run_process: set O_NONBLOCK on lines pipe
Without this, we can deadlock: the child is stuck writing to the pipe,
and we are stuck waiting on the child

With this, we the child fills up the pipe (a few hundred kBish)
and starts getting EAGAINs, which allows it to either crash
or ignore them

libzfs_run_process_get_stdout*() is used only by zpool -c scripts,
which output short runs of K=V pairs, so the likelihood of losing
legitimate data there is relatively low

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12082
2021-05-21 09:47:53 -07:00
наб e72383825b raidz_test: use only async-signal-safe functions in signal handler
execl*() before glibc 2.24 could allocate, but only if called with at
least 1024 arguments, which five isn't

errno modification is also fine, so long as we restore it at the end

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12086
2021-05-20 16:37:38 -07:00
Rich Ercolani 0b1b66b473 Update tmpfile() existence detection
Linux changed the tmpfile() signature again in torvalds/linux@6521f89,
which in turn broke our HAVE_TMPFILE detection in configure.

Update that macro to include the new case, and change the signature of
zpl_tmpfile as appropriate.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12060
Closes: #12087
2021-05-20 16:02:36 -07:00
Brian Behlendorf 8fb577ae6d Fix dRAID sequential resilver silent damage handling
This change addresses two distinct scenarios which are possible
when performing a sequential resilver to a dRAID pool with vdevs
that contain silent unknown damage. Which in this circumstance
took the form of the devices being intentionally overwritten with
zeros. However, it could also result from a device returning incorrect
data while a sequential resilver was in progress.

Scenario 1) A sequential resilver is performed while all of the
dRAID vdevs are ONLINE and there is silent damage present on the
vdev being resilvered. In this case, nothing will be repaired
by vdev_raidz_io_done_reconstruct_known_missing() because
rc->rc_error isn't set on any of the raid columns. To address
this vdev_draid_io_start_read() has been updated to always mark
the resilvering column as ESTALE for sequential resilver IO.

Scenario 2) Multiple columns contain silent damage for the same
block and a sequential resilver is performed. In this case it's
impossible to generate the correct data from parity unless all of
the damaged columns are being sequentially resilvered (and thus
only good data is used to generate parity). This is as expected
and there's nothing which can be done about it. However, we need
to be careful not to make to situation worse. Since we can't
verify the data is actually good without a checksum, we must
only repair the devices which are being sequentially resilvered.
Otherwise, an incorrect repair to a device which previously
contained good data could effectively lock in the damage and
make reconstruction impossible. A check for this was added to
vdev_raidz_io_done_verified() along with a new test case.

Lastly, this change updates the redundancy_draid_spare1 and
redundancy_draid_spare3 test cases to be more representative
of normal dRAID replacement operation.  Specifically, what we
care about is that the scrub run after a sequential resilver
does not find additional blocks which need repair.  This would
indicate the sequential resilver failed to rebuild a section of
one of the devices. Note also the tests were switched to using
the verify_pool() function which still checks for checksum errors.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12061
2021-05-20 15:05:26 -07:00
Lauri Tirkkonen 6ac2d7f76f zfs-allow.8: mention 'bookmark' permission
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Lauri Tirkkonen <lauri@hacktheplanet.fi>
Closes #12064
2021-05-20 09:03:03 -07:00
наб adc851d9f8 d/zfsutils.zfs.init derivatives: shellcheck, fix header
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-05-20 08:55:47 -07:00
наб 9627bdc697 contrib/bash_completion.d: fix obvious shellcheck problems
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-05-20 08:55:38 -07:00
наб 359b6cca0f zgenhostid: use argument path directly
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-05-20 08:55:31 -07:00
наб 6fc3099248 Trim excess shellcheck annotations. Widen to all non-Korn scripts
Before, make shellcheck checked
  scripts/{commitcheck,make_gitrev,man-dates,paxcheck,zfs-helpers,zfs,
           zfs-tests,zimport,zloop}.sh
  cmd/zed/zed.d/{{all-debug,all-syslog,data-notify,generic-notify,
                 resilver_finish-start-scrub,scrub_finish-notify,
                 statechange-led,statechange-notify,trim_finish-notify,
                 zed-functions}.sh,history_event-zfs-list-cacher.sh.in}
  cmd/zpool/zpool.d/{dm-deps,iostat,lsblk,media,ses,smart,upath}
now it also checks
  contrib/dracut/{02zfsexpandknowledge/module-setup,
                  90zfs/{export-zfs,parse-zfs,zfs-needshutdown,
                         zfs-load-key,zfs-lib,module-setup,
                         mount-zfs,zfs-generator}}.sh.in
  cmd/zed/zed.d/{pool_import-led,vdev_attach-led,
                 resilver_finish-notify,vdev_clear-led}.sh
  contrib/initramfs/{zfsunlock,hooks/zfs.in,scripts/local-top/zfs}
  tests/zfs-tests/tests/perf/scripts/prefetch_io.sh
  scripts/common.sh.in
  contrib/bpftrace/zfs-trace.sh
  autogen.sh

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-05-20 08:55:23 -07:00
наб 2ca77988a5 Fix SC2181 ("[ $?") outside tests/
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-05-20 08:54:47 -07:00
Rich Ercolani 1d106ab57a Simple change to fix building in recent environments
Renamed _fini too for symmetry.

Suggested-by: @ensch
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12059
Closes: #11987
Closes: #12056
2021-05-19 20:46:42 -07:00
Alexander Motin 7457b024ba Scale worker threads and taskqs with number of CPUs
While use of dynamic taskqs allows to reduce number of idle threads,
hardcoded 8 taskqs of each kind is a big overkill for small systems,
complicating CPU scheduling, increasing I/O reorder, etc, while
providing no real locking benefits, just not needed there.

On another side, 12*8 worker threads per kind are able to overload
almost any system nowadays.  For example, pool of several fast SSDs
with SHA256 checksum makes system barely responsive during scrub, or
with dedup enabled barely responsive during large file deletion.

To address both problems this patch introduces ZTI_SCALE macro, alike
to ZTI_BATCH, but with multiple taskqs, depending on number of CPUs,
to be used in places where lock scalability is needed, while request
ordering is not so much.  The code is made to create new taskq for
~6 worker threads (less for small systems, but more for very large)
up to 80% of CPU cores (previous 75% was not good for rounding down).
Both number of threads and threads per taskq are now tunable in case
somebody really wants to use all of system power for ZFS.

While obviously some benchmarks show small peak performance reduction
(not so big really, especially on systems with SMT, where use of the
second threads does not give as much performance as the first ones),
they also show dramatic latency reduction and much more smooth user-
space operation in case of high CPU usage by ZFS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11966
2021-05-14 09:13:53 -07:00
Brian Behlendorf 6a13add559 ZTS: Increase redundancy test timeout
The redundancy_draid.ksh and redundancy_raidz.ksh tests were updated
by commit 93c8e91fe to additionally verify self-healing.  This
additional check increased the run time which can now occasionally
exceed the default maximum timeout in the CI environment.  To prevent
this from causing failures increase the default timeout for the
redundancy test cases.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12043
2021-05-14 09:11:56 -07:00
наб 5236eafdcd i-t: rewrite hooks
This produces a leaner image, doesn't fail if zdb doesn't exist,
properly handles hostnameless systems, doesn't mention crypto modules
for no reason, doesn't add useless empty executable in hopes an
eight-year-old PR is merged, uses i-t builtins for all copies

Also optimize the checkbashisms filter to spawn one (or a few) awks
instead of one per regular file and remove initramfs/hooks therefrom due
to a command -v false positive

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12017
2021-05-13 21:47:53 -07:00
Paul Zuchowski fce29d6aa4 Fix dmu_recv_stream test for resumable
Use dsl_dataset_has_resume_receive_state()
not dsl_dataset_is_zapified() to check if
stream is resumable.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #12034
2021-05-13 21:46:14 -07:00
Ryan Moeller 210231ede0 FreeBSD: Implement xattr=sa
FreeBSD historically has not cared about the xattr property; it was
always treated as xattr=on.  With xattr=on, xattrs are stored as files
in a hidden xattr directory.  With xattr=sa, xattrs are stored as
system attributes and get cached in nvlists during xattr operations.
This makes SA xattrs simpler and more efficient to manipulate.  FreeBSD
needs to implement the SA xattr operations for feature parity with
Linux and to ensure that SA xattrs are accessible when migrated or
replicated from Linux.

Following the example set by Linux, refactor our existing extattr vnops
to split off the parts handling dir style xattrs, and add the
corresponding SA handling parts.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11997
2021-05-13 15:14:12 -07:00
Ryan Moeller d86debf576 FreeBSD: Use SET_ERROR to trace xattr name errors
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11997
2021-05-13 15:14:01 -07:00
Ryan Moeller 099ca8186b FreeBSD: Don't force xattr mount option
The kernel will use the xattr property by default when not overridden
by a mount option.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11997
2021-05-13 15:13:20 -07:00
Scott Colby da124ad8ec zed: Add Pushover notifier
Add zed_notify_pushover to zed-functions.sh, along with the necessary
configuration variables in zed.rc.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Scott Colby <scott@scolby.com>
Closes #12012
2021-05-13 10:02:24 -07:00
Brian Behlendorf 6217656da3 Revert "Fix raw sends on encrypted datasets when copying back snapshots"
Commit d1d4769 takes into account the encryption key version to
decide if the local_mac could be zeroed out. However, this could lead
to failure mounting encrypted datasets created with intermediate
versions of ZFS encryption available in master between major releases.
In order to prevent this situation revert d1d4769 pending a more
comprehensive fix which addresses the mount failure case.

Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11294
Issue #12025
Issue #12300
Closes #12033
2021-05-13 10:00:17 -07:00
наб 618a65cd7a Widen mancheck target to all pages, fix them
mandoc: ./man/man8/zfs-mount-generator.8.in:188:2:
        ERROR: skipping end of block that is not open: RE
mandoc: ./man/man8/zfs_ids_to_path.8:38:2:
        ERROR: skipping unknown macro: .LP
mandoc: ./man/man8/zfs_ids_to_path.8:48:2:
        ERROR: inserting missing end of block: Sh breaks Bl
mandoc: ./man/man8/zfs-wait.8:69:2:
        ERROR: skipping end of block that is not open: El
mandoc: ./man/man8/zfs-program.8:460:2:
        ERROR: inserting missing end of block: It breaks Bd
mandoc: ./man/man8/zfs-mount-generator.8:188:2:
        ERROR: skipping end of block that is not open: RE
mandoc: ./man/man8/zstream.8:43:2:
        ERROR: skipping unknown macro: .LP
mandoc: ./man/man8/zstream.8:107:2:
        ERROR: inserting missing end of block: Sh breaks Bl
mandoc: ./man/man8/zstream.8:107:2:
        ERROR: inserting missing end of block: Sh breaks Bl
make: *** [Makefile:1529: mancheck] Error 1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12017
2021-05-12 21:32:48 -07:00
наб 37086897b0 libzfs: add keylocation=https://, backed by fetch(3) or libcurl
Add support for http and https to the keylocation properly to
allow encryption keys to be fetched from the specified URL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #9543
Closes #9947 
Closes #11956
2021-05-12 21:21:35 -07:00
Brian Behlendorf 7d07d1be39 ZTS: Add known exceptions
The following seven tests been observed to occasionally fail during
CI testing.  This commit adds them to the list of known somewhat
flaky test cases.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12023
2021-05-11 19:55:12 -07:00
Coleman Kane 48c7b0e444 linux 5.13 compat: bdevops->revalidate_disk() removed
Linux kernel commit 0f00b82e5413571ed225ddbccad6882d7ea60bc7 removes the
revalidate_disk() handler from struct block_device_operations. This
caused a regression, and this commit eliminates the call to it and the
assignment in the block_device_operations static handler assignment
code, when configure identifies that the kernel doesn't support that
API handler.

Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11967 
Closes #11977
2021-05-11 19:53:02 -07:00
Ryan Moeller 4704be2879 Remove unimplemented virus scanning hooks
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11972
2021-05-10 22:02:25 -07:00
наб 38c6d6cedd module/zfs: remove zfs_zevent_console and zfs_zevent_cols
zfs_zevent_console committed multiple printk()s per line without
properly continuing them ‒ a single event could easily be fragmented
across over thirty lines, making it useless for direct application

zfs_zevent_cols exists purely to wrap the output from zfs_zevent_console

The niche this was supposed to fill can be better served by something
akin to the all-syslog ZEDLET

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #7082 
Closes #11996
2021-05-10 11:00:15 -07:00
наб 2babd20045 libzfs: zfs_asprintf(): don't return undefined pointer
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-08 09:37:40 -07:00
наб 87b671f3ac libzfsbootenv: lzbe_set_boot_device(): don't free undefined pointer
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-08 09:27:53 -07:00
наб c493943404 zfs_get_enclosure_sysfs_path(): don't free undefined pointer
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-08 09:25:59 -07:00
наб 8dfb9e57c7 zfs_get_enclosure_sysfs_path(): don't leak dev path
Also always free tmp2 at the end

Before:
nabijaczleweli@tarta:~/uwu$ valgrind --leak-check=full ./blergh
==8947== Memcheck, a memory error detector
==8947== Using Valgrind-3.14.0 and LibVEX
==8947== Command: ./blergh
==8947==
(null)
==8947==
==8947== HEAP SUMMARY:
==8947==     in use at exit: 23 bytes in 1 blocks
==8947==   total heap usage: 3 allocs, 2 frees, 1,147 bytes allocated
==8947==
==8947== 23 bytes in 1 blocks are definitely lost in loss record 1 of 1
==8947==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==8947==    by 0x48D74B7: vasprintf (vasprintf.c:73)
==8947==    by 0x48B7833: asprintf (asprintf.c:35)
==8947==    by 0x401258: zfs_get_enclosure_sysfs_path
                         (zutil_device_path_os.c:191)
==8947==    by 0x401482: main (blergh.c:107)
==8947==
==8947== LEAK SUMMARY:
==8947==    definitely lost: 23 bytes in 1 blocks
==8947==    indirectly lost: 0 bytes in 0 blocks
==8947==      possibly lost: 0 bytes in 0 blocks
==8947==    still reachable: 0 bytes in 0 blocks
==8947==         suppressed: 0 bytes in 0 blocks
==8947==
==8947== For counts of detected and suppressed errors, rerun with: -v
==8947== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

nabijaczleweli@tarta:~/uwu$ sed -n 191p zutil_device_path_os.c
        tmpsize = asprintf(&tmp1, "/sys/block/%s/device", dev_name);

After:
nabijaczleweli@tarta:~/uwu$ valgrind --leak-check=full ./blergh
==9512== Memcheck, a memory error detector
==9512== Using Valgrind-3.14.0 and LibVEX
==9512== Command: ./blergh
==9512==
(null)
==9512==
==9512== HEAP SUMMARY:
==9512==     in use at exit: 0 bytes in 0 blocks
==9512==   total heap usage: 3 allocs, 3 frees, 1,147 bytes allocated
==9512==
==9512== All heap blocks were freed -- no leaks are possible
==9512==
==9512== For counts of detected and suppressed errors, rerun with: -v
==9512== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-08 09:24:57 -07:00
наб ca46fa602b zpool: vdev_run_cmd(): don't free undefined pointers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-08 09:21:59 -07:00
наб 68ebbd9a93 libzfs: zpool_load_compat(): don't free undefined pointers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-08 09:19:20 -07:00
наб 8bc357ba92 libzfs: zpool_load_compat(): open feature file cloexec
As a bonus, this also passes the open flags into the open flags instead
of the mode (it worked by accident because O_RDONLY is 0),
correctly detects a failed map,
and prefaults the entire file since we're always writing to every page

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-08 09:16:26 -07:00
illiliti 36e8abee95 copy-builtin: posix conformance
This commits contains changes to allow running `copy-builtin` without
bash + some minor improvements.

changed shebang to /bin/sh
added -f option to `set` to globally disable unneeded globbing
replaced all `echo` commands within add_after() with `printf`
alternative to avoid possible issues with options (-neE)
dropped non-portable superfluous `readlink` command
replaced superfluous `true` command with `:` builtin alternative
replaced non-portable `--recursive` option of `cp` command with `-R`
alternative
dropped non-portable `local` keyword

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: illiliti <illiliti@protonmail.com>
Closes #12004
2021-05-08 08:58:26 -07:00
Brian Behlendorf 93c8e91fe7 Fix dRAID self-healing short columns
When dRAID performs a normal read operation only the data columns
in the raid map are read from disk.  This is enough information to
calculate the checksum, verify it, and return the needed data to the
application.  It's only in the event of a checksum failure that the
additional parity and any empty columns must be read since they are
required for parity reconstruction.

Reading these additional columns is handled by vdev_raidz_read_all()
which calls vdev_draid_map_alloc_empty() to expand the raid_map_t
and submit IOs for the missing columns.  This all works correctly,
but it fails to account for any "short" columns.  These are data
columns which are padded with a empty skip sector at the end.
Since that empty sector is not needed for a normal read it's not
read when columns is first read from disk.  However, like the parity
and empty columns the skip sector is needed to perform reconstruction.

The fix is to mark any "short" columns as never being read by clearing
the rc_tried flag when expanding the raid_map_t.  This will cause
the entire column to re-read from disk in the event of a checksum
failure allowing the self-healing functionality to repair the block.

Note that this only effects the self-healing feature because when
scrubbing a pool the parity, data, and empty columns are all read
initially to verify their contents.  Furthermore, only blocks which
contain "short" columns would be effected, and only when the memory
backing the skip sector wasn't already zeroed out.

This change extends the existing redundancy_raidz.ksh test case to
verify self-healing (as well as resilver and scrub).  Then applies
the same test case to dRAID with a slightly modified version of
the test script called redundancy_draid.ksh.  The unused variable
combrec was also removed from both test cases.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12010
2021-05-08 08:57:25 -07:00
наб a27ab6d43b dracut/90/module-setup: mainly shellcheck cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11956
2021-05-07 17:20:42 -07:00
наб 1966e959ca Replace ZoL with OpenZFS where applicable
Afterward, git grep ZoL matches:
  * README.md:  * [ZoL Site](https://zfsonlinux.org)
  - Correct
  * etc/default/zfs.in:# ZoL userland configuration.
  - Changing this would induce a needless upgrade-check,
    if the user has modified the configuration;
    this can be updated the next time the defaults change
  * module/zfs/dmu_send.c:   * ZoL < 0.7 does not handle [...]
  - Before 0.7 is ZoL, so fair enough

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11956
2021-05-07 17:20:37 -07:00
Ryan Moeller 8c0991e813 FreeBSD: Remove !FreeBSD ifdef'd code
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11994
2021-05-07 15:13:44 -07:00
Ryan Moeller 0dd7da9d7a Clean up use of zfs_log_create in zfs_dir
zfs_log_create returns void, so there is no reason to cast its return
value to void at the call site.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11994
2021-05-07 15:13:10 -07:00
наб 3bd6b0e05a zed: protect against wait4()/fork() races to the global PID table
This can be very easily triggered by adding a sleep(1) before
the wait4() on a PID-starved system: the reaper thread would wait
for a child before its entry appeared, letting old entries accumulate:

  Invoking "all-debug.sh" eid=3021 pid=391
  Finished "(null)" eid=0 pid=391 time=0.002432s exit=0
  Invoking "all-syslog.sh" eid=3021 pid=336
  Finished "(null)" eid=0 pid=336 time=0.002432s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3021 pid=347
  Invoking "all-debug.sh" eid=3022 pid=349
  Finished "history_event-zfs-list-cacher.sh" eid=3021 pid=347
                                              time=0.001669s exit=0
  Finished "(null)" eid=0 pid=349 time=0.002404s exit=0
  Invoking "all-syslog.sh" eid=3022 pid=370
  Finished "(null)" eid=0 pid=370 time=0.002427s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3022 pid=391
  avl_find(tree, new_node, &where) == NULL
  ASSERT at ../../module/avl/avl.c:641:avl_add()
  Thread 1 "zed" received signal SIGABRT, Aborted.

By employing this wider lock, we atomise [wait, remove] and [fork, add]:
slowing down the reaper thread now just causes some zombies
to accumulate until it can get to them

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11963
Closes #11965
2021-05-07 15:10:16 -07:00
Alyssa Ross c074a7de13 Return required size when encode_fh size too small
Quoting <linux/exportfs.h>:

> encode_fh() should return the fileid_type on success and on error
> returns 255 (if the space needed to encode fh is greater than
> @max_len*4 bytes). On error @max_len contains the minimum size (in 4
> byte unit) needed to encode the file handle.

ZFS was not setting max_len in the case where the handle was too
small.  As a result of this, the `t_name_to_handle_at.c' example in
name_to_handle_at(2) did not work on ZFS.

zfsctl_fid() will itself set max_len if called with a fid that is too
small, so if we give zfs_fid() that behavior as well, the fix is quite
easy: if the handle is too small, just use a zero-size fid instead of
the handle.

Tested by running t_name_to_handle_at on a normal file, a directory, a
.zfs directory, and a snapshot.

Thanks-to: Puck Meerburg <puck@puckipedia.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Closes #11995
2021-05-07 15:08:16 -07:00
Alexander Motin 4fb9e5638b Simplify/fix dnode_move() for dn_zfetch
Previous code tried to keep prefetch streams while moving dnode.  But
it was at least not updating per-stream zs_fetchback pointers, causing
use-after-free on next access.  Instead of that I see much easier and
cleaner to just drop old prefetch state and start new from scratch.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11936
Closes #11998
2021-05-07 15:07:03 -07:00
Matthew Ahrens 610cb4fb8c undocumented libzfs API changes broke "zfs list"
While OpenZFS does permit breaking changes to the libzfs API, we should
avoid these changes when reasonably possible, and take steps to mitigate
the impact to consumers when changes are necessary.

Commit e4288a8397 made a libzfs API change that is especially
difficult for consumers because there is no change to the function
signatures, only to their behavior.  Therefore, consumers can't notice
that there was a change at compile time.  Also, the API change was
incompletely and incorrectly documented.

The commit message mentions `zfs_get_prop()` [sic], but all callers of
`get_numeric_property()` are impacted: `zfs_prop_get()`,
`zfs_prop_get_numeric()`, and `zfs_prop_get_int()`.

`zfs_prop_get_int()` always calls `get_numeric_property(src=NULL)`, so
it assumes that the filesystem is not mounted.  This means that e.g.
`zfs_prop_get_int(ZFS_PROP_MOUNTED)` always returns 0.

The documentation says that to preserve the previous behavior, callers
should initialize `*src=ZPROP_SRC_NONE`, and some callers were changed
to do that.  However, the existing behavior is actually preserved by
initializing `*src=ZPROP_SRC_ALL`, not `NONE`.

The code comment above `zfs_prop_get()` says, "src: ... NULL will be
treated as ZPROP_SRC_ALL.".  However, the code actually treats NULL as
ZPROP_SRC_NONE.  i.e. `zfs_prop_get(src=NULL)` assumes that the
filesystem is not mounted.

There are several existing calls which use `src=NULL` which are impacted
by the API change, most noticeably those used by `zfs list`, which now
assumes that filesystems are not mounted.  For example,
`zfs list -o name,mounted` previously indicated whether a filesystem was
mounted or not, but now it always (incorrectly) indicates that the
filesystem is not mounted (`MOUNTED: no`).  Similarly, properties that
are set at mount time are ignored.  E.g. `zfs list -o name,atime` may
display an incorrect value if it was set at mount time.

To address these problems, this commit reverts commit e4288a8397:
"zfs get: don't lookup mount options when using "-s local""

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11999
2021-05-06 11:24:56 -07:00
Ryan Moeller b1e44cdcea FreeBSD: Initialize/destroy zp->z_lock
zp->z_lock is used in shared code for protecting projid and scantime.
We don't exercise these paths much if at all on FreeBSD, so have been
lucky enough not to have issues with the uninitialized locks so far.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #12003
2021-05-06 09:45:16 -07:00
Rich Ercolani 1f77b58789 Updated zfs_dbgmsg_enable documentation to be more accurate
Changed the default specified for zfs_dbgmsg_enable, added
clarification of interaction with zfs_flags.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #11984
Closes #11986
2021-05-05 21:16:51 -07:00
Ryan Moeller c903a756ac Miscellaneous code cleanup
Remove some extra whitespace.

Use pointer-typed asserts in Linux's znode cache destructor for more
info when debugging.

Simplify a couple of conversions from inode to znode when we already
have the znode.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11974
2021-04-30 16:39:07 -07:00
Ryan Moeller e4efb70950 FreeBSD: Clean up ASSERT/VERIFY use in module
Convert use of ASSERT() to ASSERT0(), ASSERT3U(), ASSERT3S(), 
ASSERT3P(), and likewise for VERIFY().  In some cases it ended up 
making more sense to change the code, such as VERIFY on nvlist 
operations that I have converted to use fnvlist instead.  In one 
place I changed an internal struct member from int to boolean_t to 
match its use.  Some asserts that combined multiple checks with && 
in a single assert have been split to separate asserts, to make it 
apparent which check fails.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11971
2021-04-30 16:36:10 -07:00
наб ec4f330816 zed.d/zed-functions.sh: fix zed_guid_to_pool() on dash
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11935
Closes #11954
2021-04-30 15:04:41 -07:00
наб 208675a09b zed.d/history_event-zfs-list-cacher.sh: no grep for snapshot detection
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11935
2021-04-30 15:04:38 -07:00
наб 8dca000040 zed.d/*-notify.sh: use mktemp instead of generating temp path manually
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11935
2021-04-30 15:04:33 -07:00
наб 71def603cd zed.d/pool_import-led.sh: fix for current zpool scripts
Also minor clean-up with folding state_to_val() into a case,
unrolling the lesser-available seq into numbers,
ignoring vdev states we don't care about,
and documentation comments

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11934
Closes #11935
2021-04-30 15:04:28 -07:00
наб 1a7d7182ac libzutil: fix dm_get_underlying_path() return if not a DM device
For example, this would happily return "/dev/(null)" for /dev/sda1

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11935
2021-04-30 15:04:19 -07:00
Ryan Moeller ccb46cab50 ZTS: Fix xattr_002_neg passing too soon
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11970
2021-04-30 07:37:02 -07:00
Ryan Moeller 801c76149b FreeBSD: Prune some unneeded definitions
IS_XATTRDIR is never used.
v_count is only used in two places, one immediately followed by the
use of the real name, v_usecount.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #11973
2021-04-30 07:34:53 -07:00
наб 6f4e132fec ZTS: cli_root/zfs_load-key: add separate key files
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue: #11956
Closes #11976
2021-04-30 07:31:22 -07:00
Toomas Soome 17b83525f5 zdb: dump_history can be improved
We only recognize some history records, instead, use
same logic as in print_history_records() in zpool_main.c.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11940
2021-04-29 16:44:07 -07:00
Alan Somers e4288a8397 zfs get: don't lookup mount options when using "-s local"
Looking up mount options can be very expensive on servers with many
mounted file systems.  When doing "zfs get" with any "-s" option that
does not include "temporary", the mount list will never be used.  This
commit optimizes for that case.

This is a breaking commit for libzfs!  Callers of zfs_get_prop are now
required to initialize src.  To preserve existing behavior, they should
initialize it to ZPROP_SRC_NONE.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11955
2021-04-29 14:19:44 -07:00
Arshad Hussain bc9c7265ae vdev_id: variable not getting expanded under map_slot()
Under function map_slot() variable passed as args
were not getting properly substituted or expanded.
This patch fixes the substitution issue.

Reviewed-by: Niklas Edmundsson <nikke@acc.umu.se>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #11951 
Closes #11959
2021-04-29 13:58:49 -07:00
Nathaniel Wesley Filardo 056a658dee vdev_mirror: don't scrub/resilver devices that can't be read
This ensures that we don't accumulate checksum errors against offline or
unavailable devices but, more importantly, means that we don't
needlessly create DTL entries for offline devices that are already
up-to-date.

Consider a 3-way mirror, with disk A always online (and so always with
an empty DTL) and B and C only occasionally online.  When A & B resilver
with C offline, B's DTL will effectively be appended to C's due to these
spurious ZIOs even as the resilver empties B's DTL:

  * These ZIOs land in vdev_mirror_scrub_done() and flag an error

  * That flagged error causes vdev_mirror_io_done() to see
    unexpected_errors, so it issues a ZIO_TYPE_WRITE repair ZIO, which
    inherits ZIO_FLAG_SCAN_THREAD because zio_vdev_child_io() includes
    that flag in ZIO_VDEV_CHILD_FLAGS.

  * That ZIO fails, too, and eventually zio_done() gets its hands on it
    and calls vdev_stat_update().

  * vdev_stat_update() sees the error and this zio...

    * is not speculative,
    * is not due to EIO (but rather ENXIO, since the device is closed)
    * has an ->io_vd != NULL (specifically, the offline leaf device)
    * is a write
    * is for a txg != 0 (but rather the read block's physical birth txg)
    * has ZIO_FLAG_SCAN_THREAD asserted

  * So: vdev_stat_update() calls vdev_dtl_dirty() on the offline vdev.

Then, when A & C resilver with B offline, that story gets replayed and
C's DTL will be appended to B's.

In fact, one does not need this permanently-broken-mirror scenario to
induce badness: breaking a mirror with no DTLs and then scrubbing will
create DTLs for all offline devices.  These DTLs will persist until the
entire mirror is reassembled for the duration of the *resilver*, which,
incidentally, will not consider the devices with good data to be sources
of good data in the case of a read failure.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com>
Closes #11930
2021-04-27 17:48:11 -07:00
Brian Behlendorf 919714554b zfs.spec.in: remove post ldconfig scriptlets
In Fedora 28 the packaging guidelines were changed such that ldconfig
should no longer be called in either the %post or %postun scriptlets.
Instead the new compatibility macros %ldconfig_post, %ldconfig_postun,
and %ldocnfig_scriptlets should be used.

Since we only currently support Fedora 31 and newer, we could drop
%post or %postun scriptlets entirely according to the guidelines.
However, since we also use the same spec file for CentOS / RHEL
it's convenient to call the macros which are available starting
with CentOS / RHEL 8.  For CentOS / RHEL 7 we must still call
ldconfig in the traditional way.

https://fedoraproject.org/wiki/Changes/Removing_ldconfig_scriptlets

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11931
2021-04-27 17:45:40 -07:00
Toomas Soome 09131144b7 zdb: ASSERT issues when DEBUG is not defined
If zdb is not built with DEBUG mode, the ASSERT macros will be
eliminated.

This will leave vim defined, but not used (gcc warning) and
checkpoint spacemap validation loop will do nothing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11932
2021-04-27 08:33:37 -07:00
Brian Behlendorf b1f7341203 ZTS: Add known exceptions
Both the zpool_initialize_import_export and checkpoint_discard_busy
test cases a known to occasionally fail.  Add them to the list of
known possible failures and reference the appropriate issue on the
tracker.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11949
2021-04-27 08:27:03 -07:00
Martin Matuška a69356cf26 Drop "All rights reserved" from files by trasz@FreeBSD.org
This obeys the change in freebsd/freebsd-src@bce7ee9d4

External-issue: https://reviews.freebsd.org/D26980
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11947
2021-04-27 08:25:48 -07:00
Prawn b0269cd8ce receive: don't fail inheriting (-x) properties on wrong dataset type
Receiving datasets while blanket inheriting properties like zfs 
receive -x mountpoint can generally be desirable, e.g. to avoid 
unexpected mounts on backup hosts.

Currently this will fail to receive zvols due to the mountpoint 
property being applicable to filesystems only.  This limitation 
currently requires operators to special-case their minds and tools 
for zvols.

This change gets rid of this limitation for inherit (-x) by
Spiting up the dataset type handling: Warnings for inheriting (-x), 
errors for overriding (-o).

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #11416
Closes #11840
Closes #11864
2021-04-26 17:23:51 -07:00
Mateusz Guzik f172f759b9 FreeBSD: damage control racing .. lookups in face of mkdir/rmdir
External-issue: https://reviews.freebsd.org/D29769
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11926
2021-04-26 12:44:40 -07:00
Romain Dolbeau 0375465536 Fix AVX512BW Fletcher code on AVX512-but-not-BW machines
Introduce a specific valid function for avx512f+avx512bw (instead 
of checking only for avx512f).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Romain Dolbeau <romain@dolbeau.org>
Closes #11937
Closes #11938
2021-04-26 12:42:42 -07:00
наб f9bece92e2 zed: protect against wait4()/fork() races to the launched process tree
As soon as wait4() returns, fork() can immediately return with the same
PID, and race to lock _launched_processes_lock, then try to add the new
(duplicate) PID to _launched_processes, which asserts

By locking before wait4(), we ensure, that, given that same
unfortunate scheduling, _launched_processes_lock cannot be locked by the
spawner before we pop the process in the reaper, and only afterward will
it be added

This moves where the reaper idles when there are children from the
wait4() to the pause(), locking for the duration of that single syscall
in both the no-children and running-children cases; the impact of this
is one to two syscalls (depending on _launched_processes_lock state)
per loop

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11924
Closes #11928
2021-04-22 17:49:21 -07:00
Daniel Stevenson 60ffc1c460 Fixed incorrect man page reference in zfsprops(8)
The special_small_blocks section directed readers to zpool(8) for
documentation on special allocation classes, while they are actually
documented in zpoolconcepts(8).

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Daniel Stevenson <daniel@dstev.net>
Closes #11918
2021-04-20 10:16:00 -07:00
наб 44287ffa0d etc/systemd/zfs-mount-generator: don't fail if no cached pools
If $FSLIST exists but is empty, the generator fails with
  sort: cannot read: '/etc/zfs/zfs-list.cache/*':
  No such file or directory

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11915
2021-04-19 10:52:44 -07:00
наб dc3a56d38b libshare: nfs: commonify nfs_enable_share()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
2021-04-19 09:07:09 -07:00
наб 62866fc96c freebsd/libshare: nfs: make nfs_is_shared() thread-safe
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
2021-04-19 09:07:04 -07:00
наб 8357bbff1f libshare: nfs: commonify nfs_{init,fini}_tmpfile(), nfs_disable_share()
Also open the temp file cloexec

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
2021-04-19 09:06:57 -07:00
наб a3d387b31d libshare: nfs: commonify nfs_exports_[un]lock(), FILE_HEADER
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
2021-04-19 09:06:52 -07:00
наб 0f4d83117a libshare: nfs: don't leak nfs_lock_fd when lock fails
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
2021-04-19 09:06:31 -07:00
наб fef8bd41fc libspl: implement atomics in terms of atomics
This replaces the generic libspl atomic.c atomics implementation
with one based on builtin gcc atomics.  This functionality was added
as an experimental feature in gcc 4.4.  Today even CentOS 7 ships
with gcc 4.8 as the default compiler we can make this the default.

Furthermore, the builtin atomics are as good or better than our
hand-rolled implementation so it's reasonable to drop that custom code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11904
2021-04-18 22:13:24 -07:00
Brian Behlendorf 50d9ff93df ZTS: Improve redundancy test scripts
- Add additional logging to provide more information about why the
  test failed.  This including logging more of the individual commands
  and the contents and differences of the record files on failure.

- Updated get_vdevs() to properly exclude all top-level vdevs
  including raidz3 and draid[1-3].

- Replaced gnudd with dd.  This is the only remaining place in the
  test suite gnudd is used and it shouldn't be needed.

- The refill_test_env function expects the pool as the first argument
  but never sets the pool variable.

- Only fill the test pools to 50% of capacity instead of 75% to help
  speed up the tests.

- Fix replace_missing_devs() calculation, MINDEVSIZE should be
  MINVDEVSIZE.

- Fix damage_devs() so it overwrites almost all of the device so
  we're guaranteed to damage filesystem blocks.

- redundancy_stripe.ksh should not use log_mustnot to check if the
  pool is healthy since the return value may be misinterpreted.
  Just perform a normal conditional check and log the failure.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11906
2021-04-18 21:58:36 -07:00
Attila Fülöp 7c9702e2a7 ICP: Silence objtool "stack pointer realignment" warnings
Objtool requires the use of a DRAP register while aligning the
stack. Since a DRAP register is a gcc concept and we are
notoriously low on registers in the crypto code, it's not worth
the effort to mimic gcc generated stack realignment.

We simply silence the warning by adding the offending object files
to OBJECT_FILES_NON_STANDARD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #6950
Closes #11914
2021-04-17 13:11:18 -07:00
наб a31ac10185 libzfs: refresh property cache after inheriting userprop
This matches what happens when inheriting a system property

Consider the following program:
	int main() {
		void *zhp = libzfs_init();
		void *dataset = zfs_open(zhp, "zest/__test", 1);

		printf("before:");
		dump_nvlist(zfs_get_user_props(dataset), 2);
		printf("\n");

		zfs_prop_inherit(dataset, "xyz.nabijaczleweli:test", 0);
		printf("after:");
		dump_nvlist(zfs_get_user_props(dataset), 2);
		printf("\n");

		zfs_refresh_properties(dataset);
		printf("refreshed:");
		dump_nvlist(zfs_get_user_props(dataset), 2);
		printf("\n");
	}

And the output before:
	# zfs set xyz.nabijaczleweli:test=hehe zest/__test
	# ./a.out
	before:  xyz.nabijaczleweli:test:
	      value: 'hehe'
	      source: 'zest/__test'

	after:  xyz.nabijaczleweli:test:
	      value: 'hehe'
	      source: 'zest/__test'

	refreshed:

As compared to the output after:
	# zfs set xyz.nabijaczleweli:test=hehe zest/__test
	# ./a.out
	before:  xyz.nabijaczleweli:test:
	      value: 'hehe'
	      source: 'zest/__test'

	after:
	refreshed:

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11064
Closes #11911
2021-04-17 12:39:54 -07:00
наб bfe8b9fff3 libzfs: don't mark prompt+raw as retriable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11911
Closes #11031
2021-04-17 12:39:28 -07:00
Mateusz Guzik 309c32c954 Combine zio caches if possible
This deduplicates 2 sets of caches which use the same allocation size.

Memory savings fluctuate a lot, one sample result is FreeBSD running
"make buildworld" saving ~180MB RAM in reduced page count associated
with zio caches.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11877
2021-04-17 12:36:04 -07:00
наб c682b67827 contrib/dracut: 90: zfs-{rollback,snapshot}-bootfs: use @sbindir@
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11898
2021-04-16 15:25:12 -07:00
наб ac541438a2 contrib/i-t: properly mount root's children with spaces
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11898
2021-04-16 15:24:55 -07:00
наб d8ced6613d contrib/dracut: 90: mount essential datasets under root
This partly mirrors what the i-t script does (though that mounts all
children, recursively) ‒ /etc, /usr, /lib*, and /bin are all essential,
if present, to successfully invoke the real init, which will then mount
everything else it might need in the right order

The following extreme-case set-up boots w/o issues now:
  /               zoot            zfs  rw,relatime,xattr,noacl
  ├─/etc          zoot/etc        zfs  rw,relatime,xattr,noacl
  ├─/usr          zoot/usr        zfs  rw,relatime,xattr,noacl
  │ └─/usr/local  zoot/usr/local  zfs  rw,relatime,xattr,noacl
  ├─/var          zoot/var        zfs  rw,relatime,xattr,noacl
  │ ├─/var/lib    zoot/var/lib    zfs  rw,relatime,xattr,noacl
  │ ├─/var/log    zoot/var/log    zfs  rw,relatime,xattr,posixacl
  │ ├─/var/cache  zoot/var/cache  zfs  rw,relatime,xattr,noacl
  │ └─/var/tmp    zoot/var/tmp    zfs  rw,relatime,xattr,noacl
  ├─/home         zoot/home       zfs  rw,relatime,xattr,noacl
  │ └─/home/nab   zoot/home/nab   zfs  rw,relatime,xattr,noacl
  ├─/boot         zoot/boot       zfs  rw,relatime,xattr,noacl
  ├─/root         zoot/home/root  zfs  rw,relatime,xattr,noacl
  ├─/opt          zoot/opt        zfs  rw,relatime,xattr,noacl
  └─/srv          zoot/srv        zfs  rw,relatime,xattr,noacl

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11898
2021-04-16 15:24:22 -07:00
наб 51bc60e582 contrib/dracut: 90: generator: only log to kmsg if debug set on cmdline
"debug" is also used by systemd itself, and there's really no reason for
the generator to write this much garbage by default

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11898
2021-04-16 15:24:06 -07:00
наб 626a792bb3 contrib/dracut: 02: don't spill device names across multiple lines
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11898
2021-04-16 15:23:35 -07:00
Attila Fülöp a9c93ac533 ICP: Add missing stack frame info to SHA asm files
Since the assembly routines calculating SHA checksums don't use
a standard stack layout, CFI directives are needed to unroll the
stack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11733
2021-04-16 15:11:26 -07:00
Paul Zuchowski f2286383d0 Fix crash in zio_done error reporting
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #11872
Closes #11896
2021-04-16 11:00:53 -07:00
Ryan Moeller d92af6fc8d zfs-send(8): Restore sorting of flags
Before #11710 the flags in zfs-send(8) were sorted.
Restore order and bump the date.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11905
2021-04-15 17:43:07 -07:00
наб 23b6f17abb linux/spl: proc: use global table_{min,max} values instead of local ones
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11879
2021-04-15 14:55:50 -07:00
наб a0978d15aa linux/libspl: gethostid: read from /proc/sys/kernel/spl/hostid, simplify
Fixes get_system_hostid() if it was set via the aforementioned sysctl
and simplifies the code a bit.  The kernel and user-space must agree,
after all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11879
2021-04-15 14:55:47 -07:00
наб 7de4c88b39 linux/spl: base proc_dohostid() on proc_dostring()
This fixes /proc/sys/kernel/spl/hostid on kernels with mainline commit
32927393dc1ccd60fb2bdc05b9e8e88753761469 ("sysctl: pass kernel pointers
to ->proc_handler") ‒ 5.7-rc1 and up

The access_ok() check in copy_to_user() in proc_copyout_string() would
always fail, so all userspace reads and writes would fail with EINVAL

proc_dostring() strips only the final new-line,
but simple_strtoul() doesn't actually need a back-trimmed string ‒
writing "012345678   \n" is still allowed, as is "012345678zupsko", &c.

This alters what happens when an invalid value is written ‒
previously it'd get set to what-ever simple_strtoul() returned
(probably 0, thereby resetting it to default), now it does nothing

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11878
Closes #11879
2021-04-15 14:55:43 -07:00
наб 2d14207c98 libspl: lift common bits of getexecname()
Merge the actual implementations of getexecname() and slightly clean
up the FreeBSD one.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11879
2021-04-15 14:55:40 -07:00
наб 375bdb2b20 module/zfs/zvol.c: purge unused zvol_volmode_cb_arg
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11879
2021-04-15 14:55:37 -07:00
Jitendra Patidar 08795ab8d3 ZFS traverse_visitbp optimization to limit prefetch
Traversal code, traverse_visitbp() does visit blocks recursively.
Indirect (Non L0) Block of size 128k could contain, 1024 block pointers
of 128 bytes. In case of full traverse OR incremental traverse, where
all blocks were modified, it could traverse large number of blocks
pointed by indirect. Traversal code does issue prefetch of blocks
traversed below indirect. This could result into large number of
async reads queued on vdev queue. So, account for prefetch issued for
blocks pointed by indirect and limit max prefetch in one go.

Module Param:
zfs_traverse_indirect_prefetch_limit: Limit of prefetch while traversing
an indirect block.

Local counters:
prefetched: Local counter to account for number prefetch done.
pidx: Index for which next prefetch to be issued.
ptidx: Index at which next prefetch to be triggered.

Keep "ptidx" somewhere in the middle of blocks prefetched, so that
blocks prefetch read gets the enough time window before their demand
read is issued.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #11802 
Closes #11803
2021-04-15 13:49:27 -07:00
наб 86418090d7 ZTS: add zed_fd_spill to verify the fds ZEDLETs inherit
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11891
2021-04-15 13:46:05 -07:00
наб aa6a14c0d5 zed: set O_CLOEXEC on persistent fds, remove closefrom() from pre-exec
Also don't dup /dev/null over stdio if daemonised

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11891
2021-04-15 13:46:02 -07:00
Paul Dagnelie 414f7249dc Add SIGSTOP and SIGTSTP handling to issig
This change adds SIGSTOP and SIGTSTP handling to the issig function; 
this mirrors its behavior on Solaris. This way, long running kernel 
tasks can be stopped with the appropriate signals. Note that doing 
so with ctrl-z on the command line doesn't return control of the tty 
to the shell, because tty handling is done separately from stopping 
the process. That can be future work, if people feel that it is a 
necessary addition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Issue #810 
Issue #10843 
Closes #11801
2021-04-15 13:34:35 -07:00
Brian Behlendorf c95c5176b6 Fix 'make checkbashisms` warnings
The awk command used by the checkbashisms target incorrectly
adds the escape character before the ! and # characters.  This
results in the following warnings because these characters do not
need to be escaped.

    awk: cmd. line:1: warning: regexp escape sequence
        `\!' is not a known regexp operator
    awk: cmd. line:1: warning: regexp escape sequence
        `\#' is not a known regexp operator

Remove the unneeded escape character before ! and #.

Valid escape sequences are:

    https://www.gnu.org/software/gawk/manual/html_node/Escape-Sequences.html

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11902
2021-04-15 08:53:08 -07:00
Yuri Pankov 96904d879c Fix vdev health padding in zpool list -v
Do not (incorrectly, right instead left) pad health string itself,
it will be taken care of when printing property value below.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Yuri Pankov <yuripv@FreeBSD.org>
Closes #11899
2021-04-14 09:02:16 -07:00
Brian Behlendorf 3e660534ea Obsolete earlier packages due to version bump
Follow up to d5ef91af which adds a missing 'obsoletes' for the
libzfs-devel package.

Add a comment to the zfs.spec file as a reminder that previous
versions of the package should be marked as obsolete.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11844 
Closes #11895
2021-04-13 16:33:06 -07:00
наб d197a150b4 libzfs: get rid of unused libzfs_handle::libzfs_{storeerr,chassis_id}
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11868
2021-04-13 14:15:06 -07:00
наб 533527725b libzfs: get rid of libzfs_handle::libzfs_mnttab
All users did a freopen() on it. Even some non-users did!
This is point-less ‒ just open the mtab when needed

If I understand Solaris' getextmntent(3C) correctly, the non-user
freopen()s are very likely an odd, twisted vestigial tail of that ‒
but it's got a completely different calling convention and caching
semantics than any platform we support

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11868
2021-04-13 14:14:44 -07:00
наб 74e48f470e linux/libspl: getextmntent(): don't leak mnttab FILE*
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11868
2021-04-13 14:14:44 -07:00
наб b3530c4262 libzfs: zfs_mount_at(): load key for encryption root if MS_CRYPT
zfs_crypto_load_key() only works on encryption roots,
and zfs mount -la would fail if it encounters a datasets that
is sorted before their encroots.

To trigger:
  truncate -s 40G /tmp/test
  dd if=/dev/urandom of=/tmp/k bs=128 count=1 status=none
  zpool create -O encryption=on -O keylocation=file:///tmp/k \
               -O keyformat=passphrase test /tmp/test
  zfs create -o mountpoint=/a test/a
  zfs create -o mountpoint=/b test/b
  zfs umount test
  zfs unload-key test
  zfs mount -la

The final mount errored out with:
  Key load error: Keys must be loaded for
    encryption root of 'test/a' (test).
  Key load error: Keys must be loaded for
    encryption root of 'test/b' (test).

And only /test was mounted

This technically breaks the libzfs API, but the previous behavior was
decidedly a bug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11870 
Closes #11875
2021-04-12 21:26:55 -07:00
Mateusz Guzik 93f81eb721 FreeBSD: use vnlru_free_vfsops if available
Fixes issues when zfs is used along with other filesystems.

External-issue: https://cgit.freebsd.org/src/commit/?id=e9272225e6bed840b00eef1c817b188c172338ee
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11881
2021-04-12 11:01:46 -07:00
Mateusz Guzik 5ad86e973c FreeBSD: add missing seqc write begin/end around zfs_acl_chown_setattr
It happens to trip over an assert but does not matter for correctness at
this time. Done for future proofing.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11884
2021-04-12 10:59:57 -07:00
Mateusz Guzik d8c09f3fcc FreeBSD: add support for lockless symlink lookup
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11883
2021-04-12 10:59:22 -07:00
наб 53e9540e41 .gitmodules: link to openzfs github repository
Update the test images link to reference the openzfs github repository.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11868
2021-04-12 09:37:23 -07:00
Prawn ee6615e07a cmd/zfs receive: allow dry-run (-n) to check property args
zfs recv -n does not report some errors it could.  The code to bail 
out of the receive if in dry-run mode came a little early, skipping 
validation of cmdprops (recv -x and -o) among others.  Move the
check down to enable these additional checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #11862
2021-04-12 09:35:55 -07:00
наб 722b7f9a4c libuutil: purge unused functions
Remove vestigial uu_open_tmp().  The problems with this implementation
are many, but the primary one is the TMPPATHFMT macro, which is
unused, and always has been.

Searching around for any users leads only to earlier imports of the
same, identical file, i.a. into an apple repository (which does patch
gethrtime() into it and gives us a copyright date of 2007),
and a MidnightBSD one from 2008.

Searching illumos-gate, uu_open_tmp appears, in current HEAD, three
times: in the header, libuutil's mapfile ABI, and the implementation.

This slowly grows up to eight occurrences as one moves back to the root
"OpenSolaris Launch" commit: the header, implementation, twice in
libuutil's spec ABI, twice (with multilib and non-multilib paths) in
libuutil.so's i386 and SPARC binary db ABIs.

That's 2005, and this file was abandonware even then, it's dead code.

The situation is similar for the uu_dprintf() family of functions and
uu_dump().  Nothing in accessibly recorded history has ever used them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11873
2021-04-12 09:32:43 -07:00
Colm e086db1656 Improvements to the 'compatibility' property
Several improvements to the operation of the 'compatibility' property:

1) Improved handling of unrecognized features:
Change the way unrecognized features in compatibility files are handled.

 * invalid features in files under /usr/share/zfs/compatibility.d
   only get a warning (as these may refer to future features not yet in
   the library),
 * invalid features in files under /etc/zfs/compatibility.d
   get an error (as these are presumed to refer to the current system).

2) Improved error reporting from zpool_load_compat.
Note: slight ABI change to zpool_load_compat for better error reporting.

3) compatibility=legacy inhibits all 'zpool upgrade' operations.

4) Detect when features are enabled outside current compatibility set
   * zpool set compatibility=foo <-- print a warning
   * zpool set feature@xxx=enabled <-- error
   * zpool status <-- indicate this state

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #11861
2021-04-12 09:08:56 -07:00
Brian Behlendorf 888700bc6b ZTS: fix removal_condense_export test case
It's been observed in the CI that the required 25% of obsolete bytes
in the mapping can be to high a threshold for this test resulting in
condensing never being triggered and a test failure.  To prevent these
failures make the existing zfs_condense_indirect_obsolete_pct tuning
available so the obsolete percentage can be reduced from 25% to 5%
during this test.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11869
2021-04-11 21:49:13 -07:00
Brian Behlendorf 4ef03d077c Update libzfs.abi for zfs_send() change
Commit 099fa7e4 intentionally modified the libzfs ABI.  However, it
failed to include an update for the libzfs.abi file.  This commit
resolves the `make checkabi` warning due to that omission.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11710
2021-04-11 17:06:54 -07:00
pstef 458f82319a Balance parentheses in parameter descriptions
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Piotr Paweł Stefaniak <pstef@freebsd.org>
Closes #11882
2021-04-11 16:35:07 -07:00
Brian Behlendorf ea3cd8e420 ZTS: Add known exceptions
The fault/auto_spare_shared, l2arc/persist_l2arc_007_pos, and
alloc_class/alloc_class_013_pos test cases are not entirely reliable
and may occasionally fail resulting in a false positive in the CI.
Add these tests to known list of possible failures until they can
be made 100% reliable.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11890
2021-04-11 15:55:38 -07:00
наб 10b575d04c lib/: set O_CLOEXEC on all fds
As found by
  git grep -E '(open|setmntent|pipe2?)\(' |
    grep -vE '((zfs|zpool)_|fd|dl|lzc_re|pidfile_|g_)open\('

FreeBSD's pidfile_open() says nothing about the flags of the files it
opens, but we can't do anything about it anyway; the implementation does
open all files with O_CLOEXEC

Consider this output with zpool.d/media appended with
"pid=$$; (ls -l /proc/$pid/fd > /dev/tty)":
  $ /sbin/zpool iostat -vc media
  lrwx------ 0 -> /dev/pts/0
  l-wx------ 1 -> 'pipe:[3278500]'
  l-wx------ 2 -> /dev/null
  lrwx------ 3 -> /dev/zfs
  lr-x------ 4 -> /proc/31895/mounts
  lrwx------ 5 -> /dev/zfs
  lr-x------ 10 -> /usr/lib/zfs-linux/zpool.d/media
vs
  $ ./zpool iostat -vc vendor,upath,iostat,media
  lrwx------ 0 -> /dev/pts/0
  l-wx------ 1 -> 'pipe:[3279887]'
  l-wx------ 2 -> /dev/null
  lr-x------ 10 -> /usr/lib/zfs-linux/zpool.d/media

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-11 15:45:59 -07:00
наб 92ffd87aaf libzfs{,_core}: set O_CLOEXEC on persistent (ZFS_DEV and MNTTAB) fds
These were fd 3, 4, and 5 by the time zfs change-key hit
execute_key_fob()

glibc appends "e" to setmntent() mode, but musl's just returns fopen()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-11 15:45:31 -07:00
наб 0fc401a7ef libzfs: zfs_crypto_create() requires a new key by definition: set newkey
This changes the password prompt for new encryption roots from
  Enter passphrase:
  Re-enter passphrase:
to
  Enter new passphrase:
  Re-enter new passphrase:
which makes more sense and is more consistent with "new passphrase"
now always meaning "come up with something" and plain "passphrase"
"remember that thing"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-11 15:44:54 -07:00
наб e568853f96 libzfs_crypto.c: remove unused key_locator enum
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-11 15:43:15 -07:00
наб 0395cf92e0 zfprops(8): fix spacing in jailed= arguments
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-11 15:42:24 -07:00
наб 0241dfac47 zfs-[un]jail(8): fix "zfs-jail [un]jail" leftovers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-11 15:41:55 -07:00
наб e0779d1e20 zed: untangle _zed_conf_parse_path()
Dunno, maybe it's just me, but the previous style was /really/ confusing

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-11 15:27:34 -07:00
наб 346c85b722 zed: don't malloc() global zed_conf instance, optimise zed_conf layout
It's all of 40 bytes with 4-byte pointers and 64 with 8-byte ones
(previously 44 and 88, respectively) ‒
there's no reason it can't live on the stack

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-11 15:27:30 -07:00
наб 46d50eaf56 zed: remove zed_conf::{min,max}_events and ZED_{MIN,MAX}_EVENTS
No users, fields marked "reserved for future use", macros defined to 0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-11 15:27:25 -07:00
наб d09bd19629 zed: remove zed_conf::syslog_facility
No users, nobody sets it, main() hard-codes LOG_DAEMON, which is the
only correct value for this

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-11 15:27:13 -07:00
наб 88bf37d91a zed: _zed_conf_display_help(): be consistent about what got_err means
Users passed in EXIT_SUCCESS and EXIT_FAILURE, despite it being a bool

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-11 15:25:39 -07:00
наб d622f16b6b zed: untangle -h option listing
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-11 15:25:12 -07:00
наб 83cc6bbf79 zed: print out licence string as one big chunk
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-11 15:23:59 -07:00
pablofsf 099fa7e475 Allow zfs to send replication streams with missing snapshots
A tentative implementation and discussion was done in #5285.
According to it a send --skip-missing|-s flag has been added.
In a replication stream, when there are snapshots missing in
the hierarchy, if -s is provided print a warning and ignore
dataset (and its children) instead of throwing an error

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pablo Correa Gómez <ablocorrea@hotmail.com>
Closes #11710
2021-04-11 12:05:35 -07:00
Olaf Faaland 2ec0b0dd71 kmod-zfs should obsolete kmod-spl as well as spl-kmod
Without this Obsoletes, using packages built --with-spec=redhat, an
upgrade from zfs-0.7 to zfs-2.x does not cause the kmod-spl-0.7 package
to be removed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #11865
2021-04-11 12:02:26 -07:00
наб d08dc34515 zvol_wait: properly handle zvol_volmode sysctl being 3/none
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-11 11:58:36 -07:00
наб 4640baab69 zfs_ids_to_path: print correct wrong values
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-11 11:58:16 -07:00
наб ecbf7c6707 zfs_ids_to_path: the -v comes after the executable name
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-11 11:57:56 -07:00
наб e09318829b contrib/bpftrace: exec bpftrace, remove useless cat
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-11 11:57:34 -07:00
наб 0f2915602e arc_summary3: just read /s/m/{mod}/version instead of spawning cat
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-11 11:57:14 -07:00
наб 519aec83f5 zvol_wait: fix for zvols with spaces in name, optimise
list_zvols() would happily, for zvols with spaces in their names,
assign the second half to volmode, &c., so use a normal read
and set IFS to a tab instead of using 4 separate AWK processes(?)

Similarly, in filter_out_deleted_zvols(), run zfs(8) once and use the
output directly instead of spawning a zfs(8) process per zvol

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-11 11:56:53 -07:00
наб ea4541e4c6 zstreamdump: exec zstream dump
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-11 11:55:58 -07:00
Ryan Moeller a631283b74 Move zfsdev_state_{init,destroy} to common code
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11833
2021-04-08 21:17:43 -07:00
Ryan Moeller 1dff545278 Eliminate zfsdev_get_state_impl
After 3937ab20f zfsdev_get_state_impl can become zfsdev_get_state.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11833
2021-04-08 21:17:18 -07:00
TerraTech 161ed825ca zpl_inode.c: Fix SMACK interoperability
SMACK needs to have the ZFS dentry security field setup before
SMACK's d_instantiate() hook is called as it requires functioning
'__vfs_getxattr()' calls to properly set the labels.

Fxes:
1) file instantiation properly setting the object label to the
   subject's label
2) proper file labeling in a transmutable directory

Functions Updated:
1) zpl_create()
2) zpl_mknod()
3) zpl_mkdir()
4) zpl_symlink()

External-issue: https://github.com/cschaufler/smack-next/issues/1
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: TerraTech <TerraTech@users.noreply.github.com>
Closes #11646 
Closes #11839
2021-04-08 21:15:29 -07:00
Ryan Moeller 5d508d92d2 ZTS: Improve cleanup in removal_with_export
Kill the removal operation on every platform, not just Linux.
The test has been fixed and is now stable on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11856
2021-04-08 21:10:28 -07:00
Rich Ercolani 46fb478326 Added check for broken alien version
Added a check for alien 8.95.{1,2,3}, which is known to fail to 
generate debs 100% of the time, and instead print out a message 
informing the developer that it's known to be broken and linking 
them to more information.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #11848 
Closes #11850
2021-04-08 14:37:51 -07:00
Brian Behlendorf 600a1dc54c Use dsl_scan_setup_check() to setup a scrub
When a rebuild completes it will automatically schedule a follow up
scrub to verify all of the block checksums.  Before setting up the
scrub execute the counterpart dsl_scan_setup_check() function to
confirm the scrub can be started.  Prior to this change we'd only
check vdev_rebuild_active() which isn't as comprehensive, and using
the check function keeps all of this logic in one place.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11849
2021-04-08 14:33:15 -07:00
Tino Reichardt 9c3b926b0e Fix double sha1/sha1.o line in module/icp/Makefile.in
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #11852
2021-04-08 13:25:24 -07:00
Ryan Moeller 383401589e ZTS: Tests using zhack may fail on FreeBSD
As described in #11854, zhack is occasionally segfaulting on FreeBSD. 
Debugging this is proving to be tricky. To avoid false positives in
the CI add entries for the tests that use zhack in zts-report to 
accept that they may occasionally fail on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Issue #11854
Closes #11855
2021-04-08 13:21:53 -07:00
Ryan Moeller e778b0485b Ratelimit deadman zevents as with delay zevents
Just as delay zevents can flood the zevent pipe when a vdev becomes
unresponsive, so do the deadman zevents.

Ratelimit deadman zevents according to the same tunable as for delay
zevents.

Enable deadman tests on FreeBSD and add a test for deadman event
ratelimiting. 

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11786
2021-04-07 16:23:57 -07:00
наб c9c6537731 zed: only go up to current limit in close_from() fallback
Consider the following strace log:
  prlimit64(0, RLIMIT_NOFILE,
            NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = -1 EBADF (Bad file descriptor)
  dup2(0, 30000)                      = -1 EBADF (Bad file descriptor)
  dup2(0, 300000)                     = -1 EBADF (Bad file descriptor)
  prlimit64(0, RLIMIT_NOFILE,
            {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = 3000
  dup2(0, 30000)                      = 30000
  dup2(0, 300000)                     = 300000

Even a privileged process needs to bump its rlimit before being able
to use fds higher than rlim_cur.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:52:51 -07:00
наб 509a2dcf7d zed.8: the Diagnosis Engine is implemented
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:52:42 -07:00
наб 54f6daea7a zed: replace zed_file_write_n() with write(2), purge it
We set SA_RESTART early on, which will prevent EINTRs (indeed, to the
point of needing to clear it in the reaper, since it interferes with
pause(2)), which is the only error zed_file_write_n() actually handled
(plus, the pid write is no bigger than 12 bytes anyway)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:52:30 -07:00
наб 3d62acf0ad zed: merge all _NOT_IMPLEMENTED_ events
These events should currently never be generated.

Also untag _zed_event_add_nvpair() from merge with
zpool_do_events_nvprint() ‒ they serve different purposes (machine,
usually script vs human consumption) and format the output differently
as it stands

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:51:34 -07:00
наб 0d0720eb52 zed: remove unused zed_file_read_n()
Same deal as zed_file_close_on_exec()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:51:24 -07:00
наб 1a05182ba0 zed: bump zfs_zevent_len_max if we miss any events
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:51:15 -07:00
наб c52612ba03 zed.8: don't pretend an unprivileged user could change the script owner
And add a note on /why/ ZEDLETs need to be owned by root

Quoth chown(2), Linux man-pages project:
  Only a privileged process (Linux: one with the CAP_CHOWN capability)
  may change the owner of a file.

Quoth chown(2), FreeBSD:
     [EPERM]  The operation would change the ownership,
              but the effective user ID is not the super-user.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:51:06 -07:00
наб ed519ad495 zed: purge all mentions of a configuration file
There simply isn't a need for one, since the flags the daemon takes
are all short (mostly just toggles) and administrative in nature,
and are therefore better served by the age-old tradition of sourcing an
environment file and preparing the cmdline in the init-specific handler
itself, if needed at all

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:50:52 -07:00
наб 64c03a0a27 zed: implement close_from() in terms of /proc/self/fd, if available
/dev/fd on Darwin

Consider the following strace output:
  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0

Yes, that is well over a million file descriptors!

This reduces the ZED start-up time from "at least a second" to
"instantaneous", and, under strace, from "don't even try" to "usable"
by simple virtue of doing five syscalls instead of over a million;
in most cases the main loop does nothing

Recent Linuxes (5.8+) have close_range(2) for this, but that's an
overoptimisation (and libcs don't have wrappers for it yet)

This is also run by the ZEDLET pre-exec. Compare:
  Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0
to
  Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0
lol

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:50:38 -07:00
наб 3bc3eef9c3 zed: print combined system/user time after ZEDLET death
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:50:03 -07:00
Marcin Skarbek 7367be0da9 Add kmodtool fix to detect different System.map location
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Marcin Skarbek <git@skarbek.name>
Closes #7807 
Closes #11836
2021-04-07 10:17:39 -07:00
Olaf Faaland cfd4a25fce fix misplaced quotes in kmod-preamble
rpm/redhat/zfs-kmod.spec.in has a typo in the shell code that
creates the kmod-preamble file.  This typo results in the
preamble file having the wrong name,

./SOURCES/kmod-preamblenObsoletes

and missing the Obsoletes clause that has become part of the name.

Because the filename is incorrect, the built package does not have
"obsoletes" or "conflicts" set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #11851
2021-04-07 10:10:34 -07:00
Brian Behlendorf d5ef91af02 Obsolete earlier packages due to version bump
In order for package managers such as dnf to upgrade cleanly after
the package SONAME bump the obsolete package names must be known.
Update the new packages to correctly obsolete the old ones.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11844
Closes #11847
2021-04-07 10:09:21 -07:00
наб aa5a4eb5d0 i-t: don't brokenly set the scheduler for root pool vdev's disks
This effectively reverts
  4fc411f7a3 (part of #6807) and
  f6fbe25664 (#9042) ‒
the code itself and latter PR cite symmetry with whole-disk-vdev
behaviour (presumably because rootfs vdevs are rarely whole disks),
but the code is broken for NVME devices (indeed, it'd strip the
controller number instead of the (potential) partition number, turning
"nvme0n1p1" into "nvmen1p1", which would then subsequently fail the
sysfs existence check); it could be fixed to handle those (and any
others) rather easily by dereferencing /sys/class/block/$devname,
but this isn't the place for setting this ‒ as noted in the commit that
removed setting the scheduler by default
(9e17e6f254) ‒ use an udev rule

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11838
2021-04-06 18:29:31 -07:00
наб 55419e0a72 i-t: fix root=zfs:AUTO
IFS= would break loops in import_pool(), which would fault
any automatic import

Additionally $ZFS_BOOTFS from cmdline would interfere with find_rootfs()

If many pools were present, same thing could happen across multiple
find_rootfs() runs, so bail out early and clean up in error path

Suggested-by: @nachtgeist
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11278
Closes #11838
2021-04-06 18:28:38 -07:00
matt-fidd a03b288cf0 zfs get -p only outputs 3 columns if "clones" property is empty
get_clones_string currently returns an empty string for filesystem
snapshots which have no clones. This breaks parsable `zfs get` output as
only three columns are output, instead of 4.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Fiddaman <github@m.fiddaman.uk>
Co-authored-by: matt <matt@fiddaman.net>
Closes #11837
2021-04-06 16:05:54 -07:00
Matthew Ahrens bbcec73783 kmem_alloc(KM_SLEEP) should use kvmalloc()
`kmem_alloc(size>PAGESIZE, KM_SLEEP)` is backed by `kmalloc()`, which
finds contiguous physical memory.  If there isn't enough contiguous
physical memory available (e.g. due to physical page fragmentation), the
OOM killer will be invoked to make more memory available.  This is not
ideal because processes may be killed when there is still plenty of free
memory (it just happens to be in individual pages, not contiguous runs
of pages).  We have observed this when allocating the ~13KB `zfs_cmd_t`,
for example in `zfsdev_ioctl()`.

This commit changes the behavior of
`kmem_alloc(size>PAGESIZE, KM_SLEEP)` when there are insufficient
contiguous free pages.  In this case we will find individual pages and
stitch them together using virtual memory.  This is accomplished by
using `kvmalloc()`, which implements the described behavior by trying
`kmalloc(__GFP_NORETRY)` and falling back on `vmalloc()`.

The behavior of `kmem_alloc(KM_NOSLEEP)` is not changed; it continues to
use `kmalloc(GPF_ATOMIC | __GFP_NORETRY)`.  This is because `vmalloc()`
may sleep.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11461
2021-04-06 12:44:54 -07:00
наб 57a1214e3a zpool-features.5: remove "booting not possible with this feature"s
The exact limitations on what features are supported when booting
vary considerably depending on the environment.  In order to minimize
confusion avoid categorical statements which assume GRUB2 is being 
used.  The supported GRUB2 features are covered earlier in this man 
page for easy reference.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11842
2021-04-06 12:39:54 -07:00
George Melikov d35708b187 man: fix wrong .Xr macros usages
In addition, html doc will have working hyperlinks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11845
2021-04-06 12:27:40 -07:00
наб 61b50107a5 libzutil: zfs_isnumber(): return false if input empty
zpool list, which is the only user, would mistakenly try to parse the
empty string as the interval in this case:

  $ zpool list "a"
  cannot open 'a': no such pool
  $ zpool list ""
  interval cannot be zero
  usage: <usage string follows>
which is now symmetric with zpool get:
  $ zpool list ""
  cannot open '': name must begin with a letter

Avoid breaking the  "interval cannot be zero" string.
There simply isn't a need for this, and it's user-facing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11841 
Closes #11843
2021-04-06 12:25:53 -07:00
Brian Behlendorf ec580225d2 ZTS: pool_checkpoint improvements
The pool_checkpoint tests may incorrectly fail because several of
them invoke zdb for an imported pool.  In this scenario it's not
unexpected for zdb to fail if the pool is modified.  To resolve
this these zdb checks are now done after the pool has been exported.

Additionally, the default cleanup functions assumed the pool would
be imported when they were run.  If this was not the case they're
exit early and fail to cleanup all of the test state causing
subsequent tests to fail.  Add a check to only destroy the pool
when it is imported.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11832
2021-04-03 08:33:22 -07:00
Andrea Gelmini bf169e9f15 Fix various typos
Correct an assortment of typos throughout the code base.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #11774
2021-04-02 18:52:15 -07:00
наб 943df59ed9 bash_completion.d: always call zfs/zpool binaries directly
/dev/zfs is 0:0 666 on most systems, so the [ -w /dev/zfs ] check always
succeeds, but if zfs isn't in $PATH (e.g. when completing from
"/sbin/zfs list" on a regular account) this can lead to error spew like

  nabijaczleweli@szarotka:~$ /sbin/zfs list bash: zfs: command not found
  @ bash: zfs: command not found

We only do read-only commands, and quite general ones at that,
so there's no need to elevate one way or another.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11828
2021-04-02 16:34:58 -07:00
Brian Behlendorf c0af3c7b2c Add RELEASES.md file
Document the project's policy regarding publishing and maintaining
official OpenZFS releases.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11821
2021-04-02 16:33:40 -07:00
наб 73218f41b4 zed: allow limiting concurrent jobs
200ms time-out is relatively long, but if we already hit the cap,
then we'll likely be able to spawn multiple new jobs when we wake up

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807
2021-04-02 16:30:53 -07:00
наб 02a0fa1999 zed: remove unused zed_file_close_on_exec()
The FIXME comment was there since the initial implementation in 2014,
there are no users

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807
2021-04-02 16:30:34 -07:00
наб ca2ce9c50b zed: use separate reaper thread and collect ZEDLETs asynchronously
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807
2021-04-02 16:30:08 -07:00
наб 3ef80eefff zed: set names for all threads
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807
2021-04-02 16:29:35 -07:00
Ryan Moeller 583e320546 ZTS: inheritance/inherit_001_pos is flaky
Add inheritance/inherit_001_pos to the maybe fails on FreeBSD list.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11830
2021-04-02 11:11:52 -07:00
Ryan Moeller dce3176349 Avoid taking global lock to destroy zfsdev state
We have exclusive access to our zfsdev state object in this section
until it is invalidated by setting zs_minor to -1, so we can destroy
the state without taking a lock if we do the invalidation last, after
a member to ensure correct ordering.

While here, strengthen the assertions that zs_minor is valid when we
enter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11751
2021-04-02 11:09:05 -07:00
Ryan Moeller 02aaf11fc7 FreeBSD: Fix stable/12 after AT_BENEATH removal
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11827
2021-04-02 11:06:44 -07:00
Brian Behlendorf fe6babced2 Bump libzfs.so and libzpool.so versions
Bump the library versions as advised by the libtool guidelines.

https://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html

Two new functions were added but no existing functions were changed,
so we increase the version and the age (version:revision:age).

Added functions (2):
- boolean_t zpool_is_draid_spare(const char *);
- zpool_compat_status_t zpool_load_compat(const char *,
      boolean_t *, char *, char *);

Additionally bump the libzpool.so version information.  This library
is for internal use but we still want to update the version to track
major changes to the interfaces.

The libzfsbootenv, libuutil, libnvpair and libzfs_core libraries
have not been updated.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11817
2021-04-01 16:53:05 -07:00
Ryan Moeller c05eec32a7 Allow pool names that look like Solaris disk names
Nothing bad happens if a prefix of your pool name matches a disk name.
This is a bit of a silly restriction at this point.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11781 
Closes #11813
2021-04-01 08:49:41 -07:00
Ryan Moeller 032a213e2e Don't scale zfs_zevent_len_max by CPU count
The lower bound for this scaling to too low and the upper bound is too
high.  Use a fixed default length of 512 instead, which is a reasonable
value on any system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11822
2021-04-01 08:45:04 -07:00
Ryan Moeller 3ba10f9a6a Atomically check and set dropped zevent count
ratelimit_dropped isn't protected by a lock and is expected to
be updated atomically.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11822
2021-04-01 08:43:01 -07:00
Brian Behlendorf 9f9e1b5425 CI: Increase free space in workflow
Recently we've been running out of free space in the ubuntu 20.04
environment resulting in test failures.  This appears to be caused
by a change in the default available free space and not because of
any change in OpenZFS. Try and avoid this failure by applying a
suggested workaround which removes some unnecessary files.

https://github.com/actions/virtual-environments/issues/2840

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11826
2021-04-01 08:39:27 -07:00
Brian Atkinson 19c9247d57 Fixing m4 iops rename check
The configure check for iops->rename wanting flags was missing the
AC_MSG_CHECKING() so it would just print yes without saying what was
being checked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11825
2021-04-01 08:37:41 -07:00
наб 0c2eb3f540 fsck.zfs: implement 4/8 exit codes as suggested in manpage
Update the fsck.zfs helper to bubble up some already-known-about 
errors if they are detected in the pool.

health=degraded => 4/"Filesystem errors left uncorrected"
health=faulted && dataset in /etc/fstab => 8/"Operational error"
pool not found => 8/"Operational error"
everything else => 0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11806
2021-03-31 10:49:56 -07:00
Mike Swanson 67859aedd1 Add compatibility file sets (ZoL 0.6.1, 0.6.4, OpenZFS 2.1)
ZoL 0.6.1 introduced feature flags with the three features that all
implementations at the time were guaranteed to have.  0.6.4 introduced
a few more until 0.6.5 added two after that.  OpenZFS 2.1 added the
dRAID feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com>
Closes #11818
2021-03-31 09:40:25 -07:00
Brian Behlendorf 9ac82cab2d Update META
Increase the version to 2.1.99 to indicate the master branch is
newer than the 2.1.x release.  This ensures packages built from
master branch are considered to be newer than the last release.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-03-30 10:50:55 -07:00
Brian Behlendorf 3522f57b6a Tag 2.1.0-rc1
New features:
- Distributed Spare (dRAID) Feature
- Added "compatibility" property for zpool feature sets
- Added zpool_influxdb command to collect zpool statistics

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-03-29 16:39:52 -07:00
наб 38280c3526 zed: reap child after killing on time-out
When a child process is killed waitpid() must be called on the
pid the reap the zombie process.

Update BUGS section to reflect reality by replacing "zedlets
aren't time limited with "zedlets can be interrupted".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11769 
Closes #11798
2021-03-26 14:21:00 -07:00
Matthew Ahrens 2b56a63457 Use a helper function to clarify gang block size
For gang blocks, `DVA_GET_ASIZE()` is the total space allocated for the
gang DVA including its children BP's.  The space allocated at each DVA's
vdev/offset is `vdev_psize_to_asize(vd, SPA_GANGBLOCKSIZE)`.

This commit makes this relationship more clear by using a helper
function, `vdev_gang_header_asize()`, for the space allocated at the
gang block's vdev/offset.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11744
2021-03-26 11:19:35 -07:00
Matthew Ahrens b85f47efd0 When specifying raidz vdev name, parity count should match
When specifying the name of a RAIDZ vdev on the command line, it can be
specified as raidz-<vdevID> or raidzP-<vdevID>.
e.g. `zpool clear poolname raidz-0` or `zpool clear poolname raidz2-0`

If the parity is specified in the vdev name, it should match the actual
parity of that RAIDZ vdev, otherwise the command should fail.  This
commit makes it so.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Stuart Maybee <stuart.maybee@comcast.net>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11742
2021-03-26 11:12:22 -07:00
Luis Henriques 2037edbdaa Fix error code on __zpl_ioctl_setflags()
Other (all?) Linux filesystems seem to return -EPERM instead of -EACCESS
when trying to set FS_APPEND_FL or FS_IMMUTABLE_FL without the
CAP_LINUX_IMMUTABLE capability.  This was detected by generic/545 test
in the fstest suite.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Luis Henriques <henrix@camandro.org>
Closes #11791
2021-03-26 10:46:45 -07:00
Jessica Clarke ef977fce66 Support running FreeBSD buildworld on Arm-based macOS hosts
Arm-based Macs are like FreeBSD and provide a full 64-bit stat from the
start, so have no stat64 variants. Thus, define stat64 and fstat64 as
aliases for the normal versions.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
Closes #11771
2021-03-26 10:45:12 -07:00
Andrea Gelmini 8a915ba1f6 Removed duplicated includes
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #11775
2021-03-22 12:34:58 -07:00
Andrea Gelmini be1e69f31c Fix typo in Python method name
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #11776
2021-03-22 12:32:38 -07:00
Alexander Motin 891568c990 Split dmu_zfetch() speculation and execution parts
To make better predictions on parallel workloads dmu_zfetch() should
be called as early as possible to reduce possible request reordering.
In particular, it should be called before dmu_buf_hold_array_by_dnode()
calls dbuf_hold(), which may sleep waiting for indirect blocks, waking
up multiple threads same time on completion, that can significantly
reorder the requests, making the stream look like random.  But we
should not issue prefetch requests before the on-demand ones, since
they may get to the disks first despite the I/O scheduler, increasing
on-demand request latency.

This patch splits dmu_zfetch() into two functions: dmu_zfetch_prepare()
and dmu_zfetch_run().  The first can be executed as early as needed.
It only updates statistics and makes predictions without issuing any
I/Os.  The I/O issuance is handled by dmu_zfetch_run(), which can be
called later when all on-demand I/Os are already issued.  It even
tracks the activity of other concurrent threads, issuing the prefetch
only when _all_ on-demand requests are issued.

For many years it was a big problem for storage servers, handling
deeper request queues from their clients, having to either serialize
consequential reads to make ZFS prefetcher usable, or execute the
incoming requests as-is and get almost no prefetch from ZFS, relying
only on deep enough prefetch by the clients.  Benefits of those ways
varied, but neither was perfect.  With this patch deeper queue
sequential read benchmarks with CrystalDiskMark from Windows via
iSCSI to FreeBSD target show me much better throughput with almost
100% prefetcher hit rate, comparing to almost zero before.

While there, I also removed per-stream zs_lock as useless, completely
covered by parent zf_lock.  Also I reused zs_blocks refcount to track
zf_stream linkage of the stream, since I believe previous zs_fetch ==
NULL check in dmu_zfetch_stream_done() was racy.

Delete prefetch streams when they reach ends of files.  It saves up
to 1KB of RAM per file, plus reduces searches through the stream list.

Block data prefetch (speculation and indirect block prefetch is still
done since they are cheaper) if all dbufs of the stream are already
in DMU cache.  First cache miss immediately fires all the prefetch
that would be done for the stream by that time.  It saves some CPU
time if same files within DMU cache capacity are read over and over.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11652
2021-03-19 22:56:11 -07:00
Chunwei Chen 296a4a369b Fix zfs_get_data access to files with wrong generation
If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.

This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #10593
Closes #11682
2021-03-19 22:53:31 -07:00
Andrew 66e6d3f128 Fix regression in POSIX mode behavior
Commit 235a85657 introduced a regression in evaluation of POSIX modes
that require group DENY entries in the internal ZFS ACL. An example
of such a POSX mode is 007. When write_implies_delete_child is set,
then ACE_WRITE_DATA is added to `wanted_dirperms` in prior to calling
zfs_zaccess_common(). This occurs is zfs_zaccess_delete().

Unfortunately, when zfs_zaccess_aces_check hits this particular DENY
ACE, zfs_groupmember() is checked to determine whether access should be
denied, and since zfs_groupmember() always returns B_TRUE on Linux and
so this check is failed, resulting ultimately in EPERM being returned.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #11760
2021-03-19 22:50:46 -07:00
Palash Gandhi c23850759f ZTS: New test for kernel panic induced by redacted send
This change adds a new test that covers a bug fix in the binary search
in the redacted send resume logic that causes a kernel panic.
The bug was fixed in https://github.com/openzfs/zfs/pull/11297.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Palash Gandhi <palash.gandhi@delphix.com>
Closes #11764
2021-03-19 22:47:50 -07:00
Martin Matuška cd5b812818 Allow setting bootfs property on pools with indirect vdevs
The FreeBSD boot loader relies on the bootfs property and is capable
of booting from removed (indirect) vdevs.

Reviewed-by Eric van Gyzen
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11763
2021-03-19 22:46:43 -07:00
Ryan Moeller 0ab84bff55 Fix typo in zgenhostid.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11770
2021-03-19 22:39:42 -07:00
Brian Atkinson f52124dce8 Removing old code for k(un)map_atomic
It used to be required to pass a enum km_type to kmap_atomic() and
kunmap_atomic(), however this is no longer necessary and the wrappers
zfs_k(un)map_atomic removed these. This is confusing in the ABD code as
the struct abd_iter member iter_km no longer exists and the wrapper
macros simply compile them out.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11768
2021-03-19 22:38:44 -07:00
Serapheim Dimitropoulos 793c958f6f Initialize metaslab range trees in metaslab_init
= Motivation

We've noticed several zloop crashes within Delphix generated
due to the following sequence of events:

- A device gets expanded and new metaslabas are allocated for
  it. These metaslabs go through `metaslab_init()` but haven't
  gone through `metaslab_sync_done()` yet. This meas that the
  only range tree that's actually set is the `ms_allocatable`.
  All the others are NULL.

- A vdev_initialization is issues and `vdev_initialize_thread`
  starts processing one of these new metaslabs of the expanded
  vdev.

- As part of `vdev_initialize_calculate_progress()` we call
  into `metaslab_load()` and `metaslab_load_impl()` which
  in turn tries to dereference the metaslabs trees that
  are still NULL and therefore we crash.

The same failure can come up from the `vdev_trim` code paths.

= This Patch

We considered the following solutions to deal with this issue:

[A] Add logic to `vdev_initialize/trim` to skip those new
    metaslabs. We decided against this as it would be good
    to avoid exposing this lower-level detail to higer-level
    operations.

[B] Have `metaslab_load_impl()` return early for new metaslabs
    and thus never touch those range_trees that are NULL at
    that time. This seemed more of a work-around for the bug
    and not a clear-cut solution.

[C] Refactor our logic so all metaslabs have their range_trees
    created at the time of their creatin in `metaslab_init()`.

In this patch we decided to go with [C] because:

(1) It doesn't expose more metaslab details to higher level
    operations such as vdev initialize and trim.

(2) The current behavior of creating the range trees lazily
    in `metaslab_sync_done()` is unnecessarily complicated.

(3) Always initializing the metaslab range_trees makes other
    parts of the codebase cleaner. For example, we used to
    use `ms_freed` as the reference value for knowing whether
    all the range_trees have been initialized. Now we no
    longer need to do that check in most places (and in the
    few that we do we use the `ms_new` boolean field now
    which is more readable).

= Side Changes

Probably due to a mismerge we set `ms_loaded` to `B_TRUE` twice
in `metasloab_load_impl()`. In this patch we remove the extraneous
assignment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #11737
2021-03-19 22:36:02 -07:00
Coleman Kane ffd6978ef5 Linux 5.12 update: bio_max_segs() replaces BIO_MAX_PAGES
The BIO_MAX_PAGES macro is being retired in favor of a bio_max_segs()
function that implements the typical MIN(x,y) logic used throughout the
kernel for bounding the allocation, and also the new implementation is
intended to be signed-safe (which the former was not).

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11765
2021-03-19 22:33:42 -07:00
Coleman Kane e2a8296131 Linux 5.12 compat: idmapped mounts
In Linux 5.12, the filesystem API was modified to support ipmapped
mounts by adding a "struct user_namespace *" parameter to a number
functions and VFS handlers. This change adds the needed autoconf
macros to detect the new interfaces and updates the code appropriately.
This change does not add support for idmapped mounts, instead it
preserves the existing behavior by passing the initial user namespace
where needed.  A subsequent commit will be required to add support
for idmapped mounted.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11712
2021-03-19 21:00:59 -07:00
Matthew Ahrens 330c6c0523 Clean up RAIDZ/DRAID ereport code
The RAIDZ and DRAID code is responsible for reporting checksum errors on
their child vdevs.  Checksum errors represent events where a disk
returned data or parity that should have been correct, but was not.  In
other words, these are instances of silent data corruption.  The
checksum errors show up in the vdev stats (and thus `zpool status`'s
CKSUM column), and in the event log (`zpool events`).

Note, this is in contrast with the more common "noisy" errors where a
disk goes offline, in which case ZFS knows that the disk is bad and
doesn't try to read it, or the device returns an error on the requested
read or write operation.

RAIDZ/DRAID generate checksum errors via three code paths:

1. When RAIDZ/DRAID reconstructs a damaged block, checksum errors are
reported on any children whose data was not used during the
reconstruction.  This is handled in `raidz_reconstruct()`.  This is the
most common type of RAIDZ/DRAID checksum error.

2. When RAIDZ/DRAID is not able to reconstruct a damaged block, that
means that the data has been lost.  The zio fails and an error is
returned to the consumer (e.g. the read(2) system call).  This would
happen if, for example, three different disks in a RAIDZ2 group are
silently damaged.  Since the damage is silent, it isn't possible to know
which three disks are damaged, so a checksum error is reported against
every child that returned data or parity for this read.  (For DRAID,
typically only one "group" of children is involved in each io.)  This
case is handled in `vdev_raidz_cksum_finish()`. This is the next most
common type of RAIDZ/DRAID checksum error.

3. If RAIDZ/DRAID is not able to reconstruct a damaged block (like in
case 2), but there happens to be additional copies of this block due to
"ditto blocks" (i.e. multiple DVA's in this blkptr_t), and one of those
copies is good, then RAIDZ/DRAID compares each sector of the data or
parity that it retrieved with the good data from the other DVA, and if
they differ then it reports a checksum error on this child.  This
differs from case 2 in that the checksum error is reported on only the
subset of children that actually have bad data or parity.  This case
happens very rarely, since normally only metadata has ditto blocks.  If
the silent damage is extensive, there will be many instances of case 2,
and the pool will likely be unrecoverable.

The code for handling case 3 is considerably more complicated than the
other cases, for two reasons:

1. It needs to run after the main raidz read logic has completed.  The
data RAIDZ read needs to be preserved until after the alternate DVA has
been read, which necessitates refcounts and callbacks managed by the
non-raidz-specific zio layer.

2. It's nontrivial to map the sections of data read by RAIDZ to the
correct data.  For example, the correct data does not include the parity
information, so the parity must be recalculated based on the correct
data, and then compared to the parity that was read from the RAIDZ
children.

Due to the complexity of case 3, the rareness of hitting it, and the
minimal benefit it provides above case 2, this commit removes the code
for case 3.  These types of errors will now be handled the same as case
2, i.e. the checksum error will be reported against all children that
returned data or parity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11735
2021-03-19 16:22:10 -07:00
Mateusz Guzik 2f385c913f FreeBSD: make seqc asserts conditional on replay
Avoids tripping on asserts when doing pool recovery.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11739
2021-03-17 22:09:45 -07:00
Matthew Ahrens 46df6e98aa Remove unused rr_code
The `rr_code` field in `raidz_row_t` is unused.

This commit removes the field, as well as the code that's used to set
it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11736
2021-03-17 21:57:09 -07:00
Ryan Moeller ec3e4c6784 FreeBSD: Fix memory leaks in kstats
Don't handle (incorrectly) kmem_zalloc() failure.  With KM_SLEEP,
will never return NULL.

Free the data allocated for non-virtual kstats when deleting the object.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11767
2021-03-17 21:55:18 -07:00
Adam D. Moss 1daad98176 Linux: always check or verify return of igrab()
zhold() wraps igrab() on Linux, and igrab() may fail when the inode 
is in the process of being deleted.  This means zhold() must only be
called when a reference exists and therefore it cannot be deleted. 
This is the case for all existing consumers so add a VERIFY and a
comment explaining this requirement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11704
2021-03-16 16:33:34 -07:00
Dries Michiels 5f9d61d06b Update FreeBSD versions
Update supported FreeBSD versions in documentation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Dries Michiels <driesm.michiels@gmail.com>
Closes #11718
2021-03-16 15:03:28 -07:00
gldisater 07dff5cffe Hold and release permissions exist
The man page was missing these two permissions.
Add the missing permissions to the man page. 

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jeremy Faulkner <gldisater@gldis.ca>
Closes #11727
2021-03-16 15:01:21 -07:00
Ryan Moeller 5638803b6a ZTS: Add tests for DOS mode attributes
Create a new section of tests to run with acltype=off.

For now the only test we have is for the DOS mode READONLY attribute on
FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11734
2021-03-16 15:00:14 -07:00
Don Brady dd0b5c8559 Reference_tracking_enable should be a module param
To make use of zfs_refcount_held tunable it should be a module 
parameter in open-zfs.  Also, since the macros will auto-generate OS 
specific tunables, removed the existing zfs_refcount_held reference 
in module/os/freebsd/zfs/sysctl_os.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #11753
2021-03-16 14:56:17 -07:00
Ryan Moeller 9305ff2edf ZTS: Fix incorrect use of libtest in user_run by xattr_003_neg
You can't use user_run to eval ksh functions defined in libtest unless
you include libtest in the user shell.

Fix xattr_003_neg by:
* include libtest in the user shell
* *then* run get_xattr
* assert this fails
* use variables for filenames so they don't change in the user's shell
* don't log the contents of /etc/passwd
* cleanup all byproducts

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11185
2021-03-12 16:17:30 -08:00
Ryan Moeller e0b53a5dbb ZTS: Use ksh and current environment for user_run
The current user_run often does not work as expected.  Commands are run
in a different shell, with a different environment, and all output is
discarded.

Simplify user_run to retain the current environment, eliminate eval,
and feed the command string into ksh.  Enhance the logging for
user_run so we can see out and err.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11185
2021-03-12 16:17:01 -08:00
Mariusz Zaborski e464f7c7cc FreeBSD: bring back possibility to rewind the checkpoint from bootloader
Add parsing of the rewind options.

When I was upstreaming the change [1], I omitted the part where we
detect that the pool should be rewind. When the FreeBSD repo has
synced with the OpenZFS, this part of the code was removed.

[1] FreeBSD repo: 277f38abffc6a8160b5044128b5b2c620fbb970c
[2] OpenZFS repo: f2c027bd6a

External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254152
Originally reviewed by: tsoome, allanjude
Originally reviewed by: kevans (ok from high-level overview)
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #11730
2021-03-12 16:12:14 -08:00
Ryan Moeller f845b2dd1c FreeBSD: Clean up zfsdev_close to match Linux
Resolve some oddities in zfsdev_close() which could result in a
panic and were not present in the equivalent function for Linux.

- Remove unused definition ZFS_MIN_MINOR
- FreeBSD: Simplify zfsdev state destruction
- Assert zs_minor is valid in zfsdev_close
- Make locking around zfsdev state match Linux

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11720
2021-03-12 16:09:15 -08:00
Mateusz Guzik e3e82dcc51 FreeBSD: switch teardown lock to rms
This deserializes otherwise non-contending operations.

The previous scheme of using 17 locks hashed by curthread runs into
conflicts very quickly. Check the pull request for sample results.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:48 -08:00
Mateusz Guzik 5ebe425a5b Macroify teardown lock handling
This will allow platforms to implement it as they see fit, in particular
in a different manner than rrm locks.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:39 -08:00
Mateusz Guzik 9847f77f01 FreeBSD: rename teardown inactive macros to mimick rrm convention
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:31 -08:00
Mateusz Guzik f9acd578f0 FreeBSD: remove 2 assertions that teardown lock is not held
They are not very useful and hard to implement in the rms routine
the code is about to start using.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:20 -08:00
Mateusz Guzik 300f68e017 FreeBSD: rework asserts in zfs_dd_lookup
1. even up ifdefs
2. drop the arguably useless teardown lock asserts -- nothing else
   checks for it

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:07 -08:00
Mateusz Guzik 446400346d Add branch prediction to ZFS_ENTER and ZFS_VERIFY_ZP macros
They are expected to fail only in corner cases.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:03 -08:00
George Wilson 0936981d86 zpool import cachefile improvements
Importing a pool using the cachefile is ideal to reduce the time
required to import a pool. However, if the devices associated with
a pool in the cachefile have changed, then the import would fail.
This can easily be corrected by doing a normal import which would
then read the pool configuration from the labels.

The goal of this change is make importing using a cachefile more
resilient and auto-correcting. This is accomplished by having
the cachefile import logic automatically fallback to reading the
labels of the devices similar to a normal import. The main difference
between the fallback logic and a normal import is that the cachefile
import logic will only look at the device directories that were
originally used when the cachefile was populated. Additionally,
the fallback logic will always import by guid to ensure that only
the pools in the cachefile would be imported.

External-issue: DLPX-71980
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #11716
2021-03-12 15:42:27 -08:00
Martin Matuška b8fa03efbc Fix whitespace introduced in ecc277cff
The manual page change in ecc277c has introduced whitespace on
line ends.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11722
2021-03-11 19:42:04 -08:00
Ryan Moeller 35aa9dc6df FreeBSD: Fix scope of deadman tunables
A few deadman tunables ended up in the wrong sysctl node.

Move them to vfs.zfs.deadman.*

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11715
2021-03-11 19:23:24 -08:00
Adam D. Moss c94d648b1c Microoptimizations for VERIFY() and friends
Add branch hints and constify the intermediate evaluations of 
left/right params in VERIFY3*().

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11708
2021-03-11 17:16:09 -08:00
Allan Jude 92e8fb6395 Add missing files to Makefile
Some .h files that were added were missed in this Makefile. Since 
they are .h files, their being missing only resulted in them 
disappeared from the dist archive.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11705
2021-03-11 17:13:34 -08:00
George Melikov 8d534c37ac CI checkstyle: pin ubuntu version
Our checkstyle doesn't work well on Ubuntu 20.04,
temporary pin it to 18.04.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11713
2021-03-11 17:11:31 -08:00
Don Brady f5ada6538d Return finer grain errors in libzfs unmount_one
Added errno mappings to unmount_one() in libzfs.  Changed do_unmount() 
implementation to return errno errors directly like is done for 
do_mount() and others.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #11681
2021-03-08 08:46:45 -08:00
Tony Hutter 4fdbd43450 vdev_id: Create symlinks even if no /dev/mapper/
vdev_id uses the /dev/mapper/ symlinks to resolve a UUID to a dm name
(like dm-1).  However on some multipath setups, there is no /dev/mapper/
entry for the UUID at the time vdev_id is called by udev.  However,
this isn't necessarily needed, as we may be able to resolve the dm
name from the $DEVNAME that udev passes us (like DEVNAME="/dev/dm-1").

This patch tries to resolve the dm name from $DEVNAME first, before
falling back to looking in /dev/mapper/.  This fixed an issue where the
by-vdev names weren't reliably showing up on one of our nodes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11698
2021-03-08 08:43:30 -08:00
Antonio Russo b2eebe3ae7 ZTS events_002: Improve speed and reliability
events_002 exercises the ZED, ensuring that it neither misses events,
nor reporting events twice.

On slow test hardware, some of the timeouts are insufficient to allow
the ZED to properly settle.  Conversely, on fast hardware these same
timeouts are too long, unnecessarily slowing the test run.

Instead of using a fixed timeout, wait for the expected final event
before returning.  Additionally, wait with a timeout for unexpected
events to avoid missing them if they show up late.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11703
2021-03-08 08:42:45 -08:00
Christian Schwarz 93e3658035 zvol: call zil_replaying() during replay
zil_replaying(zil, tx) has the side-effect of informing the ZIL that an
entry has been replayed in the (still open) tx.  The ZIL uses that
information to record the replay progress in the ZIL header when that
tx's txg syncs.

ZPL log entries are not idempotent and logically dependent and thus
calling zil_replaying() is necessary for correctness.

For ZVOLs the question of correctness is more nuanced: ZVOL logs only
TX_WRITE and TX_TRUNCATE, both of which are idempotent. Logical
dependencies between two records exist only if the write or discard
request had sync semantics or if the ranges affected by the records
overlap.

Thus, at a first glance, it would be correct to restart replay from
the beginning if we crash before replay completes. But this does not
address the following scenario:
Assume one log record per LWB.
The chain on disk is

    HDR -> 1:W(1, "A") -> 2:W(1, "B") -> 3:W(2, "X") -> 4:W(3, "Z")

where N:W(O, C) represents log entry number N which is a TX_WRITE of C
to offset A.
We replay 1, 2 and 3 in one txg, sync that txg, then crash.
Bit flips corrupt 2, 3, and 4.
We come up again and restart replay from the beginning because
we did not call zil_replaying() during replay.
We replay 1 again, then interpret 2's invalid checksum as the end
of the ZIL chain and call replay done.
The replayed zvol content is "AX".

If we had called zil_replaying() the HDR would have pointed to 3
and our resumed replay would not have replayed anything because
3 was corrupted, resulting in zvol content "BX".

If 3 logically depends on 2 then the replay corrupted the ZVOL_OBJ's
contents.

This patch adds the zil_replaying() calls to the replay functions.
Since the callbacks in the replay function need the zilog_t* pointer
so that they can call zil_replaying() we open the ZIL while
replaying in zvol_create_minor(). We also verify that replay has
been done when on-demand-opening the ZIL on the first modifying
bio.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11667
2021-03-07 09:49:58 -08:00
Ryan Moeller b30cd70599 ZTS: Improve cleanup in zpool tests
* Restore original kern.corefile value after the test.
* Don't leave behind a frozen pool.
* Clean up leftover vdev files.
* Make zpool_002_pos and zpool_003_pos consistent in their handling of
core files while here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11694
2021-03-07 09:41:01 -08:00
manfromafar ecc277cff7 Clarify compressed zfs send/recv behavior
Docs for send and receive do not explain behavior when sending a 
compressed stream then receiving on a host that overrides compression 
with -o compress=value.

The data from the send stream is written as it was from the send is 
the compressed form but the compression algorithm set on the receiver 
is the overridden version which causes some confusion as to what 
algorithm was actually used.

Updated man docs to clarify behavior

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed By: Allan Jude <allanjude@freebsd.org>
Signed-off-by: manfromafar <manfromafar@outlook.com>
Closes #11690
2021-03-07 09:39:16 -08:00
Ryan Moeller 4b2e20824b Intentionally allow ZFS_READONLY in zfs_write
ZFS_READONLY represents the "DOS R/O" attribute.
When that flag is set, we should behave as if write access
were not granted by anything in the ACL.  In particular:
We _must_ allow writes after opening the file r/w, then
setting the DOS R/O attribute, and writing some more.
(Similar to how you can write after fchmod(fd, 0444).)

Restore these semantics which were lost on FreeBSD when refactoring
zfs_write.  To my knowledge Linux does not actually expose this flag,
but we'll need it to eventually so I've added the supporting checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11693
2021-03-07 09:31:52 -08:00
Brian Behlendorf e7a06356c1 Suppress cppcheck invalidSyntax warninigs
For some reason cppcheck 1.90 is generating an invalidSyntax warning
when the BF64_SET macro is used in the zstream source.  The same
warning is not reported by cppcheck 2.3, nor is their any evident
problem with the expanded macro.  This appears to be an issue with
this version of cppcheck.  This commit annotates the source to suppress
the warning.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11700
2021-03-05 17:56:35 -08:00
Brian Behlendorf 6bbb44e157 Initialize ZIL buffers
When populating a ZIL destination buffer ensure it is always
zeroed before its contents are constructed.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11687
2021-03-05 14:45:13 -08:00
Jorgen Lundman 8a6d444825 Fix abd_get_offset_struct() may allocate new abd
Even when supplied with an abd to abd_get_offset_struct(), the call
to abd_get_offset_impl() can allocate a different abd. Ensure to
call abd_fini_struct() on the abd that is not used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #11683
2021-03-05 12:22:57 -08:00
Ryan Moeller ba74de88c0 FreeBSD module --enable-debug --enable-invariants
Wire up the --enable-debug flag for configure to the FreeBSD module
build.  Add --enable-invariants.

The running FreeBSD kernel config is used to detect whether to enable
INVARIANTS if not explicitly specified with --enable-invariants or
--disable-invariants.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11678
2021-03-05 12:16:41 -08:00
Thomas Lamprecht fd1c366f82 zpool: use tab to intend continuation from removal status
Bring the output of the removal status in line with the other
"fields" that zpool status outputs, and thus allows an parser to
easier detect this as continuation of the 'remove:' output.

Before:
remove: Removal of vdev 0 copied 282G in 0h9m, completed on [...]
    776K memory used for removed device mappings

Now:
remove: Removal of vdev 0 copied 282G in 0h9m, completed on [...]
	776K memory used for removed device mappings

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Closes #11674
2021-03-05 12:15:35 -08:00
James Wah 92fb29b9f9 Don't bomb out when using keylocation=file://
Avoid following the error path when the operation in fact succeeded.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: James Wah <james@laird-wah.net>
Closes #11651
2021-03-03 08:28:49 -08:00
Christian Schwarz e439ee83c1 linux: zvol: avoid heap allocation for zvol_request_sync=1
The spl_kmem_alloc showed up in some flamegraphs in a single-threaded
4k sync write workload at 85k IOPS on an
Intel(R) Xeon(R) Silver 4215 CPU @ 2.50GHz.
Certainly not a huge win but I believe the change is clean and
easy to maintain down the road.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11666
2021-03-03 08:15:28 -08:00
Jake Howard 3242b5358e Add "zstd-fast" to help options for "compression" property
This value does work as expected, and is documented in the manpage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jake Howard <git@theorangeone.net>
Closes #11670
2021-03-03 08:14:19 -08:00
nssrikanth bedbc13daa Cancel TRIM / initialize on FAULTED non-writeable vdevs
When a device which is actively trimming or initializing becomes
FAULTED, and therefore no longer writable, cancel the active
TRIM or initialization.  When the device is merely taken offline
with `zpool offline` then stop the operation but do not cancel it.
When the device is brought back online the operation will be
resumed if possible.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Closes #11588
2021-03-02 10:27:27 -08:00
Andriy Gapon 2e160dee97 Fix assert in FreeBSD-specific dmu_read_pages
The function has three similar pieces of code: for read-behind pages,
requested pages and read-ahead pages.  All three pieces had an
assert to ensure that the page is not mapped.  Later the assert was
relaxed to require that the page is not mapped for writing.  But that
was done in two places out of three.  This change fixes the third piece,
read-ahead.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #11654
2021-02-27 17:23:09 -08:00
Brian Behlendorf 3e73ea0c10 ZTS: zpool_trim_start_and_cancel_pos.ksh
Several of the TRIM tests were based of the initialize tests and
then adapted for TRIM.  The zpool_trim_start_and_cancel_pos.ksh
test was intended to be one such test but it was overlooked and
actually never adapted.  Update it accordingly.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11649
2021-02-27 17:19:50 -08:00
Martin Matuška 03ef8f09e1 Add missing checks for unsupported features
After 35ec517 it has become possible to import ZFS pools witn an
active org.illumos:edonr feature on FreeBSD, leading to a panic.

In addition, "zpool status" reported all pools without edonr
as upgradable and "zpool upgrade -v" reported edonr in the list
of upgradable features.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11653
2021-02-27 17:16:02 -08:00
Coleman Kane 778fa36ee7 Linux 5.12 compat: replace bio_*_io_acct with disk_*_io_acct
The bio_*_acct functions became GPL exports, which causes the
kernel modules to refuse to compile. This replaces code with
alternate function calls to the disk_*_io_acct interfaces, which
are not GPL exports. This change was added in kernel commit
99dfc43ecbf67f12a06512918aaba61d55863efc.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11639
2021-02-24 10:06:05 -08:00
Coleman Kane d939930fcc Linux 5.12 compat: bio->bi_disk member moved
The struct bio member bi_disk was moved underneath a new member named
bi_bdev. So all attempts to reference bio->bi_disk need to now become
bio->bi_bdev->bd_disk.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11639
2021-02-24 10:04:34 -08:00
Brian Behlendorf 8e43fa12c5 Fix vdev_rebuild_thread deadlock
The metaslab_disable() call may block waiting for a txg sync.
Therefore it's important that vdev_rebuild_thread release the
SCL_CONFIG read lock it is holding before this call.  Failure
to do so can result in the txg_sync thread getting blocked
waiting for this lock which results in a deadlock.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewd-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11647
2021-02-24 10:01:00 -08:00
Brian Behlendorf 75a089ed34 Fix overly broad locking in spa_vdev_config_exit()
Calling vdev_free() only requires the we acquire the spa config
SCL_STATE_ALL locks, not the SCL_ALL locks.  In particular, we need
need to avoid taking the SCL_CONFIG lock (included in SCL_ALL) as a
writer since this can lead to a deadlock.  The txg_sync_thread() may
block in spa_txg_history_init_io() when taking the SCL_CONFIG lock
as a reading when it detects there's a pending writer.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11585
2021-02-24 10:00:21 -08:00
Tony Hutter 3ee4e6d8b7 vdev_id: Fix partition regular expression
Given a DM device name, the old vdev_id script would extract any text
after a 'p' as the partition number.  It then appends "-part" + the
partition number to the name, giving a by-vdev name like "L0-part5".

This works fine if the DM name is like 'dm-2p5', but doesn't work if
the DM name is a multipath name like "mpatha".  In those cases it
incorrectly matches the 'p' in "mpatha", giving by-vdev names like
"L0-partatha".

This patch fixes the issue by making the partition regex match stricter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11637
2021-02-24 09:58:46 -08:00
Brian Behlendorf 1dfc82a14e Linux: increase max nvlist_src size
On Linux increase the maximum allowed size of the src nvlist which
can be passed to the /dev/zfs ioctl.  Originally, this was set
to a maximum of KMALLOC_MAX_SIZE (4M) because it was kmalloc'd.
Since that time it's been converted to a vmalloc so that's no
longer a hard limit, and it's desirable for `zfs send/recv` to
allow larger nvlists so more snapshots can be sent at once.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6572
Closes #11638
2021-02-24 09:57:18 -08:00
Prakash Surya f01eaed455 Add upper bound for slop space calculation
This change modifies the behavior of how we determine how much slop
space to use in the pool, such that now it has an upper limit. The
default upper limit is 128G, but is configurable via a tunable.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #11023
2021-02-24 09:52:43 -08:00
Ryan Moeller 5156862960 Wrap bare EINVAL returns with SET_ERROR
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11636
2021-02-24 09:51:10 -08:00
Ryan Moeller 94fa1c3d96 Force symlink creation for zpool.d compat links
gmake install fails when zpool.d compat links already exist.

Force the symlinks to be recreated if already present.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11633
2021-02-24 09:49:59 -08:00
Cedric Maunoury b9c07ec71b send_iterate_snap : doall send without fromsnap
The behavior of a NULL fromsnap was inadvertently changed for a doall
send when the send/recv logic in libzfs was updated.  Restore the
previous behavior by correcting send_iterate_snap() to include all
the snapshots in the nvlist for this case. 

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cedric Maunoury <cedric.maunoury@gmail.com>
Closes #11608
2021-02-24 09:48:58 -08:00
Adam D. Moss 9312e0fd1e Fix error message when zfs module are already unloaded
Using zfs-sh -u on linux will fail with inaccurate message when the
zfs modules are already unloaded.  Deal with the case where a module 
is already unloaded; its USE_COUNT will be the empty string

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11627
2021-02-20 20:23:10 -08:00
fbynite 11f2e9a491 vdev_ops: don't try to call vdev_op_hold or vdev_op_rele when NULL
This prevents a panic after a SLOG add/removal on the root pool followed
by a zpool scrub.

When a SLOG is removed, a hole takes its place - the vdev_ops for a hole
is vdev_hole_ops, which defines the handler functions of vdev_op_hold
and vdev_op_rele as NULL.

This bug has been reported in illumos and FreeBSD, a different trigger
in the FreeBSD report though.

Credit for this patch goes to Patrick Mooney <pmooney@pfmooney.com>

Obtained from: illumos-gate commit: c65bd18728f34725
External-issue: https://www.illumos.org/issues/12981
External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252396
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.fx907@gmail.com>
Closes #11623
2021-02-20 20:19:20 -08:00
Tony Hutter 7e56b05058 Better zfs_get_enclosure_sysfs_path() enclosure support
A multpathed disk will have several 'underlying' paths to the disk.  For
example, multipath disk 'dm-0' may be made up of paths:
/dev/{sda,sdb,sdc,sdd}.  On many enclosures those underlying sysfs
paths will have a symlink back to their enclosure device entry
(like 'enclosure_device0/slot1').  This is used by the
statechange-led.sh script to set/clear the fault LED for a disk, and
by 'zpool status -c'.

However, on some enclosures, those underlying paths may not all have
symlinks back to the enclosure device.  Maybe only two out of four
of them might.

This patch updates zfs_get_enclosure_sysfs_path() to favor returning
paths that have symlinks back to their enclosure devices, rather
than just returning the first path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11617
2021-02-20 20:17:45 -08:00
Brian Atkinson c0801bf35a Cleaning up uio headers
Making uio_impl.h the common header interface between Linux and FreeBSD
so both OS's can share a common header file. This also helps reduce code
duplication for zfs_uio_t for each OS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11622
2021-02-20 20:16:50 -08:00
Christian Schwarz 52cb284f7b ztest: propagate -o to the zdb child process
I think this is the behavior that most users expect.

Future work: have a separate flag, e.g., -O, to specify separate
set_global_vars for the zdb child than for the ztest children.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602
2021-02-19 22:45:16 -08:00
Christian Schwarz 2801e68d1b ztest: fix -o by calling set_global_var in child processes
Without set_global_var() in the child processes the -o option provides
little use.

Before this change set_global_var() was called as a side-effect of
getopt processing which only happens for the parent ztest process.

This change limits the set of options that can be set and makes them
available to the child through ztest_shared_opts_t.

Future work: support arbitrary option count and length.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602
2021-02-19 22:45:10 -08:00
Christian Schwarz edc508ac0b libzpool: set_global_var: refactor to not modify 'arg'
Also fixes leak of the dlopen handle in the error case.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602
2021-02-19 22:45:04 -08:00
Christian Schwarz b5fffa1d29 libzpool: set_global_var: fix endianness handling (fixes zdb -o )
Without this patch I get the error

  Setting global variables is only supported on little-endian systems

when using `zdb -o` on my amd64 machine.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602
2021-02-19 22:44:05 -08:00
Ryan Moeller 64e0fe14ff Restore FreeBSD resource usage accounting
Add zfs_racct_* interfaces for platform-dependent read/write accounting.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11613
2021-02-19 22:34:33 -08:00
Don Brady 03e02e5b56 Checksum errors may not be counted
Fix regression seen in issue #11545 where checksum errors 
where not being counted or showing up in a zpool event.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #11609
2021-02-19 22:33:15 -08:00
Mark Johnston e7adccf7f5 FreeBSD: disable the use of hardware crypto offload drivers for now
First, the crypto request completion handler contains a bug in that it
fails to reset fs_done correctly after the request is completed.  This
is only a problem for asynchronous drivers.  Second, some hardware
drivers have input constraints which ZFS does not satisfy.  For
instance, ccp(4) apparently requires the AAD length for AES-GCM to be a
multiple of the cipher block size, and with qat(4) the AES-GCM AAD
length may not be longer than 240 bytes.  FreeBSD's generic crypto
framework doesn't have a mechanism to automatically fall back to a
software implementation if a hardware driver cannot process a request,
and ZFS does not tolerate such errors.

The plan is to implement such a fallback mechanism, but with FreeBSD
13.0 approaching we should simply disable the use hardware drivers for
now.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #11612
2021-02-18 15:51:20 -08:00
Andriy Gapon 778869fa13 Fix report_mount_progress never calling set_progress_header
That happens because of an off-by-one mistake.
share_mount_one_cb() calls report_mount_progress(current=sm_done) after
having incremented sm_done by one.  Then report_mount_progress()
increments the parameter again.  It appears that that logic became
obsolete after commit a10d50f999, parallel zfs mount.

On FreeBSD I observe that zfs mount -a -v prints, for example,
    (null): (201/248)
That happens because set_progress_header() is never called.

With this change the output becomes correct:
    Mounting ZFS filesystems: (209/248)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #11607
2021-02-18 13:53:05 -08:00
Ryan Libby bf156c966b Remove unused abd_alloc_scatter_offset_chunkcnt
Remove function that become unused after refactoring in
e2af2acce3.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11614
2021-02-17 21:39:13 -08:00
Colm 658fb8020f Add "compatibility" property for zpool feature sets
Property to allow sets of features to be specified; for compatibility
with specific versions / releases / external systems. Influences
the behavior of 'zpool upgrade' and 'zpool create'. Initial man
page changes and test cases included.

Brief synopsis:

zpool create -o compatibility=off|legacy|file[,file...] pool vdev...

compatibility = off : disable compatibility mode (enable all features)
compatibility = legacy : request that no features be enabled
compatibility = file[,file...] : read features from specified files.
Only features present in *all* files will be enabled on the
resulting pool. Filenames may be absolute, or relative to
/etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d (/etc
checked first).

Only affects zpool create, zpool upgrade and zpool status.

ABI changes in libzfs:

* New function "zpool_load_compat" to load and parse compat sets.
* Add "zpool_compat_status_t" typedef for compatibility parse status.
* Add ZPOOL_PROP_COMPATIBILITY to the pool properties enum
* Add ZPOOL_STATUS_COMPATIBILITY_ERR to the pool status enum

An initial set of base compatibility sets are included in
cmd/zpool/compatibility.d, and the Makefile for cmd/zpool is
modified to install these in $pkgdatadir/compatibility.d and to
create symbolic links to a reasonable set of aliases.

Reviewed-by: ericloewe
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #11468
2021-02-17 21:30:45 -08:00
Brian Behlendorf 35ec51796f FreeBSD: disable edonr in zfs_mod_supported_feature()
Rather than conditionally compiling out the edonr code for FreeBSD
update zfs_mod_supported_feature() to indicate this feature is
unsupported.  This ensures that all spa features are defined on
every platform, even if they are not supported.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11605 
Issue #11468
2021-02-17 08:14:51 -08:00
José Luis Salvador Rufo aef1830f93 Support uClibc for the tests compilations
There are two issues that don't allow ZFS to be compiled using uClibc.
`backtrace()`, and `program_invocation_short_name` as a `const`.
This patch adds uClibc to the conditionals in the same way there are
already for Glibc for `backtrace()`; and removes the external param
`program_invocation_short_name` because its only used here for the
whole project.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes #11600
2021-02-16 21:51:46 -08:00
Ryan Moeller 436ab35a53 Make inline ABD predicates compatible with C++
FreeBSD's zfsd fails to build after e2af2acce3 due to strict type
checking errors from the implicit conversion between bool and boolean_t
in the inline predicate definitions in abd.h.

Use conditionals to return the correct value type from these functions.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11592
2021-02-15 10:15:50 -08:00
Brian Behlendorf c1c31a835a Linux 5.11 compat: META
Increase the Linux-Maximum version in the META file to 5.11.
All of the required compatibility patches have been merged.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11586
2021-02-10 10:11:21 -08:00
Arshad Hussain 677cdf7865 vdev_id: Support daisy-chained JBODs in multipath mode
Within function sas_handler() userspace commands like
'/usr/sbin/multipath' have been replaced with sourcing
device details from within sysfs which reduced a
significant amount of overhead and processing time.
Multiple JBOD enclosures and their order are sourced
from the bsg driver (/sys/class/enclosure) to isolate
chassis top-level expanders, which are then dynamically
indexed based on host channel of the multipath subordinate
disk member device being processed. Additionally added a
"mixed" mode for slot identification for environments where
a ZFS server system may contain SAS disk slots where there
is no expander (direct connect to HBA) while an attached
external JBOD with an expander have different slot identifier
methods.

How Has This Been Tested?
~~~~~~~~~~~~~~~~~~~~~~~~~

Testing was performed on a AMD EPYC based dual-server
high-availability multipath environment with multiple
HBAs per ZFS server and four SAS JBODs. The two primary
JBODs were multipath/cross-connected between the two
ZFS-HA servers. The secondary JBODs were daisy-chained
off of the primary JBODs using aligned SAS expander
channels (JBOD-0 expanderA--->JBOD-1 expanderA,
          JBOD-0 expanderB--->JBOD-1 expanderB, etc).
Pools were created, exported and re-imported, imported
globally with 'zpool import -a -d /dev/disk/by-vdev'.
Low level udev debug outputs were traced to isolate
and resolve errors.

Result:
~~~~~~~

Initial testing of a previous version of this change
showed how reliance on userspace utilities like
'/usr/sbin/multipath' and '/usr/bin/lsscsi' were
exacerbated by increasing numbers of disks and JBODs.
With four 60-disk SAS JBODs and 240 disks the time to
process a udevadm trigger was 3 minutes 30 seconds
during which nearly all CPU cores were above 80%
utilization. By switching reliance on userspace
utilities to sysfs in this version, the udevadm
trigger processing time was reduced to 12.2 seconds
and negligible CPU load.

This patch also fixes few shellcheck complains.

Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Jeff Johnson <jeff.johnson@aeoncomputing.com>
Signed-off-by: Jeff Johnson <jeff.johnson@aeoncomputing.com>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #11526
2021-02-09 13:04:09 -08:00
khng300 fc273894d2 Rename zfs_inode_update to zfs_znode_update_vfs
zfs_znode_update_vfs is a more platform-agnostic name than
zfs_inode_update. Besides that, the function's prototype is moved to
include/sys/zfs_znode.h as the function is also used in common code.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ka Ho Ng <khng300@gmail.com>
Sponsored by: The FreeBSD Foundation
Closes #11580
2021-02-09 11:17:29 -08:00
Kleber Tarcísio 4f22619ae3 Add an assert to clarify code
The first time through the loop prevdb and prevhdl are NULL.  They 
are then both set, but only prevdb is checked.  Add an ASSERT to 
make it clear that prevhdl must be set when prevdb is.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kleber <klebertarcisio@yahoo.com.br>
Closes #10754
Closes #11575
2021-02-09 11:14:59 -08:00
Antonio Russo f8ce8aed0c Set file mode during zfs_write
3d40b65 refactored zfs_vnops.c, which shared much code verbatim between
Linux and BSD.  After a successful write, the suid/sgid bits are reset,
and the mode to be written is stored in newmode.  On Linux, this was
propagated to both the in-memory inode and znode, which is then updated
with sa_update.

3d40b65 accidentally removed the initialization of newmode, which
happened to occur on the same line as the inode update (which has been
moved out of the function).

The uninitialized newmode can be saved to disk, leading to a crash on
stat() of that file, in addition to a merely incorrect file mode.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11474 
Closes #11576
2021-02-08 09:15:05 -08:00
наб 7c64ee9e77 zfs-import-{cache,scan}: change condition to FileNotEmpty
When all pools are exported ZFS will generate an empty cache file.
This will cause the import service to fail, which is sub-optimal, 
since this means that dracut fails, and it necessary to run 
`zpool import -a` to boot, delete the file, and regenerate+reinstall 
the initrd.

This resolves the issue by treating an zero-length cache files the
same as a missing cache file.  This aligns the behavior with that
of the `zpool` command itself.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11568
2021-02-05 11:25:22 -08:00
nssrikanth 32366649d3 Fixed issue with processing of EC_dev_remove event
The pool guid and vdev guid received by zfs_agent_post_event(),
which calls zfs_retire_recv(), are normally non-zero.  However,
later in this same method they may be unconditionally reset to
zero by the code which is intended to handle  multipath, spare
and l2arc vdevs.  This will result in the EC_dev_remove not
being handled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>\
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Closes #11564
2021-02-05 08:30:50 -08:00
Brian Behlendorf d66f017c17 zfs-list.8: clarify listing snapshots
Clarify how to include snapshots in the `zpool list` output by
referencing the full name of the `listsnapshots` pool property,
and the `zpool list -t snapshot` option.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11562
Closes #11565
2021-02-04 09:56:28 -08:00
Christian Schwarz 84268b099b Document monotonicity of dmu_tx_assign() and txg_hold_open()
Expand the comments to make it clear exactly what is guaranteed
by dmu_tx_assign() and txg_hold_open().  Additionally, update
the comment which refers to txg_exit() when it should reference
txg_rele_to_sync().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11521
2021-02-02 10:11:37 -08:00
George Melikov 9eee7fce3b zts-report.py: ignore some skipped tests in Github CI
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554
2021-02-02 09:37:23 -08:00
George Melikov aa743acc79 CI: add ubuntu-* functional tests runner
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554
2021-02-02 09:37:23 -08:00
George Melikov 1051499aef CI: rename zfs-tests workflow
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554
2021-02-02 09:37:11 -08:00
Brian Behlendorf 96a6629872 Remove unused iov_iter_init_compat() wrapper
This compatibility code is no longer needed.  For it a while
iov_iter_init_compat() was used by zfs_uio_prefaultpages() but
this code should have been dropped as part of commit 83b91ae1.
Take care of that oversight and remove it.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11543
2021-01-30 10:06:14 -08:00
Matthew Ahrens 2d4bbd14fc The abd child/parent relationship does not need to be tracked
ABD's currently track their parent/child relationship.  This applies to
`abd_get_offset()` and `abd_borrow_buf()`.  However, nothing depends on
knowing this relationship, it's only used for consistency checks to
verify that we are not destroying an ABD that's still in use.  When we
are creating/destroying ABD's frequently, the performance impact of
maintaining these data structures (in particular the atomic
increment/decrement operations) can be measurable.

This commit removes this verification code on production builds, but
keeps it when ZFS_DEBUG is set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11535
2021-01-30 10:04:42 -08:00
nssrikanth cf0a6dd3ed Added extra check to replace Faulted VDEV with Distributed Spare
In ZED zfs_retire agent added a check to handle Distributed Spare
replacement for Faulted VDEV also.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Mark Maybee <mark.maybee@hpe.com>
Closes #11354 
Closes #11355
2021-01-28 17:00:26 -08:00
Brian Atkinson 2993698eb3 Fixing gang ABD when adding another gang
I originally applied a fix in #11539 to fix a parent's child references
when a gang ABD is free'd. However, I did not take into account
abd_gang_add_gang(). We still need to make sure to update the child
references in this function as well. In order to resolve this I removed
decreasing the gang ABD's size in abd_free_gang() as well as moved back
the original placeent of zfs_refcount_remove_many() in abd_free().

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11542
2021-01-28 16:54:12 -08:00
George Melikov 9f8c7e6a76 ZTS: add userspace_send_encrypted.ksh to Makefile
All tests need to be included in the Makefiles.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11541
2021-01-28 13:39:38 -08:00
Matthew Ahrens f8c0d7e1f6 fix abd_nr_pages_off for gang abd
`__vdev_disk_physio()` uses `abd_nr_pages_off()` to allocate a bio with
a sufficient number of iovec's to process this zio (i.e.
`nr_iovecs`/`bi_max_vecs`).  If there are not enough iovec's in the bio,
then additional bio's will be allocated.  However, this is a sub-optimal
code path.  In particular, it requires several abd calls (to
`abd_nr_pages_off()` and `abd_bio_map_off()`) which will have to walk
the constituents of the ABD (the pages or the gang children) because
they are looking for offsets > 0.

For gang ABD's, `abd_nr_pages_off()` returns the number of iovec's
needed for the first constituent, rather than the sum of all
constituents (within the requested range).  This always under-estimates
the required number of iovec's, which causes us to always need several
bio's.  The end result is that `__vdev_disk_physio()` is usually O(n^2)
for gang ABD's (and occasionally O(n^3), when more than 16 bio's are
needed).

This commit fixes `abd_nr_pages_off()`'s handling of gang ABD's, to
correctly determine how many iovec's are needed, by adding up the number
of iovec's for each of the gang children in the requested range.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11536
2021-01-28 09:28:20 -08:00
George Amanakis 0ae184a6ba Avoid updating the L2ARC device header unnecessarily
If we do not write any buffers to the cache device and the evict hand
has not advanced do not update the cache device header.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11522 
Closes #11537
2021-01-28 09:20:03 -08:00
Brian Atkinson 416015ef54 Removing ABD Parent Child Reference Before Freeing ABD
Moving the call to zfs_refcount_remove_many() in abd_free() to be called
before any of the ABD free variants are called. This is necessary
because abd_free_gang() adjusts the abd_size for the gang ABD. If the
parent's child references are removed after free'ing the gang ABD the
refcount is not adjusted correctly for the parent's children.

I also removed some stray abd_put() in comments and changed
abd_free_gang_abd() -> abd_free_gang().

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11539
2021-01-28 09:15:17 -08:00
Allan Jude 393e69241e Add zdb -r <dataset> <object-id | file> <output>
While you can use zdb -R poolname vdev:offset:[<lsize>/]<psize>[:flags] 
to extract individual DVAs from a vdev, it would be handy for be able 
copy an entire file out of the pool.

Given a file or object number, add support to copy the contents to a 
file. Useful for debugging and recovery.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11027
2021-01-27 21:36:01 -08:00
Mark Maybee b2c5904a78 Revert special case code from pre-hashtable nvlist era
Before a hash table was added on top of the nvlist code, there were
cases where the nvlist allocation was changed from fnvlist_alloc()
to nvlist_alloc() to avoid expensive NV_UNIQUE_NAME checks. Now
this is no longer necessary. These changes should be reverted to be
consistent with other code. There are some cases where this change
will also reduce the number of iterations.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Maybee <mark.maybee@delphix.com>
Closes #11464
2021-01-27 21:31:51 -08:00
Paul Dagnelie 2921ad6cba Fix zrele race in zrele_async that can cause hang
There is a race condition in zfs_zrele_async when we are checking if 
we would be the one to evict an inode. This can lead to a txg sync 
deadlock.

Instead of calling into iput directly, we attempt to perform the atomic 
decrement ourselves, unless that would set the i_count value to zero. 
In that case, we dispatch a call to iput to run later, to prevent a 
deadlock from occurring.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11527 
Closes #11530
2021-01-27 21:29:58 -08:00
George Melikov b8e6401b79 ZTS: pool_state test check for pool existence in cleanup
If there is no scsi_debug module, then this test
must be skipped, in this case cleanup routine should
be prepared for absent pool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11534
2021-01-27 17:33:30 -08:00
Alan Somers 6b2e7203ae Fix a resource leak in uu_avl_pool_destroy
Need to destroy the pthread mutex created in uu_avl_pool_create.

https://svnweb.freebsd.org/base?view=revision&revision=262912

Obtained from: FreeBSD
Sponsored by:	Spectra Logic Corporation
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11528
2021-01-26 19:39:28 -08:00
Alan Somers cf0977ad72 Parallelize vdev_validate
The runtime of vdev_validate is dominated by the disk accesses in
vdev_label_read_config.  Speed it up by validating all vdevs in
parallel using a taskq.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470
2021-01-26 19:36:51 -08:00
Alan Somers 67874d5487 Read all disk labels concurrently in vdev_label_read_config
This is similar to what we already do in vdev_geom_read_config.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470
2021-01-26 19:36:02 -08:00
Alan Somers a0e01997ec Parallelize vdev_load
metaslab_init is the slowest part of importing a mature pool, and it
must be repeated hundreds of times for each top-level vdev.  But its
speed is dominated by a few serialized disk accesses.  That can lead to
import times of > 1 hour for pools with many top-level vdevs on spinny
disks.

Speed up the import by using a taskqueue to parallelize vdev_load across
all top-level vdevs.

This also requires adding mutex protection to
metaslab_class_t.mc_historgram.  The mc_histogram fields were
unprotected when that code was first written in "Illumos 4976-4984 -
metaslab improvements" (OpenZFS
f3a7f6610f).  The lock wasn't added until
3dfb57a35e, though it's unclear exactly
which fields it's supposed to protect.  In any case, it wasn't until
vdev_load was parallelized that any code attempted concurrent access to
those fields.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470
2021-01-26 19:35:59 -08:00
Alan Somers dfb44c500e Fix a man page link in zfs-program.8
zfs-program.8 has an orphan link, fix it.

https://svnweb.freebsd.org/base?view=revision&revision=360080

Obtained from: FreeBSD
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11529
2021-01-26 16:17:11 -08:00
Brian Behlendorf 0e6c493fec cppcheck: integrete cppcheck
In order for cppcheck to perform a proper analysis it needs to be
aware of how the sources are compiled (source files, include
paths/files, extra defines, etc).  All the needed information is
available from the Makefiles and can be leveraged with a generic
cppcheck Makefile target.  So let's add one.

Additional minor changes:

* Removing the cppcheck-suppressions.txt file.  With cppcheck 2.3
  and these changes it appears to no longer be needed.  Some inline
  suppressions were also removed since they appear not to be
  needed.  We can add them back if it turns out they're needed
  for older versions of cppcheck.

* Added the ax_count_cpus m4 macro to detect at configure time how
  many processors are available in order to run multiple cppcheck
  jobs.  This value is also now used as a replacement for nproc
  when executing the kernel interface checks.

* "PHONY =" line moved in to the Rules.am file which is included
  at the top of all Makefile.am's.  This is just convenient becase
  it allows us to use the += syntax to add phony targets.

* One upside of this integration worth mentioning is it now allows
  `make cppcheck` to be run in any directory to check that subtree.

* For the moment, cppcheck is not run against the FreeBSD specific
  kernel sources.  The cppcheck-FreeBSD target will need to be
  implemented and testing on FreeBSD to support this.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-01-26 16:12:26 -08:00
Brian Behlendorf a06ba74a1e cppcheck: return value always 0
Identical condition and return expression 'rc', return value is
always 0.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-01-26 16:12:18 -08:00
Brian Behlendorf 2cdd75bed7 cppcheck: remove redundant ASSERTs
The ASSERT that the passed pointer isn't NULL appears after the
pointer has already been dereferenced.  Remove the redundant check.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-01-26 16:12:10 -08:00
Brian Behlendorf 6fc1ce0723 cppcheck: resolve double free
The double free reported for the realloc() failure branch is a
false positive.  It should be resolved in cppcheck 2.4 but for
the benefit of older versions we supress the warning.

    https://trac.cppcheck.net/ticket/9292

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-01-26 16:12:02 -08:00
Brian Behlendorf 7454d2bb8d cppcheck: zpool_main.c possible null pointer dereference
Explicitly check for NULL to satisfy cppcheck that "val" can never
be NULL when passed to printf().  This looks like a false positive
since is_blank_str() can never take the false conditional branch
when passed a NULL.  But there's no harm in adding the extra check.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-01-26 16:11:46 -08:00
Matthew Ahrens 62d4287f27 RAIDZ2/3 fails to heal silently corrupted parity w/2+ bad disks
When scrubbing, (non-sequential) resilvering, or correcting a checksum
error using RAIDZ parity, ZFS should heal any incorrect RAIDZ parity by
overwriting it.  For example, if P disks are silently corrupted (P being
the number of failures tolerated; e.g. RAIDZ2 has P=2), `zpool scrub`
should detect and heal all the bad state on these disks, including
parity.  This way if there is a subsequent failure we are fully
protected.

With RAIDZ2 or RAIDZ3, a block can have silent damage to a parity
sector, and also damage (silent or known) to a data sector.  In this
case the parity should be healed but it is not.

The problem can be noticed by scrubbing the pool twice.  Assuming there
was no damage concurrent with the scrubs, the first scrub should fix all
silent damage, and the second scrub should be "clean" (`zpool status`
should not report checksum errors on any disks).  If the bug is
encountered, then the second scrub will repair the silently-damaged
parity that the first scrub failed to repair, and these checksum errors
will be reported after the second scrub.  Since the first scrub repaired
all the damaged data, the bug can not be encountered during the second
scrub, so subsequent scrubs (more than two) are not necessary.

The root cause of the problem is some code that was inadvertently added
to `raidz_parity_verify()` by the DRAID changes.  The incorrect code
causes the parity healing to be aborted if there is damaged data
(`rc_error != 0`) or the data disk is not present (`!rc_tried`).  These
checks are not necessary, because we only call `raidz_parity_verify()`
if we have the correct data (which may have been reconstructed using
parity, and which was verified by the checksum).

This commit fixes the problem by removing the incorrect checks in
`raidz_parity_verify()`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11489 
Closes #11510
2021-01-26 16:05:05 -08:00
Will Andrews d7265b3309 ZTS: zpool_export test improvements
- refactor cleanup routines into common kshlib zpool_export_cleanup func
- don't require physical disks to test, just use files

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Will Andrews <will@firepipe.net>
Closes #11518
2021-01-26 13:14:04 -08:00
Lorenz Hüdepohl caedada66e dracut: Fix race condition between load-key and import
zfs-load-key.sh is called by the dracut-pre-mount.service unit which has
no explicit 'After' dependency on zfs-import.target. That way it can be
that the pool has not yet been imported and the zfs-load-key.sh finishes
without ever seeing the relevant pool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Lorenz Hüdepohl <dev@stellardeath.org>
Closes #11500
2021-01-26 12:14:22 -08:00
Will Andrews f4f50a7048 spa_export_common: refactor common exit points
Create a common exit point for spa_export_common (a very long 
function), which avoids missing steps on failure.  This work
is helpful for the planned forced pool export changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Will Andrews <will@firepipe.net>
Closes #11514
2021-01-25 15:04:11 -08:00
Will Andrews 35ac0ed1fd ZTS: improve output clarity of check_prop_source
Instead of just failing, indicate the expected and actual value and
source as a NOTE.  Tests using this failed in an earlier version of
the changeset and this information helped find the cause.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Will Andrews <will@firepipe.net>
Closes #11517
2021-01-25 14:39:58 -08:00
Will Andrews a57acbb627 ZTS: remove duplicate check_prop_source from zfs_receive
There is an identical definition in zfs_set_common.kshlib already.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Will Andrews <will@firepipe.net>
Closes #11516
2021-01-25 14:38:19 -08:00
Will Andrews de1a6cc387 logapi: cat output file instead of printing
This avoids globbing together multiple lines in the log, if you happen
to specify LOGAPI_DEBUG because you want to see it.

Signed-off-by:	Will Andrews <will@firepipe.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11515
2021-01-25 14:33:57 -08:00
Matthew Macy a4134da2b2 spl-taskq: Make sure thread tsd hash entry is cleared
Like any other thread created by thread_create() we need to call
thread_exit() to properly clean it up.  In particular, this ensures the
tsd hash for the thread is cleared.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11512
2021-01-25 11:18:28 -08:00
Alan Somers fd95af8dd4 Speed up "zpool import" in the presence of many zvols
By default, FreeBSD does not allow zpools to be backed by zvols (that
can be changed with the "vfs.zfs.vol.recursive" sysctl). When that
sysctl is set to 0, the kernel does not attempt to read zvols when
looking for vdevs. But the zpool command still does. This change brings
the zpool command into line with the kernel's behavior. It speeds "zpool
import" when an already imported pool has many zvols, or a zvol with
many snapshots.

https://svnweb.freebsd.org/base?view=revision&revision=357235
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241083
https://reviews.freebsd.org/D22077

Obtained from: FreeBSD
Reported by: Martin Birgmeier <d8zNeCFG@aon.at>
Sponsored by: Axcient
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11502
2021-01-24 16:02:45 -08:00
наб 8887760aef zfsprops.8: fix mispluralisation in "Default values is"
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11509
2021-01-24 15:57:51 -08:00
Ryan Moeller a4acc47e4b ZTS: Use swapctl to list swap devices on FreeBSD
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11503
2021-01-24 15:56:59 -08:00
Arshad Hussain e2870fb24a vdev_id: Add error message when $CONFIG is missing
It was observed that vdev_id exists silently when
the $CONFIG file is missing.

This patch adds error message in case vdev_id is
called without default $CONFIG or '-c'. This makes
end user observe the exit message more easily.

Before Patch:
~~~~~~~~~~~~~
$ ./cmd/vdev_id/vdev_id
$

After Patch:
~~~~~~~~~~~~
$ ./cmd/vdev_id/vdev_id
Error: Config file "/etc/zfs/vdev_id.conf" not found
$

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #11498
2021-01-23 15:52:29 -08:00
Colm 4a90d4d6fc Fix two minor lint errors (cppcheck)
Fix two minor errors reported by cppcheck:

In module/zfs/abd.c (abd_get_offset_impl), add non-NULL
assertion to prevent NULL dereference warning.

In module/zfs/arc.c (l2arc_write_buffers), change 'try'
variable to 'pass' to avoid C++ reserved word.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #11507
2021-01-23 15:49:32 -08:00
Alexander Motin 5aa69a57da Relax special_small_blocks assertion.
Follow up for commit 624222a, value asserted <= SPA_OLD_MAXBLOCKSIZE
instead of SPA_MAXBLOCKSIZE as it should be after the previous change.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11501
2021-01-23 15:45:27 -08:00
Matthew Macy 716408f560 Add basic io_uring test
Provide a basic test coverage for io_uring I/O.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11497
2021-01-23 15:42:42 -08:00
Ryan Moeller 1c94345103 FreeBSD: upstream changes to VFS interface
Set VIRF_MOUNTPOINT flag on snapshot mountpoint.

Authored-by: Mateusz Guzik <mjg@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11458
2021-01-23 15:40:43 -08:00
Matt Macy 0e9bcd5d4f FreeBSD: fix HEAD build, conditionally remove FDSYNC defines
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11458
2021-01-23 15:39:55 -08:00
Brian Behlendorf 76e1f78d4b Only add supported features during pool creation
When creating a pool only features supported by both user and
kernel space should be enabled.  Furthermore, improve the error
messages when attempting to create, or add, a dRAID vdev when
the dRAID feature is not supported by the kernel modules.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11492
2021-01-22 09:47:06 -08:00
Matthew Ahrens aa755b3549 Set aside a metaslab for ZIL blocks
Mixing ZIL and normal allocations has several problems:

1. The ZIL allocations are allocated, written to disk, and then a few
seconds later freed.  This leaves behind holes (free segments) where the
ZIL blocks used to be, which increases fragmentation, which negatively
impacts performance.

2. When under moderate load, ZIL allocations are of 128KB.  If the pool
is fairly fragmented, there may not be many free chunks of that size.
This causes ZFS to load more metaslabs to locate free segments of 128KB
or more.  The loading happens synchronously (from zil_commit()), and can
take around a second even if the metaslab's spacemap is cached in the
ARC.  All concurrent synchronous operations on this filesystem must wait
while the metaslab is loading.  This can cause a significant performance
impact.

3. If the pool is very fragmented, there may be zero free chunks of
128KB or more.  In this case, the ZIL falls back to txg_wait_synced(),
which has an enormous performance impact.

These problems can be eliminated by using a dedicated log device
("slog"), even one with the same performance characteristics as the
normal devices.

This change sets aside one metaslab from each top-level vdev that is
preferentially used for ZIL allocations (vdev_log_mg,
spa_embedded_log_class).  From an allocation perspective, this is
similar to having a dedicated log device, and it eliminates the
above-mentioned performance problems.

Log (ZIL) blocks can be allocated from the following locations.  Each
one is tried in order until the allocation succeeds:
1. dedicated log vdevs, aka "slog" (spa_log_class)
2. embedded slog metaslabs (spa_embedded_log_class)
3. other metaslabs in normal vdevs (spa_normal_class)

The space required for the embedded slog metaslabs is usually between
0.5% and 1.0% of the pool, and comes out of the existing 3.2% of "slop"
space that is not available for user data.

On an all-ssd system with 4TB storage, 87% fragmentation, 60% capacity,
and recordsize=8k, testing shows a ~50% performance increase on random
8k sync writes.  On even more fragmented systems (which hit problem #3
above and call txg_wait_synced()), the performance improvement can be
arbitrarily large (>100x).

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11389
2021-01-21 15:12:54 -08:00
Lorenz Hüdepohl 984362a71e dracut: Support /usr/bin as 'systemctl' path
On openSUSE the initrd has systemctl in /usr/bin, check this path as
well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Lorenz Hüdepohl <dev@stellardeath.org>
Closes #11487
2021-01-21 12:59:24 -08:00
Antonio Russo 0ae733c7a4 Install zgenhostid to sbindir
zgenhostid(8) is used to modify or create /etc/hostid.  This
administrative tool is currently installed to bindir.  System utilities
are typically placed in sbin.

Modify the installation directory for zgenhostid.  Additionally, track
this change in its use in dracut and the rpm installation.

Authored-by: наб <nabijaczleweli@nabijaczleweli.xyz>
Authored-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11485
2021-01-21 12:58:24 -08:00
Alan Somers 2d8f72d76c zpool: speed up importing large pools (#11469)
The ZFS_IOC_POOL_TRYIMPORT ioctl returns an nvlist from the kernel to a
preallocated buffer in  userland.  Userland must guess how large the
buffer should be.  If it undersizes it, it must reallocate and try
again.  That can cost a lot of time for large pools.

OpenZFS commit 28b40c8a6e set the guess at "zc.zc_nvlist_conf_size * 4"
without explanation.  On my system, that is too small.  From experiment,
x 32 is a better multiplier.  But I don't know how to calculate it
theoretically.

Sponsored by:	Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #11469
2021-01-21 12:55:54 -08:00
Alan Somers e50b5217e7 libzutil: optimize zpool_read_label with AIO
Read all labels in parallel instead of sequentially.

Originally committed as
https://cgit.freebsd.org/src/commit/?id=b49e9abcf44cafaf5cfad7029c9a6adbb28346e8

Obtained from: FreeBSD
Sponsored by: Spectra Logic, Axcient
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11467
2021-01-21 11:24:35 -08:00
Alan Somers ec40ce8405 libzutil: don't read extraneous data in zpool_read_label
zpool_read_label doesn't need the full labels including uberblocks.  It
only needs the vdev_phys_t.  This reduces by half the amount of data
read to check for a label, speeding up "zpool import", "zpool
labelclear", etc.

Originally committed as
https://cgit.freebsd.org/src/commit/?id=63f8025d6acab1b334373ddd33f940a69b3b54cc

Obtained from: FreeBSD
Sponsored by: Spectra Logic Corp, Axcient
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11467
2021-01-21 11:18:32 -08:00
Brian Behlendorf 83b91ae1a4 Linux 5.10 compat: restore custom uio_prefaultpages()
As part of commit 1c2358c1 the custom uio_prefaultpages() code
was removed in favor of using the generic kernel provided
iov_iter_fault_in_readable() interface.  Unfortunately, it
turns out that up until the Linux 4.7 kernel the function would
only ever fault in the first iovec of the iov_iter.  The result
being uiomove_iov() may hang waiting for the page.

This commit effectively restores the custom uio_prefaultpages()
pages code for Linux 4.9 and earlier kernels which contain the
troublesome version of iov_iter_fault_in_readable().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11463 
Closes #11484
2021-01-21 10:43:39 -08:00
Brian Atkinson d0cd9a5cc6 Extending FreeBSD UIO Struct
In FreeBSD the struct uio was just a typedef to uio_t. In order to
extend this struct, outside of the definition for the struct uio, the
struct uio has been embedded inside of a uio_t struct.

Also renamed all the uio_* interfaces to be zfs_uio_* to make it clear
this is a ZFS interface.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11438
2021-01-20 21:27:30 -08:00
Matthew Ahrens e2af2acce3 allow callers to allocate and provide the abd_t struct
The `abd_get_offset_*()` routines create an abd_t that references
another abd_t, and doesn't allocate any pages/buffers of its own.  In
some workloads, these routines may be called frequently, to create many
abd_t's representing small pieces of a single large abd_t.  In
particular, the upcoming RAIDZ Expansion project makes heavy use of
these routines.

This commit adds the ability for the caller to allocate and provide the
abd_t struct to a variant of `abd_get_offset_*()`.  This eliminates the
cost of allocating the abd_t and performing the accounting associated
with it (`abdstat_struct_size`).  The RAIDZ/DRAID code uses this for
the `rc_abd`, which references the zio's abd.  The upcoming RAIDZ
Expansion project will leverage this infrastructure to increase
performance of reads post-expansion by around 50%.

Additionally, some of the interfaces around creating and destroying
abd_t's are cleaned up.  Most significantly, the distinction between
`abd_put()` and `abd_free()` is eliminated; all types of abd_t's are
now disposed of with `abd_free()`.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Issue #8853 
Closes #11439
2021-01-20 11:24:37 -08:00
sterlingjensen 03f036cbcc Re-apply path sanitizer, as mount(8) still mangles it
Prior to util-linux 2.36.2, if a file or directory in the
current working directory was named 'dataset' then mount(8)
would prepend the current working directory to the dataset.

Eventually, we should be able to drop this workaround.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11295 
Closes #11462
2021-01-19 11:57:31 -08:00
Antonio Russo f8c4d63a26 ZTS: avoid piping to special devices
As described in #11445, the kernel interface kernel_{read,write} no
longer act on special devices.  In the ZTS, zfs send and receive are
tested by piping to these devices, leading to spurious failures (for
positive tests) and may mask errors (for negative tests).

Until a more permanent mechanism to address this deficiency is
developed, clean up the output from the ZTS by avoiding directly piping
to or from /dev/null and /dev/zero.

For /dev/zero input, simply use a pipe: `cat </dev/zero |` .

However, for /dev/null output, the shell semantics for pipe failures
means that zfs send error codes will be masked by the successful
`| cat >/dev/null` command execution.  In that case, use a temporary
file under $TEST_BASE_DIR for output in favor.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11478
2021-01-19 11:53:35 -08:00
Ryan Moeller 60a2434b29 libzfs_sendrecv: Use fnv* to verify nvlist/nvpair*
Use verified variants of nvlist/nvpair functions where applicable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11460
2021-01-14 09:53:09 -08:00
Ryan Moeller a9eaae065b ztest: Clean up use of ASSERT and VERIFY
Try to use more appropriate ASSERT and VERIFY variants in ztest.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11454
2021-01-12 17:21:01 -08:00
Antonio Russo 8752f7e320 ZTS: avoid race to unmount in zfs_rollback_001
The zfs_rollback_001 test modifies files in a temporary, test dataset
repeatedly.  Before each iteration, any preexisting dataset is removed,
after unmounted with umount -f, if necessary.

Add a short delay after the forced unmount, avoiding a race that can
prevent zfs destroy from succeeding, leading to a test failure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11451
2021-01-12 17:20:02 -08:00
Ryan Moeller bea3fc71ae ztest: Use fnvlist_* instead of VERIFYing nvlist_*
Simplify ztest by using fnvlist functions to verify success.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11441
2021-01-11 09:30:33 -08:00
Matthew Ahrens 2ac90457f5 record ioctl elapsed time in zpool history
Each zfs ioctl that changes on-disk state (e.g. set property, create
snapshot, destroy filesystem) is recorded in the zpool history, and is
printed by `zpool history -i`.

For performance diagnostic purposes, it would be useful to know how long
each of these ioctls took to run.  This commit adds that functionality,
with a new `ZPOOL_HIST_ELAPSED_NS` member of the history nvlist.

Additionally, the time recorded in this history log is currently the
time that the history record is written to disk.  But in many cases (CLI
args logging and ioctl logging), this happens asynchronously,
potentially many seconds after the operation completed.  This commit
changes the timestamp to reflect when the history event was created,
rather than when it was written to disk.

Reviewed-by: Mark Maybee <mmaybee@cray.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11440
2021-01-11 09:29:25 -08:00
Matthew Ahrens dc303dcf5b assertion failed in arc_wait_for_eviction()
If the system is very low on memory (specifically,
`arc_free_memory() < arc_sys_free/2`, i.e. less than 1/16th of RAM
free), `arc_evict_state_impl()` will defer wakups.  In this case, the
arc_evict_waiter_t's remain on the list, even though `arc_evict_count`
has been incremented past their `aew_count`.

The problem is that `arc_wait_for_eviction()` assumes that if there are
waiters on the list, the count they are waiting for has not yet been
reached.  However, the deferred wakeups may violate this, causing
`ASSERT(last->aew_count > arc_evict_count)` to fail.

This commit resolves the issue by having new waiters use the greater of
`arc_evict_count` and the last `aew_count`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11285
Closes #11397
2021-01-07 20:06:32 -08:00
Matthew Macy f11b09dec3 FreeBSD: minor_t needs to be signed so that -1 is recognized as such
zfsdev_close sets zs_minor to -1 to avoid duplicate calls to
destroy. This doesn't mix well with the current u_int used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11437
2021-01-07 10:41:27 -08:00
Brian Behlendorf 06346cc5b5 Autoconf 2.70 compatibility
Several m4 macros have been retired in autoconf 2.70.  Update the
the build system to use the new macros provided to replace them.

* Replaced AC_HELP_STRING with AS_HELP_STRING.

* Replaced AC_TRY_COMPILE with AC_COMPILE_IFELSE/AC_LANG_PROGRAM.

* Replaced AC_CANONICAL_SYSTEM with AC_CANONICAL_TARGET

* Replaced AC_PROG_LIBTOOL with LT_INIT

* $CPP is not defined in ZFS_AC_KERNEL and really shouldn't be
  directly used like this.  Replace it with an $AWK command
  to extract the kernel source version.

Reviewed-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11413 
Closes #11419
2021-01-02 16:55:55 -08:00
Toomas Soome 4ba8c6b584 zfs_mount_all_mountpoints: cleanup_all should leave pool root mounted
if pool root is not mounted, then zpool umount in next test will leave
dataset mountpoint directory around and next zfs mount -a will fail
with error: cannot mount '/testpool': directory is not empty

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11417
2021-01-02 16:54:53 -08:00
Konstantin Khorenko 064c2cf40e VZ 7 kernel compat: introduce ITER-enabled .direct_IO() via IOVECs
Virtuozzo 7 kernels starting 3.10.0-1127.18.2.vz7.163.46
have the following configuration:

  * no HAVE_VFS_RW_ITERATE
  * HAVE_VFS_DIRECT_IO_ITER_RW_OFFSET

=> let's add implementation of zpl_direct_IO() via
zpl_aio_{read,write}() in this case.

https://bugs.openvz.org/browse/OVZ-7243

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Closes #11410 
Closes #11411
2020-12-30 14:18:29 -08:00
Matthew Ahrens 808e681238 Memory leak in zdb:import_checkpointed_state()
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396
2020-12-28 10:05:55 -08:00
Matthew Ahrens f014700a37 Memory leak in ztest_dmu_objset_own()
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396
2020-12-28 10:05:44 -08:00
Matthew Ahrens a0316ad268 Memory leak in ztest_vdev_attach_detach()
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396
2020-12-28 10:05:32 -08:00
Matthew Ahrens b6722b871b nvlist leaked in zpool_find_config()
In `zpool_find_config()`, the `pools` nvlist is leaked.  Part of it (a
sub-nvlist) is returned in `*configp`, but the callers also leak that.

Additionally, in `zdb.c:main()`, the `searchdirs` is leaked.

The leaks were detected by ASAN (`configure --enable-asan`).

This commit resolves the leaks.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396
2020-12-28 10:05:31 -08:00
Toomas Soome 40ab927ae8 implicit conversion from 'boolean_t' to 'ds_hold_flags_t'
Build error on illumos with gcc 10 did reveal:

In function 'dmu_objset_refresh_ownership':
../../common/fs/zfs/dmu_objset.c:857:25: error: implicit conversion
from 'boolean_t' to 'ds_hold_flags_t' {aka 'enum ds_hold_flags'}
[-Werror=enum-conversion]
      857 |  dsl_dataset_disown(ds, decrypt, tag);
          |                         ^~~~~~~
cc1: all warnings being treated as errors

libzfs_input_check.c: In function 'zfs_ioc_input_tests':
libzfs_input_check.c:754:28: error: implicit conversion from
'enum dmu_objset_type' to 'enum lzc_dataset_type'
[-Werror=enum-conversion]
  754 |  err = lzc_create(dataset, DMU_OST_ZFS, NULL, NULL, 0);
      |                            ^~~~~~~~~~~
cc1: all warnings being treated as errors

The same issue is present in openzfs, and also the same issue about
ds_hold_flags_t, which currently defines exactly one valid value.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11406
2020-12-27 16:31:02 -08:00
Brian Behlendorf c449d4b06d Linux 5.11 compat: blk_{un}register_region()
As of 5.11 the blk_register_region() and blk_unregister_region()
functions have been retired. This isn't a problem since add_disk()
has implicitly allocated minor numbers for a very long time.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:46 -08:00
Brian Behlendorf 19697e4545 Linux 5.11 compat: revalidate_disk_size()
Both revalidate_disk_size() and revalidate_disk() have been removed.
Functionally this isn't a problem because we only relied on these
functions to call zvol_revalidate_disk() for us and to perform any
additional handling which might be needed for that kernel version.
When neither are available we know there's no additional handling
needed and we can directly call zvol_revalidate_disk().

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:40 -08:00
Brian Behlendorf 72ba4b2a4c Linux 5.11 compat: bdev_whole()
The bd_contains member was removed from the block_device structure.
Callers needing to determine if a vdev is a whole block device should
use the new bdev_whole() wrapper.  For older kernels we provide our
own bdev_whole() wrapper which relies on bd_contains for compatibility.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:33 -08:00
Brian Behlendorf a970f0594e Linux 5.11 compat: bio_start_io_acct() / bio_end_io_acct()
The generic IO accounting functions have been removed in favor of the
bio_start_io_acct() and bio_end_io_acct() functions which provide a
better interface.  These new functions were introduced in the 5.8
kernels but it wasn't until the 5.11 kernel that the previous generic
IO accounting interfaces were removed.

This commit updates the blk_generic_*_io_acct() wrappers to provide
and interface similar to the updated kernel interface.  It's slightly
different because for older kernels we need to pass the request queue
as well as the bio.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:24 -08:00
Brian Behlendorf b7281c88bc Linux 5.11 compat: lookup_bdev()
The lookup_bdev() function has been updated to require a dev_t
be passed as the second argument. This is actually pretty nice
since the major number stored in the dev_t was the only part we
were interested in. This allows to us avoid handling the bdev
entirely.  The vdev_lookup_bdev() wrapper was updated to emulate
the behavior of the new lookup_bdev() for all supported kernels.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:08 -08:00
Brian Behlendorf c347fac586 Linux 5.11 compat: conftest
Update the ZFS_LINUX_TEST_PROGRAM macro to always set the module
license.  As of the 5.11 kernel not setting a license has been
converted from a warning to an error.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:15:19 -08:00
Ryan Moeller 8fbd31d761 dbufstat: Fix warnings with Python 3.8
Replace "is" with "==" and "is not" with "!=".

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11394
2020-12-23 15:10:35 -08:00
Brian Behlendorf 4879c4d198 Linux 5.10 compat: META
Increase the Linux-Maximum version in the META file to 5.10.
All of the required compatibility patches have been merged.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11391
2020-12-23 08:55:02 -08:00
Brian Behlendorf 0c763f76b1 Remove unused check from dmu_tx_count_write()
Individual transactions may not be larger than DMU_MAX_ACCESS.
This is enforced by the assertions in dmu_tx_hold_write() and
dmu_tx_hold_write_by_dnode().  There's an additional check in
dmu_tx_count_write() however it has no effect and only sets a
local err variable.  We could enable this check, however since
it's already enforced by ASSERTs elsewhere I opted to remove it
instead.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3731 
Closes #11384
2020-12-21 20:17:13 -08:00
Christian Schwarz af9593903e zfs-kmods: install to /lib/modules instead of /usr/lib/modules
Before this patch, dracut wouldn't find zfs.ko for inclusion in
initramfs. This was caused by the packages installing in to
/lib/modules instead of /usr/lib/modules.  Correcting this allows
dracut to do the right thing, even without

    # /etc/dracut.conf
    add_drivers+=" zfs "

Notably, rpm/redhat/zfs-kmod.spec.in does not contain the definition of
the `prefix` macro that this commit removes in the generic kmod spec.
And https://rpmfusion.org/Packaging/KernelModules/Kmods2 does not
mention `prefix` at all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11381
2020-12-21 20:14:32 -08:00
Kjeld Schouten-Lebbing 7bbe563a0e Forward questions to github discussions
Instead of creating issues with type "question"
Forward to the GitHub Discussion system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #11383
2020-12-21 20:09:02 -08:00
Andy Fiddaman 39372fa25b Dangling reference from dmu_objset_upgrade
After porting the fix for https://github.com/openzfs/zfs/issues/5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andy Fiddaman <andy@omnios.org>
Closes #11368
2020-12-21 10:13:23 -08:00
Brian Behlendorf 6152014d1a Linux 4.18.0-257.el8 compat: blk_alloc_queue()
The CentOS stream 4.18.0-257 kernel appears to have backported
the Linux 5.9 change to make_request_fn and the associated API.
To maintain weak modules compatibility the original symbol was
retained and the new interface blk_alloc_queue_rh() was added.

Unfortunately, blk_alloc_queue() was replaced in the blkdev.h
header by blk_alloc_queue_bh() so there doesn't seem to be a way
to build new kmods against the old interfces.  Even though they
appear to still be available for weak module binding.

To accommodate this a configure check is added for the new _rh()
variant of the function and used if available.  If compatibility
code gets added to the kernel for the original blk_alloc_queue()
interface this should be fine.  OpenZFS will simply continue to
prefer the new interface and only fallback to blk_alloc_queue()
when blk_alloc_queue_rh() isn't available.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11374
2020-12-21 10:11:56 -08:00
Brian Behlendorf 8947fa4495 Fix maybe uninitialized variable warning
Commit 1c2358c12 restructured this code and introduced a warning
about the variable maybe not being initialized.  This cannot happen
with the updated code but we should initialize the variable anyway
to silence the warning.

    zpl_file.c: In function ‘zpl_iter_write’:
    zpl_file.c:324:9: warning: ‘count’ may be used uninitialized
        in this function [-Wmaybe-uninitialized]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11373
2020-12-20 09:50:13 -08:00
Brian Behlendorf 9ac535e662 Remove iov_iter_advance() from iter_read
There's no need to call iov_iter_advance() in zpl_iter_read().
This was preserved from the previous code where it wasn't needed
but also didn't cause any problems.  Now that the iter functions
also handle pipes that's no longer the case.  When fully reading a
pipe buffer iov_iter_advance() may results in the pipe buf release
function being called which will not be registered resulting in
a NULL dereference.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11375 
Closes #11378
2020-12-20 09:49:29 -08:00
Christian Schwarz 49c482fde3 dsl_pool: extend comment on DSL Pool Configuration Lock
Based on a conversation with Matt on the OpenZFS Slack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11370
2020-12-19 18:04:05 -08:00
Michael D Labriola 1c0bbd52c3 Linux 5.10 compat: also zvol_revalidate_disk()
Commit 59b68723 added a configure check for 5.10, which removed
revalidate_disk(), and conditionally replaced it's usage with a call to
the new revalidate_disk_size() function.  However, the old function also
invoked the device's registered callback, in our case
zvol_revalidate_disk().  This commit adds a call to zvol_revalidate_disk()
in zvol_update_volsize() to make sure the code path stays the same.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Michael D Labriola <michael.d.labriola@gmail.com>
Closes #11358
2020-12-18 09:36:19 -08:00
Prawn 77f09e2932 ZED/zfs-list-cacher.sh: don't exit on ignored event type
Check for the history_event type instead.

The zfs-list-cacher.sh script currently respects the event types
excluded from syslog(!) in ZED_SYSLOG_SUBCLASS_EXCLUDE.
This makes little sense in this single-purpose script and
silently breaks when history_events are excluded from syslog,
which is the default since 13d65987a9.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #11164
Closes #11347
2020-12-18 09:34:10 -08:00
Brian Behlendorf 1c2358c12a Linux 5.10 compat: use iov_iter in uio structure
As of the 5.10 kernel the generic splice compatibility code has been
removed.  All filesystems are now responsible for registering a
->splice_read and ->splice_write callback to support this operation.

The good news is the VFS provided generic_file_splice_read() and
iter_file_splice_write() callbacks can be used provided the ->iter_read
and ->iter_write callback support pipes.  However, this is currently
not the case and only iovecs and bvecs (not pipes) are ever attached
to the uio structure.

This commit changes that by allowing full iov_iter structures to be
attached to uios.  Ever since the 4.9 kernel the iov_iter structure
has supported iovecs, kvecs, bvevs, and pipes so it's desirable to
pass the entire thing when possible.  In conjunction with this the
uio helper functions (i.e uiomove(), uiocopy(), etc) have been
updated to understand the new UIO_ITER type.

Note that using the kernel provided uio_iter interfaces allowed the
existing Linux specific uio handling code to be simplified.  When
there's no longer a need to support kernel's older than 4.9, then
it will be possible to remove the iovec and bvec members from the
uio structure and always use a uio_iter.  Until then we need to
maintain all of the existing types for older kernels.

Some additional refactoring and cleanup was included in this change:

- Added checks to configure to detect available iov_iter interfaces.
  Some are available all the way back to the 3.10 kernel and are used
  when available.  In particular, uio_prefaultpages() now always uses
  iov_iter_fault_in_readable() which is available for all supported
  kernels.

- The unused UIO_USERISPACE type has been removed.  It is no longer
  needed now that the uio_seg enum is platform specific.

- Moved zfs_uio.c from the zcommon.ko module to the Linux specific
  platform code for the zfs.ko module.  This gets it out of libzfs
  where it was never needed and keeps this Linux specific code out
  of the common sources.

- Removed unnecessary O_APPEND handling from zfs_iter_write(), this
  is redundant and O_APPEND is already handled in zfs_write();

Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11351
2020-12-18 08:48:26 -08:00
Brian Behlendorf 2844ad60d4 ZTS: Simplify zpool_initialize_verify_initialized
Consider the test to be a success as long as the initializing pattern
is found at least once per metaslab.  This indicates that at least
part of the free space was initialized.  Ideally we'd check that the
pattern was written to all free space but that's much trickier so this
check is a reasonable compromise.

Using a here-string to feed the loop in this test causes an empty
string to still trigger the loop so we miss the `spacemaps=0` case.
Pipe into the loop instead.

While here, we can use `zpool wait -t initialize $TESTPOOL` to wait for
the pool to initialize.

Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11365
2020-12-18 08:42:59 -08:00
Matthew Ahrens 71e4ce0e52 special device removal space accounting fixes
The space in special devices is not included in spa_dspace (or
dsl_pool_adjustedsize(), or the zfs `available` property).  Therefore
there is always at least as much free space in the normal class, as
there is allocated in the special class(es).  And therefore, there is
always enough free space to remove a special device.

However, the checks for free space when removing special devices did not
take this into account.  This commit corrects that.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11329
2020-12-17 12:11:56 -08:00
Ryan Moeller 1531506d23 Avoid extra work updating ARC kstats and tunables
After e357046 it should not be necessary to periodically update ARC
kstats and tunables.  Tunable updates are applied when modified, and
kstats are updated on demand.

Update kstats in `arc_evict_cb_check()` for `ZFS_DEBUG` builds only.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11237
2020-12-17 11:16:42 -08:00
sterlingjensen fb188409f1 Use the correct return type for getopt
Use the correct return type for getopt otherwise clang complains
about tautological-constant-out-of-range-compare.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11359
2020-12-17 10:19:30 -08:00
Matthew Ahrens be5c6d9653 Only examine best metaslabs on each vdev
On a system with very high fragmentation, we may need to do lots of gang
allocations (e.g. most indirect block allocations (~50KB) may need to
gang). Before failing a "normal" allocation and resorting to ganging, we
try every metaslab.  This has the impact of loading every metaslab (not
a huge deal since we now typically keep all metaslabs loaded), and also
iterating over every metaslab for every failing allocation. If there are
many metaslabs (more than the typical ~200, e.g. due to vdev expansion
or very large vdevs), the CPU cost of this iteration can be very
impactful.  This iteration is done with the mg_lock held, creating long
hold times and high lock contention for concurrent allocations,
ultimately causing long txg sync times and poor application performance.

To address this, this commit changes the behavior of "normal" (not
try_hard, not ZIL) allocations.  These will now only examine the 100
best metaslabs (as determined by their ms_weight).  If none of these
have a large enough free segment, then the allocation will fail and
we'll fall back on ganging.

To accomplish this, we will now (normally) gang before doing a
`try_hard` allocation.  Non-try_hard allocations will only examine the
100 best metaslabs of each vdev.  In summary, we will first try normal
allocation.  If that fails then we will do a gang allocation.  If that
fails then we will do a "try hard" gang allocation.  If that fails then
we will have a multi-layer gang block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11327
2020-12-16 14:40:05 -08:00
Alexander Motin f8020c9363 Make metaslab class rotor and aliquot per-allocator.
Metaslab rotor and aliquot are used to distribute workload between
vdevs while keeping some locality for logically adjacent blocks.  Once
multiple allocators were introduced to separate allocation of different
objects it does not make much sense for different allocators to write
into different metaslabs of the same metaslab group (vdev) same time,
competing for its resources.  This change makes each allocator choose
metaslab group independently, colliding with others only sporadically.

Test including simultaneous write into 4 files with recordsize of 4KB
on a striped pool of 30 disks on a system with 40 logical cores show
reduction of vdev queue lock contention from 54 to 27% due to better
load distribution.  Unfortunately it won't help much ZVOLs yet since
only one dataset/ZVOL is synced at a time, and so for the most part
only one allocator is used, but it may improve later.

While there, to reduce the number of pointer dereferences change
per-allocator storage for metaslab classes and groups from several
separate malloc()'s to variable length arrays at the ends of the
original class and group structures.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11288
2020-12-15 10:55:44 -08:00
gregory-lee-bartholomew e2d952cda0 DKMS: Disable weak modules
Fedora does not guarantee a stable kABI, so weak modules should be dis-
abled. See the dkms man page for a more detailed explanation of the weak
module feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #9891
Closes #11128
Closes #11242
Closes #11335
2020-12-15 09:22:30 -08:00
Ryan Libby d8a09b3a04 lua: avoid gcc -Wreturn-local-addr bug
Avoid a bug with gcc's -Wreturn-local-addr warning with some
obfuscation.  In buggy versions of gcc, if a return value is an
expression that involves the address of a local variable, and even if
that address is legally converted to a non-pointer type, a warning may
be emitted and the value of the address may be replaced with zero.
Howerver, buggy versions don't emit the warning or replace the value
when simply returning a local variable of non-pointer type.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90737

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11337
2020-12-15 09:20:48 -08:00
Ryan Libby 956f94010f spa: avoid type narrowing warning
Building the spa module for i386 caused gcc to emit
-Wint-to-pointer-cast "cast to pointer from integer of different size"
because spa.spa_did was uint64_t but pthread_join (via thread_join in
spa_deactivate) takes a pointer (32-bit on i386).  Define spa_did to be
pointer-size instead.  For now spa_did is in fact never non-zero and the
thread_join could instead be ifdef'd out, but changing the size of
spa_did may be more useful for the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11336
2020-12-15 09:20:06 -08:00
Ryan Libby c7500ded3e FreeBSD libzfs: gcc requires __thread after static
Building libzfs with gcc on FreeBSD failed because gcc is picky about
the order of keywords in declarations with __thread, whereas clang is
more relaxed.

https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11331
2020-12-14 09:28:24 -08:00
Matthew Macy 923d730329 dmu_zfetch: fix memory leak
The last change caused the read completion callback to not be called
if the IO was still in progress. This change restores allocation
of the arc buf callback, but in the callback path checks the new
acb_nobuf field to know to skip buffer allocation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11324
2020-12-12 16:00:00 -08:00
George Amanakis c76a40bfda Fix reporting of CKSUM errors in indirect vdevs
When removing and subsequently reattaching a vdev, CKSUM errors may
occur as vdev_indirect_read_all() reads from all children of a mirror
in case of a resilver.

Fix this by checking whether a child is missing the data and setting a
flag (ic_error) which is then checked in vdev_indirect_repair() and
suppresses incrementing the checksum counter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11277
2020-12-11 12:15:37 -08:00
Brian Behlendorf 1ad07b01bc Remove draid.d symlink from zfs_helpers.sh
In an earlier revision of dRAID there existed an /etc/zfs/draid.d
directory.  This was removed before the final version was integrated
but a little bit was accidentally overlooked in the zfs_helpers.sh
script.  Remove this remnant. 

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11326
2020-12-11 11:00:58 -08:00
Ryan Moeller 695ac5850b arc_summary3: Handle overflowing value width
Some tunables shown by arc_summary3 have string values that may exceed
the normal line length, leaving a negative offset between the name and
value fields.  The negative space is of course not valid and Python
rightly barfs up an exception traceback.

Handle an overflowing value field width by ignoring the line length
and separating the name from the value by a single space instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11270
2020-12-11 10:29:53 -08:00
Ryan Moeller 439dc034e9 FreeBSD: Implement sysctl for fletcher4 impl
There is a tunable to select the fletcher 4 checksum implementation on
Linux but it was not present in FreeBSD.

Implement the sysctl handler for FreeBSD and use ZFS_MODULE_PARAM_CALL
to provide the tunable on both platforms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11270
2020-12-11 10:29:01 -08:00
Matthew Ahrens ba67d82142 Improve zfs receive performance with lightweight write
The performance of `zfs receive` can be bottlenecked on the CPU consumed
by the `receive_writer` thread, especially when receiving streams with
small compressed block sizes.  Much of the CPU is spent creating and
destroying dbuf's and arc buf's, one for each `WRITE` record in the send
stream.

This commit introduces the concept of "lightweight writes", which allows
`zfs receive` to write to the DMU by providing an ABD, and instantiating
only a new type of `dbuf_dirty_record_t`.  The dbuf and arc buf for this
"dirty leaf block" are not instantiated.

Because there is no dbuf with the dirty data, this mechanism doesn't
support reading from "lightweight-dirty" blocks (they would see the
on-disk state rather than the dirty data).  Since the dedup-receive code
has been removed, `zfs receive` is write-only, so this works fine.

Because there are no arc bufs for the received data, the received data
is no longer cached in the ARC.

Testing a receive of a stream with average compressed block size of 4KB,
this commit improves performance by 50%, while also reducing CPU usage
by 50% of a CPU.  On a per-block basis, CPU consumed by receive_writer()
and dbuf_evict() is now 1/7th (14%) of what it was.

Baseline: 450MB/s, CPU in receive_writer() 40% + dbuf_evict() 35%
New: 670MB/s, CPU in receive_writer() 17% + dbuf_evict() 0%

The code is also restructured in a few ways:

Added a `dr_dnode` field to the dbuf_dirty_record_t.  This simplifies
some existing code that no longer needs `DB_DNODE_ENTER()` and related
routines.  The new field is needed by the lightweight-type dirty record.

To ensure that the `dr_dnode` field remains valid until the dirty record
is freed, we have to ensure that the `dnode_move()` doesn't relocate the
dnode_t.  To do this we keep a hold on the dnode until it's zio's have
completed.  This is already done by the user-accounting code
(`userquota_updates_task()`), this commit extends that so that it always
keeps the dnode hold until zio completion (see `dnode_rele_task()`).

`dn_dirty_txg` was previously zeroed when the dnode was synced.  This
was not necessary, since its meaning can be "when was this dnode last
dirtied".  This change simplifies the new `dnode_rele_task()` code.

Removed some dead code related to `DRR_WRITE_BYREF` (dedup receive).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11105
2020-12-11 10:26:02 -08:00
Paul Dagnelie 7d4b365ce3 Fix kernel panic induced by redacted send
In the redaction list traversal code, there is a bug in the binary search
logic when looking for the resume point. Maxbufid can be decremented to -1,
causing us to read the last possible block of the object instead of the one we
wanted. This can cause incorrect resume behavior, or possibly even a hang in
some cases. In addition, when examining non-last blocks, we can treat the
block as being the same size as the last block, causing us to miss entries in
the redaction list when determining where to resume. Finally, we were ignoring
the case where the resume point was found in the buffer being searched, and
resuming from minbufid. All these issues have been corrected, and the code has
been significantly simplified to make future issues less likely.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11297
2020-12-11 10:22:29 -08:00
Ryan Moeller 8c5606ca0b FreeBSD: Fix format of vfs.zfs.arc_no_grow_shift
vfs.zfs.arc_no_grow_shift has an invalid type (15) and this causes
py-sysctl to format it as a bytearray when it should be an integer.

"U" is not a valid format, it should be "I" and the type should match
the variable type, int.  We can return EINVAL if the value is set below
zero.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11318
2020-12-10 15:28:56 -08:00
Ryan Moeller 513c196200 FreeBSD: Update usage of py-sysctl
py-sysctl now includes the CTLTYPE_NODE type nodes in the list returned
by sysctl.filter() on FreeBSD head.  It also provides descriptions now.

Eliminate the subprocess call to get descriptions, and filter out the
nodes so we only deal with values.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11318
2020-12-10 15:28:31 -08:00
Brian Behlendorf e5f732edbb Fix possibly uninitialized 'root_inode' variable warning
Resolve an uninitialized variable warning when compiling.

    In function ‘zfs_domount’:
    warning: ‘root_inode’ may be used uninitialized in this
        function [-Wmaybe-uninitialized]
    sb->s_root = d_make_root(root_inode);

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11306
2020-12-10 15:23:26 -08:00
Paul Dagnelie 60a4c7d2a2 Implement memory and CPU hotplug
ZFS currently doesn't react to hotplugging cpu or memory into the 
system in any way. This patch changes that by adding logic to the ARC 
that allows the system to take advantage of new memory that is added 
for caching purposes. It also adds logic to the taskq infrastructure 
to support dynamically expanding the number of threads allocated to a 
taskq.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11212
2020-12-10 14:09:23 -08:00
Brian Behlendorf f483daa870 CI: add zloop workflow
Run ztest via zloop for 20 minutes, total run time is ~30 minutes.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11319
2020-12-10 10:55:53 -08:00
George Melikov 5053dfa08c CI: add zloop workflow
Run ztest via zloop for 20 minutes, total run time is ~30 minutes.

Signed-off-by: George Melikov <mail@gmelikov.ru>
2020-12-10 20:46:15 +03:00
Ryan Moeller e0716250bf FreeBSD: Do zcommon_init sooner to avoid FPU panic
There has been a panic affecting some system configurations where the
thread FPU context is disturbed during the fletcher 4 benchmarks,
leading to a panic at boot.

module_init() registers zcommon_init to run in the last subsystem
(SI_SUB_LAST).  Running it as soon as interrupts have been configured
(SI_SUB_INT_CONFIG_HOOKS) makes sure we have finished the benchmarks
before we start doing other things.

While it's not clear *how* the FPU context was being disturbed, this
does seem to avoid it.

Add a module_init_early() macro to run zcommon_init() at this earlier
point on FreeBSD.  On Linux this is defined as module_init().

Authored by: Konstantin Belousov <kib@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11302
2020-12-09 21:29:00 -08:00
Attila Fülöp b9916b4064 ZTS: three small follow up fixes for #11167
Follow up fix for 0cb40fa3. Remove unused variables, don't source
unused libs and add missed cleanup.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11311
2020-12-09 21:27:12 -08:00
Érico Nogueira Rolim 957f9681eb mount_zfs: print strerror instead of errno for error reporting
Tracking down an error message with the errno value can be difficult,
using strerror makes the error message clearer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11303
2020-12-09 21:24:59 -08:00
sterlingjensen 1e4667af32 Drop path prefix workaround
Canonicalization, the source of the trouble, was disabled in 9000a9f.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11295
2020-12-09 21:24:26 -08:00
Orivej Desh ab4fb9b74e Delete rw_semaphore.wait_lock configure check
Last use of wait_lock was removed in "Linux 5.3 compat: retire
rw_tryupgrade()" (e7a99dab2b).

Fixes the issue reported in
https://github.com/openzfs/zfs/issues/11097#issuecomment-714532367

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Orivej Desh <orivej@gmx.fr>
Closes #11309
2020-12-09 21:22:54 -08:00
Matthew Macy 1e4732cbda Decouple arc_read_done callback from arc buf instantiation
Add ARC_FLAG_NO_BUF to indicate that a buffer need not be
instantiated.  This fixes a ~20% performance regression on
cached reads due to zfetch changes.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11220 
Closes #11232
2020-12-09 15:05:06 -08:00
Brian Behlendorf edb20ff3ba Fix optional "force" arg handing in zfs_ioc_pool_sync()
The fnvlist_lookup_boolean_value() function should not be used
to check the force argument since it's optional.  It may not be
provided or may have been created with the wrong flags.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11281
Closes #11284
2020-12-09 14:52:45 -08:00
George Melikov 1a735e763a CI: add new zfs-tests-sanity workflow
Run zfs-tests with sanity.run for brief results.  Timeouts
are rare, so minimize false positives by increasing the
default from 60 to 180 seconds.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11304
2020-12-08 09:53:45 -08:00
George Melikov 8e8fdce682 ZTS: zpool_trim tests throttle trim process
Otherwise trim may finish before progress checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11296
2020-12-07 10:06:10 -08:00
Brian Behlendorf 83b698dc42 Reduce fletcher4 and raidz benchmark times
During module load time all of the available fetcher4 and raidz
implementations are benchmarked for a fixed amount of time to
determine the fastest available.  Manual testing has shown that this
time can be significantly reduced with negligible effect on the final
results.

This commit changes the benchmark time to 1ms which can reduce the
module load time by over a second on x86_64.  On an x86_64 system
with sse3, ssse3, and avx2 instructions the benchmark times are:

    Fletcher4    603ms   -> 15ms
    RAIDZ        1,322ms -> 64ms

Reviewed-by: Matthew Macy <mmacy@freebsd.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11282
2020-12-06 09:57:20 -08:00
Alexander Motin 8136b9d73b Avoid some spa_has_pending_synctask() calls.
Since 8c4fb36a24 (PR #7795) spa_has_pending_synctask() started to
take two more locks per write inside txg_all_lists_empty().  I am
surprised those pool-wide locks are not contended, but still their
operations are visible in CPU profiles under contended vdev lock.

This commit slightly changes vdev_queue_max_async_writes() flow to
not call the function if we are going to return max_active any way
due to high amount of dirty data.  It allows to save some CPU time
exactly when the pool is busy.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11280
2020-12-06 09:55:02 -08:00
Alexander Motin 6366ef2240 Bring consistency to ABD chunk count types.
With both abd_size and abd_nents being uint_t it makes no sense for
abd_chunkcnt_for_bytes() to return size_t.  Random mix of different
types used to count chunks looks bad and makes compiler more difficult
to optimize the code.

In particular on FreeBSD this change allows compiler to completely
optimize out abd_verify_scatter() when built without debug, removing
pointless 64-bit division and even more pointless empty loop.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11279
2020-12-06 09:53:40 -08:00
Brian Behlendorf eed2bfe06a Enable ABI checks for the checkstyle workflow
Extend the CI checkstyle workflow to perform the library ABI
checks in the master branch.  The intent is not to prevent any
ABI changes but to detect them immediately so when they're
made it's done intentionally.

When the changing the ABI the `make storeabi` target can be
used to generate a new .abi file which can be included with
the commit.  This depends on the libabigail utility which is
available from the majority of distribution package managers.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11287
2020-12-06 09:50:47 -08:00
Brian Behlendorf 0484e8722f ZTS: adjust zpool_import_012_pos timeout
When running in the CI the zpool_import_012_pos test case occasionally
takes longer than the maximum 600 seconds.  When this happens the test
case is considered to have failed but always completes a few minutes
latter.  Since the logs suggest nothing has actually failed this commit
increases timeout and removes the exception.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11286
2020-12-06 09:48:36 -08:00
Brian Behlendorf 81638c999d ZTS: Update zfs_share_concurrent_shares.ksh
Occasionally an out of memory error is hit by this test case
when mounting the filesystems.  Try and reduce the likelihood
of this occurring by reducing the thread count from 100 to 50.
It also has the advantage of slightly speeding up the test.

    cannot mount 'testpool/testfs3/79': Cannot allocate memory
        filesystem successfully created, but not mounted

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11283
2020-12-06 09:47:33 -08:00
George Amanakis d1d47691c2 Fix raw sends on encrypted datasets when copying back snapshots
When sending raw encrypted datasets the user space accounting is present
when it's not expected to be. This leads to the subsequent mount failure
due a checksum error when verifying the local mac.
Fix this by clearing the OBJSET_FLAG_USERACCOUNTING_COMPLETE and reset
the local mac. This allows the user accounting to be correctly updated
on first mount using the normal upgrade process.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10523 
Closes #11221
2020-12-04 14:34:29 -08:00
Attila Fülöp 0cb40fa389 zpool: Dryrun fails to list some devices
`zpool create -n` fails to list cache and spare vdevs.
`zpool add -n` fails to list spare devices.
`zpool split -n` fails to list `special` and `dedup` labels.
`zpool add -n` and `zpool split -n` shouldn't list hole devices.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11122
Closes #11167
2020-12-04 14:04:39 -08:00
Ryan Moeller 4b6e2a5a33 Add -u option to 'zfs create'
Add -u option to 'zfs create' that prevents file system from being
automatically mounted. This is similar to the 'zfs receive -u'.

Authored by: pjd <pjd@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@35c58230e2

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Ported-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11254
2020-12-04 14:01:42 -08:00
Brian Behlendorf 8f158ae6ad Add sanity.run file
This run file contains a subset of functional tests which exercise
as much functionality as possible while still executing relatively
quickly.  The included tests should take no more than a few seconds
each to run at most.  This provides a convenient way to sanity test a
change before committing to a full test run which takes several hours.

    $ ./scripts/zfs-tests.sh -r sanity
    ...
    Results Summary
    PASS	 813

    Running Time:	00:14:42
    Percent passed:	100.0%

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11271
2020-12-03 10:49:39 -08:00
melak 766e06695f Fix trivial typo in zfs-diff.8
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tamas TEVESZ <ice@extreme.hu>
Closes #11268 
Closes #11272
2020-12-03 10:18:26 -08:00
Alexander Motin dcf7044522 Fix for "Reduce latency effects of non-interactive I/O"
It was found that setting min_active tunables for non-interactive I/Os
makes them stuck.  It is caused by zfs_vdev_nia_delay, that can never
be reached if we never issue any I/Os due to min_active set to zero.

Fix this by issuing at least one non-interactive I/O at a time when
there are no interactive I/Os.  When there are interactive I/Os, zero
min_active allows to completely block any non-interactive I/O.  It may
min_active starvation in some scenarios, but who we are to deny foot
shooting?

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11261
2020-12-03 10:02:39 -08:00
qzdanis 9109b89cd7 Add compatibility for busybox mktemp
Busybox's mktemp requires at least six X's in the template, causing
the current sed --in-place check to fail because the file does not
exist. This change adds additional X's to mktemp templates that do
not already have at least six X's in them.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Quentin Zdanis <zdanisq@gmail.com>
Closes #11269
2020-12-03 10:01:16 -08:00
Ryan Moeller 0aacde2e9a FreeBSD: notify userspace when a vdev is removed
This is needed for zfsd to autoreplace vdevs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11260
2020-12-02 10:20:02 -08:00
Finix1979 ec50cd24ba Avoid unneccessary zio allocation and wait
In function dmu_buf_hold_array_by_dnode, the usage of zio is only for 
the reading operation. Only create the zio and wait it in the reading 
scenario as a performance optimization.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Finix Yan <yancw@info2soft.com>
Closes #11251 
Closes #11256
2020-12-02 09:28:55 -08:00
Andrew Sun 95a78a035a Make zpool status "remove:" label print in bold
When ZFS_COLOR is set, zpool status shows row headings in bold,
except for the "remove:" heading. This is a quick fix that makes
it print in bold too.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Sun <me@andrewsun.com>
Closes #11255
2020-12-01 15:22:51 -08:00
George Melikov aa2778d100 CI: simplify checkstyle runner
Remove excess steps.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11262
2020-12-01 12:15:55 -08:00
Pavel Snajdr 52c8537513 zpool_influxdb: move to libexec dir
Move the zpool_influxdb command to /usr/libexec/zfs,
and include the /usr/libexec/zfs path in the system search
directory when running the test suite.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #11156
Closes #11160
Closes #11224
2020-11-28 11:15:57 -08:00
Brian Behlendorf b2a54a28b5 Verify zfs module loaded before starting services
Extend the change made in ae12b02 to verify the zfs kernel
modules are loaded to the rest of the OpenZFS services.  If
the modules aren't loaded the neither the share, volume, or
and zed services can be started.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11243
2020-11-28 11:11:18 -08:00
Đoàn Trần Công Danh 16692e6ba0 dracut: use /bin/sh instead of bash as the intepreter
Despite that dracut has a hard dependency on bash,
its modules doesn't, dracut only has a hard dependency on bash for
module-setup (on a fully usable machine). Inside initramfs, dracut
allows users choose from a list of handful other shells, e.g. bash,
busybox, dash, mkfsh.

In fact, my local machine's initramfs is being built with dash,
and it's functional for a very long time.

Before 64025fa3a (Silence 'make checkbashisms', 2020-08-20), we also
allows our users to have that right, too.

Let's fix the problem 'make checkbashisms' reported and allows our users
to have that right, again.

For 'plymouth' case, let's simply run the command inside the if instead
of checking for the existence of command before running it, because the
status is also failture if plymouth is unavailable.

While we're at it, let's remove an unnecessary fork for grep in
zfs-generator.sh.in and its following complicated 'if elif fi' with
a simple 'case ... esac'.

To support this change, also exclude 90zfs from "make checkbashisms"
because the current CI infrastructure ships an old version of
"checkbashisms", which complains about "command -v", while the current
latest "checkbashisms" thinks it's fine. In the near future, we can
revert that change to "Makefile.am" when CI infrastructure is updated.

Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Closes #11244
2020-11-28 11:02:08 -08:00
Brian Behlendorf 04a82e043d Remove incorrect assertion
Commit 85703f6 added a new ASSERT to zfs_write() as part of the
cleanup which isn't correct in the case where multiple processes
are concurrently extending a file.  The `zp->z_size` is updated
atomically while holding a range lock on only a portion of the
file.  Therefore, it's possible for the file size to increase
after a same check is performed earlier in the loop causing this
ASSERT to fail.  The code itself handles this case correctly so
only the invalid ASSERT needs to be removed.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11235
2020-11-24 09:28:42 -08:00
Alexander Motin 6f5aac3ca0 Reduce latency effects of non-interactive I/O
Investigating influence of scrub (especially sequential) on random read
latency I've noticed that on some HDDs single 4KB read may take up to 4
seconds!  Deeper investigation shown that many HDDs heavily prioritize
sequential reads even when those are submitted with queue depth of 1.

This patch addresses the latency from two sides:
 - by using _min_active queue depths for non-interactive requests while
   the interactive request(s) are active and few requests after;
 - by throttling it further if no interactive requests has completed
   while configured amount of non-interactive did.

While there, I've also modified vdev_queue_class_to_issue() to give
more chances to schedule at least _min_active requests to the lowest
priorities.  It should reduce starvation if several non-interactive
processes are running same time with some interactive and I think should
make possible setting of zfs_vdev_max_active to as low as 1.

I've benchmarked this change with 4KB random reads from ZVOL with 16KB
block size on newly written non-fragmented pool.  On fragmented pool I
also saw improvements, but not so dramatic.  Below are log2 histograms
of the random read latency in milliseconds for different devices:

4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before:
0, 0, 2,  1,  12,  21,  19,  18, 10, 15, 17, 21
after:
0, 0, 0, 24, 101, 195, 419, 250, 47,  4,  0,  0
, that means maximum latency reduction from 2s to 500ms.

4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before:
0, 0,  2,  31,  38,  28,  18,  12, 17, 20, 24, 10, 3
after:
0, 0, 55, 247, 455, 470, 412, 181, 36,  0,  0,  0, 0
, i.e. from 4s to 250ms.

1 SAS HDD SEAGATE ST14000NM0048 before:
0,  0,  29,   70, 107,   45,  27, 1, 0, 0, 1, 4, 19
after:
1, 29, 681, 1261, 676, 1633,  67, 1, 0, 0, 0, 0,  0
, i.e. from 4s to 125ms.

1 SAS SSD SEAGATE XS3840TE70014 before (microseconds):
0, 0, 0, 0, 0, 0, 0, 0,  70, 18343, 82548, 618
after:
0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844,  90

I've also measured scrub time during the test and on idle pools.  On
idle fragmented pool I've measured scrub getting few percent faster
due to use of QD3 instead of QD2 before.  On idle non-fragmented pool
I've measured no difference.  On busy non-fragmented pool I've measured
scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11166
2020-11-24 09:26:42 -08:00
Brian Behlendorf f67bebbc34 Obsolete earlier packages due to version bump
In order for package managers such as dnf to upgrade cleanly after
the package SONAME bump the obsolete package names must be known.
Update the new packages to correctly obsolete the old ones.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11230 
Closes #11233
2020-11-24 09:24:24 -08:00
Matthew Macy cd44f5be37 FreeBSD: decouple ZFS_DEBUG from kernel debug settings
Reviewed-by: Martelli Nikola @martellini
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11213
2020-11-24 09:16:46 -08:00
Brian Behlendorf 0657326f9c Update dRAID short feature description
The documentation describes dRAID as a distributed spare, not
parity, RAID implementation.  Update the short feature description
to match the rest of the documentation.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11229
2020-11-23 14:49:17 -08:00
Antonio Russo d45267183f libzfsbootenv: do not depend on libnvpair
We do not build libnvpair.pc.  Moreover, it is automatically pulled in
by libzfs.pc, so no additional specific dependency is required.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11227
2020-11-22 15:16:42 -08:00
cragw dc6d39a85e pam_zfs_key: accommodate different dataset naming scheme
Name of dataset for user home directory may vary from the expected
$homes_prefix/$username, if different naming scheme is being used.

We can use property mountpoint to specify the dataset for $username
as long as its value is identical to passwd's pw_dir.

For example:
    NAME                       PROPERTY     VALUE
    rpool/home/myuser_123456   mountpoint   /home/myuser

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Crag Wang <crag0715@gmail.com>
Closes #11165
2020-11-22 09:32:34 -08:00
Brian Behlendorf f1ece319fd Include the ABI with dist tarball
The ABI should be included when generating the `make dist` tarball
since it's required by the `make checkabi` target.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11225
2020-11-21 10:44:52 -08:00
Brian Behlendorf 4d0ba94113 Correct missing zil_claim() DTL updates
Commit a1d477c2 accidentally disabled DTL updates for the zil_claim()
case described at the end of vdev_stat_update() by unconditionally
disabling all DTL updates when loading.  This was done to avoid
a deadlock on the vd_dtl_lock when loading the DTLs from disk.

    vdev_dtl_contains <--- Takes vd->vd_dtl_lock
    vdev_mirror_child_missing
    vdev_mirror_io_start
    zio_vdev_io_start
    __zio_execute
    arc_read
    dbuf_issue_final_prefetch
    dbuf_prefetch_impl
    dbuf_prefetch
    dmu_prefetch
    space_map_iterate
    space_map_load_length
    space_map_load
    vdev_dtl_load <--- Takes vd->vd_dtl_lock
    vdev_load
    spa_ld_load_vdev_metadata
    spa_tryimport

The missing DTL updates can be restored by moving the space_map_load()
call outside the vd_dtl_lock.  A private range tree is populated by
reading the space map and then merged in to the DTL_MISSING tree
under the lock.

Furthermore, the SPA_LOAD_NONE check in vdev_dtl_contains() leads to an
additional problem.  Any resilvering which occurs before SPA_LOAD_NONE
is set will incorrectly determine that there's nothing to repair.  This
can result in full redundancy not being restored for some blocks.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11218
2020-11-20 13:14:45 -08:00
Antonio Russo 8434017a27 Track SONAME version bump in packaging
RPM and DEB packages are named after the SONAME version of the library
they contain.  After bumping this version, the packaging should be
renamed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11219
2020-11-19 16:25:24 -08:00
наб 567e4d0dfa dracut/mount-zfs.sh: quote expansion on zpool test
Bring over some of the improvements from dracut/zfs-load-key.sh,
shellcheck is slightly quieter as well

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11198
2020-11-19 16:20:57 -08:00
наб 2d9f82d891 dracut/zfs-load-key.sh: simplify import loop, quote variable assignments
The loop now has a less confusing condition and properly uses
systemctl(1) is-failed's return code instead of that entire mess

The assignments could turn into "var=val program" if encryptionroot
or keylocation had whitespace in them

As a bonus, this (mostly) silences shellcheck

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11198
2020-11-19 16:20:42 -08:00
Ryan Moeller 85703f616d Reduce confusion in zfs_write
Is this block when abuf != NULL ever reached? Yes, it is.

Add asserts and comments to prove that when we get here, we have a full
block write at an aligned offset extending past EOF.

Simplify by removing the check that tx_bytes == max_blksz, since we can
assert that it is always true.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11191
2020-11-18 15:06:59 -08:00
Matthew Macy 0ca45cb310 Fix problems in zvol_set_volmode_impl
- Don't leave fstrans set when passed a snapshot
- Don't remove minor if volmode already matches new value
- (FreeBSD) Wait for GEOM ops to complete before trying
  remove (at create time GEOM will be "tasting" in parallel)
- (FreeBSD) Don't leak zvol_state_lock on open if zv == NULL
- (FreeBSD) Don't try to unlock zv->zv_state lock if zv == NULL

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11199
2020-11-17 09:50:52 -08:00
Brian Behlendorf 82611cdfe5 Add ABI snapshot
Add a snapshot of the current ABI using libabigail-1.7-2.  The
included ABI passes `make checkabi` for CentOS 7, Fedora 33,
Debian 10, and Ubuntu 20.04.  This covers a fairly wide range
of glibc, gcc, and libabigail versions plus other changes which
are platform specific.

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11144
2020-11-17 09:21:39 -08:00
Antonio Russo 14c34c3d49 Library ABI tracking with abigail
Provide two make targets: checkabi and storeabi.

storeabi uses libabigail to generate a reference copy of the ABI for the
public libraries.

checkabi compares such a reference to the compiled version, failing if
they are not compatible.  No ABI is generated for libzpool.so, it is
only used by ztest and zdb and not external consumers.

Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11144
2020-11-17 09:18:52 -08:00
наб e6c59cd171 zpool: correctly align columns with -p
zpool_expand_proplist() now ignores pl_fixed if its new literal
argument is true.  The rest is a consequence of needing to pass
that down.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiao?=~Dska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11202
2020-11-16 09:26:20 -08:00
наб fd654e412e zpool(8): fix pool-wi[sd]e typo
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11202
2020-11-16 09:26:16 -08:00
loli10K 4072f465bc Fix 'zfs userspace' for received datasets in encrypted root
For encrypted receives, where user accounting is initially disabled on
creation, both 'zfs userspace' and 'zfs groupspace' fails with
EOPNOTSUPP: this is because dmu_objset_id_quota_upgrade_cb() forgets to
set OBJSET_FLAG_USERACCOUNTING_COMPLETE on the objset flags after a
successful dmu_objset_space_upgrade().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9501 
Closes #9596
2020-11-16 09:10:29 -08:00
George Amanakis 2c210f6818 Fix ASSERT logic in l2arc_evict()
In case of cache device removal it is possible that at the end of
l2arc_evict() we have l2ad_hand = l2ad_evict. This can lead to the
following panic in case of a debug build:

VERIFY3(dev->l2ad_hand < dev->l2ad_evict) failed (321920512 < 321920512)
Call Trace:
 dump_stack+0x66/0x90
 spl_panic+0xef/0x117 [spl]
 l2arc_remove_vdev+0x11d/0x290 [zfs]
 spa_load_l2cache+0x275/0x5b0 [zfs]
 spa_vdev_remove+0x4a5/0x6e0 [zfs]
 zfs_ioc_vdev_remove+0x59/0xa0 [zfs]
 zfsdev_ioctl_common+0x5b3/0x630 [zfs]
 zfsdev_ioctl+0x53/0xe0 [zfs]
 do_vfs_ioctl+0x42e/0x6b0
 ksys_ioctl+0x5e/0x90
 do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

In case of cache device removal it also possible that l2ad_hand +
distance > l2ad_end since we do not iterate l2arc_evict() and l2ad_hand
is not reset. This has no functional consequence however as the cache
device is about to be removed.

Fix this by omitting the ASSERT in case of device removal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11205
2020-11-16 09:08:11 -08:00
Érico Rolim 24c12b48a1 config/dracut/90zfs: handle cases where hostid(1) returns all zeros
On systems with musl libc, hostid(1) always prints "00000000", which
will cause improper behavior when the 90zfs module is configured in a
dracut initramfs. Work around this by copying the host /etc/hostid if
the file exists, and otherwise only write /etc/hostid if hostid(1)
returns something meaningful. This avoids zgenhostid creating a random
/etc/hostid for the initramfs, which could lead to errors when trying to
import the pool if spl_hostid isn't defined in the kernel command line.

Furthermore, tag the /etc/hostid file as hostonly, since it is system
specific and shouldn't be taken into account when trying to use an
initramfs generated in one system to boot into a different system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Co-authored-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11174
Closes #11189
2020-11-14 17:21:54 -08:00
Érico Rolim 9c4b6dbb31 zgenhostid: accept hostid arguments equal to zero.
A common usage pattern for zgenhostid, including in the ZFS dracut
module, is running it as:

  zgenhostid $(hostid)

However, zgenhostid only accepted hostid arguments greater than 0, which
meant that, when the output of hostid(1) was "00000000", zgenhostid
would error out, even though 0 is a possible return value for the
gethostid(3) function used by hostid(1):

- On current musl libc, gethostid(3) is a stub that always returns 0.
- On glibc, gethostid(3) will return 0 if /etc/hostid exists but is
  smaller than 4 bytes.

In these cases, it makes more sense for zgenhostid to treat a value of 0
as other parts of the zfs codebase do, meaning that a hostid value
couldn't be determined; therefore, it should attempt to generate a
random value to write into /etc/hostid.

The manpage and usage output have been updated to reflect this.

Whitespace has also been fixed in the usage output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11174
Closes #11189
2020-11-14 17:20:54 -08:00
Brian Behlendorf 4352edaafb Linux: Fix ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP usage
The ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP macros should not be used
in the Linux zpl_*.c source files.  They return a positive error
value which is correct for the common code, but not for the Linux
specific kernel code which expects a negative return value.  The
ZPL_ENTER/ZPL_EXIT/ZPL_VERFY_ZP macros should be used instead.

Furthermore, the ZPL_EXIT macro has been updated to not call the
zfs_exit_fs() function.  This prevents a possible deadlock which
can occur when a snapshot is automatically unmounted because the
zpl_show_devname() must never wait on in progress automatic
snapshot unmounts.

Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11169 
Closes #11201
2020-11-14 10:19:00 -08:00
Matthew Ahrens d66aab7c08 Assertion failure when logging large output of channel program
The output of ZFS channel programs is logged on-disk in the zpool
history, and printed by `zpool history -i`.  Channel programs can use
10MB of memory by default, and up to 100MB by using the `zfs program -m`
flag.  Therefore their output can be up to some fraction of 100MB.

In addition to being somewhat wasteful of the limited space reserved for
the pool history (which for large pools is 1GB), in extreme cases this
can result in a failure of `ASSERT(length <= DMU_MAX_ACCESS);` in
`dmu_buf_hold_array_by_dnode()`.

This commit limits the output size that will be logged to 1MB.  Larger
outputs will not be logged, instead a entry will be logged indicating
the size of the omitted output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11194
2020-11-14 10:17:16 -08:00
Ryan Moeller 7e3617de35 Return EFAULT at the end of zfs_write() when set
FreeBSD's VFS expects EFAULT from zfs_write() if we didn't complete
the full write so it can retry the operation.  Add some missing
SET_ERRORs in zfs_write().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11193
2020-11-14 10:16:26 -08:00
Brian Behlendorf b2255edcc0 Distributed Spare (dRAID) Feature
This patch adds a new top-level vdev type called dRAID, which stands
for Distributed parity RAID.  This pool configuration allows all dRAID
vdevs to participate when rebuilding to a distributed hot spare device.
This can substantially reduce the total time required to restore full
parity to pool with a failed device.

A dRAID pool can be created using the new top-level `draid` type.
Like `raidz`, the desired redundancy is specified after the type:
`draid[1,2,3]`.  No additional information is required to create the
pool and reasonable default values will be chosen based on the number
of child vdevs in the dRAID vdev.

    zpool create <pool> draid[1,2,3] <vdevs...>

Unlike raidz, additional optional dRAID configuration values can be
provided as part of the draid type as colon separated values. This
allows administrators to fully specify a layout for either performance
or capacity reasons.  The supported options include:

    zpool create <pool> \
        draid[<parity>][:<data>d][:<children>c][:<spares>s] \
        <vdevs...>

    - draid[parity]       - Parity level (default 1)
    - draid[:<data>d]     - Data devices per group (default 8)
    - draid[:<children>c] - Expected number of child vdevs
    - draid[:<spares>s]   - Distributed hot spares (default 0)

Abbreviated example `zpool status` output for a 68 disk dRAID pool
with two distributed spares using special allocation classes.

```
  pool: tank
 state: ONLINE
config:

    NAME                  STATE     READ WRITE CKSUM
    slag7                 ONLINE       0     0     0
      draid2:8d:68c:2s-0  ONLINE       0     0     0
        L0                ONLINE       0     0     0
        L1                ONLINE       0     0     0
        ...
        U25               ONLINE       0     0     0
        U26               ONLINE       0     0     0
        spare-53          ONLINE       0     0     0
          U27             ONLINE       0     0     0
          draid2-0-0      ONLINE       0     0     0
        U28               ONLINE       0     0     0
        U29               ONLINE       0     0     0
        ...
        U42               ONLINE       0     0     0
        U43               ONLINE       0     0     0
    special
      mirror-1            ONLINE       0     0     0
        L5                ONLINE       0     0     0
        U5                ONLINE       0     0     0
      mirror-2            ONLINE       0     0     0
        L6                ONLINE       0     0     0
        U6                ONLINE       0     0     0
    spares
      draid2-0-0          INUSE     currently in use
      draid2-0-1          AVAIL
```

When adding test coverage for the new dRAID vdev type the following
options were added to the ztest command.  These options are leverages
by zloop.sh to test a wide range of dRAID configurations.

    -K draid|raidz|random - kind of RAID to test
    -D <value>            - dRAID data drives per group
    -S <value>            - dRAID distributed hot spares
    -R <value>            - RAID parity (raidz or dRAID)

The zpool_create, zpool_import, redundancy, replacement and fault
test groups have all been updated provide test coverage for the
dRAID feature.

Co-authored-by: Isaac Huang <he.huang@intel.com>
Co-authored-by: Mark Maybee <mmaybee@cray.com>
Co-authored-by: Don Brady <don.brady@delphix.com>
Co-authored-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mmaybee@cray.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10102
2020-11-13 13:51:51 -08:00
Matthew Ahrens a724db0374 Channel program may spuriously fail with "memory limit exhausted"
ZFS channel programs (invoked by `zfs program`) are executed in a LUA
sandbox with a limit on the amount of memory they can consume.  The
limit is 10MB by default, and can be raised to 100MB with the `-m` flag.
If the memory limit is exceeded, the LUA program exits and the command
fails with a message like `Channel program execution failed: Memory
limit exhausted.`

The LUA sandbox allocates memory with `vmem_alloc(KM_NOSLEEP)`, which
will fail if the requested memory is not immediately available.  In this
case, the program fails with the same message, `Memory limit exhausted`.
However, in this case the specified memory limit has not been reached,
and the memory may only be temporarily unavailable.

This commit changes the LUA memory allocator `zcp_lua_alloc()` to use
`vmem_alloc(KM_SLEEP)`, so that we won't spuriously fail when memory is
temporarily low.  Instead, we rely on the system to be able to free up
memory (e.g. by evicting from the ARC), and we assume that even at the
highest memory limit of 100MB, the channel program will not truly
exhaust the system's memory.

External-issue: DLPX-71924
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11190
2020-11-11 17:16:15 -08:00
Brian Behlendorf c08d442e45 Linux: Fix mount/unmount when dataset name has a space
The custom zpl_show_devname() helper should translate spaces in
to the octal escape sequence \040.  The getmntent(2) function
is aware of this convention and properly translates the escape
character back to a space when reading the fsname.

Without this change the `zfs mount` and `zfs unmount` commands
incorrectly detect when a dataset with a name containing spaces
is mounted.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11182 
Closes #11187
2020-11-11 17:14:24 -08:00
Mateusz Guzik 18ca574f0a G/C data_alloc_arena
It is a leftover from illumos always set to NULL and introducing a
spurious difference between zio_buf and zio_data_buf.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11188
2020-11-11 17:11:32 -08:00
Tony Perkins 9bd14b8724 Start snapdir_iterate traversals to begin wtih the value of zero.
The microzap hash can sometimes be zero for single digit snapnames.
The zap cursor can then have a serialized value of two (for . and ..),
and skip the first entry in the avl tree for the .zfs/snapshot directory
listing, and therefore does not return all snapshots.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: Tony Perkins <tperkins@datto.com>
Closes #11039
2020-11-11 17:06:16 -08:00
Adrian Chadd 3fcd737478 Fix compiling on FreeBSD + gcc - don't assume illmnos bits
This looks like it was once from the illumnos compat code.
FreeBSD doesn't have cmn_err as a compiler format attribute, so
it definitely errors out.

It doesn't show up on LLVM because it doesn't trigger at all.

Add in the format flags but keep them behind #if 0 for now;
there are too many format issues that trigger when one does
format checking in the shared code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes #11068
Closes #11069
2020-11-10 15:54:12 -08:00
Adrian Chadd 79a357c2a1 Fix pointer-is-uint64_t-sized assumption in the ioctl path
This shows up when compiling freebsd-head on amd64 using gcc-6.4.
The lib32 compat build ends up tripping over this assumption.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes #11068
Closes #11069
2020-11-10 15:53:13 -08:00
sterlingjensen a4ae4998cb Fix memleak in cmd/mount_zfs.c
Convert dynamic allocation to static buffer, simplify parse_dataset
function return path. Add tests specific to the mount helper.

Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11098
2020-11-10 15:50:44 -08:00
наб b60ae3a5dc zpoolprops.8: clarify vdev expansion rules
Remove reference to EFI(?), explain that the new space
is beyond the GPT for whole-disk vdevs, and add section noting how it
behaves with partition vdevs in terms of how the user is most likely to
encounter it ‒ the previous phrasing was confusing
and seemed to indicate that "zpool online -e" will be able to claim

  GPT[whatever, ZFS, free space, whatever]

into

  GPT[whatever, ZFS, whatever]
but that's not the case, as it'll only be able to do so after manually
resizing the ZFS partition to include the free space beforehand, i.e.:
  GPT[whatever, ZFS, free space, whatever]
  GPT[whatever, [ZFS + free space], potentially left-overs, whatever]
  # zpool online -e
  GPT[whatever, ZFS, whatever]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11158
2020-11-10 12:48:26 -08:00
Mateusz Guzik 1a0b4f566c G/C struct znode -> z_moved
The field is yet another leftover from unsupported zfs_znode_move.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11186
2020-11-10 12:42:47 -08:00
Pavel Zakharov 0f66201dbc initramfs: zfsunlock hook breaks /usr/bin
The copy_exec() function expects that the full path of the target 
file is passed rather than just the directory, and will take care 
of creating the underlying directories if they don't exist.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Closes #11162
2020-11-10 11:12:07 -08:00
Ryan Moeller 7b42f09049 FreeBSD: Simplify zvol_geom_open and zvol_cdev_open
We can consolidate the unlocking procedure into one place by starting
with drop_suspend set to B_FALSE and moving the open count check up.

While here, a little code cleanup. Match the out labels between
zvol_geom_open and zvol_cdev_open, and add a missing period in some
comments.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11175
2020-11-10 11:08:10 -08:00
Ryan Moeller 2186ed33f1 FreeBSD: Avoid spurious EINTR in zvol_cdev_open
zvol_first_open can fail with EINTR if spa_namespace_lock is not held
and cannot be taken without waiting.

Apply the same logic that was done for zvol_geom_open to take
spa_namespace_lock if not already held on first open in zvol_cdev_open.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11175
2020-11-10 11:07:25 -08:00
Ryan Moeller d1dd72a2c5 Simplify offset and length limit in zfs_write
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-11-10 10:58:59 -08:00
Ryan Moeller 9a764716fc Const some unchanging variables in zfs_write
Show that these values will not be changing later.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-11-10 10:58:59 -08:00
Ryan Moeller 2074dfd0e9 FreeBSD: Move uio_prefaultpages def to uio.h
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-11-10 10:58:59 -08:00
Ryan Moeller 8a9634e2f3 Remove redundant oid parameter to update_pages
The oid comes from the znode we are already passing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-11-10 10:54:30 -08:00
Ryan Moeller eec6646ea9 Factor uid, gid, and projid out of loop in zfs_write
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-11-10 10:53:19 -08:00
Alexander Motin daabddaac1 Fix dmu_tx_dirty_throttle after arc_c reduction
After initial arc_c was reduced to arc_c_min it became possible that
on datasets with primarycache=metadata or none dirty data make up most
of ARC capacity and easily more than configured 50% of initial arc_c,
that causes forced txg commits by arc_tempreserve_space() and periodic
very long write delays.

This patch makes arc_tempreserve_space() to use arc_c only after ARC
warmed up once and arc_c really means something, but use arc_c_max
before that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11178
2020-11-10 10:39:26 -08:00
Matthew Macy 570d7038d0 Fix dnode refcount tracking
Fix a couple of places where the wrong tag is passed
to dnode_{hold, rele}

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11184
2020-11-10 10:37:10 -08:00
Ryan Moeller 52e585a822 ZTS: Add L1 corruption test
Add a new test case which corrupts all level 1 block in a file.
Then verifies that corruption is detected and repaired.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141
2020-11-05 17:16:25 -08:00
Ryan Moeller cce66dfa8e ZTS: Output all block copies in list_file_blocks
The second part of list_file_blocks transforms the object description
output by zdb -ddddd $ds $objnum into a stream of lines of the form
"level path offset length" for the indirect blocks in the given file.
The current code only works for the first copy of L0 blocks.  L1 and
L2 indirect blocks have more than one copy on disk.

Add one more -d to the zdb command so we get all block copies and
rewrite the transformation to match more than L0 and output all DVAs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141
2020-11-05 17:16:21 -08:00
Ryan Moeller 94deb47872 ZTS: Fix list_file_blocks for mirror vdevs, level > 0
The first part of list_file_blocks transforms the pool configuration
output by zdb -C $pool into shell code to set up a shell variable,
VDEV_MAP, that maps from vdev id to the underlying vdev path. This
variable is a simple indexed array. However, the vdev id in a DVA is
only the id of the top level vdev.

When the pool is mirrored, the top level vdev is a mirror and its
children are the mirrored devices. So, what we need is to map from
the top level vdev id to a list of the underlying vdev paths.
ist_file_blocks does not need to work for raidz vdevs, so we can
disregard that case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141
2020-11-05 17:16:16 -08:00
Mariusz Zaborski ae37ceadaa FreeBSD: Prevent a NULL reference in zvol_cdev_open
Check if the ZVOL has been written before calling zil_async_to_sync.
The ZIL will be opened on the first write, not earlier.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #11152
2020-11-05 17:02:19 -08:00
khng300 a4246bce50 FreeBSD: Prevent NULL pointer dereference of resid
spa_config_load() passes NULL into resid when doing zfs_file_read().
This would trip over when vfs.zfs.autoimport_disable=0.

Sponsored by: The FreeBSD Foundation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ka Ho Ng <khng@freebsdfoundation.org>
Closes #11149
2020-11-04 16:50:08 -08:00
Antonio Russo 71ae6a9d23 Synchronize library ABI levels
Bump library SOVERSION under Linux to match FreeBSD's.

Additionally, this bump properly accounts for the ABI changes relative
to ZoL 0.8.5 for the Linux build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Issue #11144
2020-11-03 09:24:43 -08:00
Ryan Moeller 181b2adc2a FreeBSD: zvol_os: Use SET_ERROR more judiciously
SET_ERROR is useful to trace errors, so use it where the errors occur
rather than factored out to the end of a function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11146
2020-11-03 09:21:09 -08:00
Brian Behlendorf 2d7843401a ZTS: zdb_block_size_histogram increase variance
The expected variance for this test case was originally set at 10%
based on local testing.  Additional testing via the CI has show it
can be as large as 11%.  Increase the expected maximum to 12% to
prevent this test from incorrectly failing.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11148
2020-11-03 09:20:34 -08:00
Brian Behlendorf 123d2803bf ZTS: Wait on all events in events_001_pos.ksh
The events_001_pos.ksh test case can fail because it's possible,
and correct, for the config_sync event to be posted after the last
"expected" event.  To accommodate this the run_and_verify() function
has been updated to wait for all non-history events, not just the
last event.  This does not increase the run time of the test as
long as all the events do get generated.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11147
2020-11-03 09:19:45 -08:00
Coleman Kane 59b6872327 Linux 5.10 compat: revalidate_disk_size() added
A new function was added named revalidate_disk_size() and the old
revalidate_disk() appears to have been deprecated. As the only ZFS
code that calls this function is zvol_update_volsize, swapping the
old function call out for the new one should be all that is required.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11085
2020-11-02 22:01:19 +00:00
Coleman Kane ae15f1c1d8 Linux 5.10 compat: check_disk_change() removed
Kernel 5.10 removed check_disk_change() in favor of callers using
the faster bdev_check_media_change() instead, and explicitly forcing
bdev revalidation when they desire that behavior. To preserve prior
behavior, I have wrapped this into a zfs_check_media_change() macro
that calls an inline function for the new API that mimics the old
behavior when check_disk_change() doesn't exist, and just calls
check_disk_change() if it exists.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11085
2020-11-02 22:01:19 +00:00
Coleman Kane 838a249012 Linux 5.10 compat: percpu_ref added data member
Kernel commit 2b0d3d3e4fcfb brought in some changes to the struct
percpu_ref structure that moves most of its fields into a member
struct named "data" of type struct percpu_ref_data. This includes
the "count" member which is updated by vdev_blkg_tryget(), so update
this function to chase the API change, and detect it via configure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11085
2020-11-02 22:01:19 +00:00
Brian Behlendorf 8c7d604c62 Linux 5.10 compat: frame.h renamed objtool.h
In Linux 5.10 the linux/frame.h header was renamed linux/objtool.h.
Add a configure check to detect and use the correctly named header.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11085
2020-11-02 22:01:10 +00:00
Sebastian Gottschall 7eefaf0ca0 Optimize locking checks in mempool allocator
Avoid checking the whole array of objects each time by removing the self
organized memory reaping. this can be managed by the global memory reap
callback which is called every 60 seconds. this will reduce the use if
locking operations significant.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Closes #11126
2020-11-02 12:10:07 -08:00
Christian Schwarz ab8c935ea6 zfs_vnops: make zfs_get_data OS-independent
Move zfs_get_data() in to platform-independent code. The only
platform-specific aspect of it is the way we release an inode 
(Linux) / vnode_t (FreeBSD). I am not aware of a platform that
could be supported by ZFS that couldn't implement zfs_rele_async 
itself. It's sibling zvol_get_data already is platform-independent.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #10979
2020-11-02 12:07:07 -08:00
Mateusz Guzik 09eb36ce3d Introduce CPU_SEQID_UNSTABLE
Current CPU_SEQID users don't care about possibly changing CPU ID, but
enclose it within kpreempt disable/enable in order to fend off warnings
from Linux's CONFIG_DEBUG_PREEMPT.

There is no need to do it. The expected way to get CPU ID while allowing
for migration is to use raw_smp_processor_id.

In order to make this future-proof this patch keeps CPU_SEQID as is and
introduces CPU_SEQID_UNSTABLE instead, to make it clear that consumers
explicitly want this behavior.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11142
2020-11-02 11:51:12 -08:00
Matthew Macy 8583540c6e Consolidate zfs_holey and zfs_access
The zfs_holey() and zfs_access() functions can be made common
to both FreeBSD and Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11125
2020-10-31 09:40:08 -07:00
Ryan Moeller 2f94e8f09e Remove duplicate cond_resched() definition
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11131
2020-10-31 09:37:56 -07:00
Ryan Moeller 65a343bbd3 zvol_os: Fix handling of zvol private data
zvol private data is supposed to be nulled by zvol_clear_private before
zvol_free is called as an indicator that the zvol is going away.

Implement zvol_clear_private for volmode=dev.

Assert that zvol_clear_private has been called before zvol_free.

Check that zvol_clear_private has not been called when updating
volsize.  If it has, fail with ENXIO.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:49 -07:00
Ryan Moeller 277884ab42 zvol_os: Don't leak doi in cdev error path
Make sure to free doi in zvol_create_minor impl when make_dev_s fails.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:43 -07:00
Ryan Moeller 9a0ef216e5 zvol_os: Properly ignore error in volmode lookup
We fall back to a default volmode and continue when looking up a zvol's
volmode property fails.  After this we should set the error to 0 to
ensure we take the success paths in the out section.

While here, make sure we only log that the zvol was created on success.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:36 -07:00
Ryan Moeller 1a6a75ac07 zvol_os: Code cleanup in zvol_create_minor_impl
Nonfunctional changes for readability and consistency.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:30 -07:00
Ryan Moeller 260f6a28af zvol_os: Keep better track of open count in close
zvol_geom_close gets a count of the number of close operations to do.

Make sure we're always using this count to check if this will be the
last close operation performed on the zvol.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:23 -07:00
Ryan Moeller 0b32d81783 zvol_os: Tidy up asserts
Using more specific assert variants gives better messages on failure.

No functional change.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:15 -07:00
Mateusz Guzik c4ede65bdf zstd: track allocator statistics
Note that this only tracks sizes as requested by the caller.
Actual allocated space will almost always be bigger (e.g., rounded up to
the next power of 2 or page size). Additionally the allocated buffer may
be holding other areas hostage. Nonetheless, this is a starting point
for tracking memory usage in zstd.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11129
2020-10-30 15:26:10 -07:00
Attila Fülöp e8beeaa111 ICP: gcm: Allocate hash subkey table separately
While evaluating other assembler implementations it turns out that
the precomputed hash subkey tables vary in size, from 8*16 bytes
(avx2/avx512) up to 48*16 bytes (avx512-vaes), depending on the
implementation.

To be able to handle the size differences later, allocate
`gcm_Htable` dynamically rather then having a fixed size array, and
adapt consumers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11102
2020-10-30 15:24:21 -07:00
Attila Fülöp d9655c5b37 Add some missing cfi frame info in aesni-gcm-x86_64.S
While preparing #9749 some .cfi_{start,end}proc directives
were missed. Add the missing ones.

See upstream https://github.com/openssl/openssl/commit/275a048f

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11101
2020-10-30 15:23:18 -07:00
Mateusz Guzik 115216cc92 FreeBSD: catch up with 1300124 version bump
- use cache_vop_mkdir
- cache_rename -> cache_vop_rename

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11136
2020-10-30 15:22:04 -07:00
Ryan Moeller d1e4ded7bc FreeBSD: Fix 12.2-STABLE after AT_BENEATH MFC
AT_BENEATH was merged to stable/12, where kern_unlinkat takes a 
non-const path.  DECONST the path passed to kern_unlinkat in the 
case where AT_BENEATH is defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11139
2020-10-30 15:19:02 -07:00
Matthew Macy 5fa356ea44 Remove UIO_ZEROCOPY functions structures
The original xuio zero copy functionality has always been unused 
on Linux and FreeBSD.  Remove this disabled code to avoid any
confusion and improve readability.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11124
2020-10-30 10:00:33 -07:00
Alexander Motin 1199c3e8fb Yield periodically when rebuilding L2ARC
L2ARC devices of several terabytes filled with 4KB blocks may take 15
minutes to rebuild.  Due to the way L2ARC log reading is implemented
it is quite likely that for all that time rebuild thread will never
sleep.  At least on FreeBSD kernel threads have absolute priority and
can not be preempted by threads with lower priorities.  If some thread
is also bound to that specific CPU it may not get any CPU time for all
the 15 minutes.

Reviewed-by: Cedric Berger <cedric@precidata.com>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11116
2020-10-30 08:57:54 -07:00
Ryan Moeller 76d04993a6 Update references to nonexistent man pages in code
Refer to the correct section or alternative for FreeBSD and Linux.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11132
2020-10-30 08:55:59 -07:00
Alexander Motin e3a6ac8d06 FreeBSD: Remove BIO_ORDERED flag from BIO_FLUSH
ZFS always waits for the write completion before flushing the cache.
That is why it does not require explicit ordering fences around it,
which are pretty difficult to implement for NVMe, since one has no
internal concept of strict request ordering.

This was already removed from FreeBSD once, but got resurrected
by mistake during OpenZFS merge.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11130
2020-10-30 08:50:57 -07:00
Tony Hutter f829227f49 ZTS: Fix xattr_004_pos failure, don't use tmpfs
Previously, xattr_004_pos would create files with xattrs on both
tmpfs and ext2, and then copy them to zfs to verify that their
xattrs were preserved.  However tmpfs doesn't support xattrs.

This was never noticed until Fedora 33.  In Fedora 32 and older,
/tmp was on the root partition (like ext4), whereas on Fedora 33
/tmp is actually tmpfs.  That caused this test to fail on Fedora 33.

This fix updates the test to only create the file on ext2, not tmpfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11133
2020-10-30 08:47:42 -07:00
Mateusz Guzik 973ba682f5 Linux: g/c leftover fence in zfs_znode_alloc
The port removed provisions for zfs_znode_move but the cleanup missed
this bit. To quote the original:

[snip]
    list_insert_tail(&zfsvfs->z_all_znodes, zp);
    membar_producer();
    /*
     * Everything else must be valid before assigning z_zfsvfs makes the
     * znode eligible for zfs_znode_move().
     */
    zp->z_zfsvfs = zfsvfs;
[/snip]

In the current code it is immediately followed by unlock which issues
the same fence, thus plays no role in correctness.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11115
2020-10-29 09:54:20 -07:00
Mateusz Guzik 082ff328f2 FreeBSD: g/c unused zfs_znode_move support
The allocator does not provide the functionality to begin with.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11114
2020-10-29 09:52:50 -07:00
Brian Behlendorf 4ce728d028 Use known license string for zlua
The Linux kernel MODULE_LICENSE macro only recognizes a handful of
license strings and "MIT" is not one of the them.  Update the macro
to use "Dual MIT/GPL" which is recognized and what the kernel expects
MIT licensed modules to use.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11112
Closes #11113
2020-10-27 09:43:36 -07:00
Ryan Moeller 5c810ac499 FreeBSD: Skip RAW kstat sysctls by default
These kstats are often expensive to compute so we want to avoid them
unless specifically requested.

The following kstats are affected by this change:

kstat.zfs.${pool}.multihost
kstat.zfs.${pool}.misc.state
kstat.zfs.${pool}.txgs
kstat.zfs.misc.fletcher_4_bench
kstat.zfs.misc.vdev_raidz_bench
kstat.zfs.misc.dbufs
kstat.zfs.misc.dbgmsg

In FreeBSD 13, sysctl(8) has been updated to still list the
names/description/type of skipped sysctls so they are still
discoverable.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11099
2020-10-26 14:34:28 -07:00
Mateusz Guzik 01a65c5861 FreeBSD: catch up with 1300123 version bump
- removed thread argument from VOP_INACTIVE
- removed cred argument from VOP_VPTOCNP

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11104
2020-10-26 14:32:17 -07:00
Cy Schubert 3928ec5339 Restore identification of VDEVs using non-native block size
NAME         STATE     READ WRITE CKSUM
dsk02        ONLINE       0     0     0
  mirror-0   ONLINE       0     0     0
    ada1s4a  ONLINE       0     0     0
    ada2s4a  ONLINE       0     0     0  block size: 512B configured, 4096B native

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Toomas Soome <tsoome@me.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed off by: Cy Schubert <cy@FreeBSD.org>
Closes #11088
2020-10-22 12:15:17 -07:00
xtouqh 1e36af8c7b Properly format NAME subsection of zfs/zpool subcommands
Use proper names (i.e. zfs-allow and zpool-add) in NAME subsections
of zfs/zpool subcommands instead of current "pretty-printed" ones as
makewhatis utilities (or some implementations of it, namely the one
from mandoc suite used in FreeBSD) look not only at the document title
but also in NAME subsection, adding zfs(8)/zpool(8) to search results
which is not correct. (Common sense and other utilities splitting
subcommands in multiple man pages, e.g. git, do the same.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: xtouqh <xtouqh@hotmail.com>
Closes #11086
2020-10-22 11:28:10 -07:00
Ryan Moeller eb02a4c6fb Add missing zfs_arc_evict_batch_limit tunable
It's even documented already.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11094
2020-10-22 10:18:26 -07:00
Ryan Moeller 2aaab887bb arcstat: Add -a and -p options from FreeNAS
Added -a option to automatically print all valid statistics.
Added -p option to suppress scaling of printed data.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored by: Nick Principe <32284693+powernap@users.noreply.github.com>
Ported-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11090
2020-10-21 14:09:14 -07:00
Matthew Macy e53d678d4a Share zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSD
The zfs_fsync, zfs_read, and zfs_write function are almost identical
between Linux and FreeBSD.  With a little refactoring they can be
moved to the common code which is what is done by this commit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11078
2020-10-21 14:08:06 -07:00
Adam D. Moss 666aa69f32 Non-l2arc pool reads shouldn't be l2arc misses
The current l2_misses accounting behavior treats all reads to pools 
without a configured l2arc as an l2arc miss, IFF there is at least 
one other pool on the system which does have an l2arc configured.

This makes it extremely hard to tune for an improved l2arc hit/miss 
ratio because this ratio will be modulated by reads from pools which 
do not (and should not) have l2arc devices; its upper limit will 
depend on the ratio of reads from l2arc'd pools and non-l2arc'd pools.

This PR prevents ARC reads affecting l2arc stats (n.b. l2_misses is 
the only relevant one) where the target spa doesn't have an l2arc.

Includes new test - l2arc_l2miss_pos.ksh

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #10921
2020-10-20 11:39:52 -07:00
Kyle Evans 241c62bdd7 Makefile.bsd: remove directory that no longer exists
This was removed in a reorganization of directories preparing for the
merge of FreeBSD support, 006e9a4088 by mmacy. While llvm is perfectly
happy with the nonexistent -I directory, the gcc6 and gcc9 we can elect
to use as cross-toolchains both trip over it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #11077
2020-10-20 11:34:59 -07:00
Matthew Macy ff2f54246d FreeBSD: delete unreferenced file
zfs_onexit_os.c was not deleted when it was removed from the build

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11079
2020-10-20 08:53:16 -07:00
Ryan Moeller 777b8ccc35 Fix commitcheck on FreeBSD
Convert from bash to sh, avoid Perl regexes and \s, prune unused
functions.

Reviewed-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11070
2020-10-20 08:35:53 -07:00
Don Brady 13d65987a9 zed syslog entries drop important info
ZED will log zevents summaries to the syslog, however the log entries 
tend to drop event details that can be useful for diagnosis. This is 
especially true for ereport events, like io, checksum, and delay.

Update the all-syslog.sh script to log additional event information.

Add an optional config option, ZED_SYSLOG_DISPLAY_GUIDS, to zed.rc
for choosing GUIDs over names for pool and vdev.

Change the default ZED_SYSLOG_SUBCLASS_EXCLUDE to exclude history_event 
events. These events tend to be frequent, convey no meaningful info, 
and are already logged in the zpool history.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #10967
2020-10-19 11:01:00 -07:00
Ryan Moeller ab6a0e236e Ignore zpool_influxdb binary
This was requested but forgotten in #10786.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11071
2020-10-16 13:21:28 -07:00
Mateusz Guzik 41e2b3de13 FreeBSD: add missing fplookup_vexec handler to special vop vectors
Otherwise lookup can fail with EOPNOTSUPP or panic.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11066
2020-10-15 14:49:06 -07:00
Mateusz Guzik 34cda44af6 FreeBSD: g/c unused vop vector zfsctl_ops_shares_dir
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11066
2020-10-15 14:48:20 -07:00
Don Brady dff71c7936 Ignore special vdev ashift for spa ashift min/max
The removal of a vdev in the normal class would fail if there was a 
special or deup vdev that had a different ashift than the vdevs in 
the normal class.

Moved the initialization of spa_min_ashift / spa_max_ashift from 
vdev_open so that it occurs after the vdev allocation bias was 
initialized (i.e. after vdev_load).

Caveat -- In order to remove a special/dedup vdev it must have the 
same ashift as the normal pool vdevs.  This could perhaps be lifted 
in the future (i.e. for the case where there is ample space in any 
surviving special class vdevs)

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #9363
Closes #9364
Closes #11053
2020-10-15 14:45:16 -07:00
Christian Schwarz 15a4ca4620 Fix crash caused by invalid snapshot names in redactnvl
This is a follow up fix for commit 0fdd6106bb.  The VERIFY is
only true when we haven't hit an error code path.  See added
test case for a reproducer.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11048
2020-10-14 14:04:19 -07:00
Paul Dagnelie 6a60ef80e2 Fix incorrect deletion order in range_tree_add_impl gap case
After a side-effectful call like add or remove, references to range 
segs stored in btrees can no longer be used safely.  We move the 
remove call to just before the reinsertion call so that the seg 
remains valid for as long as we need it.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11044 
Closes #11056
2020-10-14 08:59:54 -07:00
Mateusz Guzik 47a7e99939 FreeBSD: fix panic due to tqid overflow
The 32-bit counter eventually wraps to 0 which is a sentinel for invalid
id.

Make it 64-bit on LP64 platforms and 0-check otherwise.

Note: Linux counterpart uses id stored per queue instead of a global.
I did not check going that way is feasible with the goal being the
minimal fix doing the job.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11059
2020-10-14 08:57:03 -07:00
Ryan Moeller 485b50bb9e Cross-platform acltype
The acltype property is currently hidden on FreeBSD and does not
reflect the NFSv4 style ZFS ACLs used on the platform.  This makes it
difficult to observe that a pool imported from FreeBSD on Linux has a
different type of ACL that is being ignored, and vice versa.

Add an nfsv4 acltype and expose the property on FreeBSD.

Make the default acltype nfsv4 on FreeBSD.

Setting acltype to an unhanded style is treated the same as setting
it to off.  The ACLs will not be removed, but they will be ignored.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10520
2020-10-13 21:25:48 -07:00
Warner Losh b302185a92 FreeBSD: make adjustments for the standalone environment
In FreeBSD, there are three compile environments that are supported:
user land, the kernel and the bootloader / standalone. Adjust the
headers to compile in the standalone environment. Limit kernel-only
items from view when _STANDALONE is defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Warner Losh <imp@FreeBSD.org>
Closes #10998
2020-10-13 21:05:49 -07:00
Matthew Macy 57dc5d42b1 dmu_zfetch: don't leak unreferenced stream when zfetch is freed
Currently streams are only freed when:
  - They have no referencing zfetch and and their I/O references
    go to zero.
  - They are more than 2s old and a new I/O request comes in on
    the same zfetch.

This means that we will leak unreferenced streams when their zfetch
structure is freed.

This change checks the reference count on a stream at zfetch free
time. If it is zero we free it immediately. If it has remaining
references we allow the prefetch callback to free it at I/O
completion time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11052
2020-10-13 21:03:36 -07:00
Warner Losh 6ba2e72b78 aarch64: Use proper guards for NEON instructions
The zstd code assumes that if you are on aarch64, you have NEON
instructions. This is not necessarily true. In a boot loader, where
you might not have the VFP properly initialized, these instructions
may not be available. It's also an error to include arm_neon.h when
the NEON insturctions aren't enabled. Change the guards for using the
NEON instructions from __aarch64__ to __ARM_NEON which is the standard
symbol for knowing if they are available.

__ARM_NEON is the proper symbol, defined in ARM C Language Extensions
Release 2.1 (https://developer.arm.com/documentation/ihi0053/d/). Some
sources suggest __ARM_NEON__, but that's the obsolete spelling from
prior versions of the standard.

Updated based on zstd pull request https://github.com/facebook/zstd/pull/2356

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@bsdimp.com>
Closes #11055
2020-10-13 21:01:40 -07:00
Adam D. Moss 92286311f8 Add zfs.sh module unload error message
If modules fail to unload because of outstanding users, don't 
consider this a success.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11042
2020-10-13 16:51:54 -07:00
Christian Schwarz 701f656b97 dmu.h: remove stale declaration dmu_objset_snapshot_tmp
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11047
2020-10-13 16:46:00 -07:00
Mateusz Guzik 2ce14cdf68 FreeBSD: use cache_rename if available
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11045
2020-10-13 16:41:26 -07:00
Mathieu Velten b3a216fba0 blkg_tryget config test: initialize struct
Missing struct initialization in a config test results in the
interface being incorrectly detected.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Mathieu Velten <matmaul@gmail.com>
Closes #10713 
Closes #11049
2020-10-13 16:36:36 -07:00
Kjeld Schouten-Lebbing 4b59c195ff Increase Supported Linux Kernel to 5.9
This increases the Linux kernel version to 5.9 from 5.8
as most compatibility fixes should already be included.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #11050
2020-10-13 09:51:13 -07:00
Ryan Moeller e191b60ddc FreeBSD: Improve libzfs_error_init messages
It is a common mistake to have failed to autoload the module due to
permission issues when running a ZFS command as a user.  "Operation
not permitted" is an unhelpfully vague error message.

Use a thread-local message buffer to format a nicer error message.
We can infer that loading the kernel module failed if the module is
not loaded.  This can be extended with heuristics for other errors
in the future.

While looking at this stuff, remove an unused thread-local message
buffer found in libspl and remove some inaccurate verbiage from the
comment on libzfs_load_module.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11033
2020-10-13 09:38:40 -07:00
Ryan Moeller 7dfc56d866 Expose zfetch_max_idistance tunable
FreeBSD had this value tunable before the switch to the new OpenZFS.
The tunable name has changed, breaking legacy compat.

Restore legacy compat for this tunable, properly expose the tunable
with the new name on all platforms, and document it in
zfs-module-parameters(5).

While here, clean up the documentation for zfetch_max_distance a bit.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11038
2020-10-13 09:32:34 -07:00
Christian Schwarz 61868bb14d zil_parse: make callback parameters const
Code cleanup, a follow up commit to 4d55ea81.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Ryan Moeller <ryan@freqlabs.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11020
2020-10-09 09:34:54 -07:00
Richard Elling e9527d44e6 Add zpool_influxdb command
A zpool_influxdb command is introduced to ease the collection
of zpool statistics into the InfluxDB time-series database.
Examples are given on how to integrate with the telegraf
statistics aggregator, a companion to influxdb.

Finally, a grafana dashboard template is included to show
how pool latency distributions can be visualized in a
ZFS + telegraf + influxdb  + grafana environment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com>
Closes #10786
2020-10-09 09:29:21 -07:00
Ryan Moeller b7ab7ae241 Linux: Initialize zp in zfs_setattr_dir
The value of zp is used without having been initialized under some
conditions.  Initialize the pointer to NULL.

Add a regression test case using chown in acl/posix.  However, this is
not enough because the setup sets xattr=sa, which means zfs_setattr_dir
will not be called.  Create a second group of acl tests in acl/posix-sa
duplicating the acl/posix tests with symlinks, and remove xattr=sa from
the original acl/posix tests.  This provides more coverage for the
default xattr=on code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10043
Closes #11025
2020-10-09 09:27:14 -07:00
Brian Behlendorf d0249a4bd0 Replace ZFS on Linux references with OpenZFS
This change updates the documentation to refer to the project
as OpenZFS instead ZFS on Linux.  Web links have been updated
to refer to https://github.com/openzfs/zfs.  The extraneous
zfsonlinux.org web links in the ZED and SPL sources have been
dropped.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11007
2020-10-08 20:10:13 -07:00
Jacob Adams 07f5d4d663 Fix Linux modules uninstall
A missing semicolon between kmoddir variable declaration and the
uninstall for loop caused modules_uninstall-Linux to fail with:

    Syntax error: "do" unexpected

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jacob Adams <jacob@tookmund.com>
Closes #11032
2020-10-08 20:07:10 -07:00
Ryan Moeller 84cfe5d4f8 ZTS: Fix path to /dev/null in nopwrite_recsize
Don't direct stdout and stderr of dd to $TEST_BASE_DIR/null,
direct it to /dev/null.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11026
2020-10-08 16:39:23 -07:00
Chuck Tuffli a8fc1b8743 Fix ubsan: shift exponent is too large
When running libzpool with the Undefined Behavior Sanitizer (ubsan)
enabled, a zpool create causes a run-time error:

    module/zfs/vdev_label.c:600:14: runtime error: shift exponent 64 is
    too large for 64-bit type 'long long unsigned int'`

in vdev_config_generate()

Fix is to convert vdev_removal_max_span to its base-2 logarithm, using
highbit64(), and then compare the "shifts".

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Chuck Tuffli <ctuffli@gmail.com>
Closes #9744
Closes #11024
2020-10-08 16:37:27 -07:00
Christian Schwarz 36482bf607 libzfs_sendrecv: zfs_send: remove unused pipefd and tid variables
fixup of 196bee4

On gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), the code removed
caused `-Wmaybe-uninitialized` errors.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11021
2020-10-08 09:43:51 -07:00
Ryan Moeller 73989f4b9e Make dbufstat work on FreeBSD
With procfs_list kstats implemented for FreeBSD, dbufs are now exposed
as kstat.zfs.misc.dbufs.

On FreeBSD, dbufstats can use the sysctl instead of procfs when no
input file has been given.

Enable the dbufstats tests on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11008
2020-10-08 09:40:23 -07:00
Ryan Moeller 82b81a2acd FreeBSD: Sort and dedup includes in kmod_core
Code cleanup. Sort includes, remove duplicates, and drop
some extra blank lines in kmod_core.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11000
2020-10-08 09:37:56 -07:00
George Melikov b0c9239f17 docs: update README's installation link
OpenZFS is a cross-OS project now.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11022
2020-10-08 09:33:53 -07:00
George Amanakis a76e4e6761 Make L2ARC tests more robust
Instead of relying on arbitrary timers after pool export/import or cache
device off/online rely on arcstats. This makes the L2ARC tests more
robust. Also cleanup some functions related to persistent L2ARC.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10983
2020-10-05 15:29:05 -07:00
Toomas Soome 4e84f67a96 zdb should not output binary data on terminal
The zdb is interpreting byte array as textual string in dump_zap,
but there are also binary arrays and we should not output binary
data on terminal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
External-issue: https://www.illumos.org/issues/12012
External-issue: https://www.illumos.org/issues/11713
Closes #11006
2020-10-05 14:05:28 -07:00
Ryan Moeller 79f0935fab FreeBSD: Sort out kernel FPU headers for 12.1-REL
We were missing an include for kernel FPU functions, breaking the build
on FreeBSD 12.1-RELEASE.  This was apparently being pulled in from
elsewhere on stable/12 and head.

Sorted the other includes in these files while here.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11005
2020-10-02 17:48:45 -07:00
Alan Somers a132c2b413 Fix EIO after resuming receive of new dataset over an existing one
When resuming an interrupted ZFS send stream that creates a new dataset
with the same name as an existing dataset, if the existing dataset is
accessed after the failed receive, then after the subsequent successful
receive it will return EIO. This happens because nothing mounts the new
dataset, leaving the old, no longer valid dataset still mounted.

This commit fixes zfs receive to always unmount and remount the
destination, regardless of whether the stream is a new stream or a
resumed stream.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249579
Closes #10995
Closes #10999
2020-10-02 17:47:09 -07:00
Ryan Moeller 4d55ea811d Throw const on some strings
In C, const indicates to the reader that mutation will not occur.
It can also serve as a hint about ownership.

Add const in a few places where it makes sense.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10997
2020-10-02 17:44:10 -07:00
John Poduska 5b525165e9 Mismatched nvlist names in zfs_keys_send_space
This causes "zfs send -vt ..." to fail with:

    cannot resume send: Unknown error 1030

It turns out that some of the name/value pairs in the verification
list for zfs_ioc_send_space(), zfs_keys_send_space, had the wrong
name, so the ioctl got kicked out in zfs_check_input_nvpairs().
Update the names accordingly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes #10978
2020-10-02 17:40:46 -07:00
Brian Behlendorf 266d1121c3 Fix buggy procfs_list_seq_next warning
The kernel seq_read() helper function expects ->next() to update
the passed position even there are no more entries.  Failure to
do so results in the following warning being logged.

    seq_file: buggy .next function procfs_list_seq_next [spl]
    did not update position index

Functionally there is no issue with the way procfs_list_seq_next()
is implemented and the warning is harmless.  However, we want to
silence this some what scary incorrect warning.  This commit
updates the Linux procfs code to advance the position even for
the last entry.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10984 
Closes #10996
2020-09-30 13:27:51 -07:00
Ryan Moeller d688beb191 FreeBSD: Fix legacy compat for platform IOCs
The request number is out of bounds of the platform table.

Subtract the starting offset to get the correct subscript.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10994
2020-09-30 13:25:50 -07:00
Matthew Macy 1cb8202b1b Eliminate gratuitous bzeroing in dbuf_stats_hash_table_data
`dbuf_stats_hash_table_data` can take much longer than it needs to
by repeatedly bzeroing its buffer when in fact the buffer only needs
to be NULL terminated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10993
2020-09-30 13:24:38 -07:00
Sebastian Gottschall 8a171ccd92 do a cyclic seek for unused memory objects in pool
In non regular use cases allocated memory might stay persistent in memory
pool. This small patch checks every minute if there are old objects which
can be released from memory pool.

Right now with regular use, the pool is checked for old objects on each
allocation attempt from this pool. so basically polling by its use. Now
consider what happens if someone writes a lot of files and stops use of
the volume or even unmounts it. So the code will no longer check if
objects can be released from the pool. Already allocated objects will
still stay in pool cache. this is no big issue for common use. But
someone discovered this issue while doing tests. personally i know this
behavior and I'm aware of it. Its no big issue. just a enhancement

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Closes #10938 
Closes #10969
2020-09-30 13:22:34 -07:00
Ryan Moeller c0bd2e0fe2 Drop references when skipping dmu_send due to EXDEV
When an invalid incremental send is requested where the "to" ds is
before the "from" ds, make sure to drop the reference to the pool
and the dataset before returning the error.

Add an assert on FreeBSD to make sure we don't hold any locks after
returning from an ioctl.

Add some test coverage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10919
2020-09-30 13:19:49 -07:00
Kjeld Schouten-Lebbing 68cdafdbb8 Add intel_QAT patches
Add community compatibility patches for Intel QAT
Due to incompatibility with higher kernel versions.

Also includes basic instructions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10961 
Closes #10962
2020-09-30 13:17:30 -07:00
Brian Behlendorf 5aa3e3d3be Use known license string for zzstd
The Linux kernel MODULE_LICENSE macro only recognizes a handful of
license strings and "BSD" is not one of the them.  Update the macro
to use "Dual BSD/GPL" which is recognized and what the kernel expects
BSD licensed module to use.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10982
Closes #10992
2020-09-28 18:43:27 -07:00
Brian Behlendorf 4c5159cc5c Fix CONFIG_DEBUG_LOCK_ALLOC configure check
This check was accidentally broken when the kABI checks were updated
to run in parallel, commit 608f874.  The check must be for the
config_debug_lock_alloc_license name to determine if the symbol
is license compatible.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10991
2020-09-28 16:42:54 -07:00
Brian Behlendorf 96951e0327 Fix objtool configure check
The m4 objtool configure check can incorrectly fail because of a
missing header in the test.  This appears to be the result of a
recent kernel change and was observed on the Fedora 5.8.11-200
kernel.

  In file included from /home/fedora/zfs/build/objtool/objtool.c:75:
  ./arch/x86/include/asm/frame.h:100:57: error: 'struct pt_regs'
      declared inside parameter list will not be visible outside
      of this definition or declaration [-Werror]

The consequence of this is that the "stack_frame_non_standard"
check is never run and HAVE_STACK_FRAME_NON_STANDARD is set
incorrectly which results in a build failure.  This change adds
the appropriate header to the "objtool" check so it now behaves
as intended.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10990
2020-09-28 16:40:50 -07:00
grodik 53245fe337 Note that keys must be loaded for 'zpool remove'
The error returned by `zpool remove` when the encryption keys aren't
loaded isn't very helpful.  Furthermore, the man pages make no
mention that the keys need to be loaded. This change doesn't resolve
the error message but it does update the man page to mention this
requirement.

Authored-by: grodik <pat@litke.dev>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10939
Closes #10948
2020-09-28 16:11:57 -07:00
Kjeld Schouten-Lebbing 595b7e99d3 Document branching structure
This change documents the currently used branching structure.
It has been cut down to not include any controversial changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10976
2020-09-28 13:23:49 -07:00
Matthew Macy af20b97078 zfetch: Don't issue new streams when old have not completed
The current dmu_zfetch code implicitly assumes that I/Os complete
within min_sec_reap seconds. With async dmu and a readonly workload
(and thus no exponential backoff in operations from the "write
throttle") such as L2ARC rebuild it is possible to saturate the drives
with I/O requests. These are then effectively compounded with prefetch
requests.

This change reference counts streams and prevents them from being
recycled after their min_sec_reap timeout if they still have
outstanding I/Os.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10900
2020-09-27 17:08:38 -07:00
Allan Jude cf2667759f zfs userspace: use zfs_path_to_zhandle so argument can be a path
Change zfs userspace subcommand to use zfs_path_to_zhandle() so that
the provided dataset can be a path (/usr) or a dataset (rpool/usr).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #8915
2020-09-25 14:37:10 -07:00
Adam D. Moss acfd2d4641 Add DB_RF_NOPREFETCH to dbuf_read()s in dnode.c
Prefetching of dnodes in dbuf_read() can cause significant mutex 
contention for some workloads and isn't very helpful.  This is  
because we already get 32 dnodes for each block read, and when 
iterating over a directory we prefetch the dnodes in the directory.
Disable this prefetching to prevent the lock contention.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Submitted-by: Adam Moss <c@yotes.com>
Submitted-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #10877 
Closes #10953
2020-09-25 13:49:22 -07:00
Brian Behlendorf 2e407941a2 Fix PREEMPTION=y and BLK_CGROUP=y config on arm64
With PREEMPTION=y and BLK_CGROUP=y preempt_schedule_notrace() is being
used on arm64 which is a GPL-only function and hence the build of the
DKMS kernel module fails.

Fix that by redefining preempt_schedule_notrace() to preempt_schedule()
which should be safe as long as tracing is not used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Closes #8545 
Closes #9948 
Closes #10416 
Closes #10973
2020-09-25 13:28:35 -07:00
Mateusz Guzik f6bb7c029c FreeBSD: update cache_purgevfs usage after 1300117 version bump
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Nick Wolff <darkfiberiru@gmail.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #10970
2020-09-25 13:23:43 -07:00
Ryan Moeller fa1912e80f FreeBSD: Code cleanup in zio_crypt
Address some unused value and control flow issues flagged by Coverity.

Unreachable code is pruned and unused values are avoided.
Some scattered sections are reordered for coherence.

We can assume kmem_alloc(n, KM_SLEEP) doesn't fail, so there is no need
to check if it returned NULL.  The allocated memory doesn't need to be
zeroed, other than the last iovec (the MAC).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10884
2020-09-25 13:12:35 -07:00
Ryan Moeller 863e38453e Prune dead branch reported by Coverity
wkey is NULL at every `goto error;`.
dcp is never NULL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10884
2020-09-25 13:11:53 -07:00
George Wilson bd76bcb36d zpool command complains about /etc/exports.d
If the /etc/exports.d directory does not exist, then we should only
create it when we're performing an action which already requires root
privileges.

This commit moves the directory creation to the enable/disable code
path which ensures that we have the appropriate privileges.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #10785
Closes #10934
2020-09-25 13:09:40 -07:00
Christian Schwarz a5c77dc4d5 zfs_log_write: simplify data copying code for WR_COPIED records
lr_write_t records that are WR_COPIED have the record data directly
appended to them (see lr_write_t type definition).

The data is copied from the debuf using dmu_read_by_dnode.

This function was called, only for WR_COPIED records, as part of a
short-circuiting if-statement's if-expression.

I found this side-effectful call to dmu_read_by_dnode pretty
hard to spot.
This patch improves readability by moving the call to its own line.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #10956
2020-09-25 13:06:34 -07:00
Matthew Macy 7b8363d7f0 FreeBSD: Add support for procfs_list
The procfs_list interface is required by several kstats. Implement
this functionality for FreeBSD to provide access to these kstats.
                           
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10890
2020-09-23 16:43:51 -07:00
Matthew Macy 3dad29fb4b FreeBSD: Don't save user FPU context in kernel threads
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10899
2020-09-23 11:09:48 -07:00
Kjeld Schouten-Lebbing eb267f08cf Update issue templates, commitcheck and Contributing.md
- Removes OpenZFS ports from commit check
- Removes OpenZFS ports from CONTRIBUTING.md
- Adds mailings lists and IRC to issue template selector
- Remove blank issue option from issue creator

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10965
2020-09-23 09:53:26 -07:00
Paul Dagnelie 20dfe8cd3b Don't set numobjs to UINT64_MAX or near it
Resolves an issue with `zfs send` streams from 0.8.4 which
prevents them from being received by versions < 0.7.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10911 
Closes #10916
2020-09-22 16:16:07 -07:00
наб e865e7809e contrib/initramfs: fix shellcheck and checkbashisms errors with shebang
Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #10908 
Closes #10917
2020-09-22 16:10:09 -07:00
George Amanakis c6f5e9d92f Restore clearing of L2CACHE flag in arc_read_done()
Commit 45152dc removed clearing of L2CACHE flag in arc_read_done() and
moved related code in l2arc_write_eligible(). After careful code
inspection arc_read_done() is not bypassed in the case of prefetches.
Thus restore the old behavior.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: adam moss <c@yotes.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10951
2020-09-22 16:08:05 -07:00
Mark Johnston 0daa0320e9 Fix a logic bug in the FreeBSD getpages VOP
In commit cd32b4f5b7 ("Fix a deadlock in the FreeBSD getpages VOP") I
introduced a bug while porting the patch originally committed to
FreeBSD: the rangelock pointer may be NULL if the try operation failed,
so we must avoid calling zfs_rangelock_unlock() in that case.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reported-by: Steve Wills <swills@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #10519 
Closes #10960
2020-09-22 16:05:52 -07:00
Ryan Moeller 5f8a9e6a02 FreeBSD: Reduce stack usage of Lua
Use the same reduced buffer size for lauxlib that is used on Linux.

Fixes panic on HEAD in lua gsub test designed to exhaust stack space.

With this we can remove the special case to reserve more stack space
on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kyle Evans <kevans@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10959
2020-09-22 16:03:11 -07:00
Mark Johnston 6bdb09510b Annontate FreeBSD sysctls with CTLFLAG_MPSAFE
Without this, the sysctl system calls will acquire a global lock before
invoking the handler.  This is noticeable in some situations when
running top(1).  The global lock is mostly vestigal but continues to see
some use and so contention is still a problem; until the default sense
of the MPSAFE flag changes, we have to annotate each and every handler.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #10836
2020-09-21 09:49:50 -07:00
Mark Johnston c50f3c902f Fix switch statement indentation in the FreeBSD kstat code
This is in preparation for some functional changes.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #10950
2020-09-21 09:49:12 -07:00
George Amanakis 37b00fb039 Update documentation of l2arc_mfuonly
with regard to evicted_l2_eligibile_mru. Even if l2arc_mfuonly is
enabled, this is not reflected in evicted_l2_eligible_mru as this
information is useful for deciding whether to toggle l2arc_mfuonly
depending on the current workload.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10945
2020-09-21 09:26:24 -07:00
George Wilson c494aa7f57 vdev_ashift should only be set once
== Motivation and Context

The new vdev ashift optimization prevents the removal of devices when
a zfs configuration is comprised of disks which have different logical
and physical block sizes. This is caused because we set 'spa_min_ashift'
in vdev_open and then later call 'vdev_ashift_optimize'. This would
result in an inconsistency between spa's ashift calculations and that
of the top-level vdev.

In addition, the optimization logical ignores the overridden ashift
value that would be provided by '-o ashift=<val>'.

== Description

This change reworks the vdev ashift optimization so that it's only
set the first time the device is configured. It still allows the
physical and logical ahsift values to be set every time the device
is opened but those values are only consulted on first open.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-Issue: DLPX-71831
Closes #10932
2020-09-18 12:13:47 -07:00
Allan Jude 908d43d0a9 libzfs: Don't leak buf if nvlist is too large
Resolves FreeBSD Coverity defect:
CID 1432398:  Resource leaks  (RESOURCE_LEAK)

libzfs: don't leak hdl if there is an error reading env var

Resolves FreeBSD Coverity defect:
CID 1432395:  Resource leaks  (RESOURCE_LEAK)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #10882
2020-09-18 10:23:29 -07:00
George Wilson 8e82ffba7b pool may become suspended during device expansion
When expanding a device zfs needs to rescan the partition table to
get the correct size. This can only happen when we're in the kernel
and requires the device to be closed. As part of the rescan, udev is
notified and the device links are removed and recreated. This leave a
window where the vdev code may try to reopen the device before udev
has recreated the link. If that happens, then the pool may end up in
a suspended state.

To correct this, we leverage the BLKPG_RESIZE_PARTITION ioctl which
allows the partition information to be modified even while it's in use.
This ioctl also does not remove the device link associated with the zfs
data partition so it eliminates the race condition that can occur in
the kernel.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #10897
2020-09-17 20:03:10 -07:00
Matthew Ahrens a57f954226 zdb leak detection fails with in-progress device removal
When a device removal is in progress, there are 2 locations for the data
that's already been moved: the original location, on the device that's
being removed; and the new location, which is pointed to by the indirect
mapping.  When doing leak detection, zdb needs to know about both
locations.  To determine what's already been copied, we load the
spacemaps of the removing vdev, omit the blocks that are yet to be
copied, and then use the vdev's remap op to find the new location.

The problem is with an optimization to the spacemap-loading code in zdb.
When processing the log spacemaps, we ignore entries that are not
relevant because they are past the point that's been copied.  However,
entries which span the point that's been copied (i.e. they are partly
relevant and partly irrelevant) are processed normally.  This can lead
to an illegal spacemap operation, for example if offsets up to 100KB
have been copied, and the spacemap log has the following entries:

	ALLOC 50KB-150KB (partly relevant)
	FREE 50KB-100KB (entirely relevant)
	FREE 100KB-150KB (entirely irrlevant - ignored)
	ALLOC 50KB-150KB (partly relevant)

Because the entirely irrelevant entry was ignored, its space remains in
the spacemap.  When the last entry is processed, we attempt to add it to
the spacemap, but it partially overlaps with the 100-150KB entry that
was left over.

This problem was discovered by ztest/zloop.

One solution would be to also ignore the irrelevant parts of
partially-irrelevant entries (i.e. when processing the ALLOC 50-150, to
only add 50-100 to the spacemap).  However, this commit implements a
simpler solution, which is to remove this optimization entirely.  I.e.
to process the entire spacemap log, without regard for the point that's
been copied.  After reconstructing the entire allocatable range tree,
there's already code to remove the parts that have not yet been copied.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-71820
Closes #10920
2020-09-17 10:55:30 -07:00
Ryan Moeller 3c7566cb0d FreeBSD: Do not copy vp into f_data for DTYPE_VNODE files
https://reviews.freebsd.org/D26346

Do not copy vp into f_data for DTYPE_VNODE files.  The vnode pointer is
already stored in f_vnode.  Use that so f_data can be reused.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10929
2020-09-17 10:54:14 -07:00
John Poduska 5bed68bdc4 Need a long hold in zpl_mount_impl
In zpl_mount_impl, there is:
    dmu_objset_hold	; returns with pool & ds held
    dsl_pool_rele

    sget

    dsl_dataset_rele

As spelled out in the "DSL Pool Configuration Lock" in dsl_pool.c,
this requires a long hold.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes #10936
2020-09-17 10:53:02 -07:00
Toomas Soome 741b20ce0c libzfsbootenv: lzbe_nvlist_set needs to store bootenv version VB_NVLIST
A small bug did slip into initial libzfsbootenv; while storing nvlist
in nvlist, we should make sure the bootenv is using VB_NVLIST format.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10937
2020-09-17 10:51:09 -07:00
Ryan Moeller 7ead2be3d2 Rename acltype=posixacl to acltype=posix
Prefer acltype=off|posix, retaining the old names as aliases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10918
2020-09-16 12:26:06 -07:00
Georgy Yakovlev 9cc177baa0 cmd/zgenhostid: replace with simple c implementation
It was discovered that dracut scripts and zgenhostid
always generate little-endian /etc/hostid.

This commit provides simple endianess-aware binary
and updates the scripts to use it.

New features include:
 -f flag to force overwrite.
 -o flag to write to different file (for dracut)
 accepting both 0x01234567 and 01234567 values as input

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #10887 
Closes #10925
2020-09-16 12:25:12 -07:00
Pavel Snajdr 9569c31161 Fix stack frame size: dnode_dirty_l1range()
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10879
2020-09-15 15:55:55 -07:00
Pavel Snajdr a1c5578ce0 dmu_redact_snap: fix possible memleak
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10879
2020-09-15 15:55:45 -07:00
Pavel Snajdr 8c0b16e6e9 Fix stack frame size: dmu_redact_snap()
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10879
2020-09-15 15:55:35 -07:00
Pavel Snajdr c95625769d Fix stack frame size: spa_livelist_delete_cb()
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10879
2020-09-15 15:55:03 -07:00
наб 5797d7cc4c zpoolprops.8: fix raidz par[i]ty typo
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #10923
2020-09-15 15:43:42 -07:00
Toomas Soome 1db9e6e4e4 zfs label bootenv should store data as nvlist
nvlist does allow us to support different data types and systems.

To encapsulate user data to/from nvlist, the libzfsbootenv library is
provided.

Reviewed-by: Arvind Sankar <nivedita@alum.mit.edu>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10774
2020-09-15 15:42:27 -07:00
Ryan Moeller 37325e4749 Linux: Prevent destruction while showing mount devname
Use ZFS_ENTER and ZFS_EXIT to protect datasets while their mount
devname is being retrieved.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10892
Closes #10927
2020-09-15 15:40:03 -07:00
George Amanakis 085321621e Add L2ARC arcstats for MFU/MRU buffers and buffer content type
Currently the ARC state (MFU/MRU) of cached L2ARC buffer and their
content type is unknown. Knowing this information may prove beneficial
in adjusting the L2ARC caching policy.

This commit adds L2ARC arcstats that display the aligned size
(in bytes) of L2ARC buffers according to their content type
(data/metadata) and according to their ARC state (MRU/MFU or
prefetch). It also expands the existing evict_l2_eligible arcstat to
differentiate between MFU and MRU buffers.

L2ARC caches buffers from the MRU and MFU lists of ARC. Upon caching a
buffer, its ARC state (MRU/MFU) is stored in the L2 header
(b_arcs_state). The l2_m{f,r}u_asize arcstats reflect the aligned size
(in bytes) of L2ARC buffers according to their ARC state (based on
b_arcs_state). We also account for the case where an L2ARC and ARC
cached MRU or MRU_ghost buffer transitions to MFU. The l2_prefetch_asize
reflects the alinged size (in bytes) of L2ARC buffers that were cached
while they had the prefetch flag set in ARC. This is dynamically updated
as the prefetch flag of L2ARC buffers changes.

When buffers are evicted from ARC, if they are determined to be L2ARC
eligible then their logical size is recorded in
evict_l2_eligible_m{r,f}u arcstats according to their ARC state upon
eviction.

Persistent L2ARC:
When committing an L2ARC buffer to a log block (L2ARC metadata) its
b_arcs_state and prefetch flag is also stored. If the buffer changes
its arcstate or prefetch flag this is reflected in the above arcstats.
However, the L2ARC metadata cannot currently be updated to reflect this
change.
Example: L2ARC caches an MRU buffer. L2ARC metadata and arcstats count
this as an MRU buffer. The buffer transitions to MFU. The arcstats are
updated to reflect this. Upon pool re-import or on/offlining the L2ARC
device the arcstats are cleared and the buffer will now be counted as an
MRU buffer, as the L2ARC metadata were not updated.

Bug fix:
- If l2arc_noprefetch is set, arc_read_done clears the L2CACHE flag of
  an ARC buffer. However, prefetches may be issued in a way that
  arc_read_done() is bypassed. Instead, move the related code in
  l2arc_write_eligible() to account for those cases too.

Also add a test and update manpages for l2arc_mfuonly module parameter,
and update the manpages and code comments for l2arc_noprefetch.
Move persist_l2arc tests to l2arc.

Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10743
2020-09-14 10:10:44 -07:00
Harald van Dijk d0cea309e7 config/zfs-build.m4: never define _initramfs in RPM_DEFINE_UTIL
The zfs-initramfs package has never worked as no RPM-based distribution
uses initramfs-tools, which is listed as a dependency of zfs-initramfs.

This would not ordinarily be a problem, as it is only enabled when
/usr/share/initramfs-tools is present, which should not normally be the
case on RPM-based distributions. However, other packages may install
unused files there even if initramfs-tools is not used, so remove this
auto-detection for the rpm-utils target.

This does not fully remove the logic for the zfs-initramfs package. This
splits it out into a separate rpm-utils-initramfs target so that the
Debian builds can still use it.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Harald van Dijk <harald@gigawatt.nl>
Closes #10898
2020-09-12 08:22:07 -07:00
Matthew Ahrens 9b77c57d5a libzutil depends on libnvpair
libzutil depends on libnvpair, but this dependency is undeclared in the
build system.  Therefore it isn't possible to make a new command that
depends on libzutil, but does not (directly) depend on libnvpair.

This commit makes this dependency explicit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reivewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10915
2020-09-12 08:19:48 -07:00
Mateusz Guzik 8e7fe49b25 FreeBSD: convert teardown inactive lock to a read-mostly sleepable lock
The lock is taken all the time and as a regular read-write lock
avoidably serves as a mount point-wide contention point.

This forward ports FreeBSD revision r357322.

To quote aforementioned commit:

Sample result doing an incremental -j 40 build:
before: 173.30s user 458.97s system 2595% cpu 24.358 total
after:  168.58s user 254.92s system 2211% cpu 19.147 total

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #10896
2020-09-09 10:15:52 -07:00
xdch47 c2c7ca0d6d Force the use of '.' as decimal separator.
This solves issues occurring with a different decimal operator and
keeps the command line interface consistent for all locales .
E.g. `zfs set quota=0.5T`

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Felix Neumärker <xdch47@posteo.de>
Closes #10878
2020-09-09 10:14:04 -07:00
Olaf Faaland a74259cea0 Initialize mmp_last_write when the mmp thread starts
A great deal of time may go by between when mmp_init() is called and
the MMP thread starts, particularly if there are bad devices, because
there is I/O checking configs etc.  If this time is too long,

    (gethrtime() - mmp_last_write) > mmp_fail_ns

at the time the MMP thread starts.  If MMP is configured to suspend
the pool, the pool will be suspended immediately.

This can be seen in issue #10838

The value of mmp_last_write doesn't matter before the mmp thread
starts.  To give the MMP thread time to issue and land MMP writes,
initialize mmp_last_write when the MMP thread starts.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #10873
2020-09-09 10:12:54 -07:00
Ryan Moeller cba6d86638 FreeBSD: drop dependency on cryptodev module
We only need the kernel interfaces in crypto, not the device node in
cryptodev.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10901
2020-09-09 10:10:32 -07:00
George Amanakis feb3a7eef1 Introduce ZFS module parameter l2arc_mfuonly
In certain workloads it may be beneficial to reduce wear of L2ARC
devices by not caching MRU metadata and data into L2ARC. This commit
introduces a new tunable l2arc_mfuonly for this purpose.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10710
2020-09-08 11:44:37 -07:00
Ryan Moeller ebc4b52369 Avoid possibility of division by zero
When hz > 1000, msec / (1000 / hz) results in division by zero.

I found somewhere in FreeBSD using howmany(msec * hz, 1000) to convert
ms to ticks, avoiding the potential for a zero in the divisor.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10894
2020-09-08 11:39:16 -07:00
Toomas Soome 189272f78a dnode_special_open() error: unchecked function return 'zrl_tryenter'
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10876
2020-09-08 11:36:52 -07:00
Peter Dave Hello 5a877a894e Add a missing option prefix - in zfs-tests.sh usage()
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Peter Dave Hello <hsu@peterdavehello.org>
Closes #10893
2020-09-08 09:04:36 -07:00
Fabio Buso 5266cf4826 Display pbkdf2iters property as plain number
The pbkdf2iters property is an iteration counter
and should be displayed as plain number rather
than in binary unit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fabio Buso <buso.fabio@gmail.com>
Closes #10871
2020-09-08 08:49:55 -07:00
alaviss 75bf636cd0 libshare: Add missing headers for nfs.c
On musl libc, zfs failed to compile due to the missing <fcntl.h>
include, which is required for `open()` per POSIX.

This commit add the missing <fcntl.h> include.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hiếu Lê <leorize+oss@disroot.org>
Closes #10880
2020-09-04 12:03:57 -07:00
Matthew Macy 7432d29760 FreeBSD: reduce priority of ZIO_TASKQ_ISSUE writes by a larger value
On FreeBSD, if priorities divided by four (RQ_PPQ) are equal then
a difference between them is insignificant. In other words,
incrementing pri by only one as on Linux is insufficient.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10872
2020-09-04 11:13:27 -07:00
Ryan Moeller ef55446a9c Spruce up pkg-config files for libzfs/libzfs_core
Several of the listed library dependencies are not relevant on FreeBSD.
Have ./configure save libraries that are found via pkg-config as
${LIB}_PC and use the configured automake variables instead of hard
coded names so we only get what was actually needed.

While here, update the URL to point at the OpenZFS Github repo.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10869
2020-09-04 11:11:18 -07:00
Ryan Moeller 3ec88ab205 man: Cross-reference zfs-load-key(8) for ENCRYPTION mention
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Submitted-by: Harry Schmalzbauer
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10866
2020-09-04 10:53:59 -07:00
Ryan Moeller 782b1c1249 man: Add zfs rename -r to zfs-rename(8) SYNOPSIS
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10866
2020-09-04 10:53:22 -07:00
Brian Behlendorf dce63135ae Sequential scrub and resilver updated comments
Commit d4a72f2 which introduced multi-phase scrubs and resilvers
continued the work presented by Nexenta at the 2016 ZFS developer
summit.  Update the source to reflect their contribution.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-09-04 10:51:51 -07:00
Don Brady 4f07282786 Avoid posting duplicate zpool events
Duplicate io and checksum ereport events can misrepresent that 
things are worse than they seem. Ideally the zpool events and the 
corresponding vdev stat error counts in a zpool status should be 
for unique errors -- not the same error being counted over and over. 
This can be demonstrated in a simple example. With a single bad 
block in a datafile and just 5 reads of the file we end up with a 
degraded vdev, even though there is only one unique error in the pool.

The proposed solution to the above issue, is to eliminate duplicates 
when posting events and when updating vdev error stats. We now save 
recent error events of interest when posting events so that we can 
easily check for duplicates when posting an error. 

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #10861
2020-09-04 10:34:28 -07:00
Matthew Ahrens 3808032489 nowait synctask must succeed
If a `zfs_space_check_t` other than `ZFS_SPACE_CHECK_NONE` is used with
`dsl_sync_task_nowait()`, the sync task may fail due to ENOSPC.
However, there is no way to notice or communicate this failure, so it's
extremely difficult to use this functionality correctly, and in fact
almost all callers use `ZFS_SPACE_CHECK_NONE`.

This commit removes the `zfs_space_check_t` argument from
`dsl_sync_task_nowait()`, and always uses `ZFS_SPACE_CHECK_NONE`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10855
2020-09-04 10:29:39 -07:00
Ryan Moeller cd80273909 Retain thread name when resuming a zthr
When created, a zthr is given a name to identify it by.  This name is
lost when a cancelled zthr is resumed.

Retain the name of a zthr so it can be used when resuming.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10881
2020-09-03 20:09:52 -07:00
Alexander Richardson f3064162ba Fixes for running FreeBSD buildworld on Linux/macOS hosts
Adding an #ifdef __FreeBSD__ to a FreeBSD-specific header may seem odd,
but these headers are used on non-FreeBSD systems during the bootstrap
tools phase.
Originally submitted downstream as https://reviews.freebsd.org/D26193

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Closes #10863
2020-09-03 20:06:03 -07:00
Matthew Macy ac6e5fb202 Replace cv_{timed}wait_sig with cv_{timed}wait_idle where appropriate
There are a number of places where cv_?_sig is used simply for
accounting purposes but the surrounding code has no ability to
cope with actually receiving a signal. On FreeBSD it is possible
to send signals to individual kernel threads so this could
enable undesirable behavior.

This patch adds routines on Linux that will do the same idle
accounting as _sig without making the task interruptible. On
FreeBSD cv_*_idle  are all aliases for cv_*

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10843
2020-09-03 20:04:09 -07:00
Garrett Fields f30703f6fc Remove 'ZFS on Linux' references from PR Template
As mentioned in the #OpenZFS IRC channel (thanks "Toomas Soome"):
The OpenZFS PR Template still mentions "ZFS on Linux".
This changes that reference and updates the URLs.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Garrett Fields <ghfields@gmail.com>
Closes #10868
2020-09-03 16:31:05 -07:00
Spencer Kinny 6fffc88bf3 Links in Source Files
Added comments in following files
with links to Illumos manual pages:

./module/avl/avl.c
./module/nvpair/nvpair.c
./module/os/linux/spl/spl-kstat.c
./module/os/freebsd/spl/spl_kstat.c

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Spencer Kinny <spencerkinny1995@gmail.com>
Closes #5113 
Closes #10859
2020-09-02 09:42:12 -07:00
Toomas Soome 0db1b84a6d zvol: unsigned off can not be less than zero
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10867
2020-09-02 09:30:29 -07:00
Alexander Richardson 417e646722 Fix -Werror,-Wmacro-redefined in limits.h
Those macros are also defined by the compiler-provided float.h which
will be included later on (at least in the FreeBSD buildworld case) and
triggers these -Werror warnings. Including <float.h> first and only
defining the macros when DBL_DIG/FLT_DIG is missing fixes this problem.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Closes #10864
2020-09-01 16:22:09 -07:00
Ryan Moeller 964791acdc Make spa_stats.c tunables visible on FreeBSD
Use ZFS_MODULE_PARAM for cross-platform tunables in spa_stats.c, and
add update tunables.cfg in tests for the newly supported ones.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10858
2020-09-01 16:19:19 -07:00
Matthew Macy e84e49218f FreeBSD: Fix up after spa_stats.c move
Moving spa_stats added the additional burden of supporting
KSTAT_TYPE_IO.

spa_state_addr will always return a valid value regardless of
the value of 'n'. On FreeBSD this will cause an infinite loop
as it relies on the raw ops addr routine to indicate that there
is no more data.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10860
2020-09-01 16:16:56 -07:00
Ryan Moeller 7b4e27232d Add 'zfs rename -u' to rename without remounting
Allow to rename file systems without remounting if it is possible.
It is possible for file systems with 'mountpoint' property set to
'legacy' or 'none' - we don't have to change mount directory for them.
Currently such file systems are unmounted on rename and not even
mounted back.

This introduces layering violation, as we need to update
'f_mntfromname' field in statfs structure related to mountpoint (for
the dataset we are renaming and all its children).

In my opinion it is worth it, as it allow to update FreeBSD in even
cleaner way - in ZFS-only configuration root file system is ZFS file
system with 'mountpoint' property set to 'legacy'. If root dataset is
named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone
it (system/oldrootfs), update FreeBSD and if it doesn't boot we can
boot back from system/oldrootfs and rename it back to system/rootfs
while it is mounted as /. Before it was not possible, because
unmounting / was not possible.

Authored by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported by: Matt Macy <mmacy@freebsd.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10839
2020-09-01 16:14:16 -07:00
Ryan Moeller 88d19d7cc2 FreeBSD: Remove unused SECLABEL code
SECLABEL is undefined on FreeBSD and should be pruned.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10847
2020-08-31 19:52:46 -07:00
Ryan Moeller 46b7d53baf libspl: Provide platform-specific zone implementations
FreeBSD has the concept of jails, a precursor to Solaris's zones, which
can be mapped to the required zones interface with relative ease.  The
previous ZFS implementation in FreeBSD did so, and we should continue
to provide an appropriate implementation in OpenZFS as well.

Move lib/libspl/zone.c into platform code and adopt the correct
implementation for FreeBSD.

While here, prune unused code.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10851
2020-08-31 19:43:30 -07:00
Ryan Moeller eff621071f FreeBSD: Simplify INGLOBALZONE
FreeBSD's previous ZFS implemented INGLOBALZONE(thread) as
(!jailed((thread)->td_ucred)) and passed curthread to INGLOBALZONE.

We pass curproc instead of curthread, so we can achieve the same effect
with (!jailed((proc)->p_ucred)).  The implementation is trivial enough
to fit on a single line in a define.  We don't really need a whole
separate function for something that's already macros all the way down.

Eliminate in_globalzone.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10851
2020-08-31 19:43:08 -07:00
Ryan Moeller 2f65c7a608 FreeBSD: Define crgetzoneid appropriately
The previous ZFS implementation on FreeBSD had ifdefs to use jailed()
instead of crgetzoneid() in dsl_dir.c, however we can simply provide an
appropriate definition of crgetzoneid for the same effect.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10851
2020-08-31 19:42:26 -07:00
Toomas Soome 1144586b57 zio_ereport_post() and zio_ereport_start() return values are ignored
use (void) to silence analyzers.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10857
2020-08-31 19:35:11 -07:00
Spencer Kinny abe4fbfd01 Typo Correction
Corrected the typo in zfs/cmd/zfs/zfs_main.c
line number 404 pbkfd2iters to pbkdf2iters

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Spencer Kinny <spencerkinny1995@gmail.com>
Closes #10850
2020-08-30 14:14:32 -07:00
Matthew Macy 7bb18b94c7 Move spa_stats.c to common code
Initially it was considered simplest to stub out all
of the functions on FreeBSD. Now that FreeBSD supports
KSTAT_TYPE_RAW at least some of the functionality should
be made available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10842
2020-08-30 14:12:46 -07:00
Matthew Macy cd23154a8c FreeBSD: Fix spurious failure in zvol_geom_open
In zvol_geom_open on first open we need to guarantee
that the namespace lock is held to avoid spurious
failures in zvol_first_open.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10841
2020-08-30 14:11:33 -07:00
Kjeld Schouten-Lebbing 0e6a7aaac0 Auto close "Status: Feedback requested" after a month
This commit closes issues labeled with:
"Status: Feedback requested" after 1 month, if the
label is not removed or the author has not responded

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10807 
Closes #10808
2020-08-30 14:09:54 -07:00
Matthew Macy d436de6389 FreeBSD: add support for KSTAT_TYPE_RAW
A few kstats use KSTAT_TYPE_RAW to provide a string generated on
demand.  Implementing these as sysctls was punted until now.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10836
2020-08-29 20:59:50 -07:00
Brian Behlendorf 3e29e1971b Linux 5.9 compat: NR_SLAB_RECLAIMABLE
Commit dcdc12e added compatibility code to treat NR_SLAB_RECLAIMABLE_B
as if it were the same as NR_SLAB_RECLAIMABLE.  However, the new value
is in bytes while the old value was in pages which means they are not
interchangeable.

The only place the reclaimable slab size is used is as a component of
the calculation done by arc_free_memory().  This function returns the
amount of memory the ARC considers to be free or reclaimable at little
cost.  Rather than switch to a new interface to get this value it has
been removed it from the calculation.  It is normally a minor component
compared to the number of inactive or free pages, and removing it
aligns the behavior with the FreeBSD version of arc_free_memory().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10834
2020-08-29 20:57:45 -07:00
Richard Laager 62663fb7ec Fix another dependency loop
zfs-load-key-DATASET.service was gaining an
After=systemd-journald.socket due to its stdout/stderr going to the
journal (which is the default).  systemd-journald.socket has an After
(via RequiresMountsFor=/run/systemd/journal) on -.mount.  If the root
filesystem is encrypted, -.mount gets an After
zfs-load-key-DATASET.service.

By setting stdout and stderr to null on the key load services, we avoid
this loop.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10356
Closes #10388
2020-08-28 10:17:14 -07:00
Richard Laager ec41cafee1 Fix a dependency loop
When generating units with zfs-mount-generator, if the pool is already
imported, zfs-import.target is not needed.  This avoids a dependency
loop on root-on-ZFS systems:
  systemd-random-seed.service After (via RequiresMountsFor)
  var-lib.mount After
  zfs-import.target After
  zfs-import-{cache,scan}.service After
  cryptsetup.service After
  systemd-random-seed.service

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10388
2020-08-28 10:16:13 -07:00
Georgy Yakovlev 7ddcfe7c00 config/zfs-build.m4: add --with-vendor flag
This will allow an override of auto-detection of distribution, which
is based on checking presence of /etc/*-release files.

Build systems makes a lot of file location assumptions based on
detected distribution.

Some distributions (like gentoo) may prefer explicitly
setting --with-vendor=gentoo to avoid auto-detection.

Since auto-detection checks all files in order, current script may
misdetect even on gentoo system if /etc/redhat-release file is present

Default behavior is unchanged and default is --with-vendor=check

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #10835
2020-08-28 09:43:44 -07:00
Alexander Richardson 2b07c5aa3e Fix definition of BLKGETSIZE64 on FreeBSD
The matching ioctl is DIOCGMEDIASIZE.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Closes #10818
2020-08-27 16:09:26 -07:00
Georgy Yakovlev 735ba76104 module/zstd: pass -U__BMI__
If kernel is compiled with -march=znver1 or -march=znver2 zstd module 
compilation will fail due to SSE register return with SSE disabled.
What's interesting, is that -march=skylake also implies -mbmi which 
defines __BMI__ but compilation succeeds.  It is probably due to 
different BMI implementations on AMD and INTEL processors and the 
way compiler uses instructions.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #10758 
Closes #10829
2020-08-27 15:50:13 -07:00
John-Mark Gurney 770269ef3a Add the Xr's to the SEE ALSO as well
There are a ton of zfs-* and zpool-* man pages. This adds them to 
the SEE ALSO section so that people can more quickly look through 
what all the options are, now that the pages have been split.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: John-Mark Gurney <jmg@funkthat.com>
Closes #10589
2020-08-26 22:29:00 -07:00
Patrick Mooney 8d42c98d95 dnode_sync is careless with range tree
Because dnode_sync_free_range() must drop dn_mtx during its processing,
using it as a callback to range_tree_vacate() is not safe.  No other
operations (besides destroy) are allowed once range_tree_vacate() has
begun, and dropping dn_mtx would leave a window open for another thread
to observe that invalid (and unsafe) state via dnode_block_freed().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Patrick Mooney <pmooney@oxide.computer>
Closes #10708 
Closes #10823
2020-08-26 21:48:29 -07:00
Cédric Berger 600def792e Fix NEWS file
Points to https://github.com/openzfs/zfs/releases

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Cédric Berger <cedric@precidata.com>
Closes #10824
2020-08-26 21:44:41 -07:00
Ryan Moeller a2f944a140 zpool: Change base URL for ZFS messages to openzfs-docs
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10820
2020-08-26 21:43:06 -07:00
Brian Behlendorf 03f5d2fd6a Remove duplicate dnode.h include
The zfs/sa.c source file accidentally includes sys/dnode.h twice.
Remove the second occurrence.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10816 
Closes #10819
2020-08-26 21:41:09 -07:00
Paul Dagnelie 4aa3b3bd47 Always track temporary fses and snapshots for accounting
The root cause of the issue is that we only occasionally do as the 
comments in the code suggest and actually ignore the %recv dataset when 
it comes to filesystem limit tracking. Specifically, the only time we 
ignore it is when initializing the filesystem and snapshot limit values; 
when creating a new %recv dataset or deleting one, we always update 
the bookkeeping. This causes a problem if you init the fs count on a 
filesystem that already has a %recv dataset, since the bookmarking 
will be decremented but not incremented. This is resolved in this 
patch by simply always tracking the %recv dataset as a child.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10791
2020-08-26 21:38:27 -07:00
Kjeld Schouten-Lebbing ad52de77c2 Fix broken bug report form
By accident previous PR broke the bug report form.
This commit fixes it
(and is actually tested completely to work)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10821
2020-08-26 10:49:51 -07:00
Toomas Soome ec0c480c14 Remove pragma ident lines
The #pragma ident is a historical relic and not needed any more, this
pragma is actually unknown for common compilers and is only causing
trouble.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10810
2020-08-26 10:35:50 -07:00
Matthew Macy 2dbad44710 FreeBSD: disable neon usage
The neon support code does not build on FreeBSD,
ifdef out references to fix linker issues on arm64.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10809
2020-08-26 09:54:37 -07:00
George Melikov d6f90c78ab Github CI: Enable checkbashism
Run checkbashisms on checkstyle too.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10811
2020-08-26 09:52:28 -07:00
Kjeld Schouten-Lebbing 59ad15d3c5 StaleBot Tweaks
- Add Status: Triage Needed to bug reports

Currently "Type: Defect" is auto added.
Adding a triage tag, makes sure all issues are reviewed by a maintainer
It also opens up some options to priorities defects in the near future.

- Prevent future StaleBot Spam

StaleBot will limit itself to 6 actions per hour
This should prevent future floods of StaleBot activity
(aka Spam)

- StaleBot: Ignore issues that are being worked on

Ignore the following Issues:
- tagged: "Status: Work in Progress"
- Having a maintainer assigned
- Being part of a project
- Having a milestone tag

- Rename Ignore "Type: Understood" to "Bot: Not Stale"

This Commits changes the general ignore tag for StaleBot from:
 "Type: Understood"
to
"Bot: Not Stale"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10813
2020-08-26 09:49:58 -07:00
Alexander Motin 523e1295fe Introduce limit on size of L2ARC headers
Since L2ARC buffers are not evicted on memory pressure, too large
amount of headers on system with irrationally large L2ARC can render
it slow or even unusable.  This change limits L2ARC writes and
rebuild if unevictable L2ARC-only headers reach dangerous level.

While there, call arc_adapt() on L2ARC rebuild, so that it could
properly grow arc_c, reflecting potentially significant ARC size
increase and avoiding slow growth with hopeless eviction attempts
later when "overflow" is detected.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #10765
2020-08-25 14:33:36 -07:00
3807 changed files with 275030 additions and 144611 deletions
+21
View File
@@ -0,0 +1,21 @@
env:
CIRRUS_CLONE_DEPTH: 1
ARCH: amd64
build_task:
matrix:
freebsd_instance:
image_family: freebsd-12-4
freebsd_instance:
image_family: freebsd-13-2
freebsd_instance:
image_family: freebsd-14-0-snap
prepare_script:
- pkg install -y autoconf automake libtool gettext-runtime gmake ksh93 py39-packaging py39-cffi py39-sysctl
configure_script:
- env MAKE=gmake ./autogen.sh
- env MAKE=gmake ./configure --with-config="user" --with-python=3.9
build_script:
- gmake -j `sysctl -n kern.smp.cpus`
install_script:
- gmake install
+24 -77
View File
@@ -126,8 +126,8 @@ feature needed? What problem does it solve?
#### General
* All pull requests must be based on the current master branch and apply
without conflicts.
* All pull requests, except backports and releases, must be based on the current master branch
and should apply without conflicts.
* Please attempt to limit pull requests to a single commit which resolves
one specific issue.
* Make sure your commit messages are in the correct format. See the
@@ -145,22 +145,18 @@ Once everything is in good shape and the details have been worked out you can re
Any required reviews can then be finalized and the pull request merged.
#### Tests and Benchmarks
* Every pull request will by tested by the buildbot on multiple platforms by running the [zfs-tests.sh and zloop.sh](
* Every pull request is tested using a GitHub Actions workflow on multiple platforms by running the [zfs-tests.sh and zloop.sh](
https://openzfs.github.io/openzfs-docs/Developer%20Resources/Building%20ZFS.html#running-zloop-sh-and-zfs-tests-sh) test suites.
`.github/workflows/scripts/generate-ci-type.py` is used to determine whether the pull request is nonbehavior, i.e., not introducing behavior changes of any code, configuration or tests. If so, the CI will run on fewer platforms and only essential sanity tests will run. You can always override this by adding `ZFS-CI-Type` line to your commit message:
* If your last commit (or `HEAD` in git terms) contains a line `ZFS-CI-Type: quick`, quick mode is forced regardless of what files are changed.
* Otherwise, if any commit in a PR contains a line `ZFS-CI-Type: full`, full mode is forced.
* To verify your changes conform to the [style guidelines](
https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#style-guides
), please run `make checkstyle` and resolve any warnings.
* Static code analysis of each pull request is performed by the buildbot; run `make lint` to check your changes.
* Test cases should be provided when appropriate.
This includes making sure new features have adequate code coverage.
* Code analysis is performed by [CodeQL](https://codeql.github.com/) for each pull request.
* Test cases should be provided when appropriate. This includes making sure new features have adequate code coverage.
* If your pull request improves performance, please include some benchmarks.
* The pull request must pass all required [ZFS
Buildbot](http://build.zfsonlinux.org/) builders before
being accepted. If you are experiencing intermittent TEST
builder failures, you may be experiencing a [test suite
issue](https://github.com/openzfs/zfs/issues?q=is%3Aissue+is%3Aopen+label%3A%22Type%3A+Test+Suite%22).
There are also various [buildbot options](https://openzfs.github.io/openzfs-docs/Developer%20Resources/Buildbot%20Options.html)
to control how changes are tested.
* The pull request must pass all CI checks before being accepted.
### Testing
All help is appreciated! If you're in a position to run the latest code
@@ -175,6 +171,21 @@ to verify ZFS is behaving as intended.
## Style Guides
### Repository Structure
OpenZFS uses a standardised branching structure.
- The "development and main branch", is the branch all development should be based on.
- "Release branches" contain the latest released code for said version.
- "Staging branches" contain selected commits prior to being released.
**Branch Names:**
- Development and Main branch: `master`
- Release branches: `zfs-$VERSION-release`
- Staging branches: `zfs-$VERSION-staging`
`$VERSION` should be replaced with the `major.minor` version number.
_(This is the version number without the `.patch` version at the end)_
### Coding Conventions
We currently use [C Style and Coding Standards for
SunOS](http://www.cis.upenn.edu/%7Elee/06cse480/data/cstyle.ms.pdf) as our
@@ -215,70 +226,6 @@ attempting to solve.
Signed-off-by: Contributor <contributor@email.com>
```
#### OpenZFS Patch Ports
If you are porting OpenZFS patches, the commit message must meet
the following guidelines:
* The first line must be the summary line from the most important OpenZFS commit being ported.
It must begin with `OpenZFS dddd, dddd - ` where `dddd` are OpenZFS issue numbers.
* Provides a `Authored by:` line to attribute each patch for each original author.
* Provides the `Reviewed by:` and `Approved by:` lines from each original
OpenZFS commit.
* Provides a `Ported-by:` line with the developer's name followed by
their email for each OpenZFS commit.
* Provides a `OpenZFS-issue:` line with link for each original illumos
issue.
* Provides a `OpenZFS-commit:` line with link for each original OpenZFS commit.
* If necessary, provide some porting notes to describe any deviations from
the original OpenZFS commits.
An example OpenZFS patch port commit message for a single patch is provided
below.
```
OpenZFS 1234 - Summary from the original OpenZFS commit
Authored by: Original Author <original@email.com>
Reviewed by: Reviewer One <reviewer1@email.com>
Reviewed by: Reviewer Two <reviewer2@email.com>
Approved by: Approver One <approver1@email.com>
Ported-by: ZFS Contributor <contributor@email.com>
Provide some porting notes here if necessary.
OpenZFS-issue: https://www.illumos.org/issues/1234
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/abcd1234
```
If necessary, multiple OpenZFS patches can be combined in a single port.
This is useful when you are porting a new patch and its subsequent bug
fixes. An example commit message is provided below.
```
OpenZFS 1234, 5678 - Summary of most important OpenZFS commit
1234 Summary from original OpenZFS commit for 1234
Authored by: Original Author <original@email.com>
Reviewed by: Reviewer Two <reviewer2@email.com>
Approved by: Approver One <approver1@email.com>
Ported-by: ZFS Contributor <contributor@email.com>
Provide some porting notes here for 1234 if necessary.
OpenZFS-issue: https://www.illumos.org/issues/1234
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/abcd1234
5678 Summary from original OpenZFS commit for 5678
Authored by: Original Author2 <original2@email.com>
Reviewed by: Reviewer One <reviewer1@email.com>
Approved by: Approver Two <approver2@email.com>
Ported-by: ZFS Contributor <contributor@email.com>
Provide some porting notes here for 5678 if necessary.
OpenZFS-issue: https://www.illumos.org/issues/5678
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/efgh5678
```
#### Coverity Defect Fixes
If you are submitting a fix to a
[Coverity defect](https://scan.coverity.com/projects/zfsonlinux-zfs),
+8 -6
View File
@@ -25,14 +25,16 @@ Type | Version/Name
--- | ---
Distribution Name |
Distribution Version |
Linux Kernel |
Kernel Version |
Architecture |
ZFS Version |
SPL Version |
OpenZFS Version |
<!--
Commands to find ZFS/SPL versions:
modinfo zfs | grep -iw version
modinfo spl | grep -iw version
Command to find OpenZFS version:
zfs version
Commands to find kernel version:
uname -r # Linux
freebsd-version -r # FreeBSD
-->
### Describe the problem you're observing
+14
View File
@@ -0,0 +1,14 @@
blank_issues_enabled: false
contact_links:
- name: OpenZFS Questions
url: https://github.com/openzfs/zfs/discussions/new
about: Ask the community for help
- name: OpenZFS Community Support Mailing list (Linux)
url: https://zfsonlinux.topicbox.com/groups/zfs-discuss
about: Get community support for OpenZFS on Linux
- name: FreeBSD Community Support Mailing list
url: https://lists.freebsd.org/mailman/listinfo/freebsd-fs
about: Get community support for OpenZFS on FreeBSD
- name: OpenZFS on IRC
url: https://web.libera.chat/#openzfs
about: Use IRC to get community support for OpenZFS
-37
View File
@@ -1,37 +0,0 @@
---
name: Code Question
about: Ask a question about the code
title: ''
labels: 'Type: Question'
assignees: ''
---
<!--
Thank you for taking an interest in the OpenZFS codebase.
Please be aware that most questions are preferably asked in the mailing list first.
This form is primarily meant for asking questions about the code itself.
Please also check our issue tracker before opening a new question.
Filling out the following template will help other contributors better understand your question.
-->
### Ask your question!
<!--
Please provide a clear and concise question.
-->
### Which portion of the codebase does your question involve?
<!--
Optional: Please describe what portion of the codebase your issue involved.
Example: "Testsuite", "Buildbots", "CLI", a code snippet etc.
-->
### Additional context
<!--
Any additional information you want to add?
-->
+5 -4
View File
@@ -28,14 +28,15 @@ https://openzfs.github.io/openzfs-docs/Developer%20Resources/Buildbot%20Options.
- [ ] Performance enhancement (non-breaking change which improves efficiency)
- [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Library ABI change (libzfs, libzfs\_core, libnvpair, libuutil and libzfsbootenv)
- [ ] Documentation (a change to man pages or other documentation)
### Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
- [ ] My code follows the ZFS on Linux [code style requirements](https://github.com/zfsonlinux/zfs/blob/master/.github/CONTRIBUTING.md#coding-conventions).
- [ ] My code follows the OpenZFS [code style requirements](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#coding-conventions).
- [ ] I have updated the documentation accordingly.
- [ ] I have read the [**contributing** document](https://github.com/zfsonlinux/zfs/blob/master/.github/CONTRIBUTING.md).
- [ ] I have added [tests](https://github.com/zfsonlinux/zfs/tree/master/tests) to cover my changes.
- [ ] I have read the [**contributing** document](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md).
- [ ] I have added [tests](https://github.com/openzfs/zfs/tree/master/tests) to cover my changes.
- [ ] I have run the ZFS Test Suite with this change applied.
- [ ] All commit messages are properly formatted and contain [`Signed-off-by`](https://github.com/zfsonlinux/zfs/blob/master/.github/CONTRIBUTING.md#signed-off-by).
- [ ] All commit messages are properly formatted and contain [`Signed-off-by`](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#signed-off-by).
+4
View File
@@ -0,0 +1,4 @@
name: "Custom CodeQL Analysis"
queries:
- uses: ./.github/codeql/custom-queries/cpp/deprecatedFunctionUsage.ql
+4
View File
@@ -0,0 +1,4 @@
name: "Custom CodeQL Analysis"
paths-ignore:
- tests
@@ -0,0 +1,59 @@
/**
* @name Deprecated function usage detection
* @description Detects functions whose usage is banned from the OpenZFS
* codebase due to QA concerns.
* @kind problem
* @severity error
* @id cpp/deprecated-function-usage
*/
import cpp
predicate isDeprecatedFunction(Function f) {
f.getName() = "strtok" or
f.getName() = "__xpg_basename" or
f.getName() = "basename" or
f.getName() = "dirname" or
f.getName() = "bcopy" or
f.getName() = "bcmp" or
f.getName() = "bzero" or
f.getName() = "asctime" or
f.getName() = "asctime_r" or
f.getName() = "gmtime" or
f.getName() = "localtime" or
f.getName() = "strncpy"
}
string getReplacementMessage(Function f) {
if f.getName() = "strtok" then
result = "Use strtok_r(3) instead!"
else if f.getName() = "__xpg_basename" then
result = "basename(3) is underspecified. Use zfs_basename() instead!"
else if f.getName() = "basename" then
result = "basename(3) is underspecified. Use zfs_basename() instead!"
else if f.getName() = "dirname" then
result = "dirname(3) is underspecified. Use zfs_dirnamelen() instead!"
else if f.getName() = "bcopy" then
result = "bcopy(3) is deprecated. Use memcpy(3)/memmove(3) instead!"
else if f.getName() = "bcmp" then
result = "bcmp(3) is deprecated. Use memcmp(3) instead!"
else if f.getName() = "bzero" then
result = "bzero(3) is deprecated. Use memset(3) instead!"
else if f.getName() = "asctime" then
result = "Use strftime(3) instead!"
else if f.getName() = "asctime_r" then
result = "Use strftime(3) instead!"
else if f.getName() = "gmtime" then
result = "gmtime(3) isn't thread-safe. Use gmtime_r(3) instead!"
else if f.getName() = "localtime" then
result = "localtime(3) isn't thread-safe. Use localtime_r(3) instead!"
else
result = "strncpy(3) is deprecated. Use strlcpy(3) instead!"
}
from FunctionCall fc, Function f
where
fc.getTarget() = f and
isDeprecatedFunction(f)
select fc, getReplacementMessage(f)
@@ -0,0 +1,4 @@
name: openzfs-cpp-queries
version: 0.0.0
libraryPathDependencies: codeql-cpp
suites: openzfs-cpp-suite
+13
View File
@@ -0,0 +1,13 @@
# Configuration for probot-no-response - https://github.com/probot/no-response
# Number of days of inactivity before an Issue is closed for lack of response
daysUntilClose: 31
# Label requiring a response
responseRequiredLabel: "Status: Feedback requested"
# Comment to post when closing an Issue for lack of response. Set to `false` to disable
closeComment: >
This issue has been automatically closed because there has been no response
to our request for more information from the original author. With only the
information that is currently in the issue, we don't have enough information
to take action. Please reach out if you have or find the answers we need so
that we can investigate further.
+10 -1
View File
@@ -7,7 +7,14 @@ only: issues
# Issues with these labels will never be considered stale
exemptLabels:
- "Type: Feature"
- "Type: Understood"
- "Bot: Not Stale"
- "Status: Work in Progress"
# Set to true to ignore issues in a project (defaults to false)
exemptProjects: true
# Set to true to ignore issues in a milestone (defaults to false)
exemptMilestones: true
# Set to true to ignore issues with an assignee (defaults to false)
exemptAssignees: true
# Label to use when marking an issue as stale
staleLabel: "Status: Stale"
# Comment to post when marking an issue as stale. Set to `false` to disable
@@ -15,3 +22,5 @@ markComment: >
This issue has been automatically marked as "stale" because it has not had
any activity for a while. It will be closed in 90 days if no further activity occurs.
Thank you for your contributions.
# Limit the number of actions per hour, from 1-30. Default is 30
limitPerRun: 6
+61
View File
@@ -0,0 +1,61 @@
## The testings are done this way
```mermaid
flowchart TB
subgraph CleanUp and Summary
CleanUp+Summary
end
subgraph Functional Testings
sanity-checks-20.04
zloop-checks-20.04
functional-testing-20.04-->Part1-20.04
functional-testing-20.04-->Part2-20.04
functional-testing-20.04-->Part3-20.04
functional-testing-20.04-->Part4-20.04
functional-testing-22.04-->Part1-22.04
functional-testing-22.04-->Part2-22.04
functional-testing-22.04-->Part3-22.04
functional-testing-22.04-->Part4-22.04
sanity-checks-22.04
zloop-checks-22.04
end
subgraph Code Checking + Building
Build-Ubuntu-20.04
codeql.yml
checkstyle.yml
Build-Ubuntu-22.04
end
Build-Ubuntu-20.04-->sanity-checks-20.04
Build-Ubuntu-20.04-->zloop-checks-20.04
Build-Ubuntu-20.04-->functional-testing-20.04
Build-Ubuntu-22.04-->sanity-checks-22.04
Build-Ubuntu-22.04-->zloop-checks-22.04
Build-Ubuntu-22.04-->functional-testing-22.04
sanity-checks-20.04-->CleanUp+Summary
Part1-20.04-->CleanUp+Summary
Part2-20.04-->CleanUp+Summary
Part3-20.04-->CleanUp+Summary
Part4-20.04-->CleanUp+Summary
Part1-22.04-->CleanUp+Summary
Part2-22.04-->CleanUp+Summary
Part3-22.04-->CleanUp+Summary
Part4-22.04-->CleanUp+Summary
sanity-checks-22.04-->CleanUp+Summary
```
1) build zfs modules for Ubuntu 20.04 and 22.04 (~15m)
2) 2x zloop test (~10m) + 2x sanity test (~25m)
3) 4x functional testings in parts 1..4 (each ~1h)
4) cleanup and create summary
- content of summary depends on the results of the steps
When everything runs fine, the full run should be done in
about 2 hours.
The codeql.yml and checkstyle.yml are not part in this circle.
+45 -13
View File
@@ -2,31 +2,63 @@ name: checkstyle
on:
push:
pull_request_target:
pull_request:
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
checkstyle:
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install --yes -qq build-essential autoconf libtool gawk alien fakeroot linux-headers-$(uname -r)
sudo apt-get install --yes -qq zlib1g-dev uuid-dev libattr1-dev libblkid-dev libselinux-dev libudev-dev libssl-dev python-dev python-setuptools python-cffi python3 python3-dev python3-setuptools python3-cffi
# packages for tests
sudo apt-get install --yes -qq parted lsscsi ksh attr acl nfs-kernel-server fio
sudo apt-get install --yes -qq mandoc cppcheck pax-utils # devscripts - enable then bashisms fixed
sudo -E pip --quiet install flake8
# for x in lxd core20 snapd; do sudo snap remove $x; done
sudo apt-get purge -y snapd google-chrome-stable firefox
ONLY_DEPS=1 .github/workflows/scripts/qemu-3-deps.sh ubuntu22
sudo apt-get install -y cppcheck devscripts mandoc pax-utils shellcheck
sudo python -m pipx install --quiet flake8
# confirm that the tools are installed
# the build system doesn't fail when they are not
checkbashisms --version
cppcheck --version
flake8 --version
scanelf --version
shellcheck --version
- name: Prepare
run: |
sh ./autogen.sh
sed -i '/DEBUG_CFLAGS="-Werror"/s/^/#/' config/zfs-build.m4
./autogen.sh
- name: Configure
run: |
./configure
- name: Make
run: |
make -j$(nproc) --no-print-directory --silent
- name: Checkstyle
run: |
make checkstyle
make -j$(nproc) --no-print-directory --silent checkstyle
- name: Lint
run: |
make lint
make -j$(nproc) --no-print-directory --silent lint
- name: CheckABI
id: CheckABI
run: |
docker run -v $PWD:/source ghcr.io/openzfs/libabigail make -j$(nproc) --no-print-directory --silent checkabi
- name: StoreABI
if: failure() && steps.CheckABI.outcome == 'failure'
run: |
docker run -v $PWD:/source ghcr.io/openzfs/libabigail make -j$(nproc) --no-print-directory --silent storeabi
- name: Prepare artifacts
if: failure() && steps.CheckABI.outcome == 'failure'
run: |
find -name *.abi | tar -cf abi_files.tar -T -
- uses: actions/upload-artifact@v4
if: failure() && steps.CheckABI.outcome == 'failure'
with:
name: New ABI files (use only if you're sure about interface changes)
path: abi_files.tar
+45
View File
@@ -0,0 +1,45 @@
name: "CodeQL"
on:
push:
pull_request:
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
analyze:
name: Analyze
runs-on: ubuntu-22.04
permissions:
actions: read
contents: read
security-events: write
strategy:
fail-fast: false
matrix:
language: [ 'cpp', 'python' ]
steps:
- name: Set make jobs
run: |
echo "MAKEFLAGS=-j$(nproc)" >> $GITHUB_ENV
- name: Checkout repository
uses: actions/checkout@v4
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
with:
config-file: .github/codeql-${{ matrix.language }}.yml
languages: ${{ matrix.language }}
- name: Autobuild
uses: github/codeql-action/autobuild@v3
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v3
with:
category: "/language:${{matrix.language}}"
+14
View File
@@ -0,0 +1,14 @@
Workflow for each operating system:
- install qemu on the github runner
- download current cloud image of operating system
- start and init that image via cloud-init
- install dependencies and poweroff system
- start system and build openzfs and then poweroff again
- clone build system and start 2 instances of it
- run functional testings and complete in around 3h
- when tests are done, do some logfile preparing
- show detailed results for each system
- in the end, generate the job summary
/TR 14.09.2024
+107
View File
@@ -0,0 +1,107 @@
#!/usr/bin/env python3
"""
Determine the CI type based on the change list and commit message.
Prints "quick" if (explicity required by user):
- the *last* commit message contains 'ZFS-CI-Type: quick'
or if (heuristics):
- the files changed are not in the list of specified directories, and
- all commit messages do not contain 'ZFS-CI-Type: full'
Otherwise prints "full".
"""
import sys
import subprocess
import re
"""
Patterns of files that are not considered to trigger full CI.
Note: not using pathlib.Path.match() because it does not support '**'
"""
FULL_RUN_IGNORE_REGEX = list(map(re.compile, [
r'.*\.md',
r'.*\.gitignore'
]))
"""
Patterns of files that are considered to trigger full CI.
"""
FULL_RUN_REGEX = list(map(re.compile, [
r'cmd.*',
r'configs/.*',
r'META',
r'.*\.am',
r'.*\.m4',
r'autogen\.sh',
r'configure\.ac',
r'copy-builtin',
r'contrib',
r'etc',
r'include',
r'lib/.*',
r'module/.*',
r'scripts/.*',
r'tests/.*',
r'udev/.*'
]))
if __name__ == '__main__':
prog = sys.argv[0]
if len(sys.argv) != 3:
print(f'Usage: {prog} <head_ref> <base_ref>')
sys.exit(1)
head, base = sys.argv[1:3]
def output_type(type, reason):
print(f'{prog}: will run {type} CI: {reason}', file=sys.stderr)
print(type)
sys.exit(0)
# check last (HEAD) commit message
last_commit_message_raw = subprocess.run([
'git', 'show', '-s', '--format=%B', 'HEAD'
], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for line in last_commit_message_raw.stdout.decode().splitlines():
if line.strip().lower() == 'zfs-ci-type: quick':
output_type('quick', f'explicitly requested by HEAD commit {head}')
# check all commit messages
all_commit_message_raw = subprocess.run([
'git', 'show', '-s',
'--format=ZFS-CI-Commit: %H%n%B', f'{head}...{base}'
], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
all_commit_message = all_commit_message_raw.stdout.decode().splitlines()
commit_ref = head
for line in all_commit_message:
if line.startswith('ZFS-CI-Commit:'):
commit_ref = line.lstrip('ZFS-CI-Commit:').rstrip()
if line.strip().lower() == 'zfs-ci-type: full':
output_type('full', f'explicitly requested by commit {commit_ref}')
# check changed files
changed_files_raw = subprocess.run([
'git', 'diff', '--name-only', head, base
], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
changed_files = changed_files_raw.stdout.decode().splitlines()
for f in changed_files:
for r in FULL_RUN_IGNORE_REGEX:
if r.match(f):
break
else:
for r in FULL_RUN_REGEX:
if r.match(f):
output_type(
'full',
f'changed file "{f}" matches pattern "{r.pattern}"'
)
# catch-all
output_type('quick', 'no changed file matches full CI patterns')
+109
View File
@@ -0,0 +1,109 @@
#!/bin/awk -f
#
# Merge multiple ZTS tests results summaries into a single summary. This is
# needed when you're running different parts of ZTS on different tests
# runners or VMs.
#
# Usage:
#
# ./merge_summary.awk summary1.txt [summary2.txt] [summary3.txt] ...
#
# or:
#
# cat summary*.txt | ./merge_summary.awk
#
BEGIN {
i=-1
pass=0
fail=0
skip=0
state=""
cl=0
el=0
upl=0
ul=0
# Total seconds of tests runtime
total=0;
}
# Skip empty lines
/^\s*$/{next}
# Skip Configuration and Test lines
/^Test:/{state=""; next}
/Configuration/{state="";next}
# When we see "test-runner.py" stop saving config lines, and
# save test runner lines
/test-runner.py/{state="testrunner"; runner=runner$0"\n"; next}
# We need to differentiate the PASS counts from test result lines that start
# with PASS, like:
#
# PASS mv_files/setup
#
# Use state="pass_count" to differentiate
#
/Results Summary/{state="pass_count"; next}
/PASS/{ if (state=="pass_count") {pass += $2}}
/FAIL/{ if (state=="pass_count") {fail += $2}}
/SKIP/{ if (state=="pass_count") {skip += $2}}
/Running Time/{
state="";
running[i]=$3;
split($3, arr, ":")
total += arr[1] * 60 * 60;
total += arr[2] * 60;
total += arr[3]
next;
}
/Tests with results other than PASS that are expected/{state="expected_lines"; next}
/Tests with result of PASS that are unexpected/{state="unexpected_pass_lines"; next}
/Tests with results other than PASS that are unexpected/{state="unexpected_lines"; next}
{
if (state == "expected_lines") {
expected_lines[el] = $0
el++
}
if (state == "unexpected_pass_lines") {
unexpected_pass_lines[upl] = $0
upl++
}
if (state == "unexpected_lines") {
unexpected_lines[ul] = $0
ul++
}
}
# Reproduce summary
END {
print runner;
print "\nResults Summary"
print "PASS\t"pass
print "FAIL\t"fail
print "SKIP\t"skip
print ""
print "Running Time:\t"strftime("%T", total, 1)
if (pass+fail+skip > 0) {
percent_passed=(pass/(pass+fail+skip) * 100)
}
printf "Percent passed:\t%3.2f%", percent_passed
print "\n\nTests with results other than PASS that are expected:"
asort(expected_lines, sorted)
for (j in sorted)
print sorted[j]
print "\n\nTests with result of PASS that are unexpected:"
asort(unexpected_pass_lines, sorted)
for (j in sorted)
print sorted[j]
print "\n\nTests with results other than PASS that are unexpected:"
asort(unexpected_lines, sorted)
for (j in sorted)
print sorted[j]
}
+93
View File
@@ -0,0 +1,93 @@
#!/usr/bin/env bash
######################################################################
# 1) setup qemu instance on action runner
######################################################################
set -eu
# install needed packages
export DEBIAN_FRONTEND="noninteractive"
sudo apt-get -y update
sudo apt-get install -y axel cloud-image-utils daemonize guestfs-tools \
ksmtuned virt-manager linux-modules-extra-$(uname -r) zfsutils-linux
# generate ssh keys
rm -f ~/.ssh/id_ed25519
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519 -q -N ""
# we expect RAM shortage
cat << EOF | sudo tee /etc/ksmtuned.conf > /dev/null
# /etc/ksmtuned.conf - Configuration file for ksmtuned
# https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/chap-ksm
KSM_MONITOR_INTERVAL=60
# Millisecond sleep between ksm scans for 16Gb server.
# Smaller servers sleep more, bigger sleep less.
KSM_SLEEP_MSEC=30
KSM_NPAGES_BOOST=0
KSM_NPAGES_DECAY=0
KSM_NPAGES_MIN=1000
KSM_NPAGES_MAX=25000
KSM_THRES_COEF=80
KSM_THRES_CONST=8192
LOGFILE=/var/log/ksmtuned.log
DEBUG=1
EOF
sudo systemctl restart ksm
sudo systemctl restart ksmtuned
# not needed
sudo systemctl stop docker.socket
sudo systemctl stop multipathd.socket
# remove default swapfile and /mnt
sudo swapoff -a
sudo umount -l /mnt
DISK="/dev/disk/cloud/azure_resource-part1"
sudo sed -e "s|^$DISK.*||g" -i /etc/fstab
sudo wipefs -aq $DISK
sudo systemctl daemon-reload
sudo modprobe loop
sudo modprobe zfs
# partition the disk as needed
DISK="/dev/disk/cloud/azure_resource"
sudo sgdisk --zap-all $DISK
sudo sgdisk -p \
-n 1:0:+16G -c 1:"swap" \
-n 2:0:0 -c 2:"tests" \
$DISK
sync
sleep 1
# swap with same size as RAM
sudo mkswap $DISK-part1
sudo swapon $DISK-part1
# 60GB data disk
SSD1="$DISK-part2"
# 10GB data disk on ext4
sudo fallocate -l 10G /test.ssd1
SSD2=$(sudo losetup -b 4096 -f /test.ssd1 --show)
# adjust zfs module parameter and create pool
exec 1>/dev/null
ARC_MIN=$((1024*1024*256))
ARC_MAX=$((1024*1024*512))
echo $ARC_MIN | sudo tee /sys/module/zfs/parameters/zfs_arc_min
echo $ARC_MAX | sudo tee /sys/module/zfs/parameters/zfs_arc_max
echo 1 | sudo tee /sys/module/zfs/parameters/zvol_use_blk_mq
sudo zpool create -f -o ashift=12 zpool $SSD1 $SSD2 \
-O relatime=off -O atime=off -O xattr=sa -O compression=lz4 \
-O mountpoint=/mnt/tests
# no need for some scheduler
for i in /sys/block/s*/queue/scheduler; do
echo "none" | sudo tee $i > /dev/null
done
+232
View File
@@ -0,0 +1,232 @@
#!/usr/bin/env bash
######################################################################
# 2) start qemu with some operating system, init via cloud-init
######################################################################
set -eu
# short name used in zfs-qemu.yml
OS="$1"
# OS variant (virt-install --os-variant list)
OSv=$OS
# compressed with .zst extension
REPO="https://github.com/mcmilk/openzfs-freebsd-images"
FREEBSD="$REPO/releases/download/v2024-12-14"
URLzs=""
# Ubuntu mirrors
#UBMIRROR="https://cloud-images.ubuntu.com"
#UBMIRROR="https://mirrors.cloud.tencent.com/ubuntu-cloud-images"
UBMIRROR="https://mirror.citrahost.com/ubuntu-cloud-images"
# default nic model for vm's
NIC="virtio"
case "$OS" in
almalinux8)
OSNAME="AlmaLinux 8"
URL="https://repo.almalinux.org/almalinux/8/cloud/x86_64/images/AlmaLinux-8-GenericCloud-latest.x86_64.qcow2"
;;
almalinux9)
OSNAME="AlmaLinux 9"
URL="https://repo.almalinux.org/almalinux/9/cloud/x86_64/images/AlmaLinux-9-GenericCloud-latest.x86_64.qcow2"
;;
archlinux)
OSNAME="Archlinux"
URL="https://geo.mirror.pkgbuild.com/images/latest/Arch-Linux-x86_64-cloudimg.qcow2"
# dns sometimes fails with that url :/
echo "89.187.191.12 geo.mirror.pkgbuild.com" | sudo tee /etc/hosts > /dev/null
;;
centos-stream10)
OSNAME="CentOS Stream 10"
# TODO: #16903 Overwrite OSv to stream9 for virt-install until it's added to osinfo
OSv="centos-stream9"
URL="https://cloud.centos.org/centos/10-stream/x86_64/images/CentOS-Stream-GenericCloud-10-latest.x86_64.qcow2"
;;
centos-stream9)
OSNAME="CentOS Stream 9"
URL="https://cloud.centos.org/centos/9-stream/x86_64/images/CentOS-Stream-GenericCloud-9-latest.x86_64.qcow2"
;;
debian11)
OSNAME="Debian 11"
URL="https://cloud.debian.org/images/cloud/bullseye/latest/debian-11-generic-amd64.qcow2"
;;
debian12)
OSNAME="Debian 12"
URL="https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-generic-amd64.qcow2"
;;
fedora40)
OSNAME="Fedora 40"
OSv="fedora-unknown"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/40/Cloud/x86_64/images/Fedora-Cloud-Base-Generic.x86_64-40-1.14.qcow2"
;;
fedora41)
OSNAME="Fedora 41"
OSv="fedora-unknown"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/41/Cloud/x86_64/images/Fedora-Cloud-Base-Generic-41-1.4.x86_64.qcow2"
;;
freebsd13-3r)
OSNAME="FreeBSD 13.3-RELEASE"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.3-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
NIC="rtl8139"
;;
freebsd13-4r)
OSNAME="FreeBSD 13.4-RELEASE"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.4-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
NIC="rtl8139"
;;
freebsd14-1r)
OSNAME="FreeBSD 14.1-RELEASE"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.1-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
;;
freebsd14-2r)
OSNAME="FreeBSD 14.2-RELEASE"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.2-RELEASE.qcow2.zst"
BASH="/usr/local/bin/bash"
;;
freebsd13-4s)
OSNAME="FreeBSD 13.4-STABLE"
OSv="freebsd13.0"
URLzs="$FREEBSD/amd64-freebsd-13.4-STABLE.qcow2.zst"
BASH="/usr/local/bin/bash"
NIC="rtl8139"
;;
freebsd14-2s)
OSNAME="FreeBSD 14.2-STABLE"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-14.2-STABLE.qcow2.zst"
BASH="/usr/local/bin/bash"
;;
freebsd15-0c)
OSNAME="FreeBSD 15.0-CURRENT"
OSv="freebsd14.0"
URLzs="$FREEBSD/amd64-freebsd-15.0-CURRENT.qcow2.zst"
BASH="/usr/local/bin/bash"
;;
tumbleweed)
OSNAME="openSUSE Tumbleweed"
OSv="opensusetumbleweed"
MIRROR="http://opensuse-mirror-gce-us.susecloud.net"
URL="$MIRROR/tumbleweed/appliances/openSUSE-MicroOS.x86_64-OpenStack-Cloud.qcow2"
;;
ubuntu20)
OSNAME="Ubuntu 20.04"
OSv="ubuntu20.04"
URL="$UBMIRROR/focal/current/focal-server-cloudimg-amd64.img"
;;
ubuntu22)
OSNAME="Ubuntu 22.04"
OSv="ubuntu22.04"
URL="$UBMIRROR/jammy/current/jammy-server-cloudimg-amd64.img"
;;
ubuntu24)
OSNAME="Ubuntu 24.04"
OSv="ubuntu24.04"
URL="$UBMIRROR/noble/current/noble-server-cloudimg-amd64.img"
;;
*)
echo "Wrong value for OS variable!"
exit 111
;;
esac
# environment file
ENV="/var/tmp/env.txt"
echo "ENV=$ENV" >> $ENV
# result path
echo 'RESPATH="/var/tmp/test_results"' >> $ENV
# FreeBSD 13 has problems with: e1000+virtio
echo "NIC=$NIC" >> $ENV
# freebsd15 -> used in zfs-qemu.yml
echo "OS=$OS" >> $ENV
# freebsd14.0 -> used for virt-install
echo "OSv=\"$OSv\"" >> $ENV
# FreeBSD 15 (Current) -> used for summary
echo "OSNAME=\"$OSNAME\"" >> $ENV
sudo mkdir -p "/mnt/tests"
sudo chown -R $(whoami) /mnt/tests
# we are downloading via axel, curl and wget are mostly slower and
# require more return value checking
IMG="/mnt/tests/cloudimg.qcow2"
if [ ! -z "$URLzs" ]; then
echo "Loading image $URLzs ..."
time axel -q -o "$IMG.zst" "$URLzs"
zstd -q -d --rm "$IMG.zst"
else
echo "Loading image $URL ..."
time axel -q -o "$IMG" "$URL"
fi
DISK="/dev/zvol/zpool/openzfs"
FORMAT="raw"
sudo zfs create -ps -b 64k -V 80g zpool/openzfs
while true; do test -b $DISK && break; sleep 1; done
echo "Importing VM image to zvol..."
sudo qemu-img dd -f qcow2 -O raw if=$IMG of=$DISK bs=4M
rm -f $IMG
PUBKEY=$(cat ~/.ssh/id_ed25519.pub)
cat <<EOF > /tmp/user-data
#cloud-config
fqdn: $OS
users:
- name: root
shell: $BASH
- name: zfs
sudo: ALL=(ALL) NOPASSWD:ALL
shell: $BASH
ssh_authorized_keys:
- $PUBKEY
growpart:
mode: auto
devices: ['/']
ignore_growroot_disabled: false
EOF
sudo virsh net-update default add ip-dhcp-host \
"<host mac='52:54:00:83:79:00' ip='192.168.122.10'/>" --live --config
sudo virt-install \
--os-variant $OSv \
--name "openzfs" \
--cpu host-passthrough \
--virt-type=kvm --hvm \
--vcpus=4,sockets=1 \
--memory $((1024*12)) \
--memballoon model=virtio \
--graphics none \
--network bridge=virbr0,model=$NIC,mac='52:54:00:83:79:00' \
--cloud-init user-data=/tmp/user-data \
--disk $DISK,bus=virtio,cache=none,format=$FORMAT,driver.discard=unmap \
--import --noautoconsole >/dev/null
# in case the directory isn't there already
mkdir -p $HOME/.ssh
cat <<EOF >> $HOME/.ssh/config
# no questions please
StrictHostKeyChecking no
# small timeout, used in while loops later
ConnectTimeout 1
EOF
+229
View File
@@ -0,0 +1,229 @@
#!/usr/bin/env bash
######################################################################
# 3) install dependencies for compiling and loading
######################################################################
set -eu
function archlinux() {
echo "##[group]Running pacman -Syu"
sudo btrfs filesystem resize max /
sudo pacman -Syu --noconfirm
echo "##[endgroup]"
echo "##[group]Install Development Tools"
sudo pacman -Sy --noconfirm base-devel bc cpio cryptsetup dhclient dkms \
fakeroot fio gdb inetutils jq less linux linux-headers lsscsi nfs-utils \
parted pax perf python-packaging python-setuptools qemu-guest-agent ksh \
samba sysstat rng-tools rsync wget xxhash
echo "##[endgroup]"
}
function debian() {
export DEBIAN_FRONTEND="noninteractive"
echo "##[group]Running apt-get update+upgrade"
sudo apt-get update -y
sudo apt-get upgrade -y
echo "##[endgroup]"
echo "##[group]Install Development Tools"
sudo apt-get install -y \
acl alien attr autoconf bc cpio cryptsetup curl dbench dh-python dkms \
fakeroot fio gdb gdebi git ksh lcov isc-dhcp-client jq libacl1-dev \
libaio-dev libattr1-dev libblkid-dev libcurl4-openssl-dev libdevmapper-dev \
libelf-dev libffi-dev libmount-dev libpam0g-dev libselinux-dev libssl-dev \
libtool libtool-bin libudev-dev libunwind-dev linux-headers-$(uname -r) \
lsscsi nfs-kernel-server pamtester parted python3 python3-all-dev \
python3-cffi python3-dev python3-distlib python3-packaging \
python3-setuptools python3-sphinx qemu-guest-agent rng-tools rpm2cpio \
rsync samba sysstat uuid-dev watchdog wget xfslibs-dev xxhash zlib1g-dev
echo "##[endgroup]"
}
function freebsd() {
export ASSUME_ALWAYS_YES="YES"
echo "##[group]Install Development Tools"
sudo pkg install -y autoconf automake autotools base64 checkbashisms fio \
gdb gettext gettext-runtime git gmake gsed jq ksh93 lcov libtool lscpu \
pkgconf python python3 pamtester pamtester qemu-guest-agent rsync xxhash
sudo pkg install -xy \
'^samba4[[:digit:]]+$' \
'^py3[[:digit:]]+-cffi$' \
'^py3[[:digit:]]+-sysctl$' \
'^py3[[:digit:]]+-packaging$'
echo "##[endgroup]"
}
# common packages for: almalinux, centos, redhat
function rhel() {
echo "##[group]Running dnf update"
echo "max_parallel_downloads=10" | sudo -E tee -a /etc/dnf/dnf.conf
sudo dnf clean all
sudo dnf update -y --setopt=fastestmirror=1 --refresh
echo "##[endgroup]"
echo "##[group]Install Development Tools"
# Alma wants "Development Tools", Fedora 41 wants "development-tools"
if ! sudo dnf group install -y "Development Tools" ; then
echo "Trying 'development-tools' instead of 'Development Tools'"
sudo dnf group install -y development-tools
fi
sudo dnf install -y \
acl attr bc bzip2 cryptsetup curl dbench dkms elfutils-libelf-devel fio \
gdb git jq kernel-rpm-macros ksh libacl-devel libaio-devel \
libargon2-devel libattr-devel libblkid-devel libcurl-devel libffi-devel \
ncompress libselinux-devel libtirpc-devel libtool libudev-devel \
libuuid-devel lsscsi mdadm nfs-utils openssl-devel pam-devel pamtester \
parted perf python3 python3-cffi python3-devel python3-packaging \
kernel-devel python3-setuptools qemu-guest-agent rng-tools rpcgen \
rpm-build rsync samba sysstat systemd watchdog wget xfsprogs-devel xxhash \
zlib-devel
echo "##[endgroup]"
}
function tumbleweed() {
echo "##[group]Running zypper is TODO!"
sleep 23456
echo "##[endgroup]"
}
# Install dependencies
case "$1" in
almalinux8)
echo "##[group]Enable epel and powertools repositories"
sudo dnf config-manager -y --set-enabled powertools
sudo dnf install -y epel-release
echo "##[endgroup]"
rhel
echo "##[group]Install kernel-abi-whitelists"
sudo dnf install -y kernel-abi-whitelists
echo "##[endgroup]"
;;
almalinux9|centos-stream9|centos-stream10)
echo "##[group]Enable epel and crb repositories"
sudo dnf config-manager -y --set-enabled crb
sudo dnf install -y epel-release
echo "##[endgroup]"
rhel
echo "##[group]Install kernel-abi-stablelists"
sudo dnf install -y kernel-abi-stablelists
echo "##[endgroup]"
;;
archlinux)
archlinux
;;
debian*)
echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections
debian
echo "##[group]Install Debian specific"
sudo apt-get install -yq linux-perf dh-sequence-dkms
echo "##[endgroup]"
;;
fedora*)
rhel
;;
freebsd*)
freebsd
;;
tumbleweed)
tumbleweed
;;
ubuntu*)
debian
echo "##[group]Install Ubuntu specific"
sudo apt-get install -yq linux-tools-common libtirpc-dev \
linux-modules-extra-$(uname -r)
if [ "$1" != "ubuntu20" ]; then
sudo apt-get install -yq dh-sequence-dkms
fi
echo "##[endgroup]"
echo "##[group]Delete Ubuntu OpenZFS modules"
for i in $(find /lib/modules -name zfs -type d); do sudo rm -rvf $i; done
echo "##[endgroup]"
;;
esac
# This script is used for checkstyle + zloop deps also.
# Install only the needed packages and exit - when used this way.
test -z "${ONLY_DEPS:-}" || exit 0
# Start services
echo "##[group]Enable services"
case "$1" in
freebsd*)
# add virtio things
echo 'virtio_load="YES"' | sudo -E tee -a /boot/loader.conf
for i in balloon blk console random scsi; do
echo "virtio_${i}_load=\"YES\"" | sudo -E tee -a /boot/loader.conf
done
echo "fdescfs /dev/fd fdescfs rw 0 0" | sudo -E tee -a /etc/fstab
sudo -E mount /dev/fd
sudo -E touch /etc/zfs/exports
sudo -E sysrc mountd_flags="/etc/zfs/exports"
echo '[global]' | sudo -E tee /usr/local/etc/smb4.conf >/dev/null
sudo -E service nfsd enable
sudo -E service qemu-guest-agent enable
sudo -E service samba_server enable
;;
debian*|ubuntu*)
sudo -E systemctl enable nfs-kernel-server
sudo -E systemctl enable qemu-guest-agent
sudo -E systemctl enable smbd
;;
*)
# All other linux distros
sudo -E systemctl enable nfs-server
sudo -E systemctl enable qemu-guest-agent
sudo -E systemctl enable smb
;;
esac
echo "##[endgroup]"
# Setup Kernel cmdline
CMDLINE="console=tty0 console=ttyS0,115200n8"
CMDLINE="$CMDLINE selinux=0"
CMDLINE="$CMDLINE random.trust_cpu=on"
CMDLINE="$CMDLINE no_timer_check"
case "$1" in
almalinux*|centos*|fedora*)
GRUB_CFG="/boot/grub2/grub.cfg"
GRUB_MKCONFIG="grub2-mkconfig"
CMDLINE="$CMDLINE biosdevname=0 net.ifnames=0"
echo 'GRUB_SERIAL_COMMAND="serial --speed=115200"' \
| sudo tee -a /etc/default/grub >/dev/null
;;
ubuntu24)
GRUB_CFG="/boot/grub/grub.cfg"
GRUB_MKCONFIG="grub-mkconfig"
echo 'GRUB_DISABLE_OS_PROBER="false"' \
| sudo tee -a /etc/default/grub >/dev/null
;;
*)
GRUB_CFG="/boot/grub/grub.cfg"
GRUB_MKCONFIG="grub-mkconfig"
;;
esac
case "$1" in
archlinux|freebsd*)
true
;;
*)
echo "##[group]Edit kernel cmdline"
sudo sed -i -e '/^GRUB_CMDLINE_LINUX/d' /etc/default/grub || true
echo "GRUB_CMDLINE_LINUX=\"$CMDLINE\"" \
| sudo tee -a /etc/default/grub >/dev/null
sudo $GRUB_MKCONFIG -o $GRUB_CFG
echo "##[endgroup]"
;;
esac
# reset cloud-init configuration and poweroff
sudo cloud-init clean --logs
sleep 2 && sudo poweroff &
exit 0
+153
View File
@@ -0,0 +1,153 @@
#!/usr/bin/env bash
######################################################################
# 4) configure and build openzfs modules
######################################################################
set -eu
function run() {
LOG="/var/tmp/build-stderr.txt"
echo "****************************************************"
echo "$(date) ($*)"
echo "****************************************************"
($@ || echo $? > /tmp/rv) 3>&1 1>&2 2>&3 | stdbuf -eL -oL tee -a $LOG
if [ -f /tmp/rv ]; then
RV=$(cat /tmp/rv)
echo "****************************************************"
echo "exit with value=$RV ($*)"
echo "****************************************************"
echo 1 > /var/tmp/build-exitcode.txt
exit $RV
fi
}
function freebsd() {
export MAKE="gmake"
echo "##[group]Autogen.sh"
run ./autogen.sh
echo "##[endgroup]"
echo "##[group]Configure"
run ./configure \
--prefix=/usr/local \
--with-libintl-prefix=/usr/local \
--enable-pyzfs \
--enable-debug \
--enable-debuginfo
echo "##[endgroup]"
echo "##[group]Build"
run gmake -j$(sysctl -n hw.ncpu)
echo "##[endgroup]"
echo "##[group]Install"
run sudo gmake install
echo "##[endgroup]"
}
function linux() {
echo "##[group]Autogen.sh"
run ./autogen.sh
echo "##[endgroup]"
echo "##[group]Configure"
run ./configure \
--prefix=/usr \
--enable-pyzfs \
--enable-debug \
--enable-debuginfo
echo "##[endgroup]"
echo "##[group]Build"
run make -j$(nproc)
echo "##[endgroup]"
echo "##[group]Install"
run sudo make install
echo "##[endgroup]"
}
function rpm_build_and_install() {
EXTRA_CONFIG="${1:-}"
echo "##[group]Autogen.sh"
run ./autogen.sh
echo "##[endgroup]"
echo "##[group]Configure"
run ./configure --enable-debug --enable-debuginfo $EXTRA_CONFIG
echo "##[endgroup]"
echo "##[group]Build"
run make pkg-kmod pkg-utils
echo "##[endgroup]"
echo "##[group]Install"
run sudo dnf -y --nobest install $(ls *.rpm | grep -v src.rpm)
echo "##[endgroup]"
}
function deb_build_and_install() {
echo "##[group]Autogen.sh"
run ./autogen.sh
echo "##[endgroup]"
echo "##[group]Configure"
run ./configure \
--prefix=/usr \
--enable-pyzfs \
--enable-debug \
--enable-debuginfo
echo "##[endgroup]"
echo "##[group]Build"
run make native-deb-kmod native-deb-utils
echo "##[endgroup]"
echo "##[group]Install"
# Do kmod install. Note that when you build the native debs, the
# packages themselves are placed in parent directory '../' rather than
# in the source directory like the rpms are.
run sudo apt-get -y install $(find ../ | grep -E '\.deb$' \
| grep -Ev 'dkms|dracut')
echo "##[endgroup]"
}
# Debug: show kernel cmdline
if [ -f /proc/cmdline ] ; then
cat /proc/cmdline || true
fi
# save some sysinfo
uname -a > /var/tmp/uname.txt
cd $HOME/zfs
export PATH="$PATH:/sbin:/usr/sbin:/usr/local/sbin"
# build
case "$1" in
freebsd*)
freebsd
;;
alma*|centos*)
rpm_build_and_install "--with-spec=redhat"
;;
fedora*)
rpm_build_and_install
;;
debian*|ubuntu*)
deb_build_and_install
;;
*)
linux
;;
esac
# building the zfs module was ok
echo 0 > /var/tmp/build-exitcode.txt
# reset cloud-init configuration and poweroff
sudo cloud-init clean --logs
sync && sleep 2 && sudo poweroff &
exit 0
+126
View File
@@ -0,0 +1,126 @@
#!/usr/bin/env bash
######################################################################
# 5) start test machines and load openzfs module
######################################################################
set -eu
# read our defined variables
source /var/tmp/env.txt
# wait for poweroff to succeed
PID=$(pidof /usr/bin/qemu-system-x86_64)
tail --pid=$PID -f /dev/null
sudo virsh undefine openzfs
# default values per test vm:
VMs=2
CPU=2
# cpu pinning
CPUSET=("0,1" "2,3")
case "$OS" in
freebsd*)
# FreeBSD can't be optimized via ksmtuned
RAM=6
;;
*)
# Linux can be optimized via ksmtuned
RAM=8
;;
esac
# this can be different for each distro
echo "VMs=$VMs" >> $ENV
# create snapshot we can clone later
sudo zfs snapshot zpool/openzfs@now
# setup the testing vm's
PUBKEY=$(cat ~/.ssh/id_ed25519.pub)
for i in $(seq 1 $VMs); do
echo "Creating disk for vm$i..."
DISK="/dev/zvol/zpool/vm$i"
FORMAT="raw"
sudo zfs clone zpool/openzfs@now zpool/vm$i
sudo zfs create -ps -b 64k -V 80g zpool/vm$i-2
cat <<EOF > /tmp/user-data
#cloud-config
fqdn: vm$i
users:
- name: root
shell: $BASH
- name: zfs
sudo: ALL=(ALL) NOPASSWD:ALL
shell: $BASH
ssh_authorized_keys:
- $PUBKEY
growpart:
mode: auto
devices: ['/']
ignore_growroot_disabled: false
EOF
sudo virsh net-update default add ip-dhcp-host \
"<host mac='52:54:00:83:79:0$i' ip='192.168.122.1$i'/>" --live --config
sudo virt-install \
--os-variant $OSv \
--name "vm$i" \
--cpu host-passthrough \
--virt-type=kvm --hvm \
--vcpus=$CPU,sockets=1 \
--cpuset=${CPUSET[$((i-1))]} \
--memory $((1024*RAM)) \
--memballoon model=virtio \
--graphics none \
--cloud-init user-data=/tmp/user-data \
--network bridge=virbr0,model=$NIC,mac="52:54:00:83:79:0$i" \
--disk $DISK,bus=virtio,cache=none,format=$FORMAT,driver.discard=unmap \
--disk $DISK-2,bus=virtio,cache=none,format=$FORMAT,driver.discard=unmap \
--import --noautoconsole >/dev/null
done
# check the memory state from time to time
cat <<EOF > cronjob.sh
# $OS
exec 1>>/var/tmp/stats.txt
exec 2>&1
echo "*******************************************************"
date
uptime
free -m
df -h /mnt/tests
zfs list
EOF
sudo chmod +x cronjob.sh
sudo mv -f cronjob.sh /root/cronjob.sh
echo '*/5 * * * * /root/cronjob.sh' > crontab.txt
sudo crontab crontab.txt
rm crontab.txt
# check if the machines are okay
echo "Waiting for vm's to come up... (${VMs}x CPU=$CPU RAM=$RAM)"
for i in $(seq 1 $VMs); do
while true; do
ssh 2>/dev/null zfs@192.168.122.1$i "uname -a" && break
done
done
echo "All $VMs VMs are up now."
# Save the VM's serial output (ttyS0) to /var/tmp/console.txt
# - ttyS0 on the VM corresponds to a local /dev/pty/N entry
# - use 'virsh ttyconsole' to lookup the /dev/pty/N entry
for i in $(seq 1 $VMs); do
mkdir -p $RESPATH/vm$i
read "pty" <<< $(sudo virsh ttyconsole vm$i)
sudo nohup bash -c "cat $pty > $RESPATH/vm$i/console.txt" &
done
echo "Console logging for ${VMs}x $OS started."
+105
View File
@@ -0,0 +1,105 @@
#!/usr/bin/env bash
######################################################################
# 6) load openzfs module and run the tests
#
# called on runner: qemu-6-tests.sh
# called on qemu-vm: qemu-6-tests.sh $OS $2/$3
######################################################################
set -eu
function prefix() {
ID="$1"
LINE="$2"
CURRENT=$(date +%s)
TSSTART=$(cat /tmp/tsstart)
DIFF=$((CURRENT-TSSTART))
H=$((DIFF/3600))
DIFF=$((DIFF-(H*3600)))
M=$((DIFF/60))
S=$((DIFF-(M*60)))
CTR=$(cat /tmp/ctr)
echo $LINE| grep -q "^Test[: ]" && CTR=$((CTR+1)) && echo $CTR > /tmp/ctr
BASE="$HOME/work/zfs/zfs"
COLOR="$BASE/scripts/zfs-tests-color.sh"
CLINE=$(echo $LINE| grep "^Test[ :]" | sed -e 's|/usr/local|/usr|g' \
| sed -e 's| /usr/share/zfs/zfs-tests/tests/| |g' | $COLOR)
if [ -z "$CLINE" ]; then
printf "vm${ID}: %s\n" "$LINE"
else
# [vm2: 00:15:54 256] Test: functional/checksum/setup (run as root) [00:00] [PASS]
printf "[vm${ID}: %02d:%02d:%02d %4d] %s\n" \
"$H" "$M" "$S" "$CTR" "$CLINE"
fi
}
# called directly on the runner
if [ -z ${1:-} ]; then
cd "/var/tmp"
source env.txt
SSH=$(which ssh)
TESTS='$HOME/zfs/.github/workflows/scripts/qemu-6-tests.sh'
echo 0 > /tmp/ctr
date "+%s" > /tmp/tsstart
for i in $(seq 1 $VMs); do
IP="192.168.122.1$i"
daemonize -c /var/tmp -p vm${i}.pid -o vm${i}log.txt -- \
$SSH zfs@$IP $TESTS $OS $i $VMs $CI_TYPE
# handly line by line and add info prefix
stdbuf -oL tail -fq vm${i}log.txt \
| while read -r line; do prefix "$i" "$line"; done &
echo $! > vm${i}log.pid
# don't mix up the initial --- Configuration --- part
sleep 0.13
done
# wait for all vm's to finish
for i in $(seq 1 $VMs); do
tail --pid=$(cat vm${i}.pid) -f /dev/null
pid=$(cat vm${i}log.pid)
rm -f vm${i}log.pid
kill $pid
done
exit 0
fi
# this part runs inside qemu vm
export PATH="$PATH:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/sbin:/usr/local/bin"
case "$1" in
freebsd*)
sudo kldstat -n zfs 2>/dev/null && sudo kldunload zfs
sudo -E ./zfs/scripts/zfs.sh
TDIR="/usr/local/share/zfs"
;;
*)
# use xfs @ /var/tmp for all distros
sudo mv -f /var/tmp/*.txt /tmp
sudo mkfs.xfs -fq /dev/vdb
sudo mount -o noatime /dev/vdb /var/tmp
sudo chmod 1777 /var/tmp
sudo mv -f /tmp/*.txt /var/tmp
sudo -E modprobe zfs
TDIR="/usr/share/zfs"
;;
esac
# run functional testings and save exitcode
cd /var/tmp
TAGS=$2/$3
if [ "$4" == "quick" ]; then
export RUNFILES="sanity.run"
fi
sudo dmesg -c > dmesg-prerun.txt
mount > mount.txt
df -h > df-prerun.txt
$TDIR/zfs-tests.sh -vK -s 3GB -T $TAGS
RV=$?
df -h > df-postrun.txt
echo $RV > tests-exitcode.txt
sync
exit 0
+123
View File
@@ -0,0 +1,123 @@
#!/usr/bin/env bash
######################################################################
# 7) prepare output of the results
# - this script pre-creates all needed logfiles for later summary
######################################################################
set -eu
# read our defined variables
cd /var/tmp
source env.txt
mkdir -p $RESPATH
# check if building the module has failed
if [ -z ${VMs:-} ]; then
cd $RESPATH
echo ":exclamation: ZFS module didn't build successfully :exclamation:" \
| tee summary.txt | tee /tmp/summary.txt
cp /var/tmp/*.txt .
tar cf /tmp/qemu-$OS.tar -C $RESPATH -h . || true
exit 0
fi
# build was okay
BASE="$HOME/work/zfs/zfs"
MERGE="$BASE/.github/workflows/scripts/merge_summary.awk"
# catch result files of testings (vm's should be there)
for i in $(seq 1 $VMs); do
rsync -arL zfs@192.168.122.1$i:$RESPATH/current $RESPATH/vm$i || true
scp zfs@192.168.122.1$i:"/var/tmp/*.txt" $RESPATH/vm$i || true
done
cp -f /var/tmp/*.txt $RESPATH || true
cd $RESPATH
# prepare result files for summary
for i in $(seq 1 $VMs); do
file="vm$i/build-stderr.txt"
test -s $file && mv -f $file build-stderr.txt
file="vm$i/build-exitcode.txt"
test -s $file && mv -f $file build-exitcode.txt
file="vm$i/uname.txt"
test -s $file && mv -f $file uname.txt
file="vm$i/tests-exitcode.txt"
if [ ! -s $file ]; then
# XXX - add some tests for kernel panic's here
# tail -n 80 vm$i/console.txt | grep XYZ
echo 1 > $file
fi
rv=$(cat vm$i/tests-exitcode.txt)
test $rv != 0 && touch /tmp/have_failed_tests
file="vm$i/current/log"
if [ -s $file ]; then
cat $file >> log
awk '/\[FAIL\]|\[KILLED\]/{ show=1; print; next; }; \
/\[SKIP\]|\[PASS\]/{ show=0; } show' \
$file > /tmp/vm${i}dbg.txt
fi
file="vm${i}log.txt"
fileC="/tmp/vm${i}log.txt"
if [ -s $file ]; then
cat $file >> summary
cat $file | $BASE/scripts/zfs-tests-color.sh > $fileC
fi
done
# create summary of tests
if [ -s summary ]; then
$MERGE summary | grep -v '^/' > summary.txt
$MERGE summary | $BASE/scripts/zfs-tests-color.sh > /tmp/summary.txt
rm -f summary
else
touch summary.txt /tmp/summary.txt
fi
# create file for debugging
if [ -s log ]; then
awk '/\[FAIL\]|\[KILLED\]/{ show=1; print; next; }; \
/\[SKIP\]|\[PASS\]/{ show=0; } show' \
log > summary-failure-logs.txt
rm -f log
else
touch summary-failure-logs.txt
fi
# create debug overview for failed tests
cat summary.txt \
| awk '/\(expected PASS\)/{ if ($1!="SKIP") print $2; next; } show' \
| while read t; do
cat summary-failure-logs.txt \
| awk '$0~/Test[: ]/{ show=0; } $0~v{ show=1; } show' v="$t" \
> /tmp/fail.txt
SIZE=$(stat --printf="%s" /tmp/fail.txt)
SIZE=$((SIZE/1024))
# Test Summary:
echo "##[group]$t ($SIZE KiB)" >> /tmp/failed.txt
cat /tmp/fail.txt | $BASE/scripts/zfs-tests-color.sh >> /tmp/failed.txt
echo "##[endgroup]" >> /tmp/failed.txt
# Job Summary:
echo -e "\n<details>\n<summary>$t ($SIZE KiB)</summary><pre>" >> failed.txt
cat /tmp/fail.txt >> failed.txt
echo "</pre></details>" >> failed.txt
done
if [ -e /tmp/have_failed_tests ]; then
echo ":warning: Some tests failed!" >> failed.txt
else
echo ":thumbsup: All tests passed." >> failed.txt
fi
if [ ! -s uname.txt ]; then
echo ":interrobang: Panic - where is my uname.txt?" > uname.txt
fi
# artifact ready now
tar cf /tmp/qemu-$OS.tar -C $RESPATH -h . || true
+71
View File
@@ -0,0 +1,71 @@
#!/usr/bin/env bash
######################################################################
# 8) show colored output of results
######################################################################
set -eu
# read our defined variables
source /var/tmp/env.txt
cd $RESPATH
# helper function for showing some content with headline
function showfile() {
content=$(dd if=$1 bs=1024 count=400k 2>/dev/null)
if [ -z "$2" ]; then
group1=""
group2=""
else
SIZE=$(stat --printf="%s" "$file")
SIZE=$((SIZE/1024))
group1="##[group]$2 ($SIZE KiB)"
group2="##[endgroup]"
fi
cat <<EOF > tmp$$
$group1
$content
$group2
EOF
cat tmp$$
rm -f tmp$$
}
# overview
cat /tmp/summary.txt
echo ""
if [ -f /tmp/have_failed_tests -a -s /tmp/failed.txt ]; then
echo "Debuginfo of failed tests:"
cat /tmp/failed.txt
echo ""
cat /tmp/summary.txt | grep -v '^/'
echo ""
fi
echo -e "\nFull logs for download:\n $1\n"
for i in $(seq 1 $VMs); do
rv=$(cat vm$i/tests-exitcode.txt)
if [ $rv = 0 ]; then
vm="vm$i"
else
vm="vm$i"
fi
file="vm$i/dmesg-prerun.txt"
test -s "$file" && showfile "$file" "$vm: dmesg kernel"
file="/tmp/vm${i}log.txt"
test -s "$file" && showfile "$file" "$vm: test results"
file="vm$i/console.txt"
test -s "$file" && showfile "$file" "$vm: serial console"
file="/tmp/vm${i}dbg.txt"
test -s "$file" && showfile "$file" "$vm: failure logfile"
done
test -f /tmp/have_failed_tests && exit 1
exit 0
+57
View File
@@ -0,0 +1,57 @@
#!/usr/bin/env bash
######################################################################
# 9) generate github summary page of all the testings
######################################################################
set -eu
function output() {
echo -e $* >> "out-$logfile.md"
}
function outfile() {
cat "$1" >> "out-$logfile.md"
}
function outfile_plain() {
output "<pre>"
cat "$1" >> "out-$logfile.md"
output "</pre>"
}
function send2github() {
test -f "$1" || exit 0
dd if="$1" bs=1023k count=1 >> $GITHUB_STEP_SUMMARY
}
# https://docs.github.com/en/enterprise-server@3.6/actions/using-workflows/workflow-commands-for-github-actions#step-isolation-and-limits
# Job summaries are isolated between steps and each step is restricted to a maximum size of 1MiB.
# [ ] can not show all error findings here
# [x] split files into smaller ones and create additional steps
# first call, generate all summaries
if [ ! -f out-1.md ]; then
logfile="1"
for tarfile in Logs-functional-*/qemu-*.tar; do
rm -rf vm* *.txt
if [ ! -s "$tarfile" ]; then
output "\n## Functional Tests: unknown\n"
output ":exclamation: Tarfile $tarfile is empty :exclamation:"
continue
fi
tar xf "$tarfile"
test -s env.txt || continue
source env.txt
# when uname.txt is there, the other files are also ok
test -s uname.txt || continue
output "\n## Functional Tests: $OSNAME\n"
outfile_plain uname.txt
outfile_plain summary.txt
outfile failed.txt
logfile=$((logfile+1))
done
send2github out-1.md
else
send2github out-$1.md
fi
+203
View File
@@ -0,0 +1,203 @@
name: zfs-qemu
on:
push:
pull_request:
workflow_dispatch:
inputs:
include_stream9:
type: boolean
required: false
default: false
description: 'Test on CentOS 9 stream'
include_stream10:
type: boolean
required: false
default: false
description: 'Test on CentOS 10 stream'
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
test-config:
name: Setup
runs-on: ubuntu-24.04
outputs:
test_os: ${{ steps.os.outputs.os }}
ci_type: ${{ steps.os.outputs.ci_type }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Generate OS config and CI type
id: os
run: |
FULL_OS='["almalinux8", "almalinux9", "debian11", "debian12", "fedora40", "fedora41", "freebsd13-3r", "freebsd13-4s", "freebsd14-1r", "freebsd14-2s", "freebsd15-0c", "ubuntu20", "ubuntu22", "ubuntu24"]'
QUICK_OS='["almalinux8", "almalinux9", "debian12", "fedora41", "freebsd13-3r", "freebsd14-2r", "ubuntu24"]'
# determine CI type when running on PR
ci_type="full"
if ${{ github.event_name == 'pull_request' }}; then
head=${{ github.event.pull_request.head.sha }}
base=${{ github.event.pull_request.base.sha }}
ci_type=$(python3 .github/workflows/scripts/generate-ci-type.py $head $base)
fi
if [ "$ci_type" == "quick" ]; then
os_selection="$QUICK_OS"
else
os_selection="$FULL_OS"
fi
os_json=$(echo ${os_selection} | jq -c)
# Add optional runners
if [ "${{ github.event.inputs.include_stream9 }}" == 'true' ]; then
os_json=$(echo $os_json | jq -c '. += ["centos-stream9"]')
fi
if [ "${{ github.event.inputs.include_stream10 }}" == 'true' ]; then
os_json=$(echo $os_json | jq -c '. += ["centos-stream10"]')
fi
echo $os_json
echo "os=$os_json" >> $GITHUB_OUTPUT
echo "ci_type=$ci_type" >> $GITHUB_OUTPUT
qemu-vm:
name: qemu-x86
needs: [ test-config ]
strategy:
fail-fast: false
matrix:
# rhl: almalinux8, almalinux9, centos-stream9, fedora40, fedora41
# debian: debian11, debian12, ubuntu20, ubuntu22, ubuntu24
# misc: archlinux, tumbleweed
# FreeBSD variants of 2024-12:
# FreeBSD Release: freebsd13-3r, freebsd13-4r, freebsd14-1r, freebsd14-2r
# FreeBSD Stable: freebsd13-4s, freebsd14-2s
# FreeBSD Current: freebsd15-0c
os: ${{ fromJson(needs.test-config.outputs.test_os) }}
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Setup QEMU
timeout-minutes: 10
run: .github/workflows/scripts/qemu-1-setup.sh
- name: Start build machine
timeout-minutes: 10
run: .github/workflows/scripts/qemu-2-start.sh ${{ matrix.os }}
- name: Install dependencies
timeout-minutes: 20
run: |
echo "Install dependencies in QEMU machine"
IP=192.168.122.10
while pidof /usr/bin/qemu-system-x86_64 >/dev/null; do
ssh 2>/dev/null zfs@$IP "uname -a" && break
done
scp .github/workflows/scripts/qemu-3-deps.sh zfs@$IP:qemu-3-deps.sh
PID=`pidof /usr/bin/qemu-system-x86_64`
ssh zfs@$IP '$HOME/qemu-3-deps.sh' ${{ matrix.os }}
# wait for poweroff to succeed
tail --pid=$PID -f /dev/null
sleep 5 # avoid this: "error: Domain is already active"
rm -f $HOME/.ssh/known_hosts
- name: Build modules
timeout-minutes: 30
run: |
echo "Build modules in QEMU machine"
sudo virsh start openzfs
IP=192.168.122.10
while pidof /usr/bin/qemu-system-x86_64 >/dev/null; do
ssh 2>/dev/null zfs@$IP "uname -a" && break
done
rsync -ar $HOME/work/zfs/zfs zfs@$IP:./
ssh zfs@$IP '$HOME/zfs/.github/workflows/scripts/qemu-4-build.sh' ${{ matrix.os }}
- name: Setup testing machines
timeout-minutes: 5
run: .github/workflows/scripts/qemu-5-setup.sh
- name: Run tests
timeout-minutes: 270
run: .github/workflows/scripts/qemu-6-tests.sh
env:
CI_TYPE: ${{ needs.test-config.outputs.ci_type }}
- name: Prepare artifacts
if: always()
timeout-minutes: 10
run: .github/workflows/scripts/qemu-7-prepare.sh
- uses: actions/upload-artifact@v4
id: artifact-upload
if: always()
with:
name: Logs-functional-${{ matrix.os }}
path: /tmp/qemu-${{ matrix.os }}.tar
if-no-files-found: ignore
- name: Test Summary
if: always()
run: .github/workflows/scripts/qemu-8-summary.sh '${{ steps.artifact-upload.outputs.artifact-url }}'
cleanup:
if: always()
name: Cleanup
runs-on: ubuntu-latest
needs: [ qemu-vm ]
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
- uses: actions/download-artifact@v4
- name: Generating summary
run: .github/workflows/scripts/qemu-9-summary-page.sh
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 2
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 3
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 4
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 5
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 6
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 7
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 8
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 9
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 10
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 11
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 12
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 13
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 14
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 15
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 16
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 17
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 18
- name: Generating summary...
run: .github/workflows/scripts/qemu-9-summary-page.sh 19
- uses: actions/upload-artifact@v4
with:
name: Summary Files
path: out-*
+77
View File
@@ -0,0 +1,77 @@
name: zloop
on:
push:
pull_request:
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
zloop:
runs-on: ubuntu-24.04
env:
TEST_DIR: /var/tmp/zloop
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Install dependencies
run: |
sudo apt-get purge -y snapd google-chrome-stable firefox
ONLY_DEPS=1 .github/workflows/scripts/qemu-3-deps.sh ubuntu24
- name: Autogen.sh
run: |
sed -i '/DEBUG_CFLAGS="-Werror"/s/^/#/' config/zfs-build.m4
./autogen.sh
- name: Configure
run: |
./configure --prefix=/usr --enable-debug --enable-debuginfo \
--enable-asan --enable-ubsan \
--enable-debug-kmem --enable-debug-kmem-tracking
- name: Make
run: |
make -j$(nproc)
- name: Install
run: |
sudo make install
sudo depmod
sudo modprobe zfs
- name: Tests
run: |
sudo mkdir -p $TEST_DIR
# run for 10 minutes or at most 6 iterations for a maximum runner
# time of 60 minutes.
sudo /usr/share/zfs/zloop.sh -t 600 -I 6 -l -m 1 -- -T 120 -P 60
- name: Prepare artifacts
if: failure()
run: |
sudo chmod +r -R $TEST_DIR/
- name: Ztest log
if: failure()
run: |
grep -B10 -A1000 'ASSERT' $TEST_DIR/*/ztest.out || tail -n 1000 $TEST_DIR/*/ztest.out
- name: Gdb log
if: failure()
run: |
sed -n '/Backtraces (full)/q;p' $TEST_DIR/*/ztest.gdb
- name: Zdb log
if: failure()
run: |
cat $TEST_DIR/*/ztest.zdb
- uses: actions/upload-artifact@v4
if: failure()
with:
name: Logs
path: |
/var/tmp/zloop/*/
!/var/tmp/zloop/*/vdev/
if-no-files-found: ignore
- uses: actions/upload-artifact@v4
if: failure()
with:
name: Pool files
path: |
/var/tmp/zloop/*/vdev/
if-no-files-found: ignore
+56 -36
View File
@@ -1,6 +1,7 @@
#
# N.B.
# This is the toplevel .gitignore file.
# This is the top-level .gitignore file:
# ignore everything except a list of allowed files.
#
# This is not the place for entries that are specific to
# a subdirectory. Instead add those files to the
# .gitignore file in that subdirectory.
@@ -10,6 +11,57 @@
# command after changing this file, to see if there are
# any tracked files which get ignored after the change.
*
!.github
!cmd
!config
!contrib
!etc
!include
!lib
!man
!module
!rpm
!scripts
!tests
!udev
!.github/**
!cmd/**
!config/**
!contrib/**
!etc/**
!include/**
!lib/**
!man/**
!module/**
!rpm/**
!scripts/**
!tests/**
!udev/**
!.editorconfig
!.cirrus.yml
!.gitignore
!.gitmodules
!.mailmap
!AUTHORS
!autogen.sh
!CODE_OF_CONDUCT.md
!configure.ac
!copy-builtin
!COPYRIGHT
!LICENSE
!Makefile.am
!META
!NEWS
!NOTICE
!README.md
!RELEASES.md
!TEST
!zfs.release.in
#
# Normal rules
#
@@ -31,40 +83,8 @@
modules.order
Makefile
Makefile.in
#
# Top level generated files specific to this top level dir
#
/bin
/build
/configure
/config.log
/config.status
/libtool
/zfs_config.h
/zfs_config.h.in
/zfs.release
/stamp-h1
/aclocal.m4
/autom4te.cache
#
# Top level generic files
#
!.gitignore
tags
TAGS
current
cscope.*
*.rpm
*.deb
*.tar.gz
changelog
*.patch
*.orig
*.log
*.tmp
venv
*.so
*.so.debug
*.so.full
*.log
+1 -1
View File
@@ -1,3 +1,3 @@
[submodule "scripts/zfs-images"]
path = scripts/zfs-images
url = https://github.com/zfsonlinux/zfs-images
url = https://github.com/openzfs/zfs-images
+216
View File
@@ -0,0 +1,216 @@
# This file maps the name+email seen in a commit back to a canonical
# name+email. Git will replace the commit name/email with the canonical version
# wherever it sees it.
#
# If there is a commit in the history with a "wrong" name or email, list it
# here. If you regularly commit with an alternate name or email address and
# would like to ensure that you are always listed consistently in the repo, add
# mapping here.
#
# On the other hand, if you use multiple names or email addresses legitimately
# (eg you use a company email address for your paid OpenZFS work, and a
# personal address for your evening side projects), then don't map one to the
# other here.
#
# The most common formats are:
#
# Canonical Name <canonical-email>
# Canonical Name <canonical-email> <commit-email>
# Canonical Name <canonical-email> Commit Name <commit-email>
#
# See https://git-scm.com/docs/gitmailmap for more info.
# These maps are making names consistent where they have varied but the email
# address has never changed. In most cases, the full name is in the
# Signed-off-by of a commit with a matching author.
Ahelenia Ziemiańska <nabijaczleweli@gmail.com>
Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Alex John <alex@stty.io>
Andreas Dilger <adilger@dilger.ca>
Andrew Walker <awalker@ixsystems.com>
Benedikt Neuffer <github@itfriend.de>
Chengfei Zhu <chengfeix.zhu@intel.com>
ChenHao Lu <18302010006@fudan.edu.cn>
Chris Lindee <chris.lindee+github@gmail.com>
Colm Buckley <colm@tuatha.org>
Crag Wang <crag0715@gmail.com>
Damian Szuberski <szuberskidamian@gmail.com>
Daniel Kolesa <daniel@octaforge.org>
Debabrata Banerjee <dbavatar@gmail.com>
Finix Yan <yanchongwen@hotmail.com>
Gaurav Kumar <gauravk.18@gmail.com>
Gionatan Danti <g.danti@assyoma.it>
Glenn Washburn <development@efficientek.com>
Gordan Bobic <gordan.bobic@gmail.com>
Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
hedong zhang <h_d_zhang@163.com>
Ilkka Sovanto <github@ilkka.kapsi.fi>
InsanePrawn <Insane.Prawny@gmail.com>
Jason Cohen <jwittlincohen@gmail.com>
Jason Harmening <jason.harmening@gmail.com>
Jeremy Faulkner <gldisater@gmail.com>
Jinshan Xiong <jinshan.xiong@gmail.com>
John Poduska <jpoduska@datto.com>
Justin Scholz <git@justinscholz.de>
Ka Ho Ng <khng300@gmail.com>
Kash Pande <github@tripleback.net>
Kay Pedersen <christianpe96@gmail.com>
KernelOfTruth <kerneloftruth@gmail.com>
Liu Hua <liu.hua130@zte.com.cn>
Liu Qing <winglq@gmail.com>
loli10K <ezomori.nozomu@gmail.com>
Mart Frauenlob <allkind@fastest.cc>
Matthias Blankertz <matthias@blankertz.org>
Michael Gmelin <grembo@FreeBSD.org>
Olivier Mazouffre <olivier.mazouffre@ims-bordeaux.fr>
Piotr Kubaj <pkubaj@anongoth.pl>
Quentin Zdanis <zdanisq@gmail.com>
Roberto Ricci <ricci@disroot.org>
Rob Norris <robn@despairlabs.com>
Rob Norris <rob.norris@klarasystems.com>
Sam Lunt <samuel.j.lunt@gmail.com>
Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
Sebastian Wuerl <s.wuerl@mailbox.org>
Stoiko Ivanov <github@nomore.at>
Tamas TEVESZ <ice@extreme.hu>
WHR <msl0000023508@gmail.com>
Yanping Gao <yanping.gao@xtaotech.com>
Youzhong Yang <youzhong@gmail.com>
# Signed-off-by: overriding Author:
Ryan <errornointernet@envs.net> <error.nointernet@gmail.com>
Sietse <sietse@wizdom.nu> <uglymotha@wizdom.nu>
Qiuhao Chen <chenqiuhao1997@gmail.com> <haohao0924@126.com>
Yuxin Wang <yuxinwang9999@gmail.com> <Bi11gates9999@gmail.com>
Zhenlei Huang <zlei@FreeBSD.org> <zlei.huang@gmail.com>
# Commits from strange places, long ago
Brian Behlendorf <behlendorf1@llnl.gov> <behlendo@7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c>
Brian Behlendorf <behlendorf1@llnl.gov> <behlendo@fedora-17-amd64.(none)>
Brian Behlendorf <behlendorf1@llnl.gov> <behlendo@myhost.(none)>
Brian Behlendorf <behlendorf1@llnl.gov> <ubuntu@ip-172-31-16-145.us-west-1.compute.internal>
Brian Behlendorf <behlendorf1@llnl.gov> <ubuntu@ip-172-31-20-6.us-west-1.compute.internal>
Herb Wartens <wartens2@llnl.gov> <wartens2@7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c>
Ned Bass <bass6@llnl.gov> <bass6@zeno1.(none)>
Tulsi Jain <tulsi.jain@delphix.com> <tulsi.jain@Tulsi-Jains-MacBook-Pro.local>
# Mappings from Github no-reply addresses
ajs124 <git@ajs124.de> <ajs124@users.noreply.github.com>
Alek Pinchuk <apinchuk@axcient.com> <alek-p@users.noreply.github.com>
Alexander Lobakin <alobakin@pm.me> <solbjorn@users.noreply.github.com>
Alexey Smirnoff <fling@member.fsf.org> <fling-@users.noreply.github.com>
Allen Holl <allen.m.holl@gmail.com> <65494904+allen-4@users.noreply.github.com>
Alphan Yılmaz <alphanyilmaz@gmail.com> <a1ea321@users.noreply.github.com>
Ameer Hamza <ahamza@ixsystems.com> <106930537+ixhamza@users.noreply.github.com>
Andrew J. Hesford <ajh@sideband.org> <48421688+ahesford@users.noreply.github.com>>
Andrew Sun <me@andrewsun.com> <as-com@users.noreply.github.com>
Aron Xu <happyaron.xu@gmail.com> <happyaron@users.noreply.github.com>
Arun KV <arun.kv@datacore.com> <65647132+arun-kv@users.noreply.github.com>
Ben Wolsieffer <benwolsieffer@gmail.com> <lopsided98@users.noreply.github.com>
bernie1995 <bernie.pikes@gmail.com> <42413912+bernie1995@users.noreply.github.com>
Bojan Novković <bnovkov@FreeBSD.org> <72801811+bnovkov@users.noreply.github.com>
Boris Protopopov <boris.protopopov@actifio.com> <bprotopopov@users.noreply.github.com>
Brad Forschinger <github@bnjf.id.au> <bnjf@users.noreply.github.com>
Brandon Thetford <brandon@dodecatec.com> <dodexahedron@users.noreply.github.com>
buzzingwires <buzzingwires@outlook.com> <131118055+buzzingwires@users.noreply.github.com>
Cedric Maunoury <cedric.maunoury@gmail.com> <38213715+cedricmaunoury@users.noreply.github.com>
Charles Suh <charles.suh@gmail.com> <charlessuh@users.noreply.github.com>
Chris Peredun <chris.peredun@ixsystems.com> <126915832+chrisperedun@users.noreply.github.com>
Dacian Reece-Stremtan <dacianstremtan@gmail.com> <35844628+dacianstremtan@users.noreply.github.com>
Damian Szuberski <szuberskidamian@gmail.com> <30863496+szubersk@users.noreply.github.com>
Daniel Hiepler <d-git@coderdu.de> <32984777+heeplr@users.noreply.github.com>
Daniel Kobras <d.kobras@science-computing.de> <sckobras@users.noreply.github.com>
Daniel Reichelt <hacking@nachtgeist.net> <nachtgeist@users.noreply.github.com>
David Quigley <david.quigley@intel.com> <dpquigl@users.noreply.github.com>
Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com> <31087738+dennisfriedrichsen@users.noreply.github.com>
Dex Wood <slash2314@gmail.com> <slash2314@users.noreply.github.com>
DHE <git@dehacked.net> <DeHackEd@users.noreply.github.com>
Dmitri John Ledkov <dimitri.ledkov@canonical.com> <19779+xnox@users.noreply.github.com>
Dries Michiels <driesm.michiels@gmail.com> <32487486+driesmp@users.noreply.github.com>
Edmund Nadolski <edmund.nadolski@ixsystems.com> <137826107+ednadolski-ix@users.noreply.github.com>
Érico Nogueira <erico.erc@gmail.com> <34201958+ericonr@users.noreply.github.com>
Fedor Uporov <fuporov.vstack@gmail.com> <60701163+fuporovvStack@users.noreply.github.com>
Felix Dörre <felix@dogcraft.de> <felixdoerre@users.noreply.github.com>
Felix Neumärker <xdch47@posteo.de> <34678034+xdch47@users.noreply.github.com>
Finix Yan <yancw@info2soft.com> <Finix1979@users.noreply.github.com>
Gaurav Kumar <gauravk.18@gmail.com> <gaurkuma@users.noreply.github.com>
George Gaydarov <git@gg7.io> <gg7@users.noreply.github.com>
Georgy Yakovlev <gyakovlev@gentoo.org> <168902+gyakovlev@users.noreply.github.com>
Gerardwx <gerardw@alum.mit.edu> <Gerardwx@users.noreply.github.com>
Gian-Carlo DeFazio <defazio1@llnl.gov> <defaziogiancarlo@users.noreply.github.com>
Giuseppe Di Natale <dinatale2@llnl.gov> <dinatale2@users.noreply.github.com>
Hajo Möller <dasjoe@gmail.com> <dasjoe@users.noreply.github.com>
Harry Mallon <hjmallon@gmail.com> <1816667+hjmallon@users.noreply.github.com>
Hiếu Lê <leorize+oss@disroot.org> <alaviss@users.noreply.github.com>
Jake Howard <git@theorangeone.net> <RealOrangeOne@users.noreply.github.com>
James Cowgill <james.cowgill@mips.com> <jcowgill@users.noreply.github.com>
Jaron Kent-Dobias <jaron@kent-dobias.com> <kentdobias@users.noreply.github.com>
Jason King <jason.king@joyent.com> <jasonbking@users.noreply.github.com>
Jeff Dike <jdike@akamai.com> <52420226+jdike@users.noreply.github.com>
Jitendra Patidar <jitendra.patidar@nutanix.com> <53164267+jsai20@users.noreply.github.com>
João Carlos Mendes Luís <jonny@jonny.eng.br> <dioni21@users.noreply.github.com>
John Eismeier <john.eismeier@gmail.com> <32205350+jeis2497052@users.noreply.github.com>
John L. Hammond <john.hammond@intel.com> <35266395+jhammond-intel@users.noreply.github.com>
John-Mark Gurney <jmg@funkthat.com> <jmgurney@users.noreply.github.com>
John Ramsden <johnramsden@riseup.net> <johnramsden@users.noreply.github.com>
Jonathon Fernyhough <jonathon@m2x.dev> <559369+jonathonf@users.noreply.github.com>
Jose Luis Duran <jlduran@gmail.com> <jlduran@users.noreply.github.com>
Justin Hibbits <chmeeedalf@gmail.com> <chmeeedalf@users.noreply.github.com>
Kevin Greene <kevin.greene@delphix.com> <104801862+kxgreene@users.noreply.github.com>
Kevin Jin <lostking2008@hotmail.com> <33590050+jxdking@users.noreply.github.com>
Kevin P. Fleming <kevin@km6g.us> <kpfleming@users.noreply.github.com>
Krzysztof Piecuch <piecuch@kpiecuch.pl> <3964215+pikrzysztof@users.noreply.github.com>
Kyle Evans <kevans@FreeBSD.org> <kevans91@users.noreply.github.com>
Laurențiu Nicola <lnicola@dend.ro> <lnicola@users.noreply.github.com>
loli10K <ezomori.nozomu@gmail.com> <loli10K@users.noreply.github.com>
Lorenz Hüdepohl <dev@stellardeath.org> <lhuedepohl@users.noreply.github.com>
Luís Henriques <henrix@camandro.org> <73643340+lumigch@users.noreply.github.com>
Marcin Skarbek <git@skarbek.name> <mskarbek@users.noreply.github.com>
Matt Fiddaman <github@m.fiddaman.uk> <81489167+matt-fidd@users.noreply.github.com>
Maxim Filimonov <che@bein.link> <part1zano@users.noreply.github.com>
Max Zettlmeißl <max@zettlmeissl.de> <6818198+maxz@users.noreply.github.com>
Michael Niewöhner <foss@mniewoehner.de> <c0d3z3r0@users.noreply.github.com>
Michael Zhivich <mzhivich@akamai.com> <33133421+mzhivich@users.noreply.github.com>
MigeljanImeri <ImeriMigel@gmail.com> <78048439+MigeljanImeri@users.noreply.github.com>
Mo Zhou <cdluminate@gmail.com> <5723047+cdluminate@users.noreply.github.com>
Nick Mattis <nickm970@gmail.com> <nmattis@users.noreply.github.com>
omni <omni+vagant@hack.org> <79493359+omnivagant@users.noreply.github.com>
Pablo Correa Gómez <ablocorrea@hotmail.com> <32678034+pablofsf@users.noreply.github.com>
Paul Zuchowski <pzuchowski@datto.com> <31706010+PaulZ-98@users.noreply.github.com>
Peter Ashford <ashford@accs.com> <pashford@users.noreply.github.com>
Peter Dave Hello <hsu@peterdavehello.org> <PeterDaveHello@users.noreply.github.com>
Peter Wirdemo <peter.wirdemo@gmail.com> <4224155+pewo@users.noreply.github.com>
Petros Koutoupis <petros@petroskoutoupis.com> <pkoutoupis@users.noreply.github.com>
Ping Huang <huangping@smartx.com> <101400146+hpingfs@users.noreply.github.com>
Piotr P. Stefaniak <pstef@freebsd.org> <pstef@users.noreply.github.com>
Richard Allen <belperite@gmail.com> <33836503+belperite@users.noreply.github.com>
Rich Ercolani <rincebrain@gmail.com> <214141+rincebrain@users.noreply.github.com>
Rick Macklem <rmacklem@uoguelph.ca> <64620010+rmacklem@users.noreply.github.com>
Rob Wing <rob.wing@klarasystems.com> <98866084+rob-wing@users.noreply.github.com>
Roman Strashkin <roman.strashkin@nexenta.com> <Ramzec@users.noreply.github.com>
Ryan Hirasaki <ryanhirasaki@gmail.com> <4690732+RyanHir@users.noreply.github.com>
Samuel Wycliffe J <samwyc@hpe.com> <115969550+samwyc@users.noreply.github.com>
Samuel Wycliffe <samuelwycliffe@gmail.com> <50765275+npc203@users.noreply.github.com>
Savyasachee Jha <hi@savyasacheejha.com> <savyajha@users.noreply.github.com>
Scott Colby <scott@scolby.com> <scolby33@users.noreply.github.com>
Sean Eric Fagan <kithrup@mac.com> <kithrup@users.noreply.github.com>
Spencer Kinny <spencerkinny1995@gmail.com> <30333052+Spencer-Kinny@users.noreply.github.com>
Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com> <75025422+nssrikanth@users.noreply.github.com>
Stefan Lendl <s.lendl@proxmox.com> <1321542+stfl@users.noreply.github.com>
Thomas Bertschinger <bertschinger@lanl.gov> <101425190+bertschinger@users.noreply.github.com>
Thomas Geppert <geppi@digitx.de> <geppi@users.noreply.github.com>
Tim Crawford <tcrawford@datto.com> <crawfxrd@users.noreply.github.com>
Todd Seidelmann <18294602+seidelma@users.noreply.github.com>
Tom Matthews <tom@axiom-partners.com> <tomtastic@users.noreply.github.com>
Tony Perkins <tperkins@datto.com> <62951051+tony-zfs@users.noreply.github.com>
Torsten Wörtwein <twoertwein@gmail.com> <twoertwein@users.noreply.github.com>
Tulsi Jain <tulsi.jain@delphix.com> <TulsiJain@users.noreply.github.com>
Václav Skála <skala@vshosting.cz> <33496485+vaclavskala@users.noreply.github.com>
Vaibhav Bhanawat <vaibhav.bhanawat@delphix.com> <88050553+vaibhav-delphix@users.noreply.github.com>
Violet Purcell <vimproved@inventati.org> <66446404+vimproved@users.noreply.github.com>
Vipin Kumar Verma <vipin.verma@hpe.com> <75025470+vermavipinkumar@users.noreply.github.com>
Wolfgang Bumiller <w.bumiller@proxmox.com> <Blub@users.noreply.github.com>
XDTG <click1799@163.com> <35128600+XDTG@users.noreply.github.com>
xtouqh <xtouqh@hotmail.com> <72357159+xtouqh@users.noreply.github.com>
Yuri Pankov <yuripv@FreeBSD.org> <113725409+yuripv@users.noreply.github.com>
Yuri Pankov <yuripv@FreeBSD.org> <82001006+yuripv@users.noreply.github.com>
+398 -26
View File
@@ -10,299 +10,671 @@ PAST MAINTAINERS:
CONTRIBUTORS:
Aaron Fineman <abyxcos@gmail.com>
Adam D. Moss <c@yotes.com>
Adam Leventhal <ahl@delphix.com>
Adam Stevko <adam.stevko@gmail.com>
adisbladis <adis@blad.is>
Adrian Chadd <adrian@freebsd.org>
Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Ahmed G <ahmedg@delphix.com>
Aidan Harris <me@aidanharr.is>
AJ Jordan <alex@strugee.net>
ajs124 <git@ajs124.de>
Akash Ayare <aayare@delphix.com>
Akash B <akash-b@hpe.com>
Alan Somers <asomers@gmail.com>
Alar Aun <spamtoaun@gmail.com>
Albert Lee <trisk@nexenta.com>
Alec Salazar <alec.j.salazar@gmail.com>
Alejandro Colomar <Colomar.6.4.3@GMail.com>
Alejandro R. Sedeño <asedeno@mit.edu>
Alek Pinchuk <alek@nexenta.com>
Aleksa Sarai <cyphar@cyphar.com>
Alexander Eremin <a.eremin@nexenta.com>
Alexander Lobakin <alobakin@pm.me>
Alexander Motin <mav@freebsd.org>
Alexander Pyhalov <apyhalov@gmail.com>
Alexander Richardson <Alexander.Richardson@cl.cam.ac.uk>
Alexander Stetsenko <ams@nexenta.com>
Alex Braunegg <alex.braunegg@gmail.com>
Alexey Shvetsov <alexxy@gentoo.org>
Alexey Smirnoff <fling@member.fsf.org>
Alex John <alex@stty.io>
Alex McWhirter <alexmcwhirter@triadic.us>
Alex Reece <alex@delphix.com>
Alex Wilson <alex.wilson@joyent.com>
Alex Zhuravlev <alexey.zhuravlev@intel.com>
Alexander Eremin <a.eremin@nexenta.com>
Alexander Motin <mav@freebsd.org>
Alexander Pyhalov <apyhalov@gmail.com>
Alexander Stetsenko <ams@nexenta.com>
Alexey Shvetsov <alexxy@gentoo.org>
Alexey Smirnoff <fling@member.fsf.org>
Allan Jude <allanjude@freebsd.org>
Allen Holl <allen.m.holl@gmail.com>
Alphan Yılmaz <alphanyilmaz@gmail.com>
alteriks <alteriks@gmail.com>
Alyssa Ross <hi@alyssa.is>
Ameer Hamza <ahamza@ixsystems.com>
Anatoly Borodin <anatoly.borodin@gmail.com>
AndCycle <andcycle@andcycle.idv.tw>
Andrea Gelmini <andrea.gelmini@gelma.net>
Andrea Righi <andrea.righi@canonical.com>
Andreas Buschmann <andreas.buschmann@tech.net.de>
Andreas Dilger <adilger@intel.com>
Andreas Vögele <andreas@andreasvoegele.com>
Andrew Barnes <barnes333@gmail.com>
Andrew Hamilton <ahamilto@tjhsst.edu>
Andrew Innes <andrew.c12@gmail.com>
Andrew J. Hesford <ajh@sideband.org>
Andrew Reid <ColdCanuck@nailedtotheperch.com>
Andrew Stormont <andrew.stormont@nexenta.com>
Andrew Sun <me@andrewsun.com>
Andrew Tselischev <andrewtselischev@gmail.com>
Andrew Turner <andrew@fubar.geek.nz>
Andrew Walker <awalker@ixsystems.com>
Andrey Prokopenko <job@terem.fr>
Andrey Vesnovaty <andrey.vesnovaty@gmail.com>
Andriy Gapon <avg@freebsd.org>
Andy Bakun <github@thwartedefforts.org>
Andy Fiddaman <omnios@citrus-it.co.uk>
Aniruddha Shankar <k@191a.net>
Anton Gubarkov <anton.gubarkov@gmail.com>
Antonio Russo <antonio.e.russo@gmail.com>
Arkadiusz Bubała <arkadiusz.bubala@open-e.com>
Armin Wehrfritz <dkxls23@gmail.com>
Arne Jansen <arne@die-jansens.de>
Aron Xu <happyaron.xu@gmail.com>
Arshad Hussain <arshad.hussain@aeoncomputing.com>
Arun KV <arun.kv@datacore.com>
Arvind Sankar <nivedita@alum.mit.edu>
Attila Fülöp <attila@fueloep.org>
Avatat <kontakt@avatat.pl>
Bart Coddens <bart.coddens@gmail.com>
Basil Crow <basil.crow@delphix.com>
Huang Liu <liu.huang@zte.com.cn>
Bassu <bassu@phi9.com>
Ben Allen <bsallen@alcf.anl.gov>
Ben Rubson <ben.rubson@gmail.com>
Ben Cordero <bencord0@condi.me>
Benda Xu <orv@debian.org>
Benedikt Neuffer <github@itfriend.de>
Benjamin Albrecht <git@albrecht.io>
Benjamin Gentil <benjgentil.pro@gmail.com>
Benjamin Sherman <benjamin@holyarmy.org>
Ben McGough <bmcgough@fredhutch.org>
Ben Rubson <ben.rubson@gmail.com>
Ben Wolsieffer <benwolsieffer@gmail.com>
bernie1995 <bernie.pikes@gmail.com>
Bill McGonigle <bill-github.com-public1@bfccomputing.com>
Bill Pijewski <wdp@joyent.com>
Bojan Novković <bnovkov@FreeBSD.org>
Boris Protopopov <boris.protopopov@nexenta.com>
Brad Forschinger <github@bnjf.id.au>
Brad Lewis <brad.lewis@delphix.com>
Brandon Thetford <brandon@dodecatec.com>
Brian Atkinson <bwa@g.clemson.edu>
Brian Behlendorf <behlendorf1@llnl.gov>
Brian J. Murrell <brian@sun.com>
Brooks Davis <brooks@one-eyed-alien.net>
BtbN <btbn@btbn.de>
bunder2015 <omfgbunder@gmail.com>
buzzingwires <buzzingwires@outlook.com>
bzzz77 <bzzz.tomas@gmail.com>
cable2999 <cable2999@users.noreply.github.com>
Caleb James DeLisle <calebdelisle@lavabit.com>
Cameron Harr <harr1@llnl.gov>
Cao Xuewen <cao.xuewen@zte.com.cn>
Carlo Landmeter <clandmeter@gmail.com>
Carlos Alberto Lopez Perez <clopez@igalia.com>
Cedric Maunoury <cedric.maunoury@gmail.com>
Chaoyu Zhang <zhang.chaoyu@zte.com.cn>
Charles Suh <charles.suh@gmail.com>
Chen Can <chen.can2@zte.com.cn>
Chengfei Zhu <chengfeix.zhu@intel.com>
Chen Haiquan <oc@yunify.com>
ChenHao Lu <18302010006@fudan.edu.cn>
Chip Parker <aparker@enthought.com>
Chris Burroughs <chris.burroughs@gmail.com>
Chris Davidson <christopher.davidson@gmail.com>
Chris Dunlap <cdunlap@llnl.gov>
Chris Dunlop <chris@onthe.net.au>
Chris Lindee <chris.lindee+github@gmail.com>
Chris McDonough <chrism@plope.com>
Chris Peredun <chris.peredun@ixsystems.com>
Chris Siden <chris.siden@delphix.com>
Chris Wedgwood <cw@f00f.org>
Chris Williamson <chris.williamson@delphix.com>
Chris Zubrzycki <github@mid-earth.net>
Christ Schlacta <aarcane@aarcane.info>
Chris Siebenmann <cks.github@cs.toronto.edu>
Christer Ekholm <che@chrekh.se>
Christian Kohlschütter <christian@kohlschutter.com>
Christian Neukirchen <chneukirchen@gmail.com>
Christian Schwarz <me@cschwarz.com>
Christopher Voltz <cjunk@voltz.ws>
Christ Schlacta <aarcane@aarcane.info>
Chris Wedgwood <cw@f00f.org>
Chris Williamson <chris.williamson@delphix.com>
Chris Zubrzycki <github@mid-earth.net>
Chuck Tuffli <ctuffli@gmail.com>
Chunwei Chen <david.chen@nutanix.com>
Clemens Fruhwirth <clemens@endorphin.org>
Clemens Lang <cl@clang.name>
Clint Armstrong <clint@clintarmstrong.net>
Coleman Kane <ckane@colemankane.org>
Colin Ian King <colin.king@canonical.com>
Colin Percival <cperciva@tarsnap.com>
Colm Buckley <colm@tuatha.org>
Crag Wang <crag0715@gmail.com>
Craig Loomis <cloomis@astro.princeton.edu>
Craig Sanders <github@taz.net.au>
Cyril Plisko <cyril.plisko@infinidat.com>
DHE <git@dehacked.net>
Cy Schubert <cy@FreeBSD.org>
Cédric Berger <cedric@precidata.com>
Dacian Reece-Stremtan <dacianstremtan@gmail.com>
Dag-Erling Smørgrav <des@FreeBSD.org>
Damiano Albani <damiano.albani@gmail.com>
Damian Szuberski <szuberskidamian@gmail.com>
Damian Wojsław <damian@wojslaw.pl>
Daniel Berlin <dberlin@dberlin.org>
Daniel Hiepler <d-git@coderdu.de>
Daniel Hoffman <dj.hoffman@delphix.com>
Daniel Kobras <d.kobras@science-computing.de>
Daniel Kolesa <daniel@octaforge.org>
Daniel Perry <dtperry@amazon.com>
Daniel Reichelt <hacking@nachtgeist.net>
Daniel Stevenson <bot@dstev.net>
Daniel Verite <daniel@verite.pro>
Daniil Lunev <d.lunev.mail@gmail.com>
Dan Kimmel <dan.kimmel@delphix.com>
Dan McDonald <danmcd@nexenta.com>
Dan Swartzendruber <dswartz@druber.com>
Dan Vatca <dan.vatca@gmail.com>
Daniel Hoffman <dj.hoffman@delphix.com>
Daniel Verite <daniel@verite.pro>
Daniil Lunev <d.lunev.mail@gmail.com>
Darik Horn <dajhorn@vanadac.com>
Dave Eddy <dave@daveeddy.com>
David Hedberg <david@qzx.se>
David Lamparter <equinox@diac24.net>
David Qian <david.qian@intel.com>
David Quigley <david.quigley@intel.com>
Debabrata Banerjee <dbanerje@akamai.com>
D. Ebdrup <debdrup@freebsd.org>
Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com>
Denys Rtveliashvili <denys@rtveliashvili.name>
Derek Dai <daiderek@gmail.com>
Derek Schrock <dereks@lifeofadishwasher.com>
Dex Wood <slash2314@gmail.com>
DHE <git@dehacked.net>
Didier Roche <didrocks@ubuntu.com>
Dimitri John Ledkov <xnox@ubuntu.com>
Dimitry Andric <dimitry@andric.com>
Dirkjan Bussink <d.bussink@gmail.com>
Dmitry Khasanov <pik4ez@gmail.com>
Dominic Pearson <dsp@technoanimal.net>
Dominik Hassler <hadfl@omniosce.org>
Dominik Honnef <dominikh@fork-bomb.org>
Don Brady <don.brady@delphix.com>
Doug Rabson <dfr@rabson.org>
Dr. András Korn <korn-github.com@elan.rulez.org>
Dries Michiels <driesm.michiels@gmail.com>
Edmund Nadolski <edmund.nadolski@ixsystems.com>
Eitan Adler <lists@eitanadler.com>
Eli Rosenthal <eli.rosenthal@delphix.com>
Eli Schwartz <eschwartz93@gmail.com>
Eric Desrochers <eric.desrochers@canonical.com>
Eric Dillmann <eric@jave.fr>
Eric Schrock <Eric.Schrock@delphix.com>
Ethan Coe-Renner <coerenner1@llnl.gov>
Etienne Dechamps <etienne@edechamps.fr>
Evan Allrich <eallrich@gmail.com>
Evan Harris <eharris@puremagic.com>
Evan Susarret <evansus@gmail.com>
Fabian Grünbichler <f.gruenbichler@proxmox.com>
Fabio Buso <dev.siroibaf@gmail.com>
Fabio Scaccabarozzi <fsvm88@gmail.com>
Fajar A. Nugraha <github@fajar.net>
Fan Yong <fan.yong@intel.com>
fbynite <fbynite@users.noreply.github.com>
Fedor Uporov <fuporov.vstack@gmail.com>
Felix Dörre <felix@dogcraft.de>
Felix Neumärker <xdch47@posteo.de>
Feng Sun <loyou85@gmail.com>
Finix Yan <yancw@info2soft.com>
Francesco Mazzoli <f@mazzo.li>
Frederik Wessels <wessels147@gmail.com>
Frédéric Vanniere <f.vanniere@planet-work.com>
Gabriel A. Devenyi <gdevenyi@gmail.com>
Garrett D'Amore <garrett@nexenta.com>
Garrett Fields <ghfields@gmail.com>
Garrison Jensen <garrison.jensen@gmail.com>
Gary Mills <gary_mills@fastmail.fm>
Gaurav Kumar <gauravk.18@gmail.com>
GeLiXin <ge.lixin@zte.com.cn>
George Amanakis <g_amanakis@yahoo.com>
George Diamantopoulos <georgediam@gmail.com>
George Gaydarov <git@gg7.io>
George Melikov <mail@gmelikov.ru>
George Wilson <gwilson@delphix.com>
Georgy Yakovlev <ya@sysdump.net>
Gerardwx <gerardw@alum.mit.edu>
Gian-Carlo DeFazio <defazio1@llnl.gov>
Gionatan Danti <g.danti@assyoma.it>
Giuseppe Di Natale <guss80@gmail.com>
Gleb Smirnoff <glebius@FreeBSD.org>
Glenn Washburn <development@efficientek.com>
glibg10b <glibg10b@users.noreply.github.com>
gofaster <felix.gofaster@gmail.com>
Gordan Bobic <gordan@redsleeve.org>
Gordon Bergling <gbergling@googlemail.com>
Gordon Ross <gwr@nexenta.com>
Gordon Tetlow <gordon@freebsd.org>
Graham Christensen <graham@grahamc.com>
Graham Perrin <grahamperrin@gmail.com>
Gregor Kopka <gregor@kopka.net>
Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
grembo <freebsd@grem.de>
Grischa Zengel <github.zfsonlinux@zengel.info>
grodik <pat@litke.dev>
Gunnar Beutner <gunnar@beutner.name>
Gvozden Neskovic <neskovic@gmail.com>
Hajo Möller <dasjoe@gmail.com>
Han Gao <rabenda.cn@gmail.com>
Hans Rosenfeld <hans.rosenfeld@nexenta.com>
Harald van Dijk <harald@gigawatt.nl>
Harry Mallon <hjmallon@gmail.com>
Harry Sintonen <github-piru@kyber.fi>
HC <mmttdebbcc@yahoo.com>
hedong zhang <h_d_zhang@163.com>
Heitor Alves de Siqueira <halves@canonical.com>
Henrik Riomar <henrik.riomar@gmail.com>
Herb Wartens <wartens2@llnl.gov>
Hiếu Lê <leorize+oss@disroot.org>
Huang Liu <liu.huang@zte.com.cn>
Håkan Johansson <f96hajo@chalmers.se>
Igor K <igor@dilos.org>
Igor Kozhukhov <ikozhukhov@gmail.com>
Igor Lvovsky <ilvovsky@gmail.com>
ilbsmart <wgqimut@gmail.com>
Ilkka Sovanto <github@ilkka.kapsi.fi>
illiliti <illiliti@protonmail.com>
ilovezfs <ilovezfs@icloud.com>
InsanePrawn <Insane.Prawny@gmail.com>
Isaac Huang <he.huang@intel.com>
JK Dingwall <james@dingwall.me.uk>
Jacek Fefliński <feflik@gmail.com>
Jacob Adams <tookmund@gmail.com>
Jake Howard <git@theorangeone.net>
James Cowgill <james.cowgill@mips.com>
James H <james@kagisoft.co.uk>
James Lee <jlee@thestaticvoid.com>
James Pan <jiaming.pan@yahoo.com>
James Wah <james@laird-wah.net>
Jan Engelhardt <jengelh@inai.de>
Jan Kryl <jan.kryl@nexenta.com>
Jan Sanislo <oystr@cs.washington.edu>
Jaron Kent-Dobias <jaron@kent-dobias.com>
Jason Cohen <jwittlincohen@gmail.com>
Jason Harmening <jason.harmening@gmail.com>
Jason King <jason.brian.king@gmail.com>
Jason Lee <jasonlee@lanl.gov>
Jason Zaman <jasonzaman@gmail.com>
Javen Wu <wu.javen@gmail.com>
Jean-Baptiste Lallement <jean-baptiste@ubuntu.com>
Jeff Dike <jdike@akamai.com>
Jeremy Faulkner <gldisater@gmail.com>
Jeremy Gill <jgill@parallax-innovations.com>
Jeremy Jones <jeremy@delphix.com>
Jeremy Visser <jeremy.visser@gmail.com>
Jerry Jelinek <jerry.jelinek@joyent.com>
Jessica Clarke <jrtc27@jrtc27.com>
Jinshan Xiong <jinshan.xiong@intel.com>
Jitendra Patidar <jitendra.patidar@nutanix.com>
JK Dingwall <james@dingwall.me.uk>
Joe Stein <joe.stein@delphix.com>
John-Mark Gurney <jmg@funkthat.com>
John Albietz <inthecloud247@gmail.com>
John Eismeier <john.eismeier@gmail.com>
John L. Hammond <john.hammond@intel.com>
John Gallagher <john.gallagher@delphix.com>
John Layman <jlayman@sagecloud.com>
John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
John Wren Kennedy <john.kennedy@delphix.com>
John L. Hammond <john.hammond@intel.com>
John M. Layman <jml@frijid.net>
Johnny Stenback <github@jstenback.com>
John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
John Poduska <jpoduska@datto.com>
John Ramsden <johnramsden@riseup.net>
John Wren Kennedy <john.kennedy@delphix.com>
jokersus <lolivampireslave@gmail.com>
Jonathon Fernyhough <jonathon@m2x.dev>
Jorgen Lundman <lundman@lundman.net>
Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Jose Luis Duran <jlduran@gmail.com>
Josh Soref <jsoref@users.noreply.github.com>
Joshua M. Clulow <josh@sysmgr.org>
José Luis Salvador Rufo <salvador.joseluis@gmail.com>
João Carlos Mendes Luís <jonny@jonny.eng.br>
Julian Brunner <julian.brunner@gmail.com>
Julian Heuking <JulianH@beckhoff.com>
jumbi77 <jumbi77@users.noreply.github.com>
Justin Bedő <cu@cua0.org>
Justin Gottula <justin@jgottula.com>
Justin Hibbits <chmeeedalf@gmail.com>
Justin Keogh <github.com@v6y.net>
Justin Lecher <jlec@gentoo.org>
Justin Scholz <git@justinscholz.de>
Justin T. Gibbs <gibbs@FreeBSD.org>
jyxent <jordanp@gmail.com>
Jörg Thalheim <joerg@higgsboson.tk>
KORN Andras <korn@elan.rulez.org>
ka7 <ka7@la-evento.com>
Ka Ho Ng <khng@FreeBSD.org>
Kamil Domański <kamil@domanski.co>
Karsten Kretschmer <kkretschmer@gmail.com>
Kash Pande <kash@tripleback.net>
Kay Pedersen <christianpe96@gmail.com>
Keith M Wesolowski <wesolows@foobazco.org>
Kent Ross <k@mad.cash>
KernelOfTruth <kerneloftruth@gmail.com>
Kevin Bowling <kevin.bowling@kev009.com>
Kevin Greene <kevin.greene@delphix.com>
Kevin Jin <lostking2008@hotmail.com>
Kevin P. Fleming <kevin@km6g.us>
Kevin Tanguy <kevin.tanguy@ovh.net>
KireinaHoro <i@jsteward.moe>
Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Kleber Tarcísio <klebertarcisio@yahoo.com.br>
Kody A Kantor <kody.kantor@gmail.com>
Kohsuke Kawaguchi <kk@kohsuke.org>
Konstantin Khorenko <khorenko@virtuozzo.com>
KORN Andras <korn@elan.rulez.org>
Kristof Provost <github@sigsegv.be>
Krzysztof Piecuch <piecuch@kpiecuch.pl>
Kyle Blatter <kyleblatter@llnl.gov>
Kyle Evans <kevans@FreeBSD.org>
Kyle Fuller <inbox@kylefuller.co.uk>
Loli <ezomori.nozomu@gmail.com>
Laevos <Laevos@users.noreply.github.com>
Lalufu <Lalufu@users.noreply.github.com>
Lars Johannsen <laj@it.dk>
Laura Hild <lsh@jlab.org>
Laurențiu Nicola <lnicola@dend.ro>
Lauri Tirkkonen <lauri@hacktheplanet.fi>
liaoyuxiangqin <guo.yong33@zte.com.cn>
Li Dongyang <dongyang.li@anu.edu.au>
Liu Hua <liu.hua130@zte.com.cn>
Liu Qing <winglq@gmail.com>
Li Wei <W.Li@Sun.COM>
Loli <ezomori.nozomu@gmail.com>
lorddoskias <lorddoskias@gmail.com>
Lorenz Brun <lorenz@dolansoft.org>
Lorenz Hüdepohl <dev@stellardeath.org>
louwrentius <louwrentius@gmail.com>
Lukas Wunner <lukas@wunner.de>
luozhengzheng <luo.zhengzheng@zte.com.cn>
Luís Henriques <henrix@camandro.org>
Madhav Suresh <madhav.suresh@delphix.com>
manfromafar <jonsonb10@gmail.com>
Manoj Joseph <manoj.joseph@delphix.com>
Manuel Amador (Rudd-O) <rudd-o@rudd-o.com>
Marcel Huber <marcelhuberfoo@gmail.com>
Marcel Menzel <mail@mcl.gg>
Marcel Schilling <marcel.schilling@uni-luebeck.de>
Marcel Telka <marcel.telka@nexenta.com>
Marcel Wysocki <maci.stgn@gmail.com>
Marcin Skarbek <git@skarbek.name>
Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Mark Johnston <markj@FreeBSD.org>
Mark Maybee <mark.maybee@delphix.com>
Mark Roper <markroper@gmail.com>
Mark Shellenbaum <Mark.Shellenbaum@Oracle.COM>
marku89 <mar42@kola.li>
Mark Wright <markwright@internode.on.net>
Mart Frauenlob <allkind@fastest.cc>
Martin Matuska <mm@FreeBSD.org>
Martin Rüegg <martin.rueegg@metaworx.ch>
Martin Wagner <martin.wagner.dev@gmail.com>
Massimo Maggi <me@massimo-maggi.eu>
Mateusz Guzik <mjguzik@gmail.com>
Mateusz Piotrowski <0mp@FreeBSD.org>
Mathieu Velten <matmaul@gmail.com>
Matt Fiddaman <github@m.fiddaman.uk>
Matthew Ahrens <matt@delphix.com>
Matthew Heller <matthew.f.heller@gmail.com>
Matthew Thode <mthode@mthode.org>
Matthias Blankertz <matthias@blankertz.org>
Matt Johnston <matt@fugro-fsi.com.au>
Matt Kemp <matt@mattikus.com>
Matthew Ahrens <matt@delphix.com>
Matthew Thode <mthode@mthode.org>
Matt Macy <mmacy@freebsd.org>
Matus Kral <matuskral@me.com>
Mauricio Faria de Oliveira <mfo@canonical.com>
Max Grossman <max.grossman@delphix.com>
Maxim Filimonov <che@bein.link>
Maximilian Mehnert <maximilian.mehnert@gmx.de>
Max Zettlmeißl <max@zettlmeissl.de>
Md Islam <mdnahian@outlook.com>
megari <megari@iki.fi>
Michael D Labriola <michael.d.labriola@gmail.com>
Michael Franzl <michael@franzl.name>
Michael Gebetsroither <michael@mgeb.org>
Michael Kjorling <michael@kjorling.se>
Michael Martin <mgmartin.mgm@gmail.com>
Michael Niewöhner <foss@mniewoehner.de>
Michael Zhivich <mzhivich@akamai.com>
Michal Vasilek <michal@vasilek.cz>
MigeljanImeri <ImeriMigel@gmail.com>
Mike Gerdts <mike.gerdts@joyent.com>
Mike Harsch <mike@harschsystems.com>
Mike Leddy <mike.leddy@gmail.com>
Mike Swanson <mikeonthecomputer@gmail.com>
Milan Jurik <milan.jurik@xylab.cz>
Minsoo Choo <minsoochoo0122@proton.me>
Mohamed Tawfik <m_tawfik@aucegypt.edu>
Morgan Jones <mjones@rice.edu>
Moritz Maxeiner <moritz@ucworks.org>
Mo Zhou <cdluminate@gmail.com>
naivekun <naivekun@outlook.com>
nathancheek <myself@nathancheek.com>
Nathaniel Clark <Nathaniel.Clark@misrule.us>
Nathaniel Wesley Filardo <nwf@cs.jhu.edu>
Nathan Lewis <linux.robotdude@gmail.com>
Nav Ravindranath <nav@delphix.com>
Neal Gompa (ニール・ゴンパ) <ngompa13@gmail.com>
Ned Bass <bass6@llnl.gov>
Neependra Khare <neependra@kqinfotech.com>
Neil Stockbridge <neil@dist.ro>
Nick Black <dank@qemfd.net>
Nick Garvey <garvey.nick@gmail.com>
Nick Mattis <nickm970@gmail.com>
Nick Terrell <terrelln@fb.com>
Niklas Haas <github-c6e1c8@haasn.xyz>
Nikolay Borisov <n.borisov.lkml@gmail.com>
nordaux <nordaux@gmail.com>
ofthesun9 <olivier@ofthesun.net>
Olaf Faaland <faaland1@llnl.gov>
Oleg Drokin <green@linuxhacker.ru>
Oleg Stepura <oleg@stepura.com>
Olivier Certner <olce.freebsd@certner.fr>
Olivier Mazouffre <olivier.mazouffre@ims-bordeaux.fr>
omni <omni+vagant@hack.org>
Orivej Desh <orivej@gmx.fr>
Pablo Correa Gómez <ablocorrea@hotmail.com>
Palash Gandhi <pbg4930@rit.edu>
Patrick Mooney <pmooney@pfmooney.com>
Patrik Greco <sikevux@sikevux.se>
Paul B. Henson <henson@acm.org>
Paul Dagnelie <pcd@delphix.com>
Paul Zuchowski <pzuchowski@datto.com>
Pavel Boldin <boldin.pavel@gmail.com>
Pavel Snajdr <snajpa@snajpa.net>
Pavel Zakharov <pavel.zakharov@delphix.com>
Pawel Jakub Dawidek <pjd@FreeBSD.org>
Pedro Giffuni <pfg@freebsd.org>
Peng <peng.hse@xtaotech.com>
Peter Ashford <ashford@accs.com>
Peter Dave Hello <hsu@peterdavehello.org>
Peter Doherty <peterd@acranox.org>
Peter Levine <plevine457@gmail.com>
Peter Wirdemo <peter.wirdemo@gmail.com>
Petros Koutoupis <petros@petroskoutoupis.com>
Philip Pokorny <ppokorny@penguincomputing.com>
Philipp Riederer <pt@philipptoelke.de>
Phil Kauffman <philip@kauffman.me>
Ping Huang <huangping@smartx.com>
Piotr Kubaj <pkubaj@anongoth.pl>
Piotr P. Stefaniak <pstef@freebsd.org>
Prakash Surya <prakash.surya@delphix.com>
Prasad Joshi <prasadjoshi124@gmail.com>
privb0x23 <privb0x23@users.noreply.github.com>
P.SCH <p88@yahoo.com>
Qiuhao Chen <chenqiuhao1997@gmail.com>
Quartz <yyhran@163.com>
Quentin Zdanis <zdanisq@gmail.com>
Rafael Kitover <rkitover@gmail.com>
RageLtMan <sempervictus@users.noreply.github.com>
Ralf Ertzinger <ralf@skytale.net>
Randall Mason <ClashTheBunny@gmail.com>
Remy Blank <remy.blank@pobox.com>
renelson <bnelson@nelsonbe.com>
Reno Reckling <e-github@wthack.de>
Ricardo M. Correia <ricardo.correia@oracle.com>
Rich Ercolani <rincebrain@gmail.com>
Riccardo Schirone <rschirone91@gmail.com>
Richard Allen <belperite@gmail.com>
Richard Elling <Richard.Elling@RichardElling.com>
Richard Kojedzinszky <richard@kojedz.in>
Richard Laager <rlaager@wiktel.com>
Richard Lowe <richlowe@richlowe.net>
Richard Sharpe <rsharpe@samba.org>
Richard Yao <ryao@gentoo.org>
Rich Ercolani <rincebrain@gmail.com>
Rick Macklem <rmacklem@uoguelph.ca>
rilysh <nightquick@proton.me>
Robert Evans <evansr@google.com>
Robert Novak <sailnfool@gmail.com>
Roberto Ricci <ricci@disroot.org>
Rob Norris <robn@despairlabs.com>
Rob Wing <rew@FreeBSD.org>
Rohan Puri <rohan.puri15@gmail.com>
Romain Dolbeau <romain.dolbeau@atos.net>
Roman Strashkin <roman.strashkin@nexenta.com>
Ross Williams <ross@ross-williams.net>
Ruben Kerkhof <ruben@rubenkerkhof.com>
Ryan <errornointernet@envs.net>
Ryan Hirasaki <ryanhirasaki@gmail.com>
Ryan Lahfa <masterancpp@gmail.com>
Ryan Libby <rlibby@FreeBSD.org>
Ryan Moeller <freqlabs@FreeBSD.org>
Sam Atkinson <samatk@amazon.com>
Sam Hathaway <github.com@munkynet.org>
Sam James <sam@gentoo.org>
Sam Lunt <samuel.j.lunt@gmail.com>
Samuel VERSCHELDE <stormi-github@ylix.fr>
Samuel Wycliffe <samuelwycliffe@gmail.com>
Samuel Wycliffe J <samwyc@hpe.com>
Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
Sara Hartse <sara.hartse@delphix.com>
Saso Kiselkov <saso.kiselkov@nexenta.com>
Satadru Pramanik <satadru@gmail.com>
Savyasachee Jha <genghizkhan91@hawkradius.com>
Scott Colby <scott@scolby.com>
Scot W. Stevenson <scot.stevenson@gmail.com>
Sean Eric Fagan <sef@ixsystems.com>
Sebastian Gottschall <s.gottschall@dd-wrt.com>
Sebastian Wuerl <s.wuerl@mailbox.org>
Sebastien Roy <seb@delphix.com>
Sen Haerens <sen@senhaerens.be>
Serapheim Dimitropoulos <serapheim@delphix.com>
Seth Forshee <seth.forshee@canonical.com>
Seth Hoffert <Seth.Hoffert@gmail.com>
Seth Troisi <sethtroisi@google.com>
Shaan Nobee <sniper111@gmail.com>
Shampavman <sham.pavman@nexenta.com>
Shaun Tancheff <shaun@aeonazure.com>
Shawn Bayern <sbayern@law.fsu.edu>
Shengqi Chen <harry-chen@outlook.com>
Shen Yan <shenyanxxxy@qq.com>
Sietse <sietse@wizdom.nu>
Simon Guest <simon.guest@tesujimath.org>
Simon Klinkert <simon.klinkert@gmail.com>
Sowrabha Gopal <sowrabha.gopal@delphix.com>
Spencer Kinny <spencerkinny1995@gmail.com>
Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Stanislav Seletskiy <s.seletskiy@gmail.com>
Stefan Lendl <s.lendl@proxmox.com>
Steffen Müthing <steffen.muething@iwr.uni-heidelberg.de>
Stephen Blinick <stephen.blinick@delphix.com>
sterlingjensen <sterlingjensen@users.noreply.github.com>
Steve Dougherty <sdougherty@barracuda.com>
Steve Mokris <smokris@softpixel.com>
Steven Burgess <sburgess@dattobackup.com>
Steven Hartland <smh@freebsd.org>
Steven Johnson <sjohnson@sakuraindustries.com>
Steven Noonan <steven@uplinklabs.net>
stf <s@ctrlc.hu>
Stian Ellingsen <stian@plaimi.net>
Stoiko Ivanov <github@nomore.at>
Stéphane Lesimple <speed47_github@speed47.net>
Suman Chakravartula <schakrava@gmail.com>
Sydney Vanda <sydney.m.vanda@intel.com>
Sören Tempel <soeren+git@soeren-tempel.net>
Tamas TEVESZ <ice@extreme.hu>
Teodor Spæren <teodor_spaeren@riseup.net>
TerraTech <TerraTech@users.noreply.github.com>
Theera K. <tkittich@hotmail.com>
Thijs Cramer <thijs.cramer@gmail.com>
Thomas Bertschinger <bertschinger@lanl.gov>
Thomas Geppert <geppi@digitx.de>
Thomas Lamprecht <guggentom@hotmail.de>
Till Maas <opensource@till.name>
Tim Chase <tim@chase2k.com>
Tim Connors <tconnors@rather.puzzling.org>
Tim Crawford <tcrawford@datto.com>
Tim Haley <Tim.Haley@Sun.COM>
timor <timor.dd@googlemail.com>
Timothy Day <tday141@gmail.com>
Tim Schumacher <timschumi@gmx.de>
Tino Reichardt <milky-zfs@mcmilk.de>
Tobin Harding <me@tobin.cc>
Todd Seidelmann <seidelma@users.noreply.github.com>
Tom Caputi <tcaputi@datto.com>
Tom Matthews <tom@axiom-partners.com>
Tom Prince <tom.prince@ualberta.net>
Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Tom Prince <tom.prince@ualberta.net>
Tony Hutter <hutter2@llnl.gov>
Tony Nguyen <tony.nguyen@delphix.com>
Tony Perkins <tperkins@datto.com>
Toomas Soome <tsoome@me.com>
Torsten Wörtwein <twoertwein@gmail.com>
Toyam Cox <aviator45003@gmail.com>
Trevor Bautista <trevrb@trevrb.net>
Trey Dockendorf <treydock@gmail.com>
Troels Nørgaard <tnn@tradeshift.com>
tstabrawa <tstabrawa@users.noreply.github.com>
Tulsi Jain <tulsi.jain@delphix.com>
Turbo Fredriksson <turbo@bayour.com>
Tyler J. Stachecki <stachecki.tyler@gmail.com>
Umer Saleem <usaleem@ixsystems.com>
Vaibhav Bhanawat <vaibhav.bhanawat@delphix.com>
Valmiky Arquissandas <kayvlim@gmail.com>
Val Packett <val@packett.cool>
Vince van Oosten <techhazard@codeforyouand.me>
Violet Purcell <vimproved@inventati.org>
Vipin Kumar Verma <vipin.verma@hpe.com>
Vitaut Bajaryn <vitaut.bayaryn@gmail.com>
Volker Mauel <volkermauel@gmail.com>
Václav Skála <skala@vshosting.cz>
Walter Huf <hufman@gmail.com>
Warner Losh <imp@bsdimp.com>
Weigang Li <weigang.li@intel.com>
WHR <msl0000023508@gmail.com>
Will Andrews <will@freebsd.org>
Will Rouesnel <w.rouesnel@gmail.com>
Windel Bouwman <windel@windel.nl>
Wojciech Małota-Wójcik <outofforest@users.noreply.github.com>
Wolfgang Bumiller <w.bumiller@proxmox.com>
XDTG <click1799@163.com>
Xin Li <delphij@FreeBSD.org>
Xinliang Liu <xinliang.liu@linaro.org>
xtouqh <xtouqh@hotmail.com>
Yann Collet <cyan@fb.com>
Yanping Gao <yanping.gao@xtaotech.com>
Ying Zhu <casualfisher@gmail.com>
Youzhong Yang <youzhong@gmail.com>
yparitcher <y@paritcher.com>
yuina822 <ayuichi@club.kyutech.ac.jp>
YunQiang Su <syq@debian.org>
Yuri Pankov <yuri.pankov@gmail.com>
Yuxin Wang <yuxinwang9999@gmail.com>
Yuxuan Shui <yshuiv7@gmail.com>
Zachary Bedell <zac@thebedells.org>
Zach Dykstra <dykstra.zachary@gmail.com>
zgock <zgock@nuc.base.zgock-lab.net>
Zhao Yongming <zym@apache.org>
Zhenlei Huang <zlei@FreeBSD.org>
Zhu Chuang <chuang@melty.land>
Érico Nogueira <erico.erc@gmail.com>
Đoàn Trần Công Danh <congdanhqx@gmail.com>
韩朴宇 <w12101111@gmail.com>
+2 -2
View File
@@ -1,2 +1,2 @@
The [OpenZFS Code of Conduct](http://www.open-zfs.org/wiki/Code_of_Conduct)
applies to spaces associated with the ZFS on Linux project, including GitHub.
The [OpenZFS Code of Conduct](https://openzfs.org/wiki/Code_of_Conduct)
applies to spaces associated with the OpenZFS project, including GitHub.
+4 -4
View File
@@ -1,10 +1,10 @@
Meta: 1
Name: zfs
Branch: 1.0
Version: 2.0.0
Release: rc1
Version: 2.3.0
Release: 1
Release-Tags: relext
License: CDDL
Author: OpenZFS
Linux-Maximum: 5.8
Linux-Minimum: 3.10
Linux-Maximum: 6.12
Linux-Minimum: 4.18
+97 -146
View File
@@ -1,67 +1,82 @@
CLEANFILES =
dist_noinst_DATA =
INSTALL_DATA_HOOKS =
ALL_LOCAL =
CLEAN_LOCAL =
CHECKS = shellcheck checkbashisms
include $(top_srcdir)/config/Rules.am
include $(top_srcdir)/config/CppCheck.am
include $(top_srcdir)/config/Shellcheck.am
include $(top_srcdir)/config/Substfiles.am
include $(top_srcdir)/scripts/Makefile.am
ACLOCAL_AMFLAGS = -I config
SUBDIRS = include
if BUILD_LINUX
SUBDIRS += rpm
include $(srcdir)/%D%/rpm/Makefile.am
endif
if CONFIG_USER
SUBDIRS += etc man scripts lib tests cmd contrib
include $(srcdir)/%D%/cmd/Makefile.am
include $(srcdir)/%D%/contrib/Makefile.am
include $(srcdir)/%D%/etc/Makefile.am
include $(srcdir)/%D%/lib/Makefile.am
include $(srcdir)/%D%/man/Makefile.am
include $(srcdir)/%D%/tests/Makefile.am
if BUILD_LINUX
SUBDIRS += udev
include $(srcdir)/%D%/udev/Makefile.am
endif
endif
CPPCHECKDIRS += module
if CONFIG_KERNEL
SUBDIRS += module
extradir = $(prefix)/src/zfs-$(VERSION)
extra_HEADERS = zfs.release.in zfs_config.h.in
if BUILD_LINUX
kerneldir = $(prefix)/src/zfs-$(VERSION)/$(LINUX_VERSION)
nodist_kernel_HEADERS = zfs.release zfs_config.h module/$(LINUX_SYMBOLS)
endif
endif
AUTOMAKE_OPTIONS = foreign
EXTRA_DIST = autogen.sh copy-builtin
EXTRA_DIST += cppcheck-suppressions.txt
EXTRA_DIST += config/config.awk config/rpm.am config/deb.am config/tgz.am
EXTRA_DIST += META AUTHORS COPYRIGHT LICENSE NEWS NOTICE README.md
EXTRA_DIST += CODE_OF_CONDUCT.md
EXTRA_DIST += module/lua/README.zfs module/os/linux/spl/README.md
dist_noinst_DATA += autogen.sh copy-builtin
dist_noinst_DATA += AUTHORS CODE_OF_CONDUCT.md COPYRIGHT LICENSE META NEWS NOTICE
dist_noinst_DATA += README.md RELEASES.md
dist_noinst_DATA += module/lua/README.zfs module/os/linux/spl/README.md
# Include all the extra licensing information for modules
EXTRA_DIST += module/icp/algs/skein/THIRDPARTYLICENSE
EXTRA_DIST += module/icp/algs/skein/THIRDPARTYLICENSE.descrip
EXTRA_DIST += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman
EXTRA_DIST += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman.descrip
EXTRA_DIST += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl
EXTRA_DIST += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl.descrip
EXTRA_DIST += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.cryptogams
EXTRA_DIST += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.cryptogams.descrip
EXTRA_DIST += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.openssl
EXTRA_DIST += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.openssl.descrip
EXTRA_DIST += module/os/linux/spl/THIRDPARTYLICENSE.gplv2
EXTRA_DIST += module/os/linux/spl/THIRDPARTYLICENSE.gplv2.descrip
EXTRA_DIST += module/zfs/THIRDPARTYLICENSE.cityhash
EXTRA_DIST += module/zfs/THIRDPARTYLICENSE.cityhash.descrip
dist_noinst_DATA += module/icp/algs/skein/THIRDPARTYLICENSE
dist_noinst_DATA += module/icp/algs/skein/THIRDPARTYLICENSE.descrip
dist_noinst_DATA += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman
dist_noinst_DATA += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman.descrip
dist_noinst_DATA += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl
dist_noinst_DATA += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl.descrip
dist_noinst_DATA += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.cryptogams
dist_noinst_DATA += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.cryptogams.descrip
dist_noinst_DATA += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.openssl
dist_noinst_DATA += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.openssl.descrip
dist_noinst_DATA += module/os/linux/spl/THIRDPARTYLICENSE.gplv2
dist_noinst_DATA += module/os/linux/spl/THIRDPARTYLICENSE.gplv2.descrip
dist_noinst_DATA += module/zfs/THIRDPARTYLICENSE.cityhash
dist_noinst_DATA += module/zfs/THIRDPARTYLICENSE.cityhash.descrip
@CODE_COVERAGE_RULES@
GITREV = include/zfs_gitrev.h
PHONY = gitrev
CLEANFILES += $(GITREV)
PHONY += gitrev
gitrev:
$(AM_V_GEN)$(top_srcdir)/scripts/make_gitrev.sh $(GITREV)
all: gitrev
# Double-colon rules are allowed; there are multiple independent definitions.
maintainer-clean-local::
PHONY += install-data-hook $(INSTALL_DATA_HOOKS)
install-data-hook: $(INSTALL_DATA_HOOKS)
PHONY += maintainer-clean-local
maintainer-clean-local:
-$(RM) $(GITREV)
distclean-local::
PHONY += distclean-local
distclean-local:
-$(RM) -R autom4te*.cache build
-find . \( -name SCCS -o -name BitKeeper -o -name .svn -o -name CVS \
-o -name .pc -o -name .hg -o -name .git \) -prune -o \
@@ -71,169 +86,105 @@ distclean-local::
-o -name 'core' -o -name 'Makefile' -o -name 'Module.symvers' \
-o -name '*.order' -o -name '*.markers' -o -name '*.gcda' \
-o -name '*.gcno' \) \
-type f -print | xargs $(RM)
-type f -delete
all-local:
-[ -x ${top_builddir}/scripts/zfs-tests.sh ] && \
${top_builddir}/scripts/zfs-tests.sh -c
PHONY += $(CLEAN_LOCAL)
clean-local: $(CLEAN_LOCAL)
PHONY += $(ALL_LOCAL)
all-local: $(ALL_LOCAL)
dist-hook:
$(AM_V_GEN)$(top_srcdir)/scripts/make_gitrev.sh -D $(distdir) $(GITREV)
$(SED) ${ac_inplace} -e 's/Release:[[:print:]]*/Release: $(RELEASE)/' \
$(distdir)/META
$(top_srcdir)/scripts/make_gitrev.sh -D $(distdir) $(GITREV)
$(SED) $(ac_inplace) 's/\(Release:[[:space:]]*\).*/\1$(RELEASE)/' $(distdir)/META
if BUILD_LINUX
# For compatibility, create a matching spl-x.y.z directly which contains
# symlinks to the updated header and object file locations. These
# compatibility links will be removed in the next major release.
if CONFIG_KERNEL
install-data-hook:
rm -rf $(DESTDIR)$(prefix)/src/spl-$(VERSION) && \
mkdir $(DESTDIR)$(prefix)/src/spl-$(VERSION) && \
cd $(DESTDIR)$(prefix)/src/spl-$(VERSION) && \
ln -s ../zfs-$(VERSION)/include/spl include && \
ln -s ../zfs-$(VERSION)/$(LINUX_VERSION) $(LINUX_VERSION) && \
ln -s ../zfs-$(VERSION)/zfs_config.h.in spl_config.h.in && \
ln -s ../zfs-$(VERSION)/zfs.release.in spl.release.in && \
cd $(DESTDIR)$(prefix)/src/zfs-$(VERSION)/$(LINUX_VERSION) && \
ln -fs zfs_config.h spl_config.h && \
ln -fs zfs.release spl.release
endif
endif
PHONY += codecheck $(CHECKS)
codecheck: $(CHECKS)
PHONY += codecheck
codecheck: cstyle shellcheck checkbashisms flake8 mancheck testscheck vcscheck
SHELLCHECKSCRIPTS += autogen.sh
PHONY += checkstyle
checkstyle: codecheck commitcheck
PHONY += commitcheck
commitcheck:
@if git rev-parse --git-dir > /dev/null 2>&1; then \
$(AM_V_at)if git rev-parse --git-dir > /dev/null 2>&1; then \
${top_srcdir}/scripts/commitcheck.sh; \
fi
PHONY += cstyle
if HAVE_PARALLEL
cstyle_line = -print0 | parallel -X0 ${top_srcdir}/scripts/cstyle.pl -cpP {}
else
cstyle_line = -exec ${top_srcdir}/scripts/cstyle.pl -cpP {} +
endif
CHECKS += cstyle
cstyle:
@find ${top_srcdir} -name build -prune \
$(AM_V_at)find $(top_srcdir) -name build -prune \
-o -type f -name '*.[hc]' \
! -name 'zfs_config.*' ! -name '*.mod.c' \
! -name 'opt_global.h' ! -name '*_if*.h' \
! -name 'zstd_compat_wrapper.h' \
! -path './module/zstd/lib/*' \
-exec ${top_srcdir}/scripts/cstyle.pl -cpP {} \+
! -path './include/sys/lua/*' \
! -path './module/lua/l*.[ch]' \
! -path './module/zfs/lz4.c' \
$(cstyle_line)
filter_executable = -exec test -x '{}' \; -print
PHONY += shellcheck
shellcheck:
@if type shellcheck > /dev/null 2>&1; then \
shellcheck --exclude=SC1090 --exclude=SC1117 --format=gcc \
$$(find ${top_srcdir}/scripts/*.sh -type f) \
$$(find ${top_srcdir}/cmd/zed/zed.d/*.sh -type f) \
$$(find ${top_srcdir}/cmd/zpool/zpool.d/* \
-type f ${filter_executable}); \
else \
echo "skipping shellcheck because shellcheck is not installed"; \
fi
PHONY += checkbashisms
checkbashisms:
@if type checkbashisms > /dev/null 2>&1; then \
checkbashisms -n -p -x \
$$(find ${top_srcdir} \
-name '.git' -prune \
-o -name 'build' -prune \
-o -name 'tests' -prune \
-o -name 'config' -prune \
-o -name 'zed-functions.sh*' -prune \
-o -name 'zfs-import*' -prune \
-o -name 'zfs-mount*' -prune \
-o -name 'zfs-zed*' -prune \
-o -name 'smart' -prune \
-o -name 'paxcheck.sh' -prune \
-o -name 'make_gitrev.sh' -prune \
-o -type f ! -name 'config*' \
! -name 'libtool' \
-exec bash -c 'awk "NR==1 && /\#\!.*bin\/sh.*/ {print FILENAME;}" "{}"' \;); \
else \
echo "skipping checkbashisms because checkbashisms is not installed"; \
fi
PHONY += mancheck
mancheck:
@if type mandoc > /dev/null 2>&1; then \
find ${top_srcdir}/man/man8 -type f -name 'zfs.8' \
-o -name 'zpool.8' -o -name 'zdb.8' \
-o -name 'zgenhostid.8' | \
xargs mandoc -Tlint -Werror; \
else \
echo "skipping mancheck because mandoc is not installed"; \
fi
if BUILD_LINUX
stat_fmt = -c '%A %n'
else
stat_fmt = -f '%Sp %N'
endif
PHONY += testscheck
CHECKS += testscheck
testscheck:
@find ${top_srcdir}/tests/zfs-tests -type f \
\( -name '*.ksh' -not ${filter_executable} \) -o \
\( -name '*.kshlib' ${filter_executable} \) -o \
\( -name '*.shlib' ${filter_executable} \) -o \
\( -name '*.cfg' ${filter_executable} \) | \
xargs -r stat ${stat_fmt} | \
awk '{c++; print} END {if(c>0) exit 1}'
$(AM_V_at)[ $$(find $(top_srcdir)/tests/zfs-tests -type f \
\( -name '*.ksh' -not $(filter_executable) \) -o \
\( -name '*.kshlib' $(filter_executable) \) -o \
\( -name '*.shlib' $(filter_executable) \) -o \
\( -name '*.cfg' $(filter_executable) \) | \
tee /dev/stderr | wc -l) -eq 0 ]
PHONY += vcscheck
CHECKS += vcscheck
vcscheck:
@if git rev-parse --git-dir > /dev/null 2>&1; then \
$(AM_V_at)if git rev-parse --git-dir > /dev/null 2>&1; then \
git ls-files . --exclude-standard --others | \
awk '{c++; print} END {if(c>0) exit 1}' ; \
fi
CHECKS += zstdcheck
zstdcheck:
@$(MAKE) -C module check-zstd-symbols
PHONY += lint
lint: cppcheck paxcheck
PHONY += cppcheck
cppcheck:
@if type cppcheck > /dev/null 2>&1; then \
cppcheck --quiet --force --error-exitcode=2 --inline-suppr \
--suppressions-list=${top_srcdir}/cppcheck-suppressions.txt \
-UHAVE_SSE2 -UHAVE_AVX512F -UHAVE_UIO_ZEROCOPY \
${top_srcdir}; \
else \
echo "skipping cppcheck because cppcheck is not installed"; \
fi
PHONY += paxcheck
paxcheck:
@if type scanelf > /dev/null 2>&1; then \
${top_srcdir}/scripts/paxcheck.sh ${top_builddir}; \
$(AM_V_at)if type scanelf > /dev/null 2>&1; then \
$(top_srcdir)/scripts/paxcheck.sh $(top_builddir); \
else \
echo "skipping paxcheck because scanelf is not installed"; \
fi
PHONY += flake8
CHECKS += flake8
flake8:
@if type flake8 > /dev/null 2>&1; then \
flake8 ${top_srcdir}; \
$(AM_V_at)if type flake8 > /dev/null 2>&1; then \
flake8 $(top_srcdir); \
else \
echo "skipping flake8 because flake8 is not installed"; \
fi
PHONY += regen-tests
regen-tests:
@$(MAKE) -C tests/zfs-tests/tests regen
PHONY += ctags
ctags:
$(RM) tags
find $(top_srcdir) -name '.?*' -prune \
-o -type f -name '*.[hcS]' -print | xargs ctags -a
-o -type f -name '*.[hcS]' -exec ctags -a {} +
PHONY += etags
etags:
$(RM) TAGS
find $(top_srcdir) -name '.?*' -prune \
-o -type f -name '*.[hcS]' -print | xargs etags -a
-o -type f -name '*.[hcS]' -exec etags -a {} +
PHONY += cscopelist
cscopelist:
+1 -1
View File
@@ -1,3 +1,3 @@
Descriptions of all releases can be found on github:
https://github.com/zfsonlinux/zfs/releases
https://github.com/openzfs/zfs/releases
+4 -4
View File
@@ -12,12 +12,12 @@ This repository contains the code for running OpenZFS on Linux and FreeBSD.
* [Documentation](https://openzfs.github.io/openzfs-docs/) - for using and developing this repo
* [ZoL Site](https://zfsonlinux.org) - Linux release info & links
* [Mailing lists](https://openzfs.github.io/openzfs-docs/Project%20and%20Community/Mailing%20Lists.html)
* [OpenZFS site](http://open-zfs.org/) - for conference videos and info on other platforms (illumos, OSX, Windows, etc)
* [OpenZFS site](https://openzfs.org/) - for conference videos and info on other platforms (illumos, OSX, Windows, etc)
# Installation
Full documentation for installing OpenZFS on your favorite Linux distribution can
be found at the [ZoL Site](https://zfsonlinux.org/).
Full documentation for installing OpenZFS on your favorite operating system can
be found at the [Getting Started Page](https://openzfs.github.io/openzfs-docs/Getting%20Started/index.html).
# Contribute & Develop
@@ -32,4 +32,4 @@ For more details see the NOTICE, LICENSE and COPYRIGHT files; `UCRL-CODE-235197`
# Supported Kernels
* The `META` file contains the officially recognized supported Linux kernel versions.
* Supported FreeBSD versions are 12-STABLE and 13-CURRENT.
* Supported FreeBSD versions are any supported branches and releases starting from 13.0-RELEASE.
+37
View File
@@ -0,0 +1,37 @@
OpenZFS uses the MAJOR.MINOR.PATCH versioning scheme described here:
* MAJOR - Incremented at the discretion of the OpenZFS developers to indicate
a particularly noteworthy feature or change. An increase in MAJOR number
does not indicate any incompatible on-disk format change. The ability
to import a ZFS pool is controlled by the feature flags enabled on the
pool and the feature flags supported by the installed OpenZFS version.
Increasing the MAJOR version is expected to be an infrequent occurrence.
* MINOR - Incremented to indicate new functionality such as a new feature
flag, pool/dataset property, zfs/zpool sub-command, new user/kernel
interface, etc. MINOR releases may introduce incompatible changes to the
user space library APIs (libzfs.so). Existing user/kernel interfaces are
considered to be stable to maximize compatibility between OpenZFS releases.
Additions to the user/kernel interface are backwards compatible.
* PATCH - Incremented when applying documentation updates, important bug
fixes, minor performance improvements, and kernel compatibility patches.
The user space library APIs and user/kernel interface are considered to
be stable. PATCH releases for a MAJOR.MINOR are published as needed.
Two release branches are maintained for OpenZFS, they are:
* OpenZFS LTS - A designated MAJOR.MINOR release with periodic PATCH
releases that incorporate important changes backported from newer OpenZFS
releases. This branch is intended for use in environments using an
LTS, enterprise, or similarly managed kernel (RHEL, Ubuntu LTS, Debian).
Minor changes to support these distribution kernels will be applied as
needed. New kernel versions released after the OpenZFS LTS release are
not supported. LTS releases will receive patches for at least 2 years.
The current LTS release is OpenZFS 2.1.
* OpenZFS current - Tracks the newest MAJOR.MINOR release. This branch
includes support for the latest OpenZFS features and recently releases
kernels. When a new MINOR release is tagged the previous MINOR release
will no longer be maintained (unless it is an LTS release). New MINOR
releases are planned to occur roughly annually.
+60 -2
View File
@@ -1,4 +1,62 @@
#!/bin/sh
[ "${0%/*}" = "$0" ] || cd "${0%/*}" || exit
autoreconf -fiv || exit 1
rm -Rf autom4te.cache
# %reldir%/%canon_reldir% (%D%/%C%) only appeared in automake 1.14, but RHEL/CentOS 7 has 1.13.4
# This is an (overly) simplistic preprocessor that papers around this for the duration of the generation step,
# and can be removed once support for CentOS 7 is dropped
automake --version | awk '{print $NF; exit}' | (
IFS=. read -r AM_MAJ AM_MIN _
[ "$AM_MAJ" -gt 1 ] || [ "$AM_MIN" -ge 14 ]
) || {
process_root() {
root="$1"; shift
grep -q '%[CD]%' "$root/Makefile.am" || return
find "$root" -name Makefile.am "$@" | while read -r dir; do
dir="${dir%/Makefile.am}"
grep -q '%[CD]%' "$dir/Makefile.am" || continue
reldir="${dir#"$root"}"
reldir="${reldir#/}"
canon_reldir="$(printf '%s' "$reldir" | tr -C 'a-zA-Z0-9@_' '_')"
reldir_slash="$reldir/"
canon_reldir_slash="${canon_reldir}_"
[ -z "$reldir" ] && reldir_slash=
[ -z "$reldir" ] && canon_reldir_slash=
echo "$dir/Makefile.am" >&3
sed -i~ -e "s:%D%/:$reldir_slash:g" -e "s:%D%:$reldir:g" \
-e "s:%C%_:$canon_reldir_slash:g" -e "s:%C%:$canon_reldir:g" "$dir/Makefile.am"
done 3>>"$substituted_files"
}
rollback() {
while read -r f; do
mv "$f~" "$f"
done < "$substituted_files"
rm -f "$substituted_files"
}
echo "Automake <1.14; papering over missing %reldir%/%canon_reldir% support" >&2
substituted_files="$(mktemp)"
trap rollback EXIT
roots="$(sed '/Makefile$/!d;/module/d;s:^\s*:./:;s:/Makefile::;/^\.$/d' configure.ac)"
IFS="
"
for root in $roots; do
root="${root#./}"
process_root "$root"
done
set -f
# shellcheck disable=SC2086,SC2046
process_root . $(printf '!\n-path\n%s/*\n' $roots)
}
autoreconf -fiv && rm -rf autom4te.cache
+110 -6
View File
@@ -1,10 +1,114 @@
SUBDIRS = zfs zpool zdb zhack zinject zstream zstreamdump ztest
SUBDIRS += fsck_zfs vdev_id raidz_test zfs_ids_to_path
bin_SCRIPTS =
bin_PROGRAMS =
sbin_SCRIPTS =
sbin_PROGRAMS =
dist_bin_SCRIPTS =
zfsexec_PROGRAMS =
mounthelper_PROGRAMS =
sbin_SCRIPTS += fsck.zfs
SHELLCHECKSCRIPTS += fsck.zfs
CLEANFILES += fsck.zfs
dist_noinst_DATA += %D%/fsck.zfs.in
$(call SUBST,fsck.zfs,%D%/)
sbin_PROGRAMS += zfs_ids_to_path
CPPCHECKTARGETS += zfs_ids_to_path
zfs_ids_to_path_SOURCES = \
%D%/zfs_ids_to_path.c
zfs_ids_to_path_LDADD = \
libzfs.la
zhack_CPPFLAGS = $(AM_CPPFLAGS) $(LIBZPOOL_CPPFLAGS)
sbin_PROGRAMS += zhack
CPPCHECKTARGETS += zhack
zhack_SOURCES = \
%D%/zhack.c
zhack_LDADD = \
libzpool.la \
libzfs_core.la \
libnvpair.la
ztest_CFLAGS = $(AM_CFLAGS) $(KERNEL_CFLAGS)
ztest_CPPFLAGS = $(AM_CPPFLAGS) $(LIBZPOOL_CPPFLAGS)
sbin_PROGRAMS += ztest
CPPCHECKTARGETS += ztest
ztest_SOURCES = \
%D%/ztest.c
ztest_LDADD = \
libzpool.la \
libzfs_core.la \
libnvpair.la
ztest_LDADD += -lm
ztest_LDFLAGS = -pthread
include $(srcdir)/%D%/raidz_test/Makefile.am
include $(srcdir)/%D%/zdb/Makefile.am
include $(srcdir)/%D%/zfs/Makefile.am
include $(srcdir)/%D%/zinject/Makefile.am
include $(srcdir)/%D%/zpool/Makefile.am
include $(srcdir)/%D%/zpool_influxdb/Makefile.am
include $(srcdir)/%D%/zstream/Makefile.am
if USING_PYTHON
SUBDIRS += arcstat arc_summary dbufstat
endif
if BUILD_LINUX
SUBDIRS += mount_zfs zed zgenhostid zvol_id zvol_wait
mounthelper_PROGRAMS += mount.zfs
CPPCHECKTARGETS += mount.zfs
mount_zfs_SOURCES = \
%D%/mount_zfs.c
mount_zfs_LDADD = \
libzfs.la \
libzfs_core.la \
libnvpair.la
mount_zfs_LDADD += $(LTLIBINTL)
CPPCHECKTARGETS += raidz_test
sbin_PROGRAMS += zgenhostid
CPPCHECKTARGETS += zgenhostid
zgenhostid_SOURCES = \
%D%/zgenhostid.c
dist_bin_SCRIPTS += %D%/zvol_wait
SHELLCHECKSCRIPTS += %D%/zvol_wait
include $(srcdir)/%D%/zed/Makefile.am
endif
if USING_PYTHON
bin_SCRIPTS += arc_summary arcstat dbufstat zilstat
CLEANFILES += arc_summary arcstat dbufstat zilstat
dist_noinst_DATA += %D%/arc_summary %D%/arcstat.in %D%/dbufstat.in %D%/zilstat.in
$(call SUBST,arcstat,%D%/)
$(call SUBST,dbufstat,%D%/)
$(call SUBST,zilstat,%D%/)
arc_summary: %D%/arc_summary
$(AM_V_at)cp $< $@
endif
PHONY += cmd
cmd: $(bin_SCRIPTS) $(bin_PROGRAMS) $(sbin_SCRIPTS) $(sbin_PROGRAMS) $(dist_bin_SCRIPTS) $(zfsexec_PROGRAMS) $(mounthelper_PROGRAMS)
+357 -231
View File
@@ -32,7 +32,7 @@
Provides basic information on the ARC, its efficiency, the L2ARC (if present),
the Data Management Unit (DMU), Virtual Devices (VDEVs), and tunables. See
the in-source documentation and code at
https://github.com/zfsonlinux/zfs/blob/master/module/zfs/arc.c for details.
https://github.com/openzfs/zfs/blob/master/module/zfs/arc.c for details.
The original introduction to arc_summary can be found at
http://cuddletech.com/?p=454
"""
@@ -42,8 +42,15 @@ import os
import subprocess
import sys
import time
import errno
DESCRIPTION = 'Print ARC and other statistics for ZFS on Linux'
# We can't use env -S portably, and we need python3 -u to handle pipes in
# the shell abruptly closing the way we want to, so...
import io
if isinstance(sys.__stderr__.buffer, io.BufferedWriter):
os.execv(sys.executable, [sys.executable, "-u"] + sys.argv)
DESCRIPTION = 'Print ARC and other statistics for OpenZFS'
INDENT = ' '*8
LINE_LENGTH = 72
DATE_FORMAT = '%a %b %d %H:%M:%S %Y'
@@ -57,8 +64,6 @@ SECTION_HELP = 'print info from one section ('+' '.join(SECTIONS)+')'
SECTION_PATHS = {'arc': 'arcstats',
'dmu': 'dmu_tx',
'l2arc': 'arcstats', # L2ARC stuff lives in arcstats
'vdev': 'vdev_cache_stats',
'xuio': 'xuio_stats',
'zfetch': 'zfetchstats',
'zil': 'zil'}
@@ -84,18 +89,24 @@ if sys.platform.startswith('freebsd'):
# Requires py36-sysctl on FreeBSD
import sysctl
VDEV_CACHE_SIZE = 'vdev.cache_size'
def is_value(ctl):
return ctl.type != sysctl.CTLTYPE_NODE
def namefmt(ctl, base='vfs.zfs.'):
# base is removed from the name
cut = len(base)
return ctl.name[cut:]
def load_kstats(section):
base = 'kstat.zfs.misc.{section}.'.format(section=section)
# base is removed from the name
fmt = lambda kstat: '{name} : {value}'.format(name=kstat.name[len(base):],
fmt = lambda kstat: '{name} : {value}'.format(name=namefmt(kstat, base),
value=kstat.value)
return [fmt(kstat) for kstat in sysctl.filter(base)]
kstats = sysctl.filter(base)
return [fmt(kstat) for kstat in kstats if is_value(kstat)]
def get_params(base):
cut = 8 # = len('vfs.zfs.')
return {ctl.name[cut:]: str(ctl.value) for ctl in sysctl.filter(base)}
ctls = sysctl.filter(base)
return {namefmt(ctl): str(ctl.value) for ctl in ctls if is_value(ctl)}
def get_tunable_params():
return get_params('vfs.zfs')
@@ -112,25 +123,8 @@ if sys.platform.startswith('freebsd'):
return '{} version {}'.format(name, version)
def get_descriptions(_request):
# py-sysctl doesn't give descriptions, so we have to shell out.
command = ['sysctl', '-d', 'vfs.zfs']
# The recommended way to do this is with subprocess.run(). However,
# some installed versions of Python are < 3.5, so we offer them
# the option of doing it the old way (for now)
if 'run' in dir(subprocess):
info = subprocess.run(command, stdout=subprocess.PIPE,
universal_newlines=True)
lines = info.stdout.split('\n')
else:
info = subprocess.check_output(command, universal_newlines=True)
lines = info.split('\n')
def fmt(line):
name, desc = line.split(':', 1)
return (name.strip(), desc.strip())
return dict([fmt(line) for line in lines if len(line) > 0])
ctls = sysctl.filter('vfs.zfs')
return {namefmt(ctl): ctl.description for ctl in ctls if is_value(ctl)}
elif sys.platform.startswith('linux'):
@@ -138,8 +132,6 @@ elif sys.platform.startswith('linux'):
SPL_PATH = '/sys/module/spl/parameters'
TUNABLES_PATH = '/sys/module/zfs/parameters'
VDEV_CACHE_SIZE = 'zfs_vdev_cache_size'
def load_kstats(section):
path = os.path.join(KSTAT_PATH, section)
with open(path) as f:
@@ -171,21 +163,11 @@ elif sys.platform.startswith('linux'):
# The original arc_summary called /sbin/modinfo/{spl,zfs} to get
# the version information. We switch to /sys/module/{spl,zfs}/version
# to make sure we get what is really loaded in the kernel
command = ["cat", "/sys/module/{0}/version".format(request)]
req = request.upper()
# The recommended way to do this is with subprocess.run(). However,
# some installed versions of Python are < 3.5, so we offer them
# the option of doing it the old way (for now)
if 'run' in dir(subprocess):
info = subprocess.run(command, stdout=subprocess.PIPE,
universal_newlines=True)
version = info.stdout.strip()
else:
info = subprocess.check_output(command, universal_newlines=True)
version = info.strip()
return version
try:
with open("/sys/module/{}/version".format(request)) as f:
return f.read().strip()
except:
return "(unknown)"
def get_descriptions(request):
"""Get the descriptions of the Solaris Porting Layer (SPL) or the
@@ -204,21 +186,13 @@ elif sys.platform.startswith('linux'):
# there, so we fall back on modinfo
command = ["/sbin/modinfo", request, "-0"]
# The recommended way to do this is with subprocess.run(). However,
# some installed versions of Python are < 3.5, so we offer them
# the option of doing it the old way (for now)
info = ''
try:
if 'run' in dir(subprocess):
info = subprocess.run(command, stdout=subprocess.PIPE,
universal_newlines=True)
raw_output = info.stdout.split('\0')
else:
info = subprocess.check_output(command,
universal_newlines=True)
raw_output = info.split('\0')
info = subprocess.run(command, stdout=subprocess.PIPE,
check=True, universal_newlines=True)
raw_output = info.stdout.split('\0')
except subprocess.CalledProcessError:
print("Error: Descriptions not available",
@@ -241,6 +215,29 @@ elif sys.platform.startswith('linux'):
return descs
def handle_unraisableException(exc_type, exc_value=None, exc_traceback=None,
err_msg=None, object=None):
handle_Exception(exc_type, object, exc_traceback)
def handle_Exception(ex_cls, ex, tb):
if ex_cls is KeyboardInterrupt:
sys.exit()
if ex_cls is BrokenPipeError:
# It turns out that while sys.exit() triggers an exception
# not handled message on Python 3.8+, os._exit() does not.
os._exit(0)
if ex_cls is OSError:
if ex.errno == errno.ENOTCONN:
sys.exit()
raise ex
if hasattr(sys,'unraisablehook'): # Python 3.8+
sys.unraisablehook = handle_unraisableException
sys.excepthook = handle_Exception
def cleanup_line(single_line):
"""Format a raw line of data from /proc and isolate the name value
@@ -263,35 +260,34 @@ def draw_graph(kstats_dict):
arc_stats = isolate_section('arcstats', kstats_dict)
GRAPH_INDENT = ' '*4
GRAPH_WIDTH = 60
GRAPH_WIDTH = 70
arc_max = int(arc_stats['c_max'])
arc_size = f_bytes(arc_stats['size'])
arc_perc = f_perc(arc_stats['size'], arc_stats['c_max'])
mfu_size = f_bytes(arc_stats['mfu_size'])
mru_size = f_bytes(arc_stats['mru_size'])
meta_limit = f_bytes(arc_stats['arc_meta_limit'])
meta_size = f_bytes(arc_stats['arc_meta_used'])
dnode_limit = f_bytes(arc_stats['arc_dnode_limit'])
arc_perc = f_perc(arc_stats['size'], arc_max)
data_size = f_bytes(arc_stats['data_size'])
meta_size = f_bytes(arc_stats['metadata_size'])
dnode_size = f_bytes(arc_stats['dnode_size'])
info_form = ('ARC: {0} ({1}) MFU: {2} MRU: {3} META: {4} ({5}) '
'DNODE {6} ({7})')
info_line = info_form.format(arc_size, arc_perc, mfu_size, mru_size,
meta_size, meta_limit, dnode_size,
dnode_limit)
info_form = ('ARC: {0} ({1}) Data: {2} Meta: {3} Dnode: {4}')
info_line = info_form.format(arc_size, arc_perc, data_size, meta_size,
dnode_size)
info_spc = ' '*int((GRAPH_WIDTH-len(info_line))/2)
info_line = GRAPH_INDENT+info_spc+info_line
graph_line = GRAPH_INDENT+'+'+('-'*(GRAPH_WIDTH-2))+'+'
mfu_perc = float(int(arc_stats['mfu_size'])/int(arc_stats['c_max']))
mru_perc = float(int(arc_stats['mru_size'])/int(arc_stats['c_max']))
arc_perc = float(int(arc_stats['size'])/int(arc_stats['c_max']))
arc_perc = float(int(arc_stats['size'])/arc_max)
data_perc = float(int(arc_stats['data_size'])/arc_max)
meta_perc = float(int(arc_stats['metadata_size'])/arc_max)
dnode_perc = float(int(arc_stats['dnode_size'])/arc_max)
total_ticks = float(arc_perc)*GRAPH_WIDTH
mfu_ticks = mfu_perc*GRAPH_WIDTH
mru_ticks = mru_perc*GRAPH_WIDTH
other_ticks = total_ticks-(mfu_ticks+mru_ticks)
data_ticks = data_perc*GRAPH_WIDTH
meta_ticks = meta_perc*GRAPH_WIDTH
dnode_ticks = dnode_perc*GRAPH_WIDTH
other_ticks = total_ticks-(data_ticks+meta_ticks+dnode_ticks)
core_form = 'F'*int(mfu_ticks)+'R'*int(mru_ticks)+'O'*int(other_ticks)
core_form = 'D'*int(data_ticks)+'M'*int(meta_ticks)+'N'*int(dnode_ticks)+\
'O'*int(other_ticks)
core_spc = ' '*(GRAPH_WIDTH-(2+len(core_form)))
core_line = GRAPH_INDENT+'|'+core_form+core_spc+'|'
@@ -397,8 +393,12 @@ def format_raw_line(name, value):
if ARGS.alt:
result = '{0}{1}={2}'.format(INDENT, name, value)
else:
spc = LINE_LENGTH-(len(INDENT)+len(value))
result = '{0}{1:<{spc}}{2}'.format(INDENT, name, value, spc=spc)
# Right-align the value within the line length if it fits,
# otherwise just separate it from the name by a single space.
fit = LINE_LENGTH - len(INDENT) - len(name)
overflow = len(value) + 1
w = max(fit, overflow)
result = '{0}{1}{2:>{w}}'.format(INDENT, name, value, w=w)
return result
@@ -537,57 +537,132 @@ def section_arc(kstats_dict):
arc_stats = isolate_section('arcstats', kstats_dict)
throttle = arc_stats['memory_throttle_count']
if throttle == '0':
health = 'HEALTHY'
else:
health = 'THROTTLED'
prt_1('ARC status:', health)
prt_i1('Memory throttle count:', throttle)
print()
memory_all = arc_stats['memory_all_bytes']
memory_free = arc_stats['memory_free_bytes']
memory_avail = arc_stats['memory_available_bytes']
arc_size = arc_stats['size']
arc_target_size = arc_stats['c']
arc_max = arc_stats['c_max']
arc_min = arc_stats['c_min']
mfu_size = arc_stats['mfu_size']
mru_size = arc_stats['mru_size']
meta_limit = arc_stats['arc_meta_limit']
meta_size = arc_stats['arc_meta_used']
dnode_limit = arc_stats['arc_dnode_limit']
dnode_size = arc_stats['dnode_size']
target_size_ratio = '{0}:1'.format(int(arc_max) // int(arc_min))
prt_2('ARC size (current):',
f_perc(arc_size, arc_max), f_bytes(arc_size))
print('ARC status:')
prt_i1('Total memory size:', f_bytes(memory_all))
prt_i2('Min target size:', f_perc(arc_min, memory_all), f_bytes(arc_min))
prt_i2('Max target size:', f_perc(arc_max, memory_all), f_bytes(arc_max))
prt_i2('Target size (adaptive):',
f_perc(arc_target_size, arc_max), f_bytes(arc_target_size))
prt_i2('Min size (hard limit):',
f_perc(arc_min, arc_max), f_bytes(arc_min))
prt_i2('Max size (high water):',
target_size_ratio, f_bytes(arc_max))
caches_size = int(mfu_size)+int(mru_size)
prt_i2('Most Frequently Used (MFU) cache size:',
f_perc(mfu_size, caches_size), f_bytes(mfu_size))
prt_i2('Most Recently Used (MRU) cache size:',
f_perc(mru_size, caches_size), f_bytes(mru_size))
prt_i2('Metadata cache size (hard limit):',
f_perc(meta_limit, arc_max), f_bytes(meta_limit))
prt_i2('Metadata cache size (current):',
f_perc(meta_size, meta_limit), f_bytes(meta_size))
prt_i2('Dnode cache size (hard limit):',
f_perc(dnode_limit, meta_limit), f_bytes(dnode_limit))
prt_i2('Dnode cache size (current):',
f_perc(dnode_size, dnode_limit), f_bytes(dnode_size))
f_perc(arc_size, arc_max), f_bytes(arc_target_size))
prt_i2('Current size:', f_perc(arc_size, arc_max), f_bytes(arc_size))
prt_i1('Free memory size:', f_bytes(memory_free))
prt_i1('Available memory size:', f_bytes(memory_avail))
print()
compressed_size = arc_stats['compressed_size']
overhead_size = arc_stats['overhead_size']
bonus_size = arc_stats['bonus_size']
dnode_size = arc_stats['dnode_size']
dbuf_size = arc_stats['dbuf_size']
hdr_size = arc_stats['hdr_size']
l2_hdr_size = arc_stats['l2_hdr_size']
abd_chunk_waste_size = arc_stats['abd_chunk_waste_size']
prt_1('ARC structural breakdown (current size):', f_bytes(arc_size))
prt_i2('Compressed size:',
f_perc(compressed_size, arc_size), f_bytes(compressed_size))
prt_i2('Overhead size:',
f_perc(overhead_size, arc_size), f_bytes(overhead_size))
prt_i2('Bonus size:',
f_perc(bonus_size, arc_size), f_bytes(bonus_size))
prt_i2('Dnode size:',
f_perc(dnode_size, arc_size), f_bytes(dnode_size))
prt_i2('Dbuf size:',
f_perc(dbuf_size, arc_size), f_bytes(dbuf_size))
prt_i2('Header size:',
f_perc(hdr_size, arc_size), f_bytes(hdr_size))
prt_i2('L2 header size:',
f_perc(l2_hdr_size, arc_size), f_bytes(l2_hdr_size))
prt_i2('ABD chunk waste size:',
f_perc(abd_chunk_waste_size, arc_size), f_bytes(abd_chunk_waste_size))
print()
meta = arc_stats['meta']
pd = arc_stats['pd']
pm = arc_stats['pm']
data_size = arc_stats['data_size']
metadata_size = arc_stats['metadata_size']
anon_data = arc_stats['anon_data']
anon_metadata = arc_stats['anon_metadata']
mfu_data = arc_stats['mfu_data']
mfu_metadata = arc_stats['mfu_metadata']
mfu_edata = arc_stats['mfu_evictable_data']
mfu_emetadata = arc_stats['mfu_evictable_metadata']
mru_data = arc_stats['mru_data']
mru_metadata = arc_stats['mru_metadata']
mru_edata = arc_stats['mru_evictable_data']
mru_emetadata = arc_stats['mru_evictable_metadata']
mfug_data = arc_stats['mfu_ghost_data']
mfug_metadata = arc_stats['mfu_ghost_metadata']
mrug_data = arc_stats['mru_ghost_data']
mrug_metadata = arc_stats['mru_ghost_metadata']
unc_data = arc_stats['uncached_data']
unc_metadata = arc_stats['uncached_metadata']
caches_size = int(anon_data)+int(anon_metadata)+\
int(mfu_data)+int(mfu_metadata)+int(mru_data)+int(mru_metadata)+\
int(unc_data)+int(unc_metadata)
prt_1('ARC types breakdown (compressed + overhead):', f_bytes(caches_size))
prt_i2('Data size:',
f_perc(data_size, caches_size), f_bytes(data_size))
prt_i2('Metadata size:',
f_perc(metadata_size, caches_size), f_bytes(metadata_size))
print()
prt_1('ARC states breakdown (compressed + overhead):', f_bytes(caches_size))
prt_i2('Anonymous data size:',
f_perc(anon_data, caches_size), f_bytes(anon_data))
prt_i2('Anonymous metadata size:',
f_perc(anon_metadata, caches_size), f_bytes(anon_metadata))
s = 4294967296
v = (s-int(pd))*(s-int(meta))/s
prt_i2('MFU data target:', f_perc(v, s),
f_bytes(v / 65536 * caches_size / 65536))
prt_i2('MFU data size:',
f_perc(mfu_data, caches_size), f_bytes(mfu_data))
prt_i2('MFU evictable data size:',
f_perc(mfu_edata, caches_size), f_bytes(mfu_edata))
prt_i1('MFU ghost data size:', f_bytes(mfug_data))
v = (s-int(pm))*int(meta)/s
prt_i2('MFU metadata target:', f_perc(v, s),
f_bytes(v / 65536 * caches_size / 65536))
prt_i2('MFU metadata size:',
f_perc(mfu_metadata, caches_size), f_bytes(mfu_metadata))
prt_i2('MFU evictable metadata size:',
f_perc(mfu_emetadata, caches_size), f_bytes(mfu_emetadata))
prt_i1('MFU ghost metadata size:', f_bytes(mfug_metadata))
v = int(pd)*(s-int(meta))/s
prt_i2('MRU data target:', f_perc(v, s),
f_bytes(v / 65536 * caches_size / 65536))
prt_i2('MRU data size:',
f_perc(mru_data, caches_size), f_bytes(mru_data))
prt_i2('MRU evictable data size:',
f_perc(mru_edata, caches_size), f_bytes(mru_edata))
prt_i1('MRU ghost data size:', f_bytes(mrug_data))
v = int(pm)*int(meta)/s
prt_i2('MRU metadata target:', f_perc(v, s),
f_bytes(v / 65536 * caches_size / 65536))
prt_i2('MRU metadata size:',
f_perc(mru_metadata, caches_size), f_bytes(mru_metadata))
prt_i2('MRU evictable metadata size:',
f_perc(mru_emetadata, caches_size), f_bytes(mru_emetadata))
prt_i1('MRU ghost metadata size:', f_bytes(mrug_metadata))
prt_i2('Uncached data size:',
f_perc(unc_data, caches_size), f_bytes(unc_data))
prt_i2('Uncached metadata size:',
f_perc(unc_metadata, caches_size), f_bytes(unc_metadata))
print()
print('ARC hash breakdown:')
prt_i1('Elements max:', f_hits(arc_stats['hash_elements_max']))
prt_i2('Elements current:',
f_perc(arc_stats['hash_elements'], arc_stats['hash_elements_max']),
f_hits(arc_stats['hash_elements']))
prt_i1('Elements:', f_hits(arc_stats['hash_elements']))
prt_i1('Collisions:', f_hits(arc_stats['hash_collisions']))
prt_i1('Chain max:', f_hits(arc_stats['hash_chain_max']))
@@ -595,9 +670,26 @@ def section_arc(kstats_dict):
print()
print('ARC misc:')
prt_i1('Memory throttles:', arc_stats['memory_throttle_count'])
prt_i1('Memory direct reclaims:', arc_stats['memory_direct_count'])
prt_i1('Memory indirect reclaims:', arc_stats['memory_indirect_count'])
prt_i1('Deleted:', f_hits(arc_stats['deleted']))
prt_i1('Mutex misses:', f_hits(arc_stats['mutex_miss']))
prt_i1('Eviction skips:', f_hits(arc_stats['evict_skip']))
prt_i1('Eviction skips due to L2 writes:',
f_hits(arc_stats['evict_l2_skip']))
prt_i1('L2 cached evictions:', f_bytes(arc_stats['evict_l2_cached']))
prt_i1('L2 eligible evictions:', f_bytes(arc_stats['evict_l2_eligible']))
prt_i2('L2 eligible MFU evictions:',
f_perc(arc_stats['evict_l2_eligible_mfu'],
arc_stats['evict_l2_eligible']),
f_bytes(arc_stats['evict_l2_eligible_mfu']))
prt_i2('L2 eligible MRU evictions:',
f_perc(arc_stats['evict_l2_eligible_mru'],
arc_stats['evict_l2_eligible']),
f_bytes(arc_stats['evict_l2_eligible_mru']))
prt_i1('L2 ineligible evictions:',
f_bytes(arc_stats['evict_l2_ineligible']))
print()
@@ -606,78 +698,119 @@ def section_archits(kstats_dict):
"""
arc_stats = isolate_section('arcstats', kstats_dict)
all_accesses = int(arc_stats['hits'])+int(arc_stats['misses'])
actual_hits = int(arc_stats['mfu_hits'])+int(arc_stats['mru_hits'])
prt_1('ARC total accesses (hits + misses):', f_hits(all_accesses))
ta_todo = (('Cache hit ratio:', arc_stats['hits']),
('Cache miss ratio:', arc_stats['misses']),
('Actual hit ratio (MFU + MRU hits):', actual_hits))
all_accesses = int(arc_stats['hits'])+int(arc_stats['iohits'])+\
int(arc_stats['misses'])
prt_1('ARC total accesses:', f_hits(all_accesses))
ta_todo = (('Total hits:', arc_stats['hits']),
('Total I/O hits:', arc_stats['iohits']),
('Total misses:', arc_stats['misses']))
for title, value in ta_todo:
prt_i2(title, f_perc(value, all_accesses), f_hits(value))
print()
dd_total = int(arc_stats['demand_data_hits']) +\
int(arc_stats['demand_data_iohits']) +\
int(arc_stats['demand_data_misses'])
prt_i2('Data demand efficiency:',
f_perc(arc_stats['demand_data_hits'], dd_total),
f_hits(dd_total))
dp_total = int(arc_stats['prefetch_data_hits']) +\
int(arc_stats['prefetch_data_misses'])
prt_i2('Data prefetch efficiency:',
f_perc(arc_stats['prefetch_data_hits'], dp_total),
f_hits(dp_total))
known_hits = int(arc_stats['mfu_hits']) +\
int(arc_stats['mru_hits']) +\
int(arc_stats['mfu_ghost_hits']) +\
int(arc_stats['mru_ghost_hits'])
anon_hits = int(arc_stats['hits'])-known_hits
prt_2('ARC demand data accesses:', f_perc(dd_total, all_accesses),
f_hits(dd_total))
dd_todo = (('Demand data hits:', arc_stats['demand_data_hits']),
('Demand data I/O hits:', arc_stats['demand_data_iohits']),
('Demand data misses:', arc_stats['demand_data_misses']))
for title, value in dd_todo:
prt_i2(title, f_perc(value, dd_total), f_hits(value))
print()
print('Cache hits by cache type:')
dm_total = int(arc_stats['demand_metadata_hits']) +\
int(arc_stats['demand_metadata_iohits']) +\
int(arc_stats['demand_metadata_misses'])
prt_2('ARC demand metadata accesses:', f_perc(dm_total, all_accesses),
f_hits(dm_total))
dm_todo = (('Demand metadata hits:', arc_stats['demand_metadata_hits']),
('Demand metadata I/O hits:',
arc_stats['demand_metadata_iohits']),
('Demand metadata misses:', arc_stats['demand_metadata_misses']))
for title, value in dm_todo:
prt_i2(title, f_perc(value, dm_total), f_hits(value))
print()
pd_total = int(arc_stats['prefetch_data_hits']) +\
int(arc_stats['prefetch_data_iohits']) +\
int(arc_stats['prefetch_data_misses'])
prt_2('ARC prefetch data accesses:', f_perc(pd_total, all_accesses),
f_hits(pd_total))
pd_todo = (('Prefetch data hits:', arc_stats['prefetch_data_hits']),
('Prefetch data I/O hits:', arc_stats['prefetch_data_iohits']),
('Prefetch data misses:', arc_stats['prefetch_data_misses']))
for title, value in pd_todo:
prt_i2(title, f_perc(value, pd_total), f_hits(value))
print()
pm_total = int(arc_stats['prefetch_metadata_hits']) +\
int(arc_stats['prefetch_metadata_iohits']) +\
int(arc_stats['prefetch_metadata_misses'])
prt_2('ARC prefetch metadata accesses:', f_perc(pm_total, all_accesses),
f_hits(pm_total))
pm_todo = (('Prefetch metadata hits:',
arc_stats['prefetch_metadata_hits']),
('Prefetch metadata I/O hits:',
arc_stats['prefetch_metadata_iohits']),
('Prefetch metadata misses:',
arc_stats['prefetch_metadata_misses']))
for title, value in pm_todo:
prt_i2(title, f_perc(value, pm_total), f_hits(value))
print()
all_prefetches = int(arc_stats['predictive_prefetch'])+\
int(arc_stats['prescient_prefetch'])
prt_2('ARC predictive prefetches:',
f_perc(arc_stats['predictive_prefetch'], all_prefetches),
f_hits(arc_stats['predictive_prefetch']))
prt_i2('Demand hits after predictive:',
f_perc(arc_stats['demand_hit_predictive_prefetch'],
arc_stats['predictive_prefetch']),
f_hits(arc_stats['demand_hit_predictive_prefetch']))
prt_i2('Demand I/O hits after predictive:',
f_perc(arc_stats['demand_iohit_predictive_prefetch'],
arc_stats['predictive_prefetch']),
f_hits(arc_stats['demand_iohit_predictive_prefetch']))
never = int(arc_stats['predictive_prefetch']) -\
int(arc_stats['demand_hit_predictive_prefetch']) -\
int(arc_stats['demand_iohit_predictive_prefetch'])
prt_i2('Never demanded after predictive:',
f_perc(never, arc_stats['predictive_prefetch']),
f_hits(never))
print()
prt_2('ARC prescient prefetches:',
f_perc(arc_stats['prescient_prefetch'], all_prefetches),
f_hits(arc_stats['prescient_prefetch']))
prt_i2('Demand hits after prescient:',
f_perc(arc_stats['demand_hit_prescient_prefetch'],
arc_stats['prescient_prefetch']),
f_hits(arc_stats['demand_hit_prescient_prefetch']))
prt_i2('Demand I/O hits after prescient:',
f_perc(arc_stats['demand_iohit_prescient_prefetch'],
arc_stats['prescient_prefetch']),
f_hits(arc_stats['demand_iohit_prescient_prefetch']))
never = int(arc_stats['prescient_prefetch'])-\
int(arc_stats['demand_hit_prescient_prefetch'])-\
int(arc_stats['demand_iohit_prescient_prefetch'])
prt_i2('Never demanded after prescient:',
f_perc(never, arc_stats['prescient_prefetch']),
f_hits(never))
print()
print('ARC states hits of all accesses:')
cl_todo = (('Most frequently used (MFU):', arc_stats['mfu_hits']),
('Most recently used (MRU):', arc_stats['mru_hits']),
('Most frequently used (MFU) ghost:',
arc_stats['mfu_ghost_hits']),
('Most recently used (MRU) ghost:',
arc_stats['mru_ghost_hits']))
arc_stats['mru_ghost_hits']),
('Uncached:', arc_stats['uncached_hits']))
for title, value in cl_todo:
prt_i2(title, f_perc(value, arc_stats['hits']), f_hits(value))
# For some reason, anon_hits can turn negative, which is weird. Until we
# have figured out why this happens, we just hide the problem, following
# the behavior of the original arc_summary.
if anon_hits >= 0:
prt_i2('Anonymously used:',
f_perc(anon_hits, arc_stats['hits']), f_hits(anon_hits))
print()
print('Cache hits by data type:')
dt_todo = (('Demand data:', arc_stats['demand_data_hits']),
('Demand prefetch data:', arc_stats['prefetch_data_hits']),
('Demand metadata:', arc_stats['demand_metadata_hits']),
('Demand prefetch metadata:',
arc_stats['prefetch_metadata_hits']))
for title, value in dt_todo:
prt_i2(title, f_perc(value, arc_stats['hits']), f_hits(value))
print()
print('Cache misses by data type:')
dm_todo = (('Demand data:', arc_stats['demand_data_misses']),
('Demand prefetch data:',
arc_stats['prefetch_data_misses']),
('Demand metadata:', arc_stats['demand_metadata_misses']),
('Demand prefetch metadata:',
arc_stats['prefetch_metadata_misses']))
for title, value in dm_todo:
prt_i2(title, f_perc(value, arc_stats['misses']), f_hits(value))
prt_i2(title, f_perc(value, all_accesses), f_hits(value))
print()
@@ -686,13 +819,28 @@ def section_dmu(kstats_dict):
zfetch_stats = isolate_section('zfetchstats', kstats_dict)
zfetch_access_total = int(zfetch_stats['hits'])+int(zfetch_stats['misses'])
zfetch_access_total = int(zfetch_stats['hits']) +\
int(zfetch_stats['future']) + int(zfetch_stats['stride']) +\
int(zfetch_stats['past']) + int(zfetch_stats['misses'])
prt_1('DMU prefetch efficiency:', f_hits(zfetch_access_total))
prt_i2('Hit ratio:', f_perc(zfetch_stats['hits'], zfetch_access_total),
prt_1('DMU predictive prefetcher calls:', f_hits(zfetch_access_total))
prt_i2('Stream hits:',
f_perc(zfetch_stats['hits'], zfetch_access_total),
f_hits(zfetch_stats['hits']))
prt_i2('Miss ratio:', f_perc(zfetch_stats['misses'], zfetch_access_total),
future = int(zfetch_stats['future']) + int(zfetch_stats['stride'])
prt_i2('Hits ahead of stream:', f_perc(future, zfetch_access_total),
f_hits(future))
prt_i2('Hits behind stream:',
f_perc(zfetch_stats['past'], zfetch_access_total),
f_hits(zfetch_stats['past']))
prt_i2('Stream misses:',
f_perc(zfetch_stats['misses'], zfetch_access_total),
f_hits(zfetch_stats['misses']))
prt_i2('Streams limit reached:',
f_perc(zfetch_stats['max_streams'], zfetch_stats['misses']),
f_hits(zfetch_stats['max_streams']))
prt_i1('Stream strides:', f_hits(zfetch_stats['stride']))
prt_i1('Prefetches issued', f_hits(zfetch_stats['io_issued']))
print()
@@ -724,7 +872,8 @@ def section_l2arc(kstats_dict):
('Free on write:', 'l2_free_on_write'),
('R/W clashes:', 'l2_rw_clash'),
('Bad checksums:', 'l2_cksum_bad'),
('I/O errors:', 'l2_io_error'))
('Read errors:', 'l2_io_error'),
('Write errors:', 'l2_writes_error'))
for title, value in l2_todo:
prt_i1(title, f_hits(arc_stats[value]))
@@ -736,6 +885,21 @@ def section_l2arc(kstats_dict):
prt_i2('Header size:',
f_perc(arc_stats['l2_hdr_size'], arc_stats['l2_size']),
f_bytes(arc_stats['l2_hdr_size']))
prt_i2('MFU allocated size:',
f_perc(arc_stats['l2_mfu_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_mfu_asize']))
prt_i2('MRU allocated size:',
f_perc(arc_stats['l2_mru_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_mru_asize']))
prt_i2('Prefetch allocated size:',
f_perc(arc_stats['l2_prefetch_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_prefetch_asize']))
prt_i2('Data (buffer content) allocated size:',
f_perc(arc_stats['l2_bufc_data_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_bufc_data_asize']))
prt_i2('Metadata (buffer content) allocated size:',
f_perc(arc_stats['l2_bufc_metadata_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_bufc_metadata_asize']))
print()
prt_1('L2ARC breakdown:', f_hits(l2_access_total))
@@ -745,28 +909,20 @@ def section_l2arc(kstats_dict):
prt_i2('Miss ratio:',
f_perc(arc_stats['l2_misses'], l2_access_total),
f_hits(arc_stats['l2_misses']))
prt_i1('Feeds:', f_hits(arc_stats['l2_feeds']))
print()
print('L2ARC writes:')
if arc_stats['l2_writes_done'] != arc_stats['l2_writes_sent']:
prt_i2('Writes sent:', 'FAULTED', f_hits(arc_stats['l2_writes_sent']))
prt_i2('Done ratio:',
f_perc(arc_stats['l2_writes_done'],
arc_stats['l2_writes_sent']),
f_hits(arc_stats['l2_writes_done']))
prt_i2('Error ratio:',
f_perc(arc_stats['l2_writes_error'],
arc_stats['l2_writes_sent']),
f_hits(arc_stats['l2_writes_error']))
else:
prt_i2('Writes sent:', '100 %', f_hits(arc_stats['l2_writes_sent']))
print('L2ARC I/O:')
prt_i2('Reads:',
f_bytes(arc_stats['l2_read_bytes']),
f_hits(arc_stats['l2_hits']))
prt_i2('Writes:',
f_bytes(arc_stats['l2_write_bytes']),
f_hits(arc_stats['l2_writes_sent']))
print()
print('L2ARC evicts:')
prt_i1('Lock retries:', f_hits(arc_stats['l2_evict_lock_retry']))
prt_i1('Upon reading:', f_hits(arc_stats['l2_evict_reading']))
prt_i1('L1 cached:', f_hits(arc_stats['l2_evict_l1cached']))
prt_i1('While reading:', f_hits(arc_stats['l2_evict_reading']))
print()
@@ -826,38 +982,9 @@ def section_tunables(*_):
print()
def section_vdev(kstats_dict):
"""Collect information on VDEV caches"""
# Currently [Nov 2017] the VDEV cache is disabled, because it is actually
# harmful. When this is the case, we just skip the whole entry. See
# https://github.com/zfsonlinux/zfs/blob/master/module/zfs/vdev_cache.c
# for details
tunables = get_vdev_params()
if tunables[VDEV_CACHE_SIZE] == '0':
print('VDEV cache disabled, skipping section\n')
return
vdev_stats = isolate_section('vdev_cache_stats', kstats_dict)
vdev_cache_total = int(vdev_stats['hits']) +\
int(vdev_stats['misses']) +\
int(vdev_stats['delegations'])
prt_1('VDEV cache summary:', f_hits(vdev_cache_total))
prt_i2('Hit ratio:', f_perc(vdev_stats['hits'], vdev_cache_total),
f_hits(vdev_stats['hits']))
prt_i2('Miss ratio:', f_perc(vdev_stats['misses'], vdev_cache_total),
f_hits(vdev_stats['misses']))
prt_i2('Delegations:', f_perc(vdev_stats['delegations'], vdev_cache_total),
f_hits(vdev_stats['delegations']))
print()
def section_zil(kstats_dict):
"""Collect information on the ZFS Intent Log. Some of the information
taken from https://github.com/zfsonlinux/zfs/blob/master/include/sys/zil.h
taken from https://github.com/openzfs/zfs/blob/master/include/sys/zil.h
"""
zil_stats = isolate_section('zil', kstats_dict)
@@ -882,7 +1009,6 @@ section_calls = {'arc': section_arc,
'l2arc': section_l2arc,
'spl': section_spl,
'tunables': section_tunables,
'vdev': section_vdev,
'zil': section_zil}
-1
View File
@@ -1 +0,0 @@
arc_summary
-13
View File
@@ -1,13 +0,0 @@
bin_SCRIPTS = arc_summary
CLEANFILES = arc_summary
EXTRA_DIST = arc_summary2 arc_summary3
if USING_PYTHON_2
SCRIPT = arc_summary2
else
SCRIPT = arc_summary3
endif
arc_summary: $(SCRIPT)
cp $< $@
File diff suppressed because it is too large Load Diff
Executable
+801
View File
@@ -0,0 +1,801 @@
#!/usr/bin/env @PYTHON_SHEBANG@
#
# Print out ZFS ARC Statistics exported via kstat(1)
# For a definition of fields, or usage, use arcstat -v
#
# This script was originally a fork of the original arcstat.pl (0.1)
# by Neelakanth Nadgir, originally published on his Sun blog on
# 09/18/2007
# http://blogs.sun.com/realneel/entry/zfs_arc_statistics
#
# A new version aimed to improve upon the original by adding features
# and fixing bugs as needed. This version was maintained by Mike
# Harsch and was hosted in a public open source repository:
# http://github.com/mharsch/arcstat
#
# but has since moved to the illumos-gate repository.
#
# This Python port was written by John Hixson for FreeNAS, introduced
# in commit e2c29f:
# https://github.com/freenas/freenas
#
# and has been improved by many people since.
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License, Version 1.0 only
# (the "License"). You may not use this file except in compliance
# with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Fields have a fixed width. Every interval, we fill the "v"
# hash with its corresponding value (v[field]=value) using calculate().
# @hdr is the array of fields that needs to be printed, so we
# just iterate over this array and print the values using our pretty printer.
#
# This script must remain compatible with Python 3.6+.
#
import sys
import time
import getopt
import re
import copy
from signal import signal, SIGINT, SIGWINCH, SIG_DFL
cols = {
# HDR: [Size, Scale, Description]
"time": [8, -1, "Time"],
"hits": [4, 1000, "ARC hits per second"],
"iohs": [4, 1000, "ARC I/O hits per second"],
"miss": [4, 1000, "ARC misses per second"],
"read": [4, 1000, "Total ARC accesses per second"],
"hit%": [4, 100, "ARC hit percentage"],
"ioh%": [4, 100, "ARC I/O hit percentage"],
"miss%": [5, 100, "ARC miss percentage"],
"dhit": [4, 1000, "Demand hits per second"],
"dioh": [4, 1000, "Demand I/O hits per second"],
"dmis": [4, 1000, "Demand misses per second"],
"dh%": [3, 100, "Demand hit percentage"],
"di%": [3, 100, "Demand I/O hit percentage"],
"dm%": [3, 100, "Demand miss percentage"],
"ddhit": [5, 1000, "Demand data hits per second"],
"ddioh": [5, 1000, "Demand data I/O hits per second"],
"ddmis": [5, 1000, "Demand data misses per second"],
"ddh%": [4, 100, "Demand data hit percentage"],
"ddi%": [4, 100, "Demand data I/O hit percentage"],
"ddm%": [4, 100, "Demand data miss percentage"],
"dmhit": [5, 1000, "Demand metadata hits per second"],
"dmioh": [5, 1000, "Demand metadata I/O hits per second"],
"dmmis": [5, 1000, "Demand metadata misses per second"],
"dmh%": [4, 100, "Demand metadata hit percentage"],
"dmi%": [4, 100, "Demand metadata I/O hit percentage"],
"dmm%": [4, 100, "Demand metadata miss percentage"],
"phit": [4, 1000, "Prefetch hits per second"],
"pioh": [4, 1000, "Prefetch I/O hits per second"],
"pmis": [4, 1000, "Prefetch misses per second"],
"ph%": [3, 100, "Prefetch hits percentage"],
"pi%": [3, 100, "Prefetch I/O hits percentage"],
"pm%": [3, 100, "Prefetch miss percentage"],
"pdhit": [5, 1000, "Prefetch data hits per second"],
"pdioh": [5, 1000, "Prefetch data I/O hits per second"],
"pdmis": [5, 1000, "Prefetch data misses per second"],
"pdh%": [4, 100, "Prefetch data hits percentage"],
"pdi%": [4, 100, "Prefetch data I/O hits percentage"],
"pdm%": [4, 100, "Prefetch data miss percentage"],
"pmhit": [5, 1000, "Prefetch metadata hits per second"],
"pmioh": [5, 1000, "Prefetch metadata I/O hits per second"],
"pmmis": [5, 1000, "Prefetch metadata misses per second"],
"pmh%": [4, 100, "Prefetch metadata hits percentage"],
"pmi%": [4, 100, "Prefetch metadata I/O hits percentage"],
"pmm%": [4, 100, "Prefetch metadata miss percentage"],
"mhit": [4, 1000, "Metadata hits per second"],
"mioh": [4, 1000, "Metadata I/O hits per second"],
"mmis": [4, 1000, "Metadata misses per second"],
"mread": [5, 1000, "Metadata accesses per second"],
"mh%": [3, 100, "Metadata hit percentage"],
"mi%": [3, 100, "Metadata I/O hit percentage"],
"mm%": [3, 100, "Metadata miss percentage"],
"arcsz": [5, 1024, "ARC size"],
"size": [5, 1024, "ARC size"],
"c": [5, 1024, "ARC target size"],
"mfu": [4, 1000, "MFU list hits per second"],
"mru": [4, 1000, "MRU list hits per second"],
"mfug": [4, 1000, "MFU ghost list hits per second"],
"mrug": [4, 1000, "MRU ghost list hits per second"],
"unc": [4, 1000, "Uncached list hits per second"],
"eskip": [5, 1000, "evict_skip per second"],
"el2skip": [7, 1000, "evict skip, due to l2 writes, per second"],
"el2cach": [7, 1024, "Size of L2 cached evictions per second"],
"el2el": [5, 1024, "Size of L2 eligible evictions per second"],
"el2mfu": [6, 1024, "Size of L2 eligible MFU evictions per second"],
"el2mru": [6, 1024, "Size of L2 eligible MRU evictions per second"],
"el2inel": [7, 1024, "Size of L2 ineligible evictions per second"],
"mtxmis": [6, 1000, "mutex_miss per second"],
"dread": [5, 1000, "Demand accesses per second"],
"ddread": [6, 1000, "Demand data accesses per second"],
"dmread": [6, 1000, "Demand metadata accesses per second"],
"pread": [5, 1000, "Prefetch accesses per second"],
"pdread": [6, 1000, "Prefetch data accesses per second"],
"pmread": [6, 1000, "Prefetch metadata accesses per second"],
"l2hits": [6, 1000, "L2ARC hits per second"],
"l2miss": [6, 1000, "L2ARC misses per second"],
"l2read": [6, 1000, "Total L2ARC accesses per second"],
"l2hit%": [6, 100, "L2ARC access hit percentage"],
"l2miss%": [7, 100, "L2ARC access miss percentage"],
"l2pref": [6, 1024, "L2ARC prefetch allocated size"],
"l2mfu": [5, 1024, "L2ARC MFU allocated size"],
"l2mru": [5, 1024, "L2ARC MRU allocated size"],
"l2data": [6, 1024, "L2ARC data allocated size"],
"l2meta": [6, 1024, "L2ARC metadata allocated size"],
"l2pref%": [7, 100, "L2ARC prefetch percentage"],
"l2mfu%": [6, 100, "L2ARC MFU percentage"],
"l2mru%": [6, 100, "L2ARC MRU percentage"],
"l2data%": [7, 100, "L2ARC data percentage"],
"l2meta%": [7, 100, "L2ARC metadata percentage"],
"l2asize": [7, 1024, "Actual (compressed) size of the L2ARC"],
"l2size": [6, 1024, "Size of the L2ARC"],
"l2bytes": [7, 1024, "Bytes read per second from the L2ARC"],
"l2wbytes": [8, 1024, "Bytes written per second to the L2ARC"],
"grow": [4, 1000, "ARC grow disabled"],
"need": [5, 1024, "ARC reclaim need"],
"free": [5, 1024, "ARC free memory"],
"avail": [5, 1024, "ARC available memory"],
"waste": [5, 1024, "Wasted memory due to round up to pagesize"],
"ztotal": [6, 1000, "zfetch total prefetcher calls per second"],
"zhits": [5, 1000, "zfetch stream hits per second"],
"zahead": [6, 1000, "zfetch hits ahead of streams per second"],
"zpast": [5, 1000, "zfetch hits behind streams per second"],
"zmisses": [7, 1000, "zfetch stream misses per second"],
"zmax": [4, 1000, "zfetch limit reached per second"],
"zfuture": [7, 1000, "zfetch stream future per second"],
"zstride": [7, 1000, "zfetch stream strides per second"],
"zissued": [7, 1000, "zfetch prefetches issued per second"],
"zactive": [7, 1000, "zfetch prefetches active per second"],
}
# ARC structural breakdown from arc_summary
structfields = {
"cmp": ["compressed", "Compressed"],
"ovh": ["overhead", "Overhead"],
"bon": ["bonus", "Bonus"],
"dno": ["dnode", "Dnode"],
"dbu": ["dbuf", "Dbuf"],
"hdr": ["hdr", "Header"],
"l2h": ["l2_hdr", "L2 header"],
"abd": ["abd_chunk_waste", "ABD chunk waste"],
}
structstats = { # size stats
"percent": "size", # percentage of this value
"sz": ["_size", "size"],
}
# ARC types breakdown from arc_summary
typefields = {
"data": ["data", "ARC data"],
"meta": ["metadata", "ARC metadata"],
}
typestats = { # size stats
"percent": "cachessz", # percentage of this value
"tg": ["_target", "target"],
"sz": ["_size", "size"],
}
# ARC states breakdown from arc_summary
statefields = {
"ano": ["anon", "Anonymous"],
"mfu": ["mfu", "MFU"],
"mru": ["mru", "MRU"],
"unc": ["uncached", "Uncached"],
}
targetstats = {
"percent": "cachessz", # percentage of this value
"fields": ["mfu", "mru"], # only applicable to these fields
"tg": ["_target", "target"],
"dt": ["_data_target", "data target"],
"mt": ["_metadata_target", "metadata target"],
}
statestats = { # size stats
"percent": "cachessz", # percentage of this value
"sz": ["_size", "size"],
"da": ["_data", "data size"],
"me": ["_metadata", "metadata size"],
"ed": ["_evictable_data", "evictable data size"],
"em": ["_evictable_metadata", "evictable metadata size"],
}
ghoststats = {
"fields": ["mfu", "mru"], # only applicable to these fields
"gsz": ["_ghost_size", "ghost size"],
"gd": ["_ghost_data", "ghost data size"],
"gm": ["_ghost_metadata", "ghost metadata size"],
}
# fields and stats
fieldstats = [
[structfields, structstats],
[typefields, typestats],
[statefields, targetstats, statestats, ghoststats],
]
for fs in fieldstats:
fields, stats = fs[0], fs[1:]
for field, fieldval in fields.items():
for group in stats:
for stat, statval in group.items():
if stat in ["fields", "percent"] or \
("fields" in group and field not in group["fields"]):
continue
colname = field + stat
coldesc = fieldval[1] + " " + statval[1]
cols[colname] = [len(colname), 1024, coldesc]
if "percent" in group:
cols[colname + "%"] = [len(colname) + 1, 100, \
coldesc + " percentage"]
v = {}
hdr = ["time", "read", "ddread", "ddh%", "dmread", "dmh%", "pread", "ph%",
"size", "c", "avail"]
xhdr = ["time", "mfu", "mru", "mfug", "mrug", "unc", "eskip", "mtxmis",
"dread", "pread", "read"]
zhdr = ["time", "ztotal", "zhits", "zahead", "zpast", "zmisses", "zmax",
"zfuture", "zstride", "zissued", "zactive"]
sint = 1 # Default interval is 1 second
count = 1 # Default count is 1
hdr_intr = 20 # Print header every 20 lines of output
opfile = None
sep = " " # Default separator is 2 spaces
l2exist = False
cmd = ("Usage: arcstat [-havxp] [-f fields] [-o file] [-s string] [interval "
"[count]]\n")
cur = {}
d = {}
out = None
kstat = None
pretty_print = True
if sys.platform.startswith('freebsd'):
# Requires py-sysctl on FreeBSD
import sysctl
def kstat_update():
global kstat
k = [ctl for ctl in sysctl.filter('kstat.zfs.misc.arcstats')
if ctl.type != sysctl.CTLTYPE_NODE]
k += [ctl for ctl in sysctl.filter('kstat.zfs.misc.zfetchstats')
if ctl.type != sysctl.CTLTYPE_NODE]
if not k:
sys.exit(1)
kstat = {}
for s in k:
if not s:
continue
name, value = s.name, s.value
if "arcstats" in name:
# Trims 'kstat.zfs.misc.arcstats' from the name
kstat[name[24:]] = int(value)
else:
kstat["zfetch_" + name[27:]] = int(value)
elif sys.platform.startswith('linux'):
def kstat_update():
global kstat
k1 = [line.strip() for line in open('/proc/spl/kstat/zfs/arcstats')]
k2 = ["zfetch_" + line.strip() for line in
open('/proc/spl/kstat/zfs/zfetchstats')]
if k1 is None or k2 is None:
sys.exit(1)
del k1[0:2]
del k2[0:2]
k = k1 + k2
kstat = {}
for s in k:
if not s:
continue
name, unused, value = s.split()
kstat[name] = int(value)
def detailed_usage():
sys.stderr.write("%s\n" % cmd)
sys.stderr.write("Field definitions are as follows:\n")
for key in cols:
sys.stderr.write("%11s : %s\n" % (key, cols[key][2]))
sys.stderr.write("\n")
sys.exit(0)
def usage():
sys.stderr.write("%s\n" % cmd)
sys.stderr.write("\t -h : Print this help message\n")
sys.stderr.write("\t -a : Print all possible stats\n")
sys.stderr.write("\t -v : List all possible field headers and definitions"
"\n")
sys.stderr.write("\t -x : Print extended stats\n")
sys.stderr.write("\t -z : Print zfetch stats\n")
sys.stderr.write("\t -f : Specify specific fields to print (see -v)\n")
sys.stderr.write("\t -o : Redirect output to the specified file\n")
sys.stderr.write("\t -s : Override default field separator with custom "
"character or string\n")
sys.stderr.write("\t -p : Disable auto-scaling of numerical fields\n")
sys.stderr.write("\nExamples:\n")
sys.stderr.write("\tarcstat -o /tmp/a.log 2 10\n")
sys.stderr.write("\tarcstat -s \",\" -o /tmp/a.log 2 10\n")
sys.stderr.write("\tarcstat -v\n")
sys.stderr.write("\tarcstat -f time,hit%,dh%,ph%,mh% 1\n")
sys.stderr.write("\n")
sys.exit(1)
def snap_stats():
global cur
global kstat
prev = copy.deepcopy(cur)
kstat_update()
cur = kstat
# fill in additional values from arc_summary
cur["caches_size"] = caches_size = cur["anon_data"]+cur["anon_metadata"]+\
cur["mfu_data"]+cur["mfu_metadata"]+cur["mru_data"]+cur["mru_metadata"]+\
cur["uncached_data"]+cur["uncached_metadata"]
s = 4294967296
pd = cur["pd"]
pm = cur["pm"]
meta = cur["meta"]
v = (s-int(pd))*(s-int(meta))/s
cur["mfu_data_target"] = v / 65536 * caches_size / 65536
v = (s-int(pm))*int(meta)/s
cur["mfu_metadata_target"] = v / 65536 * caches_size / 65536
v = int(pd)*(s-int(meta))/s
cur["mru_data_target"] = v / 65536 * caches_size / 65536
v = int(pm)*int(meta)/s
cur["mru_metadata_target"] = v / 65536 * caches_size / 65536
cur["data_target"] = cur["mfu_data_target"] + cur["mru_data_target"]
cur["metadata_target"] = cur["mfu_metadata_target"] + cur["mru_metadata_target"]
cur["mfu_target"] = cur["mfu_data_target"] + cur["mfu_metadata_target"]
cur["mru_target"] = cur["mru_data_target"] + cur["mru_metadata_target"]
for key in cur:
if re.match(key, "class"):
continue
if key in prev:
d[key] = cur[key] - prev[key]
else:
d[key] = cur[key]
def isint(num):
if isinstance(num, float):
return num.is_integer()
if isinstance(num, int):
return True
return False
def prettynum(sz, scale, num=0):
suffix = ['', 'K', 'M', 'G', 'T', 'P', 'E', 'Z']
index = 0
# Special case for date field
if scale == -1:
return "%s" % num
if scale != 100:
while abs(num) > scale and index < 5:
num = num / scale
index += 1
width = sz - (0 if index == 0 else 1)
intlen = len("%.0f" % num) # %.0f rounds to nearest int
if sint == 1 and isint(num) or width < intlen + 2:
decimal = 0
else:
decimal = 1
return "%*.*f%s" % (width, decimal, num, suffix[index])
def print_values():
global hdr
global sep
global v
global pretty_print
if pretty_print:
fmt = lambda col: prettynum(cols[col][0], cols[col][1], v[col])
else:
fmt = lambda col: str(v[col])
sys.stdout.write(sep.join(fmt(col) for col in hdr))
sys.stdout.write("\n")
sys.stdout.flush()
def print_header():
global hdr
global sep
global pretty_print
if pretty_print:
fmt = lambda col: "%*s" % (cols[col][0], col)
else:
fmt = lambda col: col
sys.stdout.write(sep.join(fmt(col) for col in hdr))
sys.stdout.write("\n")
def get_terminal_lines():
try:
import fcntl
import termios
import struct
data = fcntl.ioctl(sys.stdout.fileno(), termios.TIOCGWINSZ, '1234')
sz = struct.unpack('hh', data)
return sz[0]
except Exception:
pass
def update_hdr_intr():
global hdr_intr
lines = get_terminal_lines()
if lines and lines > 3:
hdr_intr = lines - 3
def resize_handler(signum, frame):
update_hdr_intr()
def init():
global sint
global count
global hdr
global xhdr
global zhdr
global opfile
global sep
global out
global l2exist
global pretty_print
desired_cols = None
aflag = False
xflag = False
hflag = False
vflag = False
zflag = False
i = 1
try:
opts, args = getopt.getopt(
sys.argv[1:],
"axzo:hvs:f:p",
[
"all",
"extended",
"zfetch",
"outfile",
"help",
"verbose",
"separator",
"columns",
"parsable"
]
)
except getopt.error as msg:
sys.stderr.write("Error: %s\n" % str(msg))
usage()
opts = None
for opt, arg in opts:
if opt in ('-a', '--all'):
aflag = True
if opt in ('-x', '--extended'):
xflag = True
if opt in ('-o', '--outfile'):
opfile = arg
i += 1
if opt in ('-h', '--help'):
hflag = True
if opt in ('-v', '--verbose'):
vflag = True
if opt in ('-s', '--separator'):
sep = arg
i += 1
if opt in ('-f', '--columns'):
desired_cols = arg
i += 1
if opt in ('-p', '--parsable'):
pretty_print = False
if opt in ('-z', '--zfetch'):
zflag = True
i += 1
argv = sys.argv[i:]
sint = int(argv[0]) if argv else sint
count = int(argv[1]) if len(argv) > 1 else (0 if len(argv) > 0 else 1)
if hflag or (xflag and zflag) or ((zflag or xflag) and desired_cols):
usage()
if vflag:
detailed_usage()
if xflag:
hdr = xhdr
if zflag:
hdr = zhdr
update_hdr_intr()
# check if L2ARC exists
snap_stats()
l2_size = cur.get("l2_size")
if l2_size:
l2exist = True
if desired_cols:
hdr = desired_cols.split(",")
invalid = []
incompat = []
for ele in hdr:
if ele not in cols:
invalid.append(ele)
elif not l2exist and ele.startswith("l2"):
sys.stdout.write("No L2ARC Here\n%s\n" % ele)
incompat.append(ele)
if len(invalid) > 0:
sys.stderr.write("Invalid column definition! -- %s\n" % invalid)
usage()
if len(incompat) > 0:
sys.stderr.write("Incompatible field specified! -- %s\n" %
incompat)
usage()
if aflag:
if l2exist:
hdr = cols.keys()
else:
hdr = [col for col in cols.keys() if not col.startswith("l2")]
if opfile:
try:
out = open(opfile, "w")
sys.stdout = out
except IOError:
sys.stderr.write("Cannot open %s for writing\n" % opfile)
sys.exit(1)
def calculate():
global d
global v
global l2exist
v = dict()
v["time"] = time.strftime("%H:%M:%S", time.localtime())
v["hits"] = d["hits"] / sint
v["iohs"] = d["iohits"] / sint
v["miss"] = d["misses"] / sint
v["read"] = v["hits"] + v["iohs"] + v["miss"]
v["hit%"] = 100 * v["hits"] / v["read"] if v["read"] > 0 else 0
v["ioh%"] = 100 * v["iohs"] / v["read"] if v["read"] > 0 else 0
v["miss%"] = 100 - v["hit%"] - v["ioh%"] if v["read"] > 0 else 0
v["dhit"] = (d["demand_data_hits"] + d["demand_metadata_hits"]) / sint
v["dioh"] = (d["demand_data_iohits"] + d["demand_metadata_iohits"]) / sint
v["dmis"] = (d["demand_data_misses"] + d["demand_metadata_misses"]) / sint
v["dread"] = v["dhit"] + v["dioh"] + v["dmis"]
v["dh%"] = 100 * v["dhit"] / v["dread"] if v["dread"] > 0 else 0
v["di%"] = 100 * v["dioh"] / v["dread"] if v["dread"] > 0 else 0
v["dm%"] = 100 - v["dh%"] - v["di%"] if v["dread"] > 0 else 0
v["ddhit"] = d["demand_data_hits"] / sint
v["ddioh"] = d["demand_data_iohits"] / sint
v["ddmis"] = d["demand_data_misses"] / sint
v["ddread"] = v["ddhit"] + v["ddioh"] + v["ddmis"]
v["ddh%"] = 100 * v["ddhit"] / v["ddread"] if v["ddread"] > 0 else 0
v["ddi%"] = 100 * v["ddioh"] / v["ddread"] if v["ddread"] > 0 else 0
v["ddm%"] = 100 - v["ddh%"] - v["ddi%"] if v["ddread"] > 0 else 0
v["dmhit"] = d["demand_metadata_hits"] / sint
v["dmioh"] = d["demand_metadata_iohits"] / sint
v["dmmis"] = d["demand_metadata_misses"] / sint
v["dmread"] = v["dmhit"] + v["dmioh"] + v["dmmis"]
v["dmh%"] = 100 * v["dmhit"] / v["dmread"] if v["dmread"] > 0 else 0
v["dmi%"] = 100 * v["dmioh"] / v["dmread"] if v["dmread"] > 0 else 0
v["dmm%"] = 100 - v["dmh%"] - v["dmi%"] if v["dmread"] > 0 else 0
v["phit"] = (d["prefetch_data_hits"] + d["prefetch_metadata_hits"]) / sint
v["pioh"] = (d["prefetch_data_iohits"] +
d["prefetch_metadata_iohits"]) / sint
v["pmis"] = (d["prefetch_data_misses"] +
d["prefetch_metadata_misses"]) / sint
v["pread"] = v["phit"] + v["pioh"] + v["pmis"]
v["ph%"] = 100 * v["phit"] / v["pread"] if v["pread"] > 0 else 0
v["pi%"] = 100 * v["pioh"] / v["pread"] if v["pread"] > 0 else 0
v["pm%"] = 100 - v["ph%"] - v["pi%"] if v["pread"] > 0 else 0
v["pdhit"] = d["prefetch_data_hits"] / sint
v["pdioh"] = d["prefetch_data_iohits"] / sint
v["pdmis"] = d["prefetch_data_misses"] / sint
v["pdread"] = v["pdhit"] + v["pdioh"] + v["pdmis"]
v["pdh%"] = 100 * v["pdhit"] / v["pdread"] if v["pdread"] > 0 else 0
v["pdi%"] = 100 * v["pdioh"] / v["pdread"] if v["pdread"] > 0 else 0
v["pdm%"] = 100 - v["pdh%"] - v["pdi%"] if v["pdread"] > 0 else 0
v["pmhit"] = d["prefetch_metadata_hits"] / sint
v["pmioh"] = d["prefetch_metadata_iohits"] / sint
v["pmmis"] = d["prefetch_metadata_misses"] / sint
v["pmread"] = v["pmhit"] + v["pmioh"] + v["pmmis"]
v["pmh%"] = 100 * v["pmhit"] / v["pmread"] if v["pmread"] > 0 else 0
v["pmi%"] = 100 * v["pmioh"] / v["pmread"] if v["pmread"] > 0 else 0
v["pmm%"] = 100 - v["pmh%"] - v["pmi%"] if v["pmread"] > 0 else 0
v["mhit"] = (d["prefetch_metadata_hits"] +
d["demand_metadata_hits"]) / sint
v["mioh"] = (d["prefetch_metadata_iohits"] +
d["demand_metadata_iohits"]) / sint
v["mmis"] = (d["prefetch_metadata_misses"] +
d["demand_metadata_misses"]) / sint
v["mread"] = v["mhit"] + v["mioh"] + v["mmis"]
v["mh%"] = 100 * v["mhit"] / v["mread"] if v["mread"] > 0 else 0
v["mi%"] = 100 * v["mioh"] / v["mread"] if v["mread"] > 0 else 0
v["mm%"] = 100 - v["mh%"] - v["mi%"] if v["mread"] > 0 else 0
v["arcsz"] = cur["size"]
v["size"] = cur["size"]
v["c"] = cur["c"]
v["mfu"] = d["mfu_hits"] / sint
v["mru"] = d["mru_hits"] / sint
v["mrug"] = d["mru_ghost_hits"] / sint
v["mfug"] = d["mfu_ghost_hits"] / sint
v["unc"] = d["uncached_hits"] / sint
v["eskip"] = d["evict_skip"] / sint
v["el2skip"] = d["evict_l2_skip"] / sint
v["el2cach"] = d["evict_l2_cached"] / sint
v["el2el"] = d["evict_l2_eligible"] / sint
v["el2mfu"] = d["evict_l2_eligible_mfu"] / sint
v["el2mru"] = d["evict_l2_eligible_mru"] / sint
v["el2inel"] = d["evict_l2_ineligible"] / sint
v["mtxmis"] = d["mutex_miss"] / sint
v["ztotal"] = (d["zfetch_hits"] + d["zfetch_future"] + d["zfetch_stride"] +
d["zfetch_past"] + d["zfetch_misses"]) / sint
v["zhits"] = d["zfetch_hits"] / sint
v["zahead"] = (d["zfetch_future"] + d["zfetch_stride"]) / sint
v["zpast"] = d["zfetch_past"] / sint
v["zmisses"] = d["zfetch_misses"] / sint
v["zmax"] = d["zfetch_max_streams"] / sint
v["zfuture"] = d["zfetch_future"] / sint
v["zstride"] = d["zfetch_stride"] / sint
v["zissued"] = d["zfetch_io_issued"] / sint
v["zactive"] = d["zfetch_io_active"] / sint
# ARC structural breakdown, ARC types breakdown, ARC states breakdown
v["cachessz"] = cur["caches_size"]
for fs in fieldstats:
fields, stats = fs[0], fs[1:]
for field, fieldval in fields.items():
for group in stats:
for stat, statval in group.items():
if stat in ["fields", "percent"] or \
("fields" in group and field not in group["fields"]):
continue
colname = field + stat
v[colname] = cur[fieldval[0] + statval[0]]
if "percent" in group:
v[colname + "%"] = 100 * v[colname] / \
v[group["percent"]] if v[group["percent"]] > 0 else 0
if l2exist:
v["l2hits"] = d["l2_hits"] / sint
v["l2miss"] = d["l2_misses"] / sint
v["l2read"] = v["l2hits"] + v["l2miss"]
v["l2hit%"] = 100 * v["l2hits"] / v["l2read"] if v["l2read"] > 0 else 0
v["l2miss%"] = 100 - v["l2hit%"] if v["l2read"] > 0 else 0
v["l2asize"] = cur["l2_asize"]
v["l2size"] = cur["l2_size"]
v["l2bytes"] = d["l2_read_bytes"] / sint
v["l2wbytes"] = d["l2_write_bytes"] / sint
v["l2pref"] = cur["l2_prefetch_asize"]
v["l2mfu"] = cur["l2_mfu_asize"]
v["l2mru"] = cur["l2_mru_asize"]
v["l2data"] = cur["l2_bufc_data_asize"]
v["l2meta"] = cur["l2_bufc_metadata_asize"]
v["l2pref%"] = 100 * v["l2pref"] / v["l2asize"]
v["l2mfu%"] = 100 * v["l2mfu"] / v["l2asize"]
v["l2mru%"] = 100 * v["l2mru"] / v["l2asize"]
v["l2data%"] = 100 * v["l2data"] / v["l2asize"]
v["l2meta%"] = 100 * v["l2meta"] / v["l2asize"]
v["grow"] = 0 if cur["arc_no_grow"] else 1
v["need"] = cur["arc_need_free"]
v["free"] = cur["memory_free_bytes"]
v["avail"] = cur["memory_available_bytes"]
v["waste"] = cur["abd_chunk_waste_size"]
def main():
global sint
global count
global hdr_intr
i = 0
count_flag = 0
init()
if count > 0:
count_flag = 1
signal(SIGINT, SIG_DFL)
signal(SIGWINCH, resize_handler)
while True:
if i == 0:
print_header()
snap_stats()
calculate()
print_values()
if count_flag == 1:
if count <= 1:
break
count -= 1
i = 0 if i >= hdr_intr else i + 1
time.sleep(sint)
if out:
out.close()
if __name__ == '__main__':
main()
-1
View File
@@ -1 +0,0 @@
arcstat
-5
View File
@@ -1,5 +0,0 @@
include $(top_srcdir)/config/Substfiles.am
bin_SCRIPTS = arcstat
SUBSTFILES += $(bin_SCRIPTS)
-494
View File
@@ -1,494 +0,0 @@
#!/usr/bin/env @PYTHON_SHEBANG@
#
# Print out ZFS ARC Statistics exported via kstat(1)
# For a definition of fields, or usage, use arcstat -v
#
# This script was originally a fork of the original arcstat.pl (0.1)
# by Neelakanth Nadgir, originally published on his Sun blog on
# 09/18/2007
# http://blogs.sun.com/realneel/entry/zfs_arc_statistics
#
# A new version aimed to improve upon the original by adding features
# and fixing bugs as needed. This version was maintained by Mike
# Harsch and was hosted in a public open source repository:
# http://github.com/mharsch/arcstat
#
# but has since moved to the illumos-gate repository.
#
# This Python port was written by John Hixson for FreeNAS, introduced
# in commit e2c29f:
# https://github.com/freenas/freenas
#
# and has been improved by many people since.
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License, Version 1.0 only
# (the "License"). You may not use this file except in compliance
# with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Fields have a fixed width. Every interval, we fill the "v"
# hash with its corresponding value (v[field]=value) using calculate().
# @hdr is the array of fields that needs to be printed, so we
# just iterate over this array and print the values using our pretty printer.
#
# This script must remain compatible with Python 2.6+ and Python 3.4+.
#
import sys
import time
import getopt
import re
import copy
from signal import signal, SIGINT, SIGWINCH, SIG_DFL
cols = {
# HDR: [Size, Scale, Description]
"time": [8, -1, "Time"],
"hits": [4, 1000, "ARC reads per second"],
"miss": [4, 1000, "ARC misses per second"],
"read": [4, 1000, "Total ARC accesses per second"],
"hit%": [4, 100, "ARC hit percentage"],
"miss%": [5, 100, "ARC miss percentage"],
"dhit": [4, 1000, "Demand hits per second"],
"dmis": [4, 1000, "Demand misses per second"],
"dh%": [3, 100, "Demand hit percentage"],
"dm%": [3, 100, "Demand miss percentage"],
"phit": [4, 1000, "Prefetch hits per second"],
"pmis": [4, 1000, "Prefetch misses per second"],
"ph%": [3, 100, "Prefetch hits percentage"],
"pm%": [3, 100, "Prefetch miss percentage"],
"mhit": [4, 1000, "Metadata hits per second"],
"mmis": [4, 1000, "Metadata misses per second"],
"mread": [5, 1000, "Metadata accesses per second"],
"mh%": [3, 100, "Metadata hit percentage"],
"mm%": [3, 100, "Metadata miss percentage"],
"arcsz": [5, 1024, "ARC size"],
"size": [4, 1024, "ARC size"],
"c": [4, 1024, "ARC target size"],
"mfu": [4, 1000, "MFU list hits per second"],
"mru": [4, 1000, "MRU list hits per second"],
"mfug": [4, 1000, "MFU ghost list hits per second"],
"mrug": [4, 1000, "MRU ghost list hits per second"],
"eskip": [5, 1000, "evict_skip per second"],
"mtxmis": [6, 1000, "mutex_miss per second"],
"dread": [5, 1000, "Demand accesses per second"],
"pread": [5, 1000, "Prefetch accesses per second"],
"l2hits": [6, 1000, "L2ARC hits per second"],
"l2miss": [6, 1000, "L2ARC misses per second"],
"l2read": [6, 1000, "Total L2ARC accesses per second"],
"l2hit%": [6, 100, "L2ARC access hit percentage"],
"l2miss%": [7, 100, "L2ARC access miss percentage"],
"l2asize": [7, 1024, "Actual (compressed) size of the L2ARC"],
"l2size": [6, 1024, "Size of the L2ARC"],
"l2bytes": [7, 1024, "Bytes read per second from the L2ARC"],
"grow": [4, 1000, "ARC grow disabled"],
"need": [4, 1024, "ARC reclaim need"],
"free": [4, 1024, "ARC free memory"],
"avail": [5, 1024, "ARC available memory"],
"waste": [5, 1024, "Wasted memory due to round up to pagesize"],
}
v = {}
hdr = ["time", "read", "miss", "miss%", "dmis", "dm%", "pmis", "pm%", "mmis",
"mm%", "size", "c", "avail"]
xhdr = ["time", "mfu", "mru", "mfug", "mrug", "eskip", "mtxmis", "dread",
"pread", "read"]
sint = 1 # Default interval is 1 second
count = 1 # Default count is 1
hdr_intr = 20 # Print header every 20 lines of output
opfile = None
sep = " " # Default separator is 2 spaces
version = "0.4"
l2exist = False
cmd = ("Usage: arcstat [-hvx] [-f fields] [-o file] [-s string] [interval "
"[count]]\n")
cur = {}
d = {}
out = None
kstat = None
if sys.platform.startswith('freebsd'):
# Requires py27-sysctl on FreeBSD
import sysctl
def kstat_update():
global kstat
k = sysctl.filter('kstat.zfs.misc.arcstats')
if not k:
sys.exit(1)
kstat = {}
for s in k:
if not s:
continue
name, value = s.name, s.value
# Trims 'kstat.zfs.misc.arcstats' from the name
kstat[name[24:]] = int(value)
elif sys.platform.startswith('linux'):
def kstat_update():
global kstat
k = [line.strip() for line in open('/proc/spl/kstat/zfs/arcstats')]
if not k:
sys.exit(1)
del k[0:2]
kstat = {}
for s in k:
if not s:
continue
name, unused, value = s.split()
kstat[name] = int(value)
def detailed_usage():
sys.stderr.write("%s\n" % cmd)
sys.stderr.write("Field definitions are as follows:\n")
for key in cols:
sys.stderr.write("%11s : %s\n" % (key, cols[key][2]))
sys.stderr.write("\n")
sys.exit(0)
def usage():
sys.stderr.write("%s\n" % cmd)
sys.stderr.write("\t -h : Print this help message\n")
sys.stderr.write("\t -v : List all possible field headers and definitions"
"\n")
sys.stderr.write("\t -x : Print extended stats\n")
sys.stderr.write("\t -f : Specify specific fields to print (see -v)\n")
sys.stderr.write("\t -o : Redirect output to the specified file\n")
sys.stderr.write("\t -s : Override default field separator with custom "
"character or string\n")
sys.stderr.write("\nExamples:\n")
sys.stderr.write("\tarcstat -o /tmp/a.log 2 10\n")
sys.stderr.write("\tarcstat -s \",\" -o /tmp/a.log 2 10\n")
sys.stderr.write("\tarcstat -v\n")
sys.stderr.write("\tarcstat -f time,hit%,dh%,ph%,mh% 1\n")
sys.stderr.write("\n")
sys.exit(1)
def snap_stats():
global cur
global kstat
prev = copy.deepcopy(cur)
kstat_update()
cur = kstat
for key in cur:
if re.match(key, "class"):
continue
if key in prev:
d[key] = cur[key] - prev[key]
else:
d[key] = cur[key]
def prettynum(sz, scale, num=0):
suffix = [' ', 'K', 'M', 'G', 'T', 'P', 'E', 'Z']
index = 0
save = 0
# Special case for date field
if scale == -1:
return "%s" % num
# Rounding error, return 0
elif 0 < num < 1:
num = 0
while abs(num) > scale and index < 5:
save = num
num = num / scale
index += 1
if index == 0:
return "%*d" % (sz, num)
if abs(save / scale) < 10:
return "%*.1f%s" % (sz - 1, num, suffix[index])
else:
return "%*d%s" % (sz - 1, num, suffix[index])
def print_values():
global hdr
global sep
global v
sys.stdout.write(sep.join(
prettynum(cols[col][0], cols[col][1], v[col]) for col in hdr))
sys.stdout.write("\n")
sys.stdout.flush()
def print_header():
global hdr
global sep
sys.stdout.write(sep.join("%*s" % (cols[col][0], col) for col in hdr))
sys.stdout.write("\n")
def get_terminal_lines():
try:
import fcntl
import termios
import struct
data = fcntl.ioctl(sys.stdout.fileno(), termios.TIOCGWINSZ, '1234')
sz = struct.unpack('hh', data)
return sz[0]
except Exception:
pass
def update_hdr_intr():
global hdr_intr
lines = get_terminal_lines()
if lines and lines > 3:
hdr_intr = lines - 3
def resize_handler(signum, frame):
update_hdr_intr()
def init():
global sint
global count
global hdr
global xhdr
global opfile
global sep
global out
global l2exist
desired_cols = None
xflag = False
hflag = False
vflag = False
i = 1
try:
opts, args = getopt.getopt(
sys.argv[1:],
"xo:hvs:f:",
[
"extended",
"outfile",
"help",
"verbose",
"separator",
"columns"
]
)
except getopt.error as msg:
sys.stderr.write("Error: %s\n" % str(msg))
usage()
opts = None
for opt, arg in opts:
if opt in ('-x', '--extended'):
xflag = True
if opt in ('-o', '--outfile'):
opfile = arg
i += 1
if opt in ('-h', '--help'):
hflag = True
if opt in ('-v', '--verbose'):
vflag = True
if opt in ('-s', '--separator'):
sep = arg
i += 1
if opt in ('-f', '--columns'):
desired_cols = arg
i += 1
i += 1
argv = sys.argv[i:]
sint = int(argv[0]) if argv else sint
count = int(argv[1]) if len(argv) > 1 else (0 if len(argv) > 0 else 1)
if hflag or (xflag and desired_cols):
usage()
if vflag:
detailed_usage()
if xflag:
hdr = xhdr
update_hdr_intr()
# check if L2ARC exists
snap_stats()
l2_size = cur.get("l2_size")
if l2_size:
l2exist = True
if desired_cols:
hdr = desired_cols.split(",")
invalid = []
incompat = []
for ele in hdr:
if ele not in cols:
invalid.append(ele)
elif not l2exist and ele.startswith("l2"):
sys.stdout.write("No L2ARC Here\n%s\n" % ele)
incompat.append(ele)
if len(invalid) > 0:
sys.stderr.write("Invalid column definition! -- %s\n" % invalid)
usage()
if len(incompat) > 0:
sys.stderr.write("Incompatible field specified! -- %s\n" %
incompat)
usage()
if opfile:
try:
out = open(opfile, "w")
sys.stdout = out
except IOError:
sys.stderr.write("Cannot open %s for writing\n" % opfile)
sys.exit(1)
def calculate():
global d
global v
global l2exist
v = dict()
v["time"] = time.strftime("%H:%M:%S", time.localtime())
v["hits"] = d["hits"] / sint
v["miss"] = d["misses"] / sint
v["read"] = v["hits"] + v["miss"]
v["hit%"] = 100 * v["hits"] / v["read"] if v["read"] > 0 else 0
v["miss%"] = 100 - v["hit%"] if v["read"] > 0 else 0
v["dhit"] = (d["demand_data_hits"] + d["demand_metadata_hits"]) / sint
v["dmis"] = (d["demand_data_misses"] + d["demand_metadata_misses"]) / sint
v["dread"] = v["dhit"] + v["dmis"]
v["dh%"] = 100 * v["dhit"] / v["dread"] if v["dread"] > 0 else 0
v["dm%"] = 100 - v["dh%"] if v["dread"] > 0 else 0
v["phit"] = (d["prefetch_data_hits"] + d["prefetch_metadata_hits"]) / sint
v["pmis"] = (d["prefetch_data_misses"] +
d["prefetch_metadata_misses"]) / sint
v["pread"] = v["phit"] + v["pmis"]
v["ph%"] = 100 * v["phit"] / v["pread"] if v["pread"] > 0 else 0
v["pm%"] = 100 - v["ph%"] if v["pread"] > 0 else 0
v["mhit"] = (d["prefetch_metadata_hits"] +
d["demand_metadata_hits"]) / sint
v["mmis"] = (d["prefetch_metadata_misses"] +
d["demand_metadata_misses"]) / sint
v["mread"] = v["mhit"] + v["mmis"]
v["mh%"] = 100 * v["mhit"] / v["mread"] if v["mread"] > 0 else 0
v["mm%"] = 100 - v["mh%"] if v["mread"] > 0 else 0
v["arcsz"] = cur["size"]
v["size"] = cur["size"]
v["c"] = cur["c"]
v["mfu"] = d["mfu_hits"] / sint
v["mru"] = d["mru_hits"] / sint
v["mrug"] = d["mru_ghost_hits"] / sint
v["mfug"] = d["mfu_ghost_hits"] / sint
v["eskip"] = d["evict_skip"] / sint
v["mtxmis"] = d["mutex_miss"] / sint
if l2exist:
v["l2hits"] = d["l2_hits"] / sint
v["l2miss"] = d["l2_misses"] / sint
v["l2read"] = v["l2hits"] + v["l2miss"]
v["l2hit%"] = 100 * v["l2hits"] / v["l2read"] if v["l2read"] > 0 else 0
v["l2miss%"] = 100 - v["l2hit%"] if v["l2read"] > 0 else 0
v["l2asize"] = cur["l2_asize"]
v["l2size"] = cur["l2_size"]
v["l2bytes"] = d["l2_read_bytes"] / sint
v["grow"] = 0 if cur["arc_no_grow"] else 1
v["need"] = cur["arc_need_free"]
v["free"] = cur["memory_free_bytes"]
v["avail"] = cur["memory_available_bytes"]
v["waste"] = cur["abd_chunk_waste_size"]
def main():
global sint
global count
global hdr_intr
i = 0
count_flag = 0
init()
if count > 0:
count_flag = 1
signal(SIGINT, SIG_DFL)
signal(SIGWINCH, resize_handler)
while True:
if i == 0:
print_header()
snap_stats()
calculate()
print_values()
if count_flag == 1:
if count <= 1:
break
count -= 1
i = 0 if i >= hdr_intr else i + 1
time.sleep(sint)
if out:
out.close()
if __name__ == '__main__':
main()
+31 -14
View File
@@ -12,7 +12,7 @@
# with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
@@ -27,7 +27,7 @@
# Copyright (C) 2013 Lawrence Livermore National Security, LLC.
# Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
#
# This script must remain compatible with Python 2.6+ and Python 3.4+.
# This script must remain compatible with and Python 3.6+.
#
import sys
@@ -37,7 +37,7 @@ import re
bhdr = ["pool", "objset", "object", "level", "blkid", "offset", "dbsize"]
bxhdr = ["pool", "objset", "object", "level", "blkid", "offset", "dbsize",
"meta", "state", "dbholds", "dbc", "list", "atype", "flags",
"usize", "meta", "state", "dbholds", "dbc", "list", "atype", "flags",
"count", "asize", "access", "mru", "gmru", "mfu", "gmfu", "l2",
"l2_dattr", "l2_asize", "l2_comp", "aholds", "dtype", "btype",
"data_bs", "meta_bs", "bsize", "lvls", "dholds", "blocks", "dsize"]
@@ -47,17 +47,17 @@ dhdr = ["pool", "objset", "object", "dtype", "cached"]
dxhdr = ["pool", "objset", "object", "dtype", "btype", "data_bs", "meta_bs",
"bsize", "lvls", "dholds", "blocks", "dsize", "cached", "direct",
"indirect", "bonus", "spill"]
dincompat = ["level", "blkid", "offset", "dbsize", "meta", "state", "dbholds",
"dbc", "list", "atype", "flags", "count", "asize", "access",
"mru", "gmru", "mfu", "gmfu", "l2", "l2_dattr", "l2_asize",
"l2_comp", "aholds"]
dincompat = ["level", "blkid", "offset", "dbsize", "usize", "meta", "state",
"dbholds", "dbc", "list", "atype", "flags", "count", "asize",
"access", "mru", "gmru", "mfu", "gmfu", "l2", "l2_dattr",
"l2_asize", "l2_comp", "aholds"]
thdr = ["pool", "objset", "dtype", "cached"]
txhdr = ["pool", "objset", "dtype", "cached", "direct", "indirect",
"bonus", "spill"]
tincompat = ["object", "level", "blkid", "offset", "dbsize", "meta", "state",
"dbc", "dbholds", "list", "atype", "flags", "count", "asize",
"access", "mru", "gmru", "mfu", "gmfu", "l2", "l2_dattr",
tincompat = ["object", "level", "blkid", "offset", "dbsize", "usize", "meta",
"state", "dbc", "dbholds", "list", "atype", "flags", "count",
"asize", "access", "mru", "gmru", "mfu", "gmfu", "l2", "l2_dattr",
"l2_asize", "l2_comp", "aholds", "btype", "data_bs", "meta_bs",
"bsize", "lvls", "dholds", "blocks", "dsize"]
@@ -70,6 +70,7 @@ cols = {
"blkid": [8, -1, "block number of buffer"],
"offset": [12, 1024, "offset in object of buffer"],
"dbsize": [7, 1024, "size of buffer"],
"usize": [7, 1024, "size of attached user data"],
"meta": [4, -1, "is this buffer metadata?"],
"state": [5, -1, "state of buffer (read, cached, etc)"],
"dbholds": [7, 1000, "number of holds on buffer"],
@@ -113,10 +114,25 @@ cmd = ("Usage: dbufstat [-bdhnrtvx] [-i file] [-f fields] [-o file] "
raw = 0
if sys.platform.startswith("freebsd"):
import io
# Requires py-sysctl on FreeBSD
import sysctl
def default_ifile():
dbufs = sysctl.filter("kstat.zfs.misc.dbufs")[0].value
sys.stdin = io.StringIO(dbufs)
return "-"
elif sys.platform.startswith("linux"):
def default_ifile():
return "/proc/spl/kstat/zfs/dbufs"
def print_incompat_helper(incompat):
cnt = 0
for key in sorted(incompat):
if cnt is 0:
if cnt == 0:
sys.stderr.write("\t")
elif cnt > 8:
sys.stderr.write(",\n\t")
@@ -384,6 +400,7 @@ def update_dict(d, k, line, labels):
key = line[labels[k]]
dbsize = int(line[labels['dbsize']])
usize = int(line[labels['usize']])
blkid = int(line[labels['blkid']])
level = int(line[labels['level']])
@@ -401,7 +418,7 @@ def update_dict(d, k, line, labels):
d[pool][objset][key]['indirect'] = 0
d[pool][objset][key]['spill'] = 0
d[pool][objset][key]['cached'] += dbsize
d[pool][objset][key]['cached'] += dbsize + usize
if blkid == -1:
d[pool][objset][key]['bonus'] += dbsize
@@ -645,9 +662,9 @@ def main():
sys.exit(1)
if not ifile:
ifile = '/proc/spl/kstat/zfs/dbufs'
ifile = default_ifile()
if ifile is not "-":
if ifile != "-":
try:
tmp = open(ifile, "r")
sys.stdin = tmp
-1
View File
@@ -1 +0,0 @@
dbufstat
-5
View File
@@ -1,5 +0,0 @@
include $(top_srcdir)/config/Substfiles.am
bin_SCRIPTS = dbufstat
SUBSTFILES += $(bin_SCRIPTS)
+44
View File
@@ -0,0 +1,44 @@
#!/bin/sh
#
# fsck.zfs: A fsck helper to accommodate distributions that expect
# to be able to execute a fsck on all filesystem types.
#
# This script simply bubbles up some already-known-about errors,
# see fsck.zfs(8)
#
if [ $# -eq 0 ]; then
echo "Usage: $0 [options] dataset…" >&2
exit 16
fi
ret=0
for dataset; do
case "$dataset" in
-*)
continue
;;
*)
;;
esac
pool="${dataset%%/*}"
case "$(@sbindir@/zpool list -Ho health "$pool")" in
DEGRADED)
ret=$(( ret | 4 ))
;;
FAULTED)
awk '!/^([[:space:]]*#.*)?$/ && $1 == "'"$dataset"'" && $3 == "zfs" {exit 1}' /etc/fstab || \
ret=$(( ret | 8 ))
;;
"")
# Pool not found, error printed by zpool(8)
ret=$(( ret | 8 ))
;;
*)
;;
esac
done
exit "$ret"
-1
View File
@@ -1 +0,0 @@
dist_sbin_SCRIPTS = fsck.zfs
-9
View File
@@ -1,9 +0,0 @@
#!/bin/sh
#
# fsck.zfs: A fsck helper to accommodate distributions that expect
# to be able to execute a fsck on all filesystem types. Currently
# this script does nothing but it could be extended to act as a
# compatibility wrapper for 'zpool scrub'.
#
exit 0
+87 -86
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -43,67 +43,45 @@
libzfs_handle_t *g_zfs;
/*
* Return the pool/dataset to mount given the name passed to mount. This
* is expected to be of the form pool/dataset, however may also refer to
* a block device if that device contains a valid zfs label.
* Opportunistically convert a target string into a pool name. If the
* string does not represent a block device with a valid zfs label
* then it is passed through without modification.
*/
static char *
parse_dataset(char *dataset)
static void
parse_dataset(const char *target, char **dataset)
{
/*
* Prior to util-linux 2.36.2, if a file or directory in the
* current working directory was named 'dataset' then mount(8)
* would prepend the current working directory to the dataset.
* Check for it and strip the prepended path when it is added.
*/
char cwd[PATH_MAX];
struct stat64 statbuf;
int error;
int len;
/*
* We expect a pool/dataset to be provided, however if we're
* given a device which is a member of a zpool we attempt to
* extract the pool name stored in the label. Given the pool
* name we can mount the root dataset.
*/
error = stat64(dataset, &statbuf);
if (error == 0) {
nvlist_t *config;
char *name;
int fd;
fd = open(dataset, O_RDONLY);
if (fd < 0)
goto out;
error = zpool_read_label(fd, &config, NULL);
(void) close(fd);
if (error)
goto out;
error = nvlist_lookup_string(config,
ZPOOL_CONFIG_POOL_NAME, &name);
if (error) {
nvlist_free(config);
} else {
dataset = strdup(name);
nvlist_free(config);
return (dataset);
}
if (getcwd(cwd, PATH_MAX) == NULL) {
perror("getcwd");
return;
}
out:
/*
* If a file or directory in your current working directory is
* named 'dataset' then mount(8) will prepend your current working
* directory to the dataset. There is no way to prevent this
* behavior so we simply check for it and strip the prepended
* patch when it is added.
*/
if (getcwd(cwd, PATH_MAX) == NULL)
return (dataset);
int len = strlen(cwd);
if (strncmp(cwd, target, len) == 0)
target += len;
len = strlen(cwd);
/* Assume pool/dataset is more likely */
strlcpy(*dataset, target, PATH_MAX);
/* Do not add one when cwd already ends in a trailing '/' */
if (strncmp(cwd, dataset, len) == 0)
return (dataset + len + (cwd[len-1] != '/'));
int fd = open(target, O_RDONLY | O_CLOEXEC);
if (fd < 0)
return;
return (dataset);
nvlist_t *cfg = NULL;
if (zpool_read_label(fd, &cfg, NULL) == 0) {
const char *nm = NULL;
if (!nvlist_lookup_string(cfg, ZPOOL_CONFIG_POOL_NAME, &nm))
strlcpy(*dataset, nm, PATH_MAX);
nvlist_free(cfg);
}
if (close(fd))
perror("close");
}
/*
@@ -130,25 +108,26 @@ mtab_is_writeable(void)
}
static int
mtab_update(char *dataset, char *mntpoint, char *type, char *mntopts)
mtab_update(const char *dataset, const char *mntpoint, const char *type,
const char *mntopts)
{
struct mntent mnt;
FILE *fp;
int error;
mnt.mnt_fsname = dataset;
mnt.mnt_dir = mntpoint;
mnt.mnt_type = type;
mnt.mnt_opts = mntopts ? mntopts : "";
mnt.mnt_fsname = (char *)dataset;
mnt.mnt_dir = (char *)mntpoint;
mnt.mnt_type = (char *)type;
mnt.mnt_opts = (char *)(mntopts ?: "");
mnt.mnt_freq = 0;
mnt.mnt_passno = 0;
fp = setmntent("/etc/mtab", "a+");
fp = setmntent("/etc/mtab", "a+e");
if (!fp) {
(void) fprintf(stderr, gettext(
"filesystem '%s' was mounted, but /etc/mtab "
"could not be opened due to error %d\n"),
dataset, errno);
"could not be opened due to error: %s\n"),
dataset, strerror(errno));
return (MOUNT_FILEIO);
}
@@ -156,8 +135,8 @@ mtab_update(char *dataset, char *mntpoint, char *type, char *mntopts)
if (error) {
(void) fprintf(stderr, gettext(
"filesystem '%s' was mounted, but /etc/mtab "
"could not be updated due to error %d\n"),
dataset, errno);
"could not be updated due to error: %s\n"),
dataset, strerror(errno));
return (MOUNT_FILEIO);
}
@@ -176,12 +155,13 @@ main(int argc, char **argv)
char badopt[MNT_LINE_MAX] = { '\0' };
char mtabopt[MNT_LINE_MAX] = { '\0' };
char mntpoint[PATH_MAX];
char *dataset;
char dataset[PATH_MAX], *pdataset = dataset;
unsigned long mntflags = 0, zfsflags = 0, remount = 0;
int sloppy = 0, fake = 0, verbose = 0, nomtab = 0, zfsutil = 0;
int error, c;
(void) setlocale(LC_ALL, "");
(void) setlocale(LC_NUMERIC, "C");
(void) textdomain(TEXT_DOMAIN);
opterr = 0;
@@ -206,10 +186,11 @@ main(int argc, char **argv)
break;
case 'h':
case '?':
(void) fprintf(stderr, gettext("Invalid option '%c'\n"),
optopt);
if (optopt)
(void) fprintf(stderr,
gettext("Invalid option '%c'\n"), optopt);
(void) fprintf(stderr, gettext("Usage: mount.zfs "
"[-sfnv] [-o options] <dataset> <mountpoint>\n"));
"[-sfnvh] [-o options] <dataset> <mountpoint>\n"));
return (MOUNT_USAGE);
}
}
@@ -231,13 +212,13 @@ main(int argc, char **argv)
return (MOUNT_USAGE);
}
dataset = parse_dataset(argv[0]);
parse_dataset(argv[0], &pdataset);
/* canonicalize the mount point */
if (realpath(argv[1], mntpoint) == NULL) {
(void) fprintf(stderr, gettext("filesystem '%s' cannot be "
"mounted at '%s' due to canonicalization error %d.\n"),
dataset, argv[1], errno);
"mounted at '%s' due to canonicalization error: %s\n"),
dataset, argv[1], strerror(errno));
return (MOUNT_SYSERR);
}
@@ -266,13 +247,6 @@ main(int argc, char **argv)
}
}
if (verbose)
(void) fprintf(stdout, gettext("mount.zfs:\n"
" dataset: \"%s\"\n mountpoint: \"%s\"\n"
" mountflags: 0x%lx\n zfsflags: 0x%lx\n"
" mountopts: \"%s\"\n mtabopts: \"%s\"\n"),
dataset, mntpoint, mntflags, zfsflags, mntopts, mtabopt);
if (mntflags & MS_REMOUNT) {
nomtab = 1;
remount = 1;
@@ -295,7 +269,9 @@ main(int argc, char **argv)
return (MOUNT_USAGE);
}
zfs_adjust_mount_options(zhp, mntpoint, mntopts, mtabopt);
if (sloppy || libzfs_envvar_is_set("ZFS_MOUNT_HELPER")) {
zfs_adjust_mount_options(zhp, mntpoint, mntopts, mtabopt);
}
/* treat all snapshots as legacy mount points */
if (zfs_get_type(zhp) == ZFS_TYPE_SNAPSHOT)
@@ -313,12 +289,11 @@ main(int argc, char **argv)
if (zfs_version == 0) {
fprintf(stderr, gettext("unable to fetch "
"ZFS version for filesystem '%s'\n"), dataset);
zfs_close(zhp);
libzfs_fini(g_zfs);
return (MOUNT_SYSERR);
}
zfs_close(zhp);
libzfs_fini(g_zfs);
/*
* Legacy mount points may only be mounted using 'mount', never using
* 'zfs mount'. However, since 'zfs mount' actually invokes 'mount'
@@ -336,6 +311,8 @@ main(int argc, char **argv)
"Use 'zfs set mountpoint=%s' or 'mount -t zfs %s %s'.\n"
"See zfs(8) for more information.\n"),
dataset, mntpoint, dataset, mntpoint);
zfs_close(zhp);
libzfs_fini(g_zfs);
return (MOUNT_USAGE);
}
@@ -346,14 +323,38 @@ main(int argc, char **argv)
"Use 'zfs set mountpoint=%s' or 'zfs mount %s'.\n"
"See zfs(8) for more information.\n"),
dataset, "legacy", dataset);
zfs_close(zhp);
libzfs_fini(g_zfs);
return (MOUNT_USAGE);
}
if (verbose)
(void) fprintf(stdout, gettext("mount.zfs:\n"
" dataset: \"%s\"\n mountpoint: \"%s\"\n"
" mountflags: 0x%lx\n zfsflags: 0x%lx\n"
" mountopts: \"%s\"\n mtabopts: \"%s\"\n"),
dataset, mntpoint, mntflags, zfsflags, mntopts, mtabopt);
if (!fake) {
error = mount(dataset, mntpoint, MNTTYPE_ZFS,
mntflags, mntopts);
if (!remount && !sloppy &&
!libzfs_envvar_is_set("ZFS_MOUNT_HELPER")) {
error = zfs_mount_at(zhp, mntopts, mntflags, mntpoint);
if (error) {
(void) fprintf(stderr, "zfs_mount_at() failed: "
"%s", libzfs_error_description(g_zfs));
zfs_close(zhp);
libzfs_fini(g_zfs);
return (MOUNT_SYSERR);
}
} else {
error = mount(dataset, mntpoint, MNTTYPE_ZFS,
mntflags, mntopts);
}
}
zfs_close(zhp);
libzfs_fini(g_zfs);
if (error) {
switch (errno) {
case ENOENT:
@@ -388,8 +389,8 @@ main(int argc, char **argv)
"mount the filesystem again.\n"), dataset);
return (MOUNT_SYSERR);
}
/* fallthru */
#endif
zfs_fallthrough;
default:
(void) fprintf(stderr, gettext("filesystem "
"'%s' can not be mounted: %s\n"), dataset,
-1
View File
@@ -1 +0,0 @@
mount.zfs
-20
View File
@@ -1,20 +0,0 @@
include $(top_srcdir)/config/Rules.am
#
# Ignore the prefix for the mount helper. It must be installed in /sbin/
# because this path is hardcoded in the mount(8) for security reasons.
# However, if needed, the configure option --with-mounthelperdir= can be used
# to override the default install location.
#
sbindir=$(mounthelperdir)
sbin_PROGRAMS = mount.zfs
mount_zfs_SOURCES = \
mount_zfs.c
mount_zfs_LDADD = \
$(abs_top_builddir)/lib/libzfs/libzfs.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la \
$(abs_top_builddir)/lib/libnvpair/libnvpair.la
mount_zfs_LDADD += $(LTLIBINTL)
-1
View File
@@ -1 +0,0 @@
/raidz_test
+9 -13
View File
@@ -1,20 +1,16 @@
include $(top_srcdir)/config/Rules.am
raidz_test_CFLAGS = $(AM_CFLAGS) $(KERNEL_CFLAGS)
raidz_test_CPPFLAGS = $(AM_CPPFLAGS) $(LIBZPOOL_CPPFLAGS)
# Includes kernel code, generate warnings for large stack frames
AM_CFLAGS += $(FRAME_LARGER_THAN)
# Unconditionally enable ASSERTs
AM_CPPFLAGS += -DDEBUG -UNDEBUG -DZFS_DEBUG
bin_PROGRAMS = raidz_test
bin_PROGRAMS += raidz_test
CPPCHECKTARGETS += raidz_test
raidz_test_SOURCES = \
raidz_test.h \
raidz_test.c \
raidz_bench.c
%D%/raidz_bench.c \
%D%/raidz_test.c \
%D%/raidz_test.h
raidz_test_LDADD = \
$(abs_top_builddir)/lib/libzpool/libzpool.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la
libzpool.la \
libzfs_core.la
raidz_test_LDADD += -lm
+23 -8
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -31,8 +31,6 @@
#include <sys/vdev_raidz_impl.h>
#include <stdio.h>
#include <sys/time.h>
#include "raidz_test.h"
#define GEN_BENCH_MEMORY (((uint64_t)1ULL)<<32)
@@ -65,7 +63,7 @@ bench_fini_raidz_maps(void)
{
/* tear down golden zio */
raidz_free(zio_bench.io_abd, max_data_size);
bzero(&zio_bench, sizeof (zio_t));
memset(&zio_bench, 0, sizeof (zio_t));
}
static inline void
@@ -83,8 +81,17 @@ run_gen_bench_impl(const char *impl)
/* create suitable raidz_map */
ncols = rto_opts.rto_dcols + fn + 1;
zio_bench.io_size = 1ULL << ds;
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, fn+1);
if (rto_opts.rto_expand) {
rm_bench = vdev_raidz_map_alloc_expanded(
&zio_bench,
rto_opts.rto_ashift, ncols+1, ncols,
fn+1, rto_opts.rto_expand_offset,
0, B_FALSE);
} else {
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, fn+1);
}
/* estimate iteration count */
iter_cnt = GEN_BENCH_MEMORY;
@@ -163,8 +170,16 @@ run_rec_bench_impl(const char *impl)
(1ULL << BENCH_ASHIFT))
continue;
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, PARITY_PQR);
if (rto_opts.rto_expand) {
rm_bench = vdev_raidz_map_alloc_expanded(
&zio_bench,
BENCH_ASHIFT, ncols+1, ncols,
PARITY_PQR,
rto_opts.rto_expand_offset, 0, B_FALSE);
} else {
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, PARITY_PQR);
}
/* estimate iteration count */
iter_cnt = (REC_BENCH_MEMORY);
+127 -69
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -37,11 +37,11 @@
static int *rand_data;
raidz_test_opts_t rto_opts;
static char gdb[256];
static const char gdb_tmpl[] = "gdb -ex \"set pagination 0\" -p %d";
static char pid_s[16];
static void sig_handler(int signo)
{
int old_errno = errno;
struct sigaction action;
/*
* Restore default action and re-raise signal so SIGSEGV and
@@ -52,22 +52,32 @@ static void sig_handler(int signo)
action.sa_flags = 0;
(void) sigaction(signo, &action, NULL);
if (rto_opts.rto_gdb)
if (system(gdb)) { }
if (rto_opts.rto_gdb) {
pid_t pid = fork();
if (pid == 0) {
execlp("gdb", "gdb", "-ex", "set pagination 0",
"-p", pid_s, NULL);
_exit(-1);
} else if (pid > 0)
while (waitpid(pid, NULL, 0) == -1 && errno == EINTR)
;
}
raise(signo);
errno = old_errno;
}
static void print_opts(raidz_test_opts_t *opts, boolean_t force)
{
char *verbose;
const char *verbose;
switch (opts->rto_v) {
case 0:
case D_ALL:
verbose = "no";
break;
case 1:
case D_INFO:
verbose = "info";
break;
case D_DEBUG:
default:
verbose = "debug";
break;
@@ -77,16 +87,20 @@ static void print_opts(raidz_test_opts_t *opts, boolean_t force)
(void) fprintf(stdout, DBLSEP "Running with options:\n"
" (-a) zio ashift : %zu\n"
" (-o) zio offset : 1 << %zu\n"
" (-e) expanded map : %s\n"
" (-r) reflow offset : %llx\n"
" (-d) number of raidz data columns : %zu\n"
" (-s) size of DATA : 1 << %zu\n"
" (-S) sweep parameters : %s \n"
" (-v) verbose : %s \n\n",
opts->rto_ashift, /* -a */
ilog2(opts->rto_offset), /* -o */
opts->rto_dcols, /* -d */
ilog2(opts->rto_dsize), /* -s */
opts->rto_sweep ? "yes" : "no", /* -S */
verbose); /* -v */
opts->rto_ashift, /* -a */
ilog2(opts->rto_offset), /* -o */
opts->rto_expand ? "yes" : "no", /* -e */
(u_longlong_t)opts->rto_expand_offset, /* -r */
opts->rto_dcols, /* -d */
ilog2(opts->rto_dsize), /* -s */
opts->rto_sweep ? "yes" : "no", /* -S */
verbose); /* -v */
}
}
@@ -104,7 +118,9 @@ static void usage(boolean_t requested)
"\t[-S parameter sweep (default: %s)]\n"
"\t[-t timeout for parameter sweep test]\n"
"\t[-B benchmark all raidz implementations]\n"
"\t[-v increase verbosity (default: %zu)]\n"
"\t[-e use expanded raidz map (default: %s)]\n"
"\t[-r expanded raidz map reflow offset (default: %llx)]\n"
"\t[-v increase verbosity (default: %d)]\n"
"\t[-h (print help)]\n"
"\t[-T test the test, see if failure would be detected]\n"
"\t[-D debug (attach gdb on SIGSEGV)]\n"
@@ -114,7 +130,9 @@ static void usage(boolean_t requested)
o->rto_dcols, /* -d */
ilog2(o->rto_dsize), /* -s */
rto_opts.rto_sweep ? "yes" : "no", /* -S */
o->rto_v); /* -d */
rto_opts.rto_expand ? "yes" : "no", /* -e */
(u_longlong_t)o->rto_expand_offset, /* -r */
o->rto_v); /* -v */
exit(requested ? 0 : 1);
}
@@ -123,19 +141,22 @@ static void process_options(int argc, char **argv)
{
size_t value;
int opt;
raidz_test_opts_t *o = &rto_opts;
bcopy(&rto_opts_defaults, o, sizeof (*o));
while ((opt = getopt(argc, argv, "TDBSvha:o:d:s:t:")) != -1) {
value = 0;
memcpy(o, &rto_opts_defaults, sizeof (*o));
while ((opt = getopt(argc, argv, "TDBSvha:er:o:d:s:t:")) != -1) {
switch (opt) {
case 'a':
value = strtoull(optarg, NULL, 0);
o->rto_ashift = MIN(13, MAX(9, value));
break;
case 'e':
o->rto_expand = 1;
break;
case 'r':
o->rto_expand_offset = strtoull(optarg, NULL, 0);
break;
case 'o':
value = strtoull(optarg, NULL, 0);
o->rto_offset = ((1ULL << MIN(12, value)) >> 9) << 9;
@@ -179,25 +200,34 @@ static void process_options(int argc, char **argv)
}
}
#define DATA_COL(rm, i) ((rm)->rm_col[raidz_parity(rm) + (i)].rc_abd)
#define DATA_COL_SIZE(rm, i) ((rm)->rm_col[raidz_parity(rm) + (i)].rc_size)
#define DATA_COL(rr, i) ((rr)->rr_col[rr->rr_firstdatacol + (i)].rc_abd)
#define DATA_COL_SIZE(rr, i) ((rr)->rr_col[rr->rr_firstdatacol + (i)].rc_size)
#define CODE_COL(rm, i) ((rm)->rm_col[(i)].rc_abd)
#define CODE_COL_SIZE(rm, i) ((rm)->rm_col[(i)].rc_size)
#define CODE_COL(rr, i) ((rr)->rr_col[(i)].rc_abd)
#define CODE_COL_SIZE(rr, i) ((rr)->rr_col[(i)].rc_size)
static int
cmp_code(raidz_test_opts_t *opts, const raidz_map_t *rm, const int parity)
{
int i, ret = 0;
int r, i, ret = 0;
VERIFY(parity >= 1 && parity <= 3);
for (i = 0; i < parity; i++) {
if (abd_cmp(CODE_COL(rm, i), CODE_COL(opts->rm_golden, i))
!= 0) {
ret++;
LOG_OPT(D_DEBUG, opts,
"\nParity block [%d] different!\n", i);
for (r = 0; r < rm->rm_nrows; r++) {
raidz_row_t * const rr = rm->rm_row[r];
raidz_row_t * const rrg = opts->rm_golden->rm_row[r];
for (i = 0; i < parity; i++) {
if (CODE_COL_SIZE(rrg, i) == 0) {
VERIFY0(CODE_COL_SIZE(rr, i));
continue;
}
if (abd_cmp(CODE_COL(rr, i),
CODE_COL(rrg, i)) != 0) {
ret++;
LOG_OPT(D_DEBUG, opts,
"\nParity block [%d] different!\n", i);
}
}
}
return (ret);
@@ -206,16 +236,26 @@ cmp_code(raidz_test_opts_t *opts, const raidz_map_t *rm, const int parity)
static int
cmp_data(raidz_test_opts_t *opts, raidz_map_t *rm)
{
int i, ret = 0;
int dcols = opts->rm_golden->rm_cols - raidz_parity(opts->rm_golden);
int r, i, dcols, ret = 0;
for (i = 0; i < dcols; i++) {
if (abd_cmp(DATA_COL(opts->rm_golden, i), DATA_COL(rm, i))
!= 0) {
ret++;
for (r = 0; r < rm->rm_nrows; r++) {
raidz_row_t *rr = rm->rm_row[r];
raidz_row_t *rrg = opts->rm_golden->rm_row[r];
dcols = opts->rm_golden->rm_row[0]->rr_cols -
raidz_parity(opts->rm_golden);
for (i = 0; i < dcols; i++) {
if (DATA_COL_SIZE(rrg, i) == 0) {
VERIFY0(DATA_COL_SIZE(rr, i));
continue;
}
LOG_OPT(D_DEBUG, opts,
"\nData block [%d] different!\n", i);
if (abd_cmp(DATA_COL(rrg, i),
DATA_COL(rr, i)) != 0) {
ret++;
LOG_OPT(D_DEBUG, opts,
"\nData block [%d] different!\n", i);
}
}
}
return (ret);
@@ -224,24 +264,21 @@ cmp_data(raidz_test_opts_t *opts, raidz_map_t *rm)
static int
init_rand(void *data, size_t size, void *private)
{
int i;
int *dst = (int *)data;
for (i = 0; i < size / sizeof (int); i++)
dst[i] = rand_data[i];
(void) private;
memcpy(data, rand_data, size);
return (0);
}
static void
corrupt_colums(raidz_map_t *rm, const int *tgts, const int cnt)
{
int i;
raidz_col_t *col;
for (i = 0; i < cnt; i++) {
col = &rm->rm_col[tgts[i]];
abd_iterate_func(col->rc_abd, 0, col->rc_size, init_rand, NULL);
for (int r = 0; r < rm->rm_nrows; r++) {
raidz_row_t *rr = rm->rm_row[r];
for (int i = 0; i < cnt; i++) {
raidz_col_t *col = &rr->rr_col[tgts[i]];
abd_iterate_func(col->rc_abd, 0, col->rc_size,
init_rand, NULL);
}
}
}
@@ -288,10 +325,20 @@ init_raidz_golden_map(raidz_test_opts_t *opts, const int parity)
VERIFY0(vdev_raidz_impl_set("original"));
opts->rm_golden = vdev_raidz_map_alloc(opts->zio_golden,
opts->rto_ashift, total_ncols, parity);
rm_test = vdev_raidz_map_alloc(zio_test,
opts->rto_ashift, total_ncols, parity);
if (opts->rto_expand) {
opts->rm_golden =
vdev_raidz_map_alloc_expanded(opts->zio_golden,
opts->rto_ashift, total_ncols+1, total_ncols,
parity, opts->rto_expand_offset, 0, B_FALSE);
rm_test = vdev_raidz_map_alloc_expanded(zio_test,
opts->rto_ashift, total_ncols+1, total_ncols,
parity, opts->rto_expand_offset, 0, B_FALSE);
} else {
opts->rm_golden = vdev_raidz_map_alloc(opts->zio_golden,
opts->rto_ashift, total_ncols, parity);
rm_test = vdev_raidz_map_alloc(zio_test,
opts->rto_ashift, total_ncols, parity);
}
VERIFY(opts->zio_golden);
VERIFY(opts->rm_golden);
@@ -330,8 +377,14 @@ init_raidz_map(raidz_test_opts_t *opts, zio_t **zio, const int parity)
(*zio)->io_abd = raidz_alloc(alloc_dsize);
init_zio_abd(*zio);
rm = vdev_raidz_map_alloc(*zio, opts->rto_ashift,
total_ncols, parity);
if (opts->rto_expand) {
rm = vdev_raidz_map_alloc_expanded(*zio,
opts->rto_ashift, total_ncols+1, total_ncols,
parity, opts->rto_expand_offset, 0, B_FALSE);
} else {
rm = vdev_raidz_map_alloc(*zio, opts->rto_ashift,
total_ncols, parity);
}
VERIFY(rm);
/* Make sure code columns are destroyed */
@@ -420,7 +473,7 @@ run_rec_check_impl(raidz_test_opts_t *opts, raidz_map_t *rm, const int fn)
if (fn < RAIDZ_REC_PQ) {
/* can reconstruct 1 failed data disk */
for (x0 = 0; x0 < opts->rto_dcols; x0++) {
if (x0 >= rm->rm_cols - raidz_parity(rm))
if (x0 >= rm->rm_row[0]->rr_cols - raidz_parity(rm))
continue;
/* Check if should stop */
@@ -445,10 +498,11 @@ run_rec_check_impl(raidz_test_opts_t *opts, raidz_map_t *rm, const int fn)
} else if (fn < RAIDZ_REC_PQR) {
/* can reconstruct 2 failed data disk */
for (x0 = 0; x0 < opts->rto_dcols; x0++) {
if (x0 >= rm->rm_cols - raidz_parity(rm))
if (x0 >= rm->rm_row[0]->rr_cols - raidz_parity(rm))
continue;
for (x1 = x0 + 1; x1 < opts->rto_dcols; x1++) {
if (x1 >= rm->rm_cols - raidz_parity(rm))
if (x1 >= rm->rm_row[0]->rr_cols -
raidz_parity(rm))
continue;
/* Check if should stop */
@@ -475,14 +529,15 @@ run_rec_check_impl(raidz_test_opts_t *opts, raidz_map_t *rm, const int fn)
} else {
/* can reconstruct 3 failed data disk */
for (x0 = 0; x0 < opts->rto_dcols; x0++) {
if (x0 >= rm->rm_cols - raidz_parity(rm))
if (x0 >= rm->rm_row[0]->rr_cols - raidz_parity(rm))
continue;
for (x1 = x0 + 1; x1 < opts->rto_dcols; x1++) {
if (x1 >= rm->rm_cols - raidz_parity(rm))
if (x1 >= rm->rm_row[0]->rr_cols -
raidz_parity(rm))
continue;
for (x2 = x1 + 1; x2 < opts->rto_dcols; x2++) {
if (x2 >=
rm->rm_cols - raidz_parity(rm))
if (x2 >= rm->rm_row[0]->rr_cols -
raidz_parity(rm))
continue;
/* Check if should stop */
@@ -598,7 +653,7 @@ static kcondvar_t sem_cv;
static int max_free_slots;
static int free_slots;
static void
static __attribute__((noreturn)) void
sweep_thread(void *arg)
{
int err = 0;
@@ -698,8 +753,10 @@ run_sweep(void)
opts = umem_zalloc(sizeof (raidz_test_opts_t), UMEM_NOFAIL);
opts->rto_ashift = ashift_v[a];
opts->rto_dcols = dcols_v[d];
opts->rto_offset = (1 << ashift_v[a]) * rand();
opts->rto_offset = (1ULL << ashift_v[a]) * rand();
opts->rto_dsize = size_v[s];
opts->rto_expand = rto_opts.rto_expand;
opts->rto_expand_offset = rto_opts.rto_expand_offset;
opts->rto_v = 0; /* be quiet */
VERIFY3P(thread_create(NULL, 0, sweep_thread, (void *) opts,
@@ -732,6 +789,7 @@ exit:
return (sweep_state == SWEEP_ERROR ? SWEEP_ERROR : 0);
}
int
main(int argc, char **argv)
{
@@ -739,8 +797,8 @@ main(int argc, char **argv)
struct sigaction action;
int err = 0;
/* init gdb string early */
(void) sprintf(gdb, gdb_tmpl, getpid());
/* init gdb pid string early */
(void) sprintf(pid_s, "%d", getpid());
action.sa_handler = sig_handler;
sigemptyset(&action.sa_mask);
+20 -14
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -28,7 +28,7 @@
#include <sys/spa.h>
static const char *raidz_impl_names[] = {
static const char *const raidz_impl_names[] = {
"original",
"scalar",
"sse2",
@@ -42,15 +42,23 @@ static const char *raidz_impl_names[] = {
NULL
};
enum raidz_verbosity {
D_ALL,
D_INFO,
D_DEBUG,
};
typedef struct raidz_test_opts {
size_t rto_ashift;
size_t rto_offset;
uint64_t rto_offset;
size_t rto_dcols;
size_t rto_dsize;
size_t rto_v;
enum raidz_verbosity rto_v;
size_t rto_sweep;
size_t rto_sweep_timeout;
size_t rto_benchmark;
size_t rto_expand;
uint64_t rto_expand_offset;
size_t rto_sanity;
size_t rto_gdb;
@@ -66,9 +74,11 @@ static const raidz_test_opts_t rto_opts_defaults = {
.rto_offset = 1ULL << 0,
.rto_dcols = 8,
.rto_dsize = 1<<19,
.rto_v = 0,
.rto_v = D_ALL,
.rto_sweep = 0,
.rto_benchmark = 0,
.rto_expand = 0,
.rto_expand_offset = -1ULL,
.rto_sanity = 0,
.rto_gdb = 0,
.rto_should_stop = B_FALSE
@@ -82,23 +92,19 @@ static inline size_t ilog2(size_t a)
}
#define D_ALL 0
#define D_INFO 1
#define D_DEBUG 2
#define LOG(lvl, a...) \
#define LOG(lvl, ...) \
{ \
if (rto_opts.rto_v >= lvl) \
(void) fprintf(stdout, a); \
(void) fprintf(stdout, __VA_ARGS__); \
} \
#define LOG_OPT(lvl, opt, a...) \
#define LOG_OPT(lvl, opt, ...) \
{ \
if (opt->rto_v >= lvl) \
(void) fprintf(stdout, a); \
(void) fprintf(stdout, __VA_ARGS__); \
} \
#define ERR(a...) (void) fprintf(stderr, a)
#define ERR(...) (void) fprintf(stderr, __VA_ARGS__)
#define DBLSEP "================\n"
-1
View File
@@ -1 +0,0 @@
dist_udev_SCRIPTS = vdev_id
-1
View File
@@ -1 +0,0 @@
/zdb
+13 -11
View File
@@ -1,16 +1,18 @@
include $(top_srcdir)/config/Rules.am
zdb_CPPFLAGS = $(AM_CPPFLAGS) $(LIBZPOOL_CPPFLAGS)
zdb_CFLAGS = $(AM_CFLAGS) $(LIBCRYPTO_CFLAGS)
# Unconditionally enable debugging for zdb
AM_CPPFLAGS += -DDEBUG -UNDEBUG -DZFS_DEBUG
sbin_PROGRAMS = zdb
sbin_PROGRAMS += zdb
CPPCHECKTARGETS += zdb
zdb_SOURCES = \
zdb.c \
zdb_il.c \
zdb.h
%D%/zdb.c \
%D%/zdb.h \
%D%/zdb_il.c
zdb_LDADD = \
$(abs_top_builddir)/lib/libzpool/libzpool.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la \
$(abs_top_builddir)/lib/libnvpair/libnvpair.la
libzdb.la \
libzpool.la \
libzfs_core.la \
libnvpair.la
zdb_LDADD += $(LIBCRYPTO_LIBS)
+2090 -795
View File
File diff suppressed because it is too large Load Diff
+1 -1
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
+160 -48
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -60,25 +60,26 @@ print_log_bp(const blkptr_t *bp, const char *prefix)
(void) printf("%s%s\n", prefix, blkbuf);
}
/* ARGSUSED */
static void
zil_prt_rec_create(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_create(zilog_t *zilog, int txtype, const void *arg)
{
lr_create_t *lr = arg;
(void) zilog;
const lr_create_t *lrc = arg;
const _lr_create_t *lr = &lrc->lr_create;
time_t crtime = lr->lr_crtime[0];
char *name, *link;
const char *name, *link;
lr_attr_t *lrattr;
name = (char *)(lr + 1);
name = (const char *)&lrc->lr_data[0];
if (lr->lr_common.lrc_txtype == TX_CREATE_ATTR ||
lr->lr_common.lrc_txtype == TX_MKDIR_ATTR) {
lrattr = (lr_attr_t *)(lr + 1);
lrattr = (lr_attr_t *)&lrc->lr_data[0];
name += ZIL_XVAT_SIZE(lrattr->lr_attr_masksize);
}
if (txtype == TX_SYMLINK) {
link = name + strlen(name) + 1;
link = (const char *)&lrc->lr_data[strlen(name) + 1];
(void) printf("%s%s -> %s\n", tab_prefix, name, link);
} else if (txtype != TX_MKXATTR) {
(void) printf("%s%s\n", tab_prefix, name);
@@ -96,44 +97,53 @@ zil_prt_rec_create(zilog_t *zilog, int txtype, void *arg)
(u_longlong_t)lr->lr_gen, (u_longlong_t)lr->lr_rdev);
}
/* ARGSUSED */
static void
zil_prt_rec_remove(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_remove(zilog_t *zilog, int txtype, const void *arg)
{
lr_remove_t *lr = arg;
(void) zilog, (void) txtype;
const lr_remove_t *lr = arg;
(void) printf("%sdoid %llu, name %s\n", tab_prefix,
(u_longlong_t)lr->lr_doid, (char *)(lr + 1));
(u_longlong_t)lr->lr_doid, (const char *)&lr->lr_data[0]);
}
/* ARGSUSED */
static void
zil_prt_rec_link(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_link(zilog_t *zilog, int txtype, const void *arg)
{
lr_link_t *lr = arg;
(void) zilog, (void) txtype;
const lr_link_t *lr = arg;
(void) printf("%sdoid %llu, link_obj %llu, name %s\n", tab_prefix,
(u_longlong_t)lr->lr_doid, (u_longlong_t)lr->lr_link_obj,
(char *)(lr + 1));
(const char *)&lr->lr_data[0]);
}
/* ARGSUSED */
static void
zil_prt_rec_rename(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_rename(zilog_t *zilog, int txtype, const void *arg)
{
lr_rename_t *lr = arg;
char *snm = (char *)(lr + 1);
char *tnm = snm + strlen(snm) + 1;
(void) zilog, (void) txtype;
const lr_rename_t *lrr = arg;
const _lr_rename_t *lr = &lrr->lr_rename;
const char *snm = (const char *)&lrr->lr_data[0];
const char *tnm = (const char *)&lrr->lr_data[strlen(snm) + 1];
(void) printf("%ssdoid %llu, tdoid %llu\n", tab_prefix,
(u_longlong_t)lr->lr_sdoid, (u_longlong_t)lr->lr_tdoid);
(void) printf("%ssrc %s tgt %s\n", tab_prefix, snm, tnm);
switch (txtype) {
case TX_RENAME_EXCHANGE:
(void) printf("%sflags RENAME_EXCHANGE\n", tab_prefix);
break;
case TX_RENAME_WHITEOUT:
(void) printf("%sflags RENAME_WHITEOUT\n", tab_prefix);
break;
}
}
/* ARGSUSED */
static int
zil_prt_rec_write_cb(void *data, size_t len, void *unused)
{
(void) unused;
char *cdata = data;
for (size_t i = 0; i < len; i++) {
@@ -146,13 +156,12 @@ zil_prt_rec_write_cb(void *data, size_t len, void *unused)
return (0);
}
/* ARGSUSED */
static void
zil_prt_rec_write(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_write(zilog_t *zilog, int txtype, const void *arg)
{
lr_write_t *lr = arg;
const lr_write_t *lr = arg;
abd_t *data;
blkptr_t *bp = &lr->lr_blkptr;
const blkptr_t *bp = &lr->lr_blkptr;
zbookmark_phys_t zb;
int verbose = MAX(dump_opt['d'], dump_opt['i']);
int error;
@@ -161,28 +170,31 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, void *arg)
(u_longlong_t)lr->lr_foid, (u_longlong_t)lr->lr_offset,
(u_longlong_t)lr->lr_length);
if (txtype == TX_WRITE2 || verbose < 5)
if (txtype == TX_WRITE2 || verbose < 4)
return;
if (lr->lr_common.lrc_reclen == sizeof (lr_write_t)) {
(void) printf("%shas blkptr, %s\n", tab_prefix,
!BP_IS_HOLE(bp) &&
bp->blk_birth >= spa_min_claim_txg(zilog->zl_spa) ?
!BP_IS_HOLE(bp) && BP_GET_LOGICAL_BIRTH(bp) >=
spa_min_claim_txg(zilog->zl_spa) ?
"will claim" : "won't claim");
print_log_bp(bp, tab_prefix);
if (verbose < 5)
return;
if (BP_IS_HOLE(bp)) {
(void) printf("\t\t\tLSIZE 0x%llx\n",
(u_longlong_t)BP_GET_LSIZE(bp));
(void) printf("%s<hole>\n", tab_prefix);
return;
}
if (bp->blk_birth < zilog->zl_header->zh_claim_txg) {
if (BP_GET_LOGICAL_BIRTH(bp) < zilog->zl_header->zh_claim_txg) {
(void) printf("%s<block already committed>\n",
tab_prefix);
return;
}
ASSERT3U(BP_GET_LSIZE(bp), !=, 0);
SET_BOOKMARK(&zb, dmu_objset_id(zilog->zl_os),
lr->lr_foid, ZB_ZIL_LEVEL,
lr->lr_offset / BP_GET_LSIZE(bp));
@@ -194,9 +206,12 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, void *arg)
if (error)
goto out;
} else {
if (verbose < 5)
return;
/* data is stored after the end of the lr_write record */
data = abd_alloc(lr->lr_length, B_FALSE);
abd_copy_from_buf(data, lr + 1, lr->lr_length);
abd_copy_from_buf(data, &lr->lr_data[0], lr->lr_length);
}
(void) printf("%s", tab_prefix);
@@ -209,22 +224,44 @@ out:
abd_free(data);
}
/* ARGSUSED */
static void
zil_prt_rec_truncate(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_write_enc(zilog_t *zilog, int txtype, const void *arg)
{
lr_truncate_t *lr = arg;
(void) txtype;
const lr_write_t *lr = arg;
const blkptr_t *bp = &lr->lr_blkptr;
int verbose = MAX(dump_opt['d'], dump_opt['i']);
(void) printf("%s(encrypted)\n", tab_prefix);
if (verbose < 4)
return;
if (lr->lr_common.lrc_reclen == sizeof (lr_write_t)) {
(void) printf("%shas blkptr, %s\n", tab_prefix,
!BP_IS_HOLE(bp) && BP_GET_LOGICAL_BIRTH(bp) >=
spa_min_claim_txg(zilog->zl_spa) ?
"will claim" : "won't claim");
print_log_bp(bp, tab_prefix);
}
}
static void
zil_prt_rec_truncate(zilog_t *zilog, int txtype, const void *arg)
{
(void) zilog, (void) txtype;
const lr_truncate_t *lr = arg;
(void) printf("%sfoid %llu, offset 0x%llx, length 0x%llx\n", tab_prefix,
(u_longlong_t)lr->lr_foid, (longlong_t)lr->lr_offset,
(u_longlong_t)lr->lr_length);
}
/* ARGSUSED */
static void
zil_prt_rec_setattr(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_setattr(zilog_t *zilog, int txtype, const void *arg)
{
lr_setattr_t *lr = arg;
(void) zilog, (void) txtype;
const lr_setattr_t *lr = arg;
time_t atime = (time_t)lr->lr_atime[0];
time_t mtime = (time_t)lr->lr_mtime[0];
@@ -266,19 +303,83 @@ zil_prt_rec_setattr(zilog_t *zilog, int txtype, void *arg)
}
}
/* ARGSUSED */
static void
zil_prt_rec_acl(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_setsaxattr(zilog_t *zilog, int txtype, const void *arg)
{
lr_acl_t *lr = arg;
(void) zilog, (void) txtype;
const lr_setsaxattr_t *lr = arg;
const char *name = (const char *)&lr->lr_data[0];
(void) printf("%sfoid %llu\n", tab_prefix,
(u_longlong_t)lr->lr_foid);
(void) printf("%sXAT_NAME %s\n", tab_prefix, name);
if (lr->lr_size == 0) {
(void) printf("%sXAT_VALUE NULL\n", tab_prefix);
} else {
(void) printf("%sXAT_VALUE ", tab_prefix);
const char *val = (const char *)&lr->lr_data[strlen(name) + 1];
for (int i = 0; i < lr->lr_size; i++) {
(void) printf("%c", *val);
val++;
}
}
}
static void
zil_prt_rec_acl(zilog_t *zilog, int txtype, const void *arg)
{
(void) zilog, (void) txtype;
const lr_acl_t *lr = arg;
(void) printf("%sfoid %llu, aclcnt %llu\n", tab_prefix,
(u_longlong_t)lr->lr_foid, (u_longlong_t)lr->lr_aclcnt);
}
typedef void (*zil_prt_rec_func_t)(zilog_t *, int, void *);
static void
zil_prt_rec_clone_range(zilog_t *zilog, int txtype, const void *arg)
{
(void) zilog, (void) txtype;
const lr_clone_range_t *lr = arg;
int verbose = MAX(dump_opt['d'], dump_opt['i']);
(void) printf("%sfoid %llu, offset %llx, length %llx, blksize %llx\n",
tab_prefix, (u_longlong_t)lr->lr_foid, (u_longlong_t)lr->lr_offset,
(u_longlong_t)lr->lr_length, (u_longlong_t)lr->lr_blksz);
if (verbose < 4)
return;
for (unsigned int i = 0; i < lr->lr_nbps; i++) {
(void) printf("%s[%u/%llu] ", tab_prefix, i + 1,
(u_longlong_t)lr->lr_nbps);
print_log_bp(&lr->lr_bps[i], "");
}
}
static void
zil_prt_rec_clone_range_enc(zilog_t *zilog, int txtype, const void *arg)
{
(void) zilog, (void) txtype;
const lr_clone_range_t *lr = arg;
int verbose = MAX(dump_opt['d'], dump_opt['i']);
(void) printf("%s(encrypted)\n", tab_prefix);
if (verbose < 4)
return;
for (unsigned int i = 0; i < lr->lr_nbps; i++) {
(void) printf("%s[%u/%llu] ", tab_prefix, i + 1,
(u_longlong_t)lr->lr_nbps);
print_log_bp(&lr->lr_bps[i], "");
}
}
typedef void (*zil_prt_rec_func_t)(zilog_t *, int, const void *);
typedef struct zil_rec_info {
zil_prt_rec_func_t zri_print;
zil_prt_rec_func_t zri_print_enc;
const char *zri_name;
uint64_t zri_count;
} zil_rec_info_t;
@@ -293,7 +394,9 @@ static zil_rec_info_t zil_rec_info[TX_MAX_TYPE] = {
{.zri_print = zil_prt_rec_remove, .zri_name = "TX_RMDIR "},
{.zri_print = zil_prt_rec_link, .zri_name = "TX_LINK "},
{.zri_print = zil_prt_rec_rename, .zri_name = "TX_RENAME "},
{.zri_print = zil_prt_rec_write, .zri_name = "TX_WRITE "},
{.zri_print = zil_prt_rec_write,
.zri_print_enc = zil_prt_rec_write_enc,
.zri_name = "TX_WRITE "},
{.zri_print = zil_prt_rec_truncate, .zri_name = "TX_TRUNCATE "},
{.zri_print = zil_prt_rec_setattr, .zri_name = "TX_SETATTR "},
{.zri_print = zil_prt_rec_acl, .zri_name = "TX_ACL_V0 "},
@@ -305,12 +408,19 @@ static zil_rec_info_t zil_rec_info[TX_MAX_TYPE] = {
{.zri_print = zil_prt_rec_create, .zri_name = "TX_MKDIR_ATTR "},
{.zri_print = zil_prt_rec_create, .zri_name = "TX_MKDIR_ACL_ATTR "},
{.zri_print = zil_prt_rec_write, .zri_name = "TX_WRITE2 "},
{.zri_print = zil_prt_rec_setsaxattr,
.zri_name = "TX_SETSAXATTR "},
{.zri_print = zil_prt_rec_rename, .zri_name = "TX_RENAME_EXCHANGE "},
{.zri_print = zil_prt_rec_rename, .zri_name = "TX_RENAME_WHITEOUT "},
{.zri_print = zil_prt_rec_clone_range,
.zri_print_enc = zil_prt_rec_clone_range_enc,
.zri_name = "TX_CLONE_RANGE "},
};
/* ARGSUSED */
static int
print_log_record(zilog_t *zilog, lr_t *lr, void *arg, uint64_t claim_txg)
print_log_record(zilog_t *zilog, const lr_t *lr, void *arg, uint64_t claim_txg)
{
(void) arg, (void) claim_txg;
int txtype;
int verbose = MAX(dump_opt['d'], dump_opt['i']);
@@ -330,6 +440,8 @@ print_log_record(zilog_t *zilog, lr_t *lr, void *arg, uint64_t claim_txg)
if (txtype && verbose >= 3) {
if (!zilog->zl_os->os_encrypted) {
zil_rec_info[txtype].zri_print(zilog, txtype, lr);
} else if (zil_rec_info[txtype].zri_print_enc) {
zil_rec_info[txtype].zri_print_enc(zilog, txtype, lr);
} else {
(void) printf("%s(encrypted)\n", tab_prefix);
}
@@ -341,10 +453,11 @@ print_log_record(zilog_t *zilog, lr_t *lr, void *arg, uint64_t claim_txg)
return (0);
}
/* ARGSUSED */
static int
print_log_block(zilog_t *zilog, blkptr_t *bp, void *arg, uint64_t claim_txg)
print_log_block(zilog_t *zilog, const blkptr_t *bp, void *arg,
uint64_t claim_txg)
{
(void) arg;
char blkbuf[BP_SPRINTF_LEN + 10];
int verbose = MAX(dump_opt['d'], dump_opt['i']);
const char *claim;
@@ -362,7 +475,7 @@ print_log_block(zilog_t *zilog, blkptr_t *bp, void *arg, uint64_t claim_txg)
if (claim_txg != 0)
claim = "already claimed";
else if (bp->blk_birth >= spa_min_claim_txg(zilog->zl_spa))
else if (BP_GET_LOGICAL_BIRTH(bp) >= spa_min_claim_txg(zilog->zl_spa))
claim = "will claim";
else
claim = "won't claim";
@@ -395,7 +508,6 @@ print_log_stats(int verbose)
(void) printf("\n");
}
/* ARGSUSED */
void
dump_intent_log(zilog_t *zilog)
{
+38 -41
View File
@@ -1,49 +1,46 @@
include $(top_srcdir)/config/Rules.am
include $(srcdir)/%D%/zed.d/Makefile.am
AM_CFLAGS += $(LIBUDEV_CFLAGS) $(LIBUUID_CFLAGS)
zed_CFLAGS = $(AM_CFLAGS)
zed_CFLAGS += $(LIBUDEV_CFLAGS) $(LIBUUID_CFLAGS)
SUBDIRS = zed.d
sbin_PROGRAMS += zed
CPPCHECKTARGETS += zed
sbin_PROGRAMS = zed
ZED_SRC = \
zed.c \
zed.h \
zed_conf.c \
zed_conf.h \
zed_disk_event.c \
zed_disk_event.h \
zed_event.c \
zed_event.h \
zed_exec.c \
zed_exec.h \
zed_file.c \
zed_file.h \
zed_log.c \
zed_log.h \
zed_strings.c \
zed_strings.h
FMA_SRC = \
agents/zfs_agents.c \
agents/zfs_agents.h \
agents/zfs_diagnosis.c \
agents/zfs_mod.c \
agents/zfs_retire.c \
agents/fmd_api.c \
agents/fmd_api.h \
agents/fmd_serd.c \
agents/fmd_serd.h
zed_SOURCES = $(ZED_SRC) $(FMA_SRC)
zed_SOURCES = \
%D%/zed.c \
%D%/zed.h \
%D%/zed_conf.c \
%D%/zed_conf.h \
%D%/zed_disk_event.c \
%D%/zed_disk_event.h \
%D%/zed_event.c \
%D%/zed_event.h \
%D%/zed_exec.c \
%D%/zed_exec.h \
%D%/zed_file.c \
%D%/zed_file.h \
%D%/zed_log.c \
%D%/zed_log.h \
%D%/zed_strings.c \
%D%/zed_strings.h \
\
%D%/agents/fmd_api.c \
%D%/agents/fmd_api.h \
%D%/agents/fmd_serd.c \
%D%/agents/fmd_serd.h \
%D%/agents/zfs_agents.c \
%D%/agents/zfs_agents.h \
%D%/agents/zfs_diagnosis.c \
%D%/agents/zfs_mod.c \
%D%/agents/zfs_retire.c
zed_LDADD = \
$(abs_top_builddir)/lib/libzfs/libzfs.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la \
$(abs_top_builddir)/lib/libnvpair/libnvpair.la \
$(abs_top_builddir)/lib/libuutil/libuutil.la
libzfs.la \
libzfs_core.la \
libnvpair.la \
libuutil.la
zed_LDADD += -lrt $(LIBUDEV_LIBS) $(LIBUUID_LIBS)
zed_LDADD += -lrt $(LIBATOMIC_LIBS) $(LIBUDEV_LIBS) $(LIBUUID_LIBS)
zed_LDFLAGS = -pthread
EXTRA_DIST = agents/README.md
dist_noinst_DATA += %D%/agents/README.md
+61 -45
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -22,6 +22,7 @@
* Copyright (c) 2004, 2010, Oracle and/or its affiliates. All rights reserved.
*
* Copyright (c) 2016, Intel Corporation.
* Copyright (c) 2023, Klara Inc.
*/
/*
@@ -38,7 +39,7 @@
#include <sys/fm/protocol.h>
#include <uuid/uuid.h>
#include <signal.h>
#include <strings.h>
#include <string.h>
#include <time.h>
#include "fmd_api.h"
@@ -97,6 +98,7 @@ _umem_logging_init(void)
int
fmd_hdl_register(fmd_hdl_t *hdl, int version, const fmd_hdl_info_t *mip)
{
(void) version;
fmd_module_t *mp = (fmd_module_t *)hdl;
mp->mod_info = mip;
@@ -179,18 +181,21 @@ fmd_hdl_getspecific(fmd_hdl_t *hdl)
void *
fmd_hdl_alloc(fmd_hdl_t *hdl, size_t size, int flags)
{
(void) hdl;
return (umem_alloc(size, flags));
}
void *
fmd_hdl_zalloc(fmd_hdl_t *hdl, size_t size, int flags)
{
(void) hdl;
return (umem_zalloc(size, flags));
}
void
fmd_hdl_free(fmd_hdl_t *hdl, void *data, size_t size)
{
(void) hdl;
umem_free(data, size);
}
@@ -217,6 +222,8 @@ fmd_hdl_debug(fmd_hdl_t *hdl, const char *format, ...)
int32_t
fmd_prop_get_int32(fmd_hdl_t *hdl, const char *name)
{
(void) hdl;
/*
* These can be looked up in mp->modinfo->fmdi_props
* For now we just hard code for phase 2. In the
@@ -225,26 +232,6 @@ fmd_prop_get_int32(fmd_hdl_t *hdl, const char *name)
if (strcmp(name, "spare_on_remove") == 0)
return (1);
if (strcmp(name, "io_N") == 0 || strcmp(name, "checksum_N") == 0)
return (10); /* N = 10 events */
return (0);
}
int64_t
fmd_prop_get_int64(fmd_hdl_t *hdl, const char *name)
{
/*
* These can be looked up in mp->modinfo->fmdi_props
* For now we just hard code for phase 2. In the
* future, there can be a ZED based override.
*/
if (strcmp(name, "remove_timeout") == 0)
return (15ULL * 1000ULL * 1000ULL * 1000ULL); /* 15 sec */
if (strcmp(name, "io_T") == 0 || strcmp(name, "checksum_T") == 0)
return (1000ULL * 1000ULL * 1000ULL * 600ULL); /* 10 min */
return (0);
}
@@ -334,22 +321,24 @@ fmd_case_uuresolved(fmd_hdl_t *hdl, const char *uuid)
fmd_hdl_debug(hdl, "case resolved by uuid (%s)", uuid);
}
int
boolean_t
fmd_case_solved(fmd_hdl_t *hdl, fmd_case_t *cp)
{
return ((cp->ci_state >= FMD_CASE_SOLVED) ? FMD_B_TRUE : FMD_B_FALSE);
(void) hdl;
return (cp->ci_state >= FMD_CASE_SOLVED);
}
void
fmd_case_add_ereport(fmd_hdl_t *hdl, fmd_case_t *cp, fmd_event_t *ep)
{
(void) hdl, (void) cp, (void) ep;
}
static void
zed_log_fault(nvlist_t *nvl, const char *uuid, const char *code)
{
nvlist_t *rsrc;
char *strval;
const char *strval;
uint64_t guid;
uint8_t byte;
@@ -362,7 +351,7 @@ zed_log_fault(nvlist_t *nvl, const char *uuid, const char *code)
if (code != NULL)
zed_log_msg(LOG_INFO, "\t%s: %s", FM_SUSPECT_DIAG_CODE, code);
if (nvlist_lookup_uint8(nvl, FM_FAULT_CERTAINTY, &byte) == 0)
zed_log_msg(LOG_INFO, "\t%s: %llu", FM_FAULT_CERTAINTY, byte);
zed_log_msg(LOG_INFO, "\t%s: %hhu", FM_FAULT_CERTAINTY, byte);
if (nvlist_lookup_nvlist(nvl, FM_FAULT_RESOURCE, &rsrc) == 0) {
if (nvlist_lookup_string(rsrc, FM_FMRI_SCHEME, &strval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %s", FM_FMRI_SCHEME,
@@ -379,7 +368,8 @@ zed_log_fault(nvlist_t *nvl, const char *uuid, const char *code)
static const char *
fmd_fault_mkcode(nvlist_t *fault)
{
char *class, *code = "-";
const char *class;
const char *code = "-";
/*
* Note: message codes come from: openzfs/usr/src/cmd/fm/dicts/ZFS.po
@@ -428,7 +418,8 @@ fmd_case_add_suspect(fmd_hdl_t *hdl, fmd_case_t *cp, nvlist_t *fault)
err |= nvlist_add_string(nvl, FM_SUSPECT_DIAG_CODE, code);
err |= nvlist_add_int64_array(nvl, FM_SUSPECT_DIAG_TIME, tod, 2);
err |= nvlist_add_uint32(nvl, FM_SUSPECT_FAULT_SZ, 1);
err |= nvlist_add_nvlist_array(nvl, FM_SUSPECT_FAULT_LIST, &fault, 1);
err |= nvlist_add_nvlist_array(nvl, FM_SUSPECT_FAULT_LIST,
(const nvlist_t **)&fault, 1);
if (err)
zed_log_die("failed to populate nvlist");
@@ -443,19 +434,21 @@ fmd_case_add_suspect(fmd_hdl_t *hdl, fmd_case_t *cp, nvlist_t *fault)
void
fmd_case_setspecific(fmd_hdl_t *hdl, fmd_case_t *cp, void *data)
{
(void) hdl;
cp->ci_data = data;
}
void *
fmd_case_getspecific(fmd_hdl_t *hdl, fmd_case_t *cp)
{
(void) hdl;
return (cp->ci_data);
}
void
fmd_buf_create(fmd_hdl_t *hdl, fmd_case_t *cp, const char *name, size_t size)
{
assert(strcmp(name, "data") == 0);
assert(strcmp(name, "data") == 0), (void) name;
assert(cp->ci_bufptr == NULL);
assert(size < (1024 * 1024));
@@ -467,22 +460,24 @@ void
fmd_buf_read(fmd_hdl_t *hdl, fmd_case_t *cp,
const char *name, void *buf, size_t size)
{
assert(strcmp(name, "data") == 0);
(void) hdl;
assert(strcmp(name, "data") == 0), (void) name;
assert(cp->ci_bufptr != NULL);
assert(size <= cp->ci_bufsiz);
bcopy(cp->ci_bufptr, buf, size);
memcpy(buf, cp->ci_bufptr, size);
}
void
fmd_buf_write(fmd_hdl_t *hdl, fmd_case_t *cp,
const char *name, const void *buf, size_t size)
{
assert(strcmp(name, "data") == 0);
(void) hdl;
assert(strcmp(name, "data") == 0), (void) name;
assert(cp->ci_bufptr != NULL);
assert(cp->ci_bufsiz >= size);
bcopy(buf, cp->ci_bufptr, size);
memcpy(cp->ci_bufptr, buf, size);
}
/* SERD Engines */
@@ -519,6 +514,19 @@ fmd_serd_exists(fmd_hdl_t *hdl, const char *name)
return (fmd_serd_eng_lookup(&mp->mod_serds, name) != NULL);
}
int
fmd_serd_active(fmd_hdl_t *hdl, const char *name)
{
fmd_module_t *mp = (fmd_module_t *)hdl;
fmd_serd_eng_t *sgp;
if ((sgp = fmd_serd_eng_lookup(&mp->mod_serds, name)) == NULL) {
zed_log_msg(LOG_ERR, "serd engine '%s' does not exist", name);
return (0);
}
return (fmd_serd_eng_fired(sgp) || !fmd_serd_eng_empty(sgp));
}
void
fmd_serd_reset(fmd_hdl_t *hdl, const char *name)
{
@@ -527,12 +535,10 @@ fmd_serd_reset(fmd_hdl_t *hdl, const char *name)
if ((sgp = fmd_serd_eng_lookup(&mp->mod_serds, name)) == NULL) {
zed_log_msg(LOG_ERR, "serd engine '%s' does not exist", name);
return;
} else {
fmd_serd_eng_reset(sgp);
fmd_hdl_debug(hdl, "serd_reset %s", name);
}
fmd_serd_eng_reset(sgp);
fmd_hdl_debug(hdl, "serd_reset %s", name);
}
int
@@ -540,16 +546,21 @@ fmd_serd_record(fmd_hdl_t *hdl, const char *name, fmd_event_t *ep)
{
fmd_module_t *mp = (fmd_module_t *)hdl;
fmd_serd_eng_t *sgp;
int err;
if ((sgp = fmd_serd_eng_lookup(&mp->mod_serds, name)) == NULL) {
zed_log_msg(LOG_ERR, "failed to add record to SERD engine '%s'",
name);
return (FMD_B_FALSE);
return (0);
}
err = fmd_serd_eng_record(sgp, ep->ev_hrt);
return (fmd_serd_eng_record(sgp, ep->ev_hrt));
}
return (err);
void
fmd_serd_gc(fmd_hdl_t *hdl)
{
fmd_module_t *mp = (fmd_module_t *)hdl;
fmd_serd_hash_apply(&mp->mod_serds, fmd_serd_eng_gc, NULL);
}
/* FMD Timers */
@@ -563,10 +574,10 @@ _timer_notify(union sigval sv)
const fmd_hdl_ops_t *ops = mp->mod_info->fmdi_ops;
struct itimerspec its;
fmd_hdl_debug(hdl, "timer fired (%p)", ftp->ft_tid);
fmd_hdl_debug(hdl, "%s timer fired (%p)", mp->mod_name, ftp->ft_tid);
/* disarm the timer */
bzero(&its, sizeof (struct itimerspec));
memset(&its, 0, sizeof (struct itimerspec));
timer_settime(ftp->ft_tid, 0, &its, NULL);
/* Note that the fmdo_timeout can remove this timer */
@@ -582,6 +593,7 @@ _timer_notify(union sigval sv)
fmd_timer_t *
fmd_timer_install(fmd_hdl_t *hdl, void *arg, fmd_event_t *ep, hrtime_t delta)
{
(void) ep;
struct sigevent sev;
struct itimerspec its;
fmd_timer_t *ftp;
@@ -599,6 +611,7 @@ fmd_timer_install(fmd_hdl_t *hdl, void *arg, fmd_event_t *ep, hrtime_t delta)
sev.sigev_notify_function = _timer_notify;
sev.sigev_notify_attributes = NULL;
sev.sigev_value.sival_ptr = ftp;
sev.sigev_signo = 0;
timer_create(CLOCK_REALTIME, &sev, &ftp->ft_tid);
timer_settime(ftp->ft_tid, 0, &its, NULL);
@@ -625,6 +638,7 @@ nvlist_t *
fmd_nvl_create_fault(fmd_hdl_t *hdl, const char *class, uint8_t certainty,
nvlist_t *asru, nvlist_t *fru, nvlist_t *resource)
{
(void) hdl;
nvlist_t *nvl;
int err = 0;
@@ -688,7 +702,8 @@ fmd_strmatch(const char *s, const char *p)
int
fmd_nvl_class_match(fmd_hdl_t *hdl, nvlist_t *nvl, const char *pattern)
{
char *class;
(void) hdl;
const char *class;
return (nvl != NULL &&
nvlist_lookup_string(nvl, FM_CLASS, &class) == 0 &&
@@ -698,6 +713,7 @@ fmd_nvl_class_match(fmd_hdl_t *hdl, nvlist_t *nvl, const char *pattern)
nvlist_t *
fmd_nvl_alloc(fmd_hdl_t *hdl, int flags)
{
(void) hdl, (void) flags;
nvlist_t *nvl = NULL;
if (nvlist_alloc(&nvl, NV_UNIQUE_NAME, 0) != 0)
+4 -8
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -72,10 +72,6 @@ typedef struct fmd_case {
} fmd_case_t;
#define FMD_B_FALSE 0 /* false value for booleans as int */
#define FMD_B_TRUE 1 /* true value for booleans as int */
#define FMD_CASE_UNSOLVED 0 /* case is not yet solved (waiting) */
#define FMD_CASE_SOLVED 1 /* case is solved (suspects added) */
#define FMD_CASE_CLOSE_WAIT 2 /* case is executing fmdo_close() */
@@ -155,7 +151,6 @@ extern void fmd_hdl_vdebug(fmd_hdl_t *, const char *, va_list);
extern void fmd_hdl_debug(fmd_hdl_t *, const char *, ...);
extern int32_t fmd_prop_get_int32(fmd_hdl_t *, const char *);
extern int64_t fmd_prop_get_int64(fmd_hdl_t *, const char *);
#define FMD_STAT_NOALLOC 0x0 /* fmd should use caller's memory */
#define FMD_STAT_ALLOC 0x1 /* fmd should allocate stats memory */
@@ -176,8 +171,7 @@ extern int fmd_case_uuclosed(fmd_hdl_t *, const char *);
extern int fmd_case_uuisresolved(fmd_hdl_t *, const char *);
extern void fmd_case_uuresolved(fmd_hdl_t *, const char *);
extern int fmd_case_solved(fmd_hdl_t *, fmd_case_t *);
extern int fmd_case_closed(fmd_hdl_t *, fmd_case_t *);
extern boolean_t fmd_case_solved(fmd_hdl_t *, fmd_case_t *);
extern void fmd_case_add_ereport(fmd_hdl_t *, fmd_case_t *, fmd_event_t *);
extern void fmd_case_add_serd(fmd_hdl_t *, fmd_case_t *, const char *);
@@ -200,10 +194,12 @@ extern size_t fmd_buf_size(fmd_hdl_t *, fmd_case_t *, const char *);
extern void fmd_serd_create(fmd_hdl_t *, const char *, uint_t, hrtime_t);
extern void fmd_serd_destroy(fmd_hdl_t *, const char *);
extern int fmd_serd_exists(fmd_hdl_t *, const char *);
extern int fmd_serd_active(fmd_hdl_t *, const char *);
extern void fmd_serd_reset(fmd_hdl_t *, const char *);
extern int fmd_serd_record(fmd_hdl_t *, const char *, fmd_event_t *);
extern int fmd_serd_fired(fmd_hdl_t *, const char *);
extern int fmd_serd_empty(fmd_hdl_t *, const char *);
extern void fmd_serd_gc(fmd_hdl_t *);
extern id_t fmd_timer_install(fmd_hdl_t *, void *, fmd_event_t *, hrtime_t);
extern void fmd_timer_remove(fmd_hdl_t *, id_t);
+28 -8
View File
@@ -7,7 +7,7 @@
* with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -29,7 +29,7 @@
#include <assert.h>
#include <stddef.h>
#include <stdlib.h>
#include <strings.h>
#include <string.h>
#include <sys/list.h>
#include <sys/time.h>
@@ -74,9 +74,18 @@ fmd_serd_eng_alloc(const char *name, uint64_t n, hrtime_t t)
fmd_serd_eng_t *sgp;
sgp = malloc(sizeof (fmd_serd_eng_t));
bzero(sgp, sizeof (fmd_serd_eng_t));
if (sgp == NULL) {
perror("malloc");
exit(EXIT_FAILURE);
}
memset(sgp, 0, sizeof (fmd_serd_eng_t));
sgp->sg_name = strdup(name);
if (sgp->sg_name == NULL) {
perror("strdup");
exit(EXIT_FAILURE);
}
sgp->sg_flags = FMD_SERD_DIRTY;
sgp->sg_n = n;
sgp->sg_t = t;
@@ -123,6 +132,12 @@ fmd_serd_hash_create(fmd_serd_hash_t *shp)
shp->sh_hashlen = FMD_STR_BUCKETS;
shp->sh_hash = calloc(shp->sh_hashlen, sizeof (void *));
shp->sh_count = 0;
if (shp->sh_hash == NULL) {
perror("calloc");
exit(EXIT_FAILURE);
}
}
void
@@ -139,7 +154,7 @@ fmd_serd_hash_destroy(fmd_serd_hash_t *shp)
}
free(shp->sh_hash);
bzero(shp, sizeof (fmd_serd_hash_t));
memset(shp, 0, sizeof (fmd_serd_hash_t));
}
void
@@ -234,13 +249,17 @@ fmd_serd_eng_record(fmd_serd_eng_t *sgp, hrtime_t hrt)
if (sgp->sg_flags & FMD_SERD_FIRED) {
serd_log_msg(" SERD Engine: record %s already fired!",
sgp->sg_name);
return (FMD_B_FALSE);
return (B_FALSE);
}
while (sgp->sg_count >= sgp->sg_n)
fmd_serd_eng_discard(sgp, list_tail(&sgp->sg_list));
sep = malloc(sizeof (fmd_serd_elem_t));
if (sep == NULL) {
perror("malloc");
exit(EXIT_FAILURE);
}
sep->se_hrt = hrt;
list_insert_head(&sgp->sg_list, sep);
@@ -259,11 +278,11 @@ fmd_serd_eng_record(fmd_serd_eng_t *sgp, hrtime_t hrt)
fmd_event_delta(oep->se_hrt, sep->se_hrt) <= sgp->sg_t) {
sgp->sg_flags |= FMD_SERD_FIRED | FMD_SERD_DIRTY;
serd_log_msg(" SERD Engine: fired %s", sgp->sg_name);
return (FMD_B_TRUE);
return (B_TRUE);
}
sgp->sg_flags |= FMD_SERD_DIRTY;
return (FMD_B_FALSE);
return (B_FALSE);
}
int
@@ -291,8 +310,9 @@ fmd_serd_eng_reset(fmd_serd_eng_t *sgp)
}
void
fmd_serd_eng_gc(fmd_serd_eng_t *sgp)
fmd_serd_eng_gc(fmd_serd_eng_t *sgp, void *arg)
{
(void) arg;
fmd_serd_elem_t *sep, *nep;
hrtime_t hrt;
+2 -2
View File
@@ -7,7 +7,7 @@
* with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -77,7 +77,7 @@ extern int fmd_serd_eng_fired(fmd_serd_eng_t *);
extern int fmd_serd_eng_empty(fmd_serd_eng_t *);
extern void fmd_serd_eng_reset(fmd_serd_eng_t *);
extern void fmd_serd_eng_gc(fmd_serd_eng_t *);
extern void fmd_serd_eng_gc(fmd_serd_eng_t *, void *);
#ifdef __cplusplus
}
+54 -21
View File
@@ -13,6 +13,7 @@
/*
* Copyright (c) 2016, Intel Corporation.
* Copyright (c) 2018, loli10K <ezomori.nozomu@gmail.com>
* Copyright (c) 2021 Hewlett Packard Enterprise Development LP
*/
#include <libnvpair.h>
@@ -63,7 +64,7 @@ typedef enum device_type {
typedef struct guid_search {
uint64_t gs_pool_guid;
uint64_t gs_vdev_guid;
char *gs_devid;
const char *gs_devid;
device_type_t gs_vdev_type;
uint64_t gs_vdev_expandtime; /* vdev expansion time */
} guid_search_t;
@@ -76,9 +77,10 @@ static boolean_t
zfs_agent_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *arg)
{
guid_search_t *gsp = arg;
char *path = NULL;
const char *path = NULL;
uint_t c, children;
nvlist_t **child;
uint64_t vdev_guid;
/*
* First iterate over any children.
@@ -99,7 +101,7 @@ zfs_agent_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *arg)
&child, &children) == 0) {
for (c = 0; c < children; c++) {
if (zfs_agent_iter_vdev(zhp, child[c], gsp)) {
gsp->gs_vdev_type = DEVICE_TYPE_L2ARC;
gsp->gs_vdev_type = DEVICE_TYPE_SPARE;
return (B_TRUE);
}
}
@@ -108,7 +110,7 @@ zfs_agent_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *arg)
&child, &children) == 0) {
for (c = 0; c < children; c++) {
if (zfs_agent_iter_vdev(zhp, child[c], gsp)) {
gsp->gs_vdev_type = DEVICE_TYPE_SPARE;
gsp->gs_vdev_type = DEVICE_TYPE_L2ARC;
return (B_TRUE);
}
}
@@ -125,6 +127,21 @@ zfs_agent_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *arg)
&gsp->gs_vdev_expandtime);
return (B_TRUE);
}
/*
* Otherwise, on a vdev guid match, grab the devid and expansion
* time. The devid might be missing on removal since its not part
* of blkid cache and L2ARC VDEV does not contain pool guid in its
* blkid, so this is a special case for L2ARC VDEV.
*/
else if (gsp->gs_vdev_guid != 0 && gsp->gs_devid == NULL &&
nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_GUID, &vdev_guid) == 0 &&
gsp->gs_vdev_guid == vdev_guid) {
(void) nvlist_lookup_string(nvl, ZPOOL_CONFIG_DEVID,
&gsp->gs_devid);
(void) nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_EXPANSION_TIME,
&gsp->gs_vdev_expandtime);
return (B_TRUE);
}
return (B_FALSE);
}
@@ -147,13 +164,13 @@ zfs_agent_iter_pool(zpool_handle_t *zhp, void *arg)
/*
* if a match was found then grab the pool guid
*/
if (gsp->gs_vdev_guid) {
if (gsp->gs_vdev_guid && gsp->gs_devid) {
(void) nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_GUID,
&gsp->gs_pool_guid);
}
zpool_close(zhp);
return (gsp->gs_vdev_guid != 0);
return (gsp->gs_devid != NULL && gsp->gs_vdev_guid != 0);
}
void
@@ -177,10 +194,12 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
}
/*
* On ZFS on Linux, we don't get the expected FM_RESOURCE_REMOVED
* ereport from vdev_disk layer after a hot unplug. Fortunately we
* get a EC_DEV_REMOVE from our disk monitor and it is a suitable
* On Linux, we don't get the expected FM_RESOURCE_REMOVED ereport
* from the vdev_disk layer after a hot unplug. Fortunately we do
* get an EC_DEV_REMOVE from our disk monitor and it is a suitable
* proxy so we remap it here for the benefit of the diagnosis engine.
* Starting in OpenZFS 2.0, we do get FM_RESOURCE_REMOVED from the spa
* layer. Processing multiple FM_RESOURCE_REMOVED events is not harmful.
*/
if ((strcmp(class, EC_DEV_REMOVE) == 0) &&
(strcmp(subclass, ESC_DISK) == 0) &&
@@ -192,11 +211,13 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
uint64_t pool_guid = 0, vdev_guid = 0;
guid_search_t search = { 0 };
device_type_t devtype = DEVICE_TYPE_PRIMARY;
const char *devid = NULL;
class = "resource.fs.zfs.removed";
subclass = "";
(void) nvlist_add_string(payload, FM_CLASS, class);
(void) nvlist_lookup_string(nvl, DEV_IDENTIFIER, &devid);
(void) nvlist_lookup_uint64(nvl, ZFS_EV_POOL_GUID, &pool_guid);
(void) nvlist_lookup_uint64(nvl, ZFS_EV_VDEV_GUID, &vdev_guid);
@@ -206,15 +227,25 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
(void) nvlist_add_int64_array(payload, FM_EREPORT_TIME, tod, 2);
/*
* If devid is missing but vdev_guid is available, find devid
* and pool_guid from vdev_guid.
* For multipath, spare and l2arc devices ZFS_EV_VDEV_GUID or
* ZFS_EV_POOL_GUID may be missing so find them.
*/
(void) nvlist_lookup_string(nvl, DEV_IDENTIFIER,
&search.gs_devid);
(void) zpool_iter(g_zfs_hdl, zfs_agent_iter_pool, &search);
pool_guid = search.gs_pool_guid;
vdev_guid = search.gs_vdev_guid;
devtype = search.gs_vdev_type;
if (devid == NULL || pool_guid == 0 || vdev_guid == 0) {
if (devid == NULL)
search.gs_vdev_guid = vdev_guid;
else
search.gs_devid = devid;
zpool_iter(g_zfs_hdl, zfs_agent_iter_pool, &search);
if (devid == NULL)
devid = search.gs_devid;
if (pool_guid == 0)
pool_guid = search.gs_pool_guid;
if (vdev_guid == 0)
vdev_guid = search.gs_vdev_guid;
devtype = search.gs_vdev_type;
}
/*
* We want to avoid reporting "remove" events coming from
@@ -226,7 +257,9 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
search.gs_vdev_expandtime + 10 > tv.tv_sec) {
zed_log_msg(LOG_INFO, "agent post event: ignoring '%s' "
"for recently expanded device '%s'", EC_DEV_REMOVE,
search.gs_devid);
devid);
fnvlist_free(payload);
free(event);
goto out;
}
@@ -318,6 +351,8 @@ zfs_agent_dispatch(const char *class, const char *subclass, nvlist_t *nvl)
static void *
zfs_agent_consumer_thread(void *arg)
{
(void) arg;
for (;;) {
agent_event_t *event;
@@ -334,9 +369,7 @@ zfs_agent_consumer_thread(void *arg)
return (NULL);
}
if ((event = (list_head(&agent_events))) != NULL) {
list_remove(&agent_events, event);
if ((event = list_remove_head(&agent_events)) != NULL) {
(void) pthread_mutex_unlock(&agent_lock);
/* dispatch to all event subscribers */
@@ -383,6 +416,7 @@ zfs_agent_init(libzfs_handle_t *zfs_hdl)
list_destroy(&agent_events);
zed_log_die("Failed to initialize agents");
}
pthread_setname_np(g_agents_tid, "agents");
}
void
@@ -398,8 +432,7 @@ zfs_agent_fini(void)
(void) pthread_join(g_agents_tid, NULL);
/* drain any pending events */
while ((event = (list_head(&agent_events))) != NULL) {
list_remove(&agent_events, event);
while ((event = list_remove_head(&agent_events)) != NULL) {
nvlist_free(event->ae_nvl);
free(event);
}
+248 -58
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -23,11 +23,11 @@
* Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright 2015 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2016, Intel Corporation.
* Copyright (c) 2023, Klara Inc.
*/
#include <stddef.h>
#include <string.h>
#include <strings.h>
#include <libuutil.h>
#include <libzfs.h>
#include <sys/types.h>
@@ -35,14 +35,29 @@
#include <sys/fs/zfs.h>
#include <sys/fm/protocol.h>
#include <sys/fm/fs/zfs.h>
#include <sys/zio.h>
#include "zfs_agents.h"
#include "fmd_api.h"
/*
* Our serd engines are named 'zfs_<pool_guid>_<vdev_guid>_{checksum,io}'. This
* #define reserves enough space for two 64-bit hex values plus the length of
* the longest string.
* Default values for the serd engine when processing checksum or io errors. The
* semantics are N <events> in T <seconds>.
*/
#define DEFAULT_CHECKSUM_N 10 /* events */
#define DEFAULT_CHECKSUM_T 600 /* seconds */
#define DEFAULT_IO_N 10 /* events */
#define DEFAULT_IO_T 600 /* seconds */
#define DEFAULT_SLOW_IO_N 10 /* events */
#define DEFAULT_SLOW_IO_T 30 /* seconds */
#define CASE_GC_TIMEOUT_SECS 43200 /* 12 hours */
/*
* Our serd engines are named in the following format:
* 'zfs_<pool_guid>_<vdev_guid>_{checksum,io,slow_io}'
* This #define reserves enough space for two 64-bit hex values plus the
* length of the longest string.
*/
#define MAX_SERDLEN (16 * 2 + sizeof ("zfs___checksum"))
@@ -56,9 +71,11 @@ typedef struct zfs_case_data {
uint64_t zc_ena;
uint64_t zc_pool_guid;
uint64_t zc_vdev_guid;
uint64_t zc_parent_guid;
int zc_pool_state;
char zc_serd_checksum[MAX_SERDLEN];
char zc_serd_io[MAX_SERDLEN];
char zc_serd_slow_io[MAX_SERDLEN];
int zc_has_remove_timer;
} zfs_case_data_t;
@@ -105,7 +122,8 @@ zfs_de_stats_t zfs_stats = {
{ "resource_drops", FMD_TYPE_UINT64, "resource related ereports" }
};
static hrtime_t zfs_remove_timeout;
/* wait 15 seconds after a removal */
static hrtime_t zfs_remove_timeout = SEC2NSEC(15);
uu_list_pool_t *zfs_case_pool;
uu_list_t *zfs_cases;
@@ -115,11 +133,13 @@ uu_list_t *zfs_cases;
#define ZFS_MAKE_EREPORT(type) \
FM_EREPORT_CLASS "." ZFS_ERROR_CLASS "." type
static void zfs_purge_cases(fmd_hdl_t *hdl);
/*
* Write out the persistent representation of an active case.
*/
static void
zfs_case_serialize(fmd_hdl_t *hdl, zfs_case_t *zcp)
zfs_case_serialize(zfs_case_t *zcp)
{
zcp->zc_data.zc_version = CASE_DATA_VERSION_SERD;
}
@@ -161,6 +181,64 @@ zfs_case_unserialize(fmd_hdl_t *hdl, fmd_case_t *cp)
return (zcp);
}
/*
* Return count of other unique SERD cases under same vdev parent
*/
static uint_t
zfs_other_serd_cases(fmd_hdl_t *hdl, const zfs_case_data_t *zfs_case)
{
zfs_case_t *zcp;
uint_t cases = 0;
static hrtime_t next_check = 0;
/*
* Note that plumbing in some external GC would require adding locking,
* since most of this module code is not thread safe and assumes there
* is only one thread running against the module. So we perform GC here
* inline periodically so that future delay induced faults will be
* possible once the issue causing multiple vdev delays is resolved.
*/
if (gethrestime_sec() > next_check) {
/* Periodically purge old SERD entries and stale cases */
fmd_serd_gc(hdl);
zfs_purge_cases(hdl);
next_check = gethrestime_sec() + CASE_GC_TIMEOUT_SECS;
}
for (zcp = uu_list_first(zfs_cases); zcp != NULL;
zcp = uu_list_next(zfs_cases, zcp)) {
zfs_case_data_t *zcd = &zcp->zc_data;
/*
* must be same pool and parent vdev but different leaf vdev
*/
if (zcd->zc_pool_guid != zfs_case->zc_pool_guid ||
zcd->zc_parent_guid != zfs_case->zc_parent_guid ||
zcd->zc_vdev_guid == zfs_case->zc_vdev_guid) {
continue;
}
/*
* Check if there is another active serd case besides zfs_case
*
* Only one serd engine will be assigned to the case
*/
if (zcd->zc_serd_checksum[0] == zfs_case->zc_serd_checksum[0] &&
fmd_serd_active(hdl, zcd->zc_serd_checksum)) {
cases++;
}
if (zcd->zc_serd_io[0] == zfs_case->zc_serd_io[0] &&
fmd_serd_active(hdl, zcd->zc_serd_io)) {
cases++;
}
if (zcd->zc_serd_slow_io[0] == zfs_case->zc_serd_slow_io[0] &&
fmd_serd_active(hdl, zcd->zc_serd_slow_io)) {
cases++;
}
}
return (cases);
}
/*
* Iterate over any active cases. If any cases are associated with a pool or
* vdev which is no longer present on the system, close the associated case.
@@ -209,10 +287,10 @@ zfs_mark_vdev(uint64_t pool_guid, nvlist_t *vd, er_timeval_t *loaded)
}
}
/*ARGSUSED*/
static int
zfs_mark_pool(zpool_handle_t *zhp, void *unused)
{
(void) unused;
zfs_case_t *zcp;
uint64_t pool_guid;
uint64_t *tod;
@@ -367,14 +445,21 @@ zfs_serd_name(char *buf, uint64_t pool_guid, uint64_t vdev_guid,
(long long unsigned int)vdev_guid, type);
}
static void
zfs_case_retire(fmd_hdl_t *hdl, zfs_case_t *zcp)
{
fmd_hdl_debug(hdl, "retiring case");
fmd_case_close(hdl, zcp->zc_case);
}
/*
* Solve a given ZFS case. This first checks to make sure the diagnosis is
* still valid, as well as cleaning up any pending timer associated with the
* case.
*/
static void
zfs_case_solve(fmd_hdl_t *hdl, zfs_case_t *zcp, const char *faultname,
boolean_t checkunusable)
zfs_case_solve(fmd_hdl_t *hdl, zfs_case_t *zcp, const char *faultname)
{
nvlist_t *detector, *fault;
boolean_t serialize;
@@ -412,7 +497,7 @@ zfs_case_solve(fmd_hdl_t *hdl, zfs_case_t *zcp, const char *faultname,
serialize = B_TRUE;
}
if (serialize)
zfs_case_serialize(hdl, zcp);
zfs_case_serialize(zcp);
nvlist_free(detector);
}
@@ -424,10 +509,10 @@ timeval_earlier(er_timeval_t *a, er_timeval_t *b)
(a->ertv_sec == b->ertv_sec && a->ertv_nsec < b->ertv_nsec));
}
/*ARGSUSED*/
static void
zfs_ereport_when(fmd_hdl_t *hdl, nvlist_t *nvl, er_timeval_t *when)
{
(void) hdl;
int64_t *tod;
uint_t nelem;
@@ -440,22 +525,51 @@ zfs_ereport_when(fmd_hdl_t *hdl, nvlist_t *nvl, er_timeval_t *when)
}
}
/*
* Record the specified event in the SERD engine and return a
* boolean value indicating whether or not the engine fired as
* the result of inserting this event.
*
* When the pool has similar active cases on other vdevs, then
* the fired state is disregarded and the case is retired.
*/
static int
zfs_fm_serd_record(fmd_hdl_t *hdl, const char *name, fmd_event_t *ep,
zfs_case_t *zcp, const char *err_type)
{
int fired = fmd_serd_record(hdl, name, ep);
int peers = 0;
if (fired && (peers = zfs_other_serd_cases(hdl, &zcp->zc_data)) > 0) {
fmd_hdl_debug(hdl, "pool %llu is tracking %d other %s cases "
"-- skip faulting the vdev %llu",
(u_longlong_t)zcp->zc_data.zc_pool_guid,
peers, err_type,
(u_longlong_t)zcp->zc_data.zc_vdev_guid);
zfs_case_retire(hdl, zcp);
fired = 0;
}
return (fired);
}
/*
* Main fmd entry point.
*/
/*ARGSUSED*/
static void
zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
{
zfs_case_t *zcp, *dcp;
int32_t pool_state;
uint64_t ena, pool_guid, vdev_guid;
uint64_t ena, pool_guid, vdev_guid, parent_guid;
uint64_t checksum_n, checksum_t;
uint64_t io_n, io_t;
er_timeval_t pool_load;
er_timeval_t er_when;
nvlist_t *detector;
boolean_t pool_found = B_FALSE;
boolean_t isresource;
char *type;
const char *type;
/*
* We subscribe to notifications for vdev or pool removal. In these
@@ -537,6 +651,9 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_GUID, &vdev_guid) != 0)
vdev_guid = 0;
if (nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_PARENT_GUID, &parent_guid) != 0)
parent_guid = 0;
if (nvlist_lookup_uint64(nvl, FM_EREPORT_ENA, &ena) != 0)
ena = 0;
@@ -623,9 +740,7 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (strcmp(class,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_DATA)) == 0 ||
strcmp(class,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_CONFIG_CACHE_WRITE)) == 0 ||
strcmp(class,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_DELAY)) == 0) {
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_CONFIG_CACHE_WRITE)) == 0) {
zfs_stats.resource_drops.fmds_value.ui64++;
return;
}
@@ -649,6 +764,7 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
data.zc_ena = ena;
data.zc_pool_guid = pool_guid;
data.zc_vdev_guid = vdev_guid;
data.zc_parent_guid = parent_guid;
data.zc_pool_state = (int)pool_state;
fmd_buf_write(hdl, cs, CASE_DATA, &data, sizeof (data));
@@ -686,13 +802,16 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (zcp->zc_data.zc_has_remove_timer) {
fmd_timer_remove(hdl, zcp->zc_remove_timer);
zcp->zc_data.zc_has_remove_timer = 0;
zfs_case_serialize(hdl, zcp);
zfs_case_serialize(zcp);
}
if (zcp->zc_data.zc_serd_io[0] != '\0')
fmd_serd_reset(hdl, zcp->zc_data.zc_serd_io);
if (zcp->zc_data.zc_serd_checksum[0] != '\0')
fmd_serd_reset(hdl,
zcp->zc_data.zc_serd_checksum);
if (zcp->zc_data.zc_serd_slow_io[0] != '\0')
fmd_serd_reset(hdl,
zcp->zc_data.zc_serd_slow_io);
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_RSRC(FM_RESOURCE_STATECHANGE))) {
uint64_t state = 0;
@@ -721,7 +840,11 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (fmd_case_solved(hdl, zcp->zc_case))
return;
fmd_hdl_debug(hdl, "error event '%s'", class);
if (vdev_guid)
fmd_hdl_debug(hdl, "error event '%s', vdev %llu", class,
vdev_guid);
else
fmd_hdl_debug(hdl, "error event '%s'", class);
/*
* Determine if we should solve the case and generate a fault. We solve
@@ -751,18 +874,18 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
fmd_case_close(hdl, dcp->zc_case);
}
zfs_case_solve(hdl, zcp, "fault.fs.zfs.pool", B_TRUE);
zfs_case_solve(hdl, zcp, "fault.fs.zfs.pool");
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_LOG_REPLAY))) {
/*
* Pool level fault for reading the intent logs.
*/
zfs_case_solve(hdl, zcp, "fault.fs.zfs.log_replay", B_TRUE);
zfs_case_solve(hdl, zcp, "fault.fs.zfs.log_replay");
} else if (fmd_nvl_class_match(hdl, nvl, "ereport.fs.zfs.vdev.*")) {
/*
* Device fault.
*/
zfs_case_solve(hdl, zcp, "fault.fs.zfs.device", B_TRUE);
zfs_case_solve(hdl, zcp, "fault.fs.zfs.device");
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_IO)) ||
fmd_nvl_class_match(hdl, nvl,
@@ -770,9 +893,12 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_IO_FAILURE)) ||
fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_DELAY)) ||
fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_PROBE_FAILURE))) {
char *failmode = NULL;
const char *failmode = NULL;
boolean_t checkremove = B_FALSE;
uint32_t pri = 0;
/*
* If this is a checksum or I/O error, then toss it into the
@@ -784,30 +910,111 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_IO))) {
if (zcp->zc_data.zc_serd_io[0] == '\0') {
if (nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_IO_N,
&io_n) != 0) {
io_n = DEFAULT_IO_N;
}
if (nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_IO_T,
&io_t) != 0) {
io_t = DEFAULT_IO_T;
}
zfs_serd_name(zcp->zc_data.zc_serd_io,
pool_guid, vdev_guid, "io");
fmd_serd_create(hdl, zcp->zc_data.zc_serd_io,
fmd_prop_get_int32(hdl, "io_N"),
fmd_prop_get_int64(hdl, "io_T"));
zfs_case_serialize(hdl, zcp);
io_n,
SEC2NSEC(io_t));
zfs_case_serialize(zcp);
}
if (fmd_serd_record(hdl, zcp->zc_data.zc_serd_io, ep))
if (zfs_fm_serd_record(hdl, zcp->zc_data.zc_serd_io,
ep, zcp, "io error")) {
checkremove = B_TRUE;
}
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_DELAY))) {
uint64_t slow_io_n, slow_io_t;
/*
* Create a slow io SERD engine when the VDEV has the
* 'vdev_slow_io_n' and 'vdev_slow_io_n' properties.
*/
if (zcp->zc_data.zc_serd_slow_io[0] == '\0' &&
nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_SLOW_IO_N,
&slow_io_n) == 0 &&
nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_SLOW_IO_T,
&slow_io_t) == 0) {
zfs_serd_name(zcp->zc_data.zc_serd_slow_io,
pool_guid, vdev_guid, "slow_io");
fmd_serd_create(hdl,
zcp->zc_data.zc_serd_slow_io,
slow_io_n,
SEC2NSEC(slow_io_t));
zfs_case_serialize(zcp);
}
/* Pass event to SERD engine and see if this triggers */
if (zcp->zc_data.zc_serd_slow_io[0] != '\0' &&
zfs_fm_serd_record(hdl,
zcp->zc_data.zc_serd_slow_io, ep, zcp, "slow io")) {
zfs_case_solve(hdl, zcp,
"fault.fs.zfs.vdev.slow_io");
}
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_CHECKSUM))) {
uint64_t flags = 0;
int32_t flags32 = 0;
/*
* We ignore ereports for checksum errors generated by
* scrub/resilver I/O to avoid potentially further
* degrading the pool while it's being repaired.
*
* Note that FM_EREPORT_PAYLOAD_ZFS_ZIO_FLAGS used to
* be int32. To allow newer zed to work on older
* kernels, if we don't find the flags, we look for
* the older ones too.
*/
if (((nvlist_lookup_uint32(nvl,
FM_EREPORT_PAYLOAD_ZFS_ZIO_PRIORITY, &pri) == 0) &&
(pri == ZIO_PRIORITY_SCRUB ||
pri == ZIO_PRIORITY_REBUILD)) ||
((nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_ZIO_FLAGS, &flags) == 0) &&
(flags & (ZIO_FLAG_SCRUB | ZIO_FLAG_RESILVER))) ||
((nvlist_lookup_int32(nvl,
FM_EREPORT_PAYLOAD_ZFS_ZIO_FLAGS, &flags32) == 0) &&
(flags32 & (ZIO_FLAG_SCRUB | ZIO_FLAG_RESILVER)))) {
fmd_hdl_debug(hdl, "ignoring '%s' for "
"scrub/resilver I/O", class);
return;
}
if (zcp->zc_data.zc_serd_checksum[0] == '\0') {
if (nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_CKSUM_N,
&checksum_n) != 0) {
checksum_n = DEFAULT_CHECKSUM_N;
}
if (nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_CKSUM_T,
&checksum_t) != 0) {
checksum_t = DEFAULT_CHECKSUM_T;
}
zfs_serd_name(zcp->zc_data.zc_serd_checksum,
pool_guid, vdev_guid, "checksum");
fmd_serd_create(hdl,
zcp->zc_data.zc_serd_checksum,
fmd_prop_get_int32(hdl, "checksum_N"),
fmd_prop_get_int64(hdl, "checksum_T"));
zfs_case_serialize(hdl, zcp);
checksum_n,
SEC2NSEC(checksum_t));
zfs_case_serialize(zcp);
}
if (fmd_serd_record(hdl,
zcp->zc_data.zc_serd_checksum, ep)) {
if (zfs_fm_serd_record(hdl,
zcp->zc_data.zc_serd_checksum, ep, zcp,
"checksum")) {
zfs_case_solve(hdl, zcp,
"fault.fs.zfs.vdev.checksum", B_FALSE);
"fault.fs.zfs.vdev.checksum");
}
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_IO_FAILURE)) &&
@@ -817,12 +1024,11 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (strncmp(failmode, FM_EREPORT_FAILMODE_CONTINUE,
strlen(FM_EREPORT_FAILMODE_CONTINUE)) == 0) {
zfs_case_solve(hdl, zcp,
"fault.fs.zfs.io_failure_continue",
B_FALSE);
"fault.fs.zfs.io_failure_continue");
} else if (strncmp(failmode, FM_EREPORT_FAILMODE_WAIT,
strlen(FM_EREPORT_FAILMODE_WAIT)) == 0) {
zfs_case_solve(hdl, zcp,
"fault.fs.zfs.io_failure_wait", B_FALSE);
"fault.fs.zfs.io_failure_wait");
}
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_PROBE_FAILURE))) {
@@ -844,7 +1050,7 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
zfs_remove_timeout);
if (!zcp->zc_data.zc_has_remove_timer) {
zcp->zc_data.zc_has_remove_timer = 1;
zfs_case_serialize(hdl, zcp);
zfs_case_serialize(zcp);
}
}
}
@@ -854,14 +1060,13 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
* The timeout is fired when we diagnosed an I/O error, and it was not due to
* device removal (which would cause the timeout to be cancelled).
*/
/* ARGSUSED */
static void
zfs_fm_timeout(fmd_hdl_t *hdl, id_t id, void *data)
{
zfs_case_t *zcp = data;
if (id == zcp->zc_remove_timer)
zfs_case_solve(hdl, zcp, "fault.fs.zfs.vdev.io", B_FALSE);
zfs_case_solve(hdl, zcp, "fault.fs.zfs.vdev.io");
}
/*
@@ -877,6 +1082,8 @@ zfs_fm_close(fmd_hdl_t *hdl, fmd_case_t *cs)
fmd_serd_destroy(hdl, zcp->zc_data.zc_serd_checksum);
if (zcp->zc_data.zc_serd_io[0] != '\0')
fmd_serd_destroy(hdl, zcp->zc_data.zc_serd_io);
if (zcp->zc_data.zc_serd_slow_io[0] != '\0')
fmd_serd_destroy(hdl, zcp->zc_data.zc_serd_slow_io);
if (zcp->zc_data.zc_has_remove_timer)
fmd_timer_remove(hdl, zcp->zc_remove_timer);
@@ -885,30 +1092,15 @@ zfs_fm_close(fmd_hdl_t *hdl, fmd_case_t *cs)
fmd_hdl_free(hdl, zcp, sizeof (zfs_case_t));
}
/*
* We use the fmd gc entry point to look for old cases that no longer apply.
* This allows us to keep our set of case data small in a long running system.
*/
static void
zfs_fm_gc(fmd_hdl_t *hdl)
{
zfs_purge_cases(hdl);
}
static const fmd_hdl_ops_t fmd_ops = {
zfs_fm_recv, /* fmdo_recv */
zfs_fm_timeout, /* fmdo_timeout */
zfs_fm_close, /* fmdo_close */
NULL, /* fmdo_stats */
zfs_fm_gc, /* fmdo_gc */
NULL, /* fmdo_gc */
};
static const fmd_prop_t fmd_props[] = {
{ "checksum_N", FMD_TYPE_UINT32, "10" },
{ "checksum_T", FMD_TYPE_TIME, "10min" },
{ "io_N", FMD_TYPE_UINT32, "10" },
{ "io_T", FMD_TYPE_TIME, "10min" },
{ "remove_timeout", FMD_TYPE_TIME, "15sec" },
{ NULL, 0, NULL }
};
@@ -949,8 +1141,6 @@ _zfs_diagnosis_init(fmd_hdl_t *hdl)
(void) fmd_stat_create(hdl, FMD_STAT_NOALLOC, sizeof (zfs_stats) /
sizeof (fmd_stat_t), (fmd_stat_t *)&zfs_stats);
zfs_remove_timeout = fmd_prop_get_int64(hdl, "remove_timeout");
}
void
+509 -95
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -24,6 +24,7 @@
* Copyright 2014 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2016, 2017, Intel Corporation.
* Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
* Copyright (c) 2023, Klara Inc.
*/
/*
@@ -63,9 +64,7 @@
* If the device could not be replaced, then the second online attempt will
* trigger the FMA fault that we skipped earlier.
*
* ZFS on Linux porting notes:
* Linux udev provides a disk insert for both the disk and the partition
*
* On Linux udev provides a disk insert for both the disk and the partition.
*/
#include <ctype.h>
@@ -135,6 +134,11 @@ zfs_unavail_pool(zpool_handle_t *zhp, void *data)
if (zfs_toplevel_state(zhp) < VDEV_STATE_DEGRADED) {
unavailpool_t *uap;
uap = malloc(sizeof (unavailpool_t));
if (uap == NULL) {
perror("malloc");
exit(EXIT_FAILURE);
}
uap->uap_zhp = zhp;
list_insert_tail((list_t *)data, uap);
} else {
@@ -143,6 +147,17 @@ zfs_unavail_pool(zpool_handle_t *zhp, void *data)
return (0);
}
/*
* Write an array of strings to the zed log
*/
static void lines_to_zed_log_msg(char **lines, int lines_cnt)
{
int i;
for (i = 0; i < lines_cnt; i++) {
zed_log_msg(LOG_INFO, "%s", lines[i]);
}
}
/*
* Two stage replace on Linux
* since we get disk notifications
@@ -180,22 +195,31 @@ zfs_unavail_pool(zpool_handle_t *zhp, void *data)
static void
zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
{
char *path;
const char *path;
vdev_state_t newstate;
nvlist_t *nvroot, *newvd;
pendingdev_t *device;
uint64_t wholedisk = 0ULL;
uint64_t offline = 0ULL;
uint64_t offline = 0ULL, faulted = 0ULL;
uint64_t guid = 0ULL;
char *physpath = NULL, *new_devid = NULL, *enc_sysfs_path = NULL;
uint64_t is_spare = 0;
const char *physpath = NULL, *new_devid = NULL, *enc_sysfs_path = NULL;
char rawpath[PATH_MAX], fullpath[PATH_MAX];
char devpath[PATH_MAX];
char pathbuf[PATH_MAX];
int ret;
boolean_t is_dm = B_FALSE;
int online_flag = ZFS_ONLINE_CHECKREMOVE | ZFS_ONLINE_UNSPARE;
boolean_t is_sd = B_FALSE;
boolean_t is_mpath_wholedisk = B_FALSE;
uint_t c;
vdev_stat_t *vs;
char **lines = NULL;
int lines_cnt = 0;
/*
* Get the persistent path, typically under the '/dev/disk/by-id' or
* '/dev/disk/by-vdev' directories. Note that this path can change
* when a vdev is replaced with a new disk.
*/
if (nvlist_lookup_string(vdev, ZPOOL_CONFIG_PATH, &path) != 0)
return;
@@ -209,19 +233,82 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
}
(void) nvlist_lookup_string(vdev, ZPOOL_CONFIG_PHYS_PATH, &physpath);
update_vdev_config_dev_sysfs_path(vdev, path,
ZPOOL_CONFIG_VDEV_ENC_SYSFS_PATH);
(void) nvlist_lookup_string(vdev, ZPOOL_CONFIG_VDEV_ENC_SYSFS_PATH,
&enc_sysfs_path);
(void) nvlist_lookup_uint64(vdev, ZPOOL_CONFIG_WHOLE_DISK, &wholedisk);
(void) nvlist_lookup_uint64(vdev, ZPOOL_CONFIG_OFFLINE, &offline);
(void) nvlist_lookup_uint64(vdev, ZPOOL_CONFIG_FAULTED, &faulted);
(void) nvlist_lookup_uint64(vdev, ZPOOL_CONFIG_GUID, &guid);
(void) nvlist_lookup_uint64(vdev, ZPOOL_CONFIG_IS_SPARE, &is_spare);
if (offline)
return; /* don't intervene if it was taken offline */
/*
* Special case:
*
* We've seen times where a disk won't have a ZPOOL_CONFIG_PHYS_PATH
* entry in their config. For example, on this force-faulted disk:
*
* children[0]:
* type: 'disk'
* id: 0
* guid: 14309659774640089719
* path: '/dev/disk/by-vdev/L28'
* whole_disk: 0
* DTL: 654
* create_txg: 4
* com.delphix:vdev_zap_leaf: 1161
* faulted: 1
* aux_state: 'external'
* children[1]:
* type: 'disk'
* id: 1
* guid: 16002508084177980912
* path: '/dev/disk/by-vdev/L29'
* devid: 'dm-uuid-mpath-35000c500a61d68a3'
* phys_path: 'L29'
* vdev_enc_sysfs_path: '/sys/class/enclosure/0:0:1:0/SLOT 30 32'
* whole_disk: 0
* DTL: 1028
* create_txg: 4
* com.delphix:vdev_zap_leaf: 131
*
* If the disk's path is a /dev/disk/by-vdev/ path, then we can infer
* the ZPOOL_CONFIG_PHYS_PATH from the by-vdev disk name.
*/
if (physpath == NULL && path != NULL) {
/* If path begins with "/dev/disk/by-vdev/" ... */
if (strncmp(path, DEV_BYVDEV_PATH,
strlen(DEV_BYVDEV_PATH)) == 0) {
/* Set physpath to the char after "/dev/disk/by-vdev" */
physpath = &path[strlen(DEV_BYVDEV_PATH)];
}
}
is_dm = zfs_dev_is_dm(path);
/*
* We don't want to autoreplace offlined disks. However, we do want to
* replace force-faulted disks (`zpool offline -f`). Force-faulted
* disks have both offline=1 and faulted=1 in the nvlist.
*/
if (offline && !faulted) {
zed_log_msg(LOG_INFO, "%s: %s is offline, skip autoreplace",
__func__, path);
return;
}
is_mpath_wholedisk = is_mpath_whole_disk(path);
zed_log_msg(LOG_INFO, "zfs_process_add: pool '%s' vdev '%s', phys '%s'"
" wholedisk %d, %s dm (guid %llu)", zpool_get_name(zhp), path,
physpath ? physpath : "NULL", wholedisk, is_dm ? "is" : "not",
" %s blank disk, %s mpath blank disk, %s labeled, enc sysfs '%s', "
"(guid %llu)",
zpool_get_name(zhp), path,
physpath ? physpath : "NULL",
wholedisk ? "is" : "not",
is_mpath_wholedisk? "is" : "not",
labeled ? "is" : "not",
enc_sysfs_path,
(long long unsigned int)guid);
/*
@@ -248,15 +335,18 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
}
}
if (is_spare)
online_flag |= ZFS_ONLINE_SPARE;
/*
* Attempt to online the device.
*/
if (zpool_vdev_online(zhp, fullpath,
ZFS_ONLINE_CHECKREMOVE | ZFS_ONLINE_UNSPARE, &newstate) == 0 &&
if (zpool_vdev_online(zhp, fullpath, online_flag, &newstate) == 0 &&
(newstate == VDEV_STATE_HEALTHY ||
newstate == VDEV_STATE_DEGRADED)) {
zed_log_msg(LOG_INFO, " zpool_vdev_online: vdev %s is %s",
fullpath, (newstate == VDEV_STATE_HEALTHY) ?
zed_log_msg(LOG_INFO,
" zpool_vdev_online: vdev '%s' ('%s') is "
"%s", fullpath, physpath, (newstate == VDEV_STATE_HEALTHY) ?
"HEALTHY" : "DEGRADED");
return;
}
@@ -273,11 +363,12 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
* vdev online to trigger a FMA fault by posting an ereport.
*/
if (!zpool_get_prop_int(zhp, ZPOOL_PROP_AUTOREPLACE, NULL) ||
!(wholedisk || is_dm) || (physpath == NULL)) {
!(wholedisk || is_mpath_wholedisk) || (physpath == NULL)) {
(void) zpool_vdev_online(zhp, fullpath, ZFS_ONLINE_FORCEFAULT,
&newstate);
zed_log_msg(LOG_INFO, "Pool's autoreplace is not enabled or "
"not a whole disk for '%s'", fullpath);
"not a blank disk for '%s' ('%s')", fullpath,
physpath);
return;
}
@@ -289,29 +380,50 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
(void) snprintf(rawpath, sizeof (rawpath), "%s%s",
is_sd ? DEV_BYVDEV_PATH : DEV_BYPATH_PATH, physpath);
if (realpath(rawpath, devpath) == NULL && !is_dm) {
if (realpath(rawpath, pathbuf) == NULL && !is_mpath_wholedisk) {
zed_log_msg(LOG_INFO, " realpath: %s failed (%s)",
rawpath, strerror(errno));
(void) zpool_vdev_online(zhp, fullpath, ZFS_ONLINE_FORCEFAULT,
&newstate);
int err = zpool_vdev_online(zhp, fullpath,
ZFS_ONLINE_FORCEFAULT, &newstate);
zed_log_msg(LOG_INFO, " zpool_vdev_online: %s FORCEFAULT (%s)",
fullpath, libzfs_error_description(g_zfshdl));
zed_log_msg(LOG_INFO, " zpool_vdev_online: %s FORCEFAULT (%s) "
"err %d, new state %d",
fullpath, libzfs_error_description(g_zfshdl), err,
err ? (int)newstate : 0);
return;
}
/* Only autoreplace bad disks */
if ((vs->vs_state != VDEV_STATE_DEGRADED) &&
(vs->vs_state != VDEV_STATE_FAULTED) &&
(vs->vs_state != VDEV_STATE_REMOVED) &&
(vs->vs_state != VDEV_STATE_CANT_OPEN)) {
zed_log_msg(LOG_INFO, " not autoreplacing since disk isn't in "
"a bad state (currently %llu)", vs->vs_state);
return;
}
nvlist_lookup_string(vdev, "new_devid", &new_devid);
if (is_dm) {
if (is_mpath_wholedisk) {
/* Don't label device mapper or multipath disks. */
zed_log_msg(LOG_INFO,
" it's a multipath wholedisk, don't label");
if (zpool_prepare_disk(zhp, vdev, "autoreplace", &lines,
&lines_cnt) != 0) {
zed_log_msg(LOG_INFO,
" zpool_prepare_disk: could not "
"prepare '%s' (%s)", fullpath,
libzfs_error_description(g_zfshdl));
if (lines_cnt > 0) {
zed_log_msg(LOG_INFO,
" zfs_prepare_disk output:");
lines_to_zed_log_msg(lines, lines_cnt);
}
libzfs_free_str_array(lines, lines_cnt);
return;
}
} else if (!labeled) {
/*
* we're auto-replacing a raw disk, so label it first
@@ -328,16 +440,24 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
* to trigger a ZFS fault for the device (and any hot spare
* replacement).
*/
leafname = strrchr(devpath, '/') + 1;
leafname = strrchr(pathbuf, '/') + 1;
/*
* If this is a request to label a whole disk, then attempt to
* write out the label.
*/
if (zpool_label_disk(g_zfshdl, zhp, leafname) != 0) {
zed_log_msg(LOG_INFO, " zpool_label_disk: could not "
if (zpool_prepare_and_label_disk(g_zfshdl, zhp, leafname,
vdev, "autoreplace", &lines, &lines_cnt) != 0) {
zed_log_msg(LOG_WARNING,
" zpool_prepare_and_label_disk: could not "
"label '%s' (%s)", leafname,
libzfs_error_description(g_zfshdl));
if (lines_cnt > 0) {
zed_log_msg(LOG_INFO,
" zfs_prepare_disk output:");
lines_to_zed_log_msg(lines, lines_cnt);
}
libzfs_free_str_array(lines, lines_cnt);
(void) zpool_vdev_online(zhp, fullpath,
ZFS_ONLINE_FORCEFAULT, &newstate);
@@ -351,11 +471,16 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
* completed.
*/
device = malloc(sizeof (pendingdev_t));
if (device == NULL) {
perror("malloc");
exit(EXIT_FAILURE);
}
(void) strlcpy(device->pd_physpath, physpath,
sizeof (device->pd_physpath));
list_insert_tail(&g_device_list, device);
zed_log_msg(LOG_INFO, " zpool_label_disk: async '%s' (%llu)",
zed_log_msg(LOG_NOTICE, " zpool_label_disk: async '%s' (%llu)",
leafname, (u_longlong_t)guid);
return; /* resumes at EC_DEV_ADD.ESC_DISK for partition */
@@ -378,8 +503,8 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
}
if (!found) {
/* unexpected partition slice encountered */
zed_log_msg(LOG_INFO, "labeled disk %s unexpected here",
fullpath);
zed_log_msg(LOG_WARNING, "labeled disk %s was "
"unexpected here", fullpath);
(void) zpool_vdev_online(zhp, fullpath,
ZFS_ONLINE_FORCEFAULT, &newstate);
return;
@@ -388,10 +513,21 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
zed_log_msg(LOG_INFO, " zpool_label_disk: resume '%s' (%llu)",
physpath, (u_longlong_t)guid);
(void) snprintf(devpath, sizeof (devpath), "%s%s",
DEV_BYID_PATH, new_devid);
/*
* Paths that begin with '/dev/disk/by-id/' will change and so
* they must be updated before calling zpool_vdev_attach().
*/
if (strncmp(path, DEV_BYID_PATH, strlen(DEV_BYID_PATH)) == 0) {
(void) snprintf(pathbuf, sizeof (pathbuf), "%s%s",
DEV_BYID_PATH, new_devid);
zed_log_msg(LOG_INFO, " zpool_label_disk: path '%s' "
"replaced by '%s'", path, pathbuf);
path = pathbuf;
}
}
libzfs_free_str_array(lines, lines_cnt);
/*
* Construct the root vdev to pass to zpool_vdev_attach(). While adding
* the entire vdev structure is harmless, we construct a reduced set of
@@ -416,8 +552,8 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
ZPOOL_CONFIG_VDEV_ENC_SYSFS_PATH, enc_sysfs_path) != 0) ||
nvlist_add_uint64(newvd, ZPOOL_CONFIG_WHOLE_DISK, wholedisk) != 0 ||
nvlist_add_string(nvroot, ZPOOL_CONFIG_TYPE, VDEV_TYPE_ROOT) != 0 ||
nvlist_add_nvlist_array(nvroot, ZPOOL_CONFIG_CHILDREN, &newvd,
1) != 0) {
nvlist_add_nvlist_array(nvroot, ZPOOL_CONFIG_CHILDREN,
(const nvlist_t **)&newvd, 1) != 0) {
zed_log_msg(LOG_WARNING, "zfs_mod: unable to add nvlist pairs");
nvlist_free(newvd);
nvlist_free(nvroot);
@@ -430,16 +566,26 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
* Wait for udev to verify the links exist, then auto-replace
* the leaf disk at same physical location.
*/
if (zpool_label_disk_wait(path, 3000) != 0) {
zed_log_msg(LOG_WARNING, "zfs_mod: expected replacement "
"disk %s is missing", path);
if (zpool_label_disk_wait(path, DISK_LABEL_WAIT) != 0) {
zed_log_msg(LOG_WARNING, "zfs_mod: pool '%s', after labeling "
"replacement disk, the expected disk partition link '%s' "
"is missing after waiting %u ms",
zpool_get_name(zhp), path, DISK_LABEL_WAIT);
nvlist_free(nvroot);
return;
}
ret = zpool_vdev_attach(zhp, fullpath, path, nvroot, B_TRUE, B_FALSE);
/*
* Prefer sequential resilvering when supported (mirrors and dRAID),
* otherwise fallback to a traditional healing resilver.
*/
ret = zpool_vdev_attach(zhp, fullpath, path, nvroot, B_TRUE, B_TRUE);
if (ret != 0) {
ret = zpool_vdev_attach(zhp, fullpath, path, nvroot,
B_TRUE, B_FALSE);
}
zed_log_msg(LOG_INFO, " zpool_vdev_replace: %s with %s (%s)",
zed_log_msg(LOG_WARNING, " zpool_vdev_replace: %s with %s (%s)",
fullpath, path, (ret == 0) ? "no errors" :
libzfs_error_description(g_zfshdl));
@@ -457,16 +603,20 @@ typedef struct dev_data {
boolean_t dd_islabeled;
uint64_t dd_pool_guid;
uint64_t dd_vdev_guid;
uint64_t dd_new_vdev_guid;
const char *dd_new_devid;
uint64_t dd_num_spares;
} dev_data_t;
static void
zfs_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *data)
{
dev_data_t *dp = data;
char *path = NULL;
const char *path = NULL;
uint_t c, children;
nvlist_t **child;
uint64_t guid = 0;
uint64_t isspare = 0;
/*
* First iterate over any children.
@@ -492,19 +642,16 @@ zfs_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *data)
}
/* once a vdev was matched and processed there is nothing left to do */
if (dp->dd_found)
if (dp->dd_found && dp->dd_num_spares == 0)
return;
(void) nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_GUID, &guid);
/*
* Match by GUID if available otherwise fallback to devid or physical
*/
if (dp->dd_vdev_guid != 0) {
uint64_t guid;
if (nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_GUID,
&guid) != 0 || guid != dp->dd_vdev_guid) {
if (guid != dp->dd_vdev_guid)
return;
}
zed_log_msg(LOG_INFO, " zfs_iter_vdev: matched on %llu", guid);
dp->dd_found = B_TRUE;
@@ -514,22 +661,39 @@ zfs_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *data)
* illumos, substring matching is not required to accommodate
* the partition suffix. An exact match will be present in
* the dp->dd_compare value.
* If the attached disk already contains a vdev GUID, it means
* the disk is not clean. In such a scenario, the physical path
* would be a match that makes the disk faulted when trying to
* online it. So, we would only want to proceed if either GUID
* matches with the last attached disk or the disk is in clean
* state.
*/
if (nvlist_lookup_string(nvl, dp->dd_prop, &path) != 0 ||
strcmp(dp->dd_compare, path) != 0)
strcmp(dp->dd_compare, path) != 0) {
return;
}
if (dp->dd_new_vdev_guid != 0 && dp->dd_new_vdev_guid != guid) {
zed_log_msg(LOG_INFO, " %s: no match (GUID:%llu"
" != vdev GUID:%llu)", __func__,
dp->dd_new_vdev_guid, guid);
return;
}
zed_log_msg(LOG_INFO, " zfs_iter_vdev: matched %s on %s",
dp->dd_prop, path);
dp->dd_found = B_TRUE;
/* pass the new devid for use by replacing code */
/* pass the new devid for use by auto-replacing code */
if (dp->dd_new_devid != NULL) {
(void) nvlist_add_string(nvl, "new_devid",
dp->dd_new_devid);
}
}
if (dp->dd_found == B_TRUE && nvlist_lookup_uint64(nvl,
ZPOOL_CONFIG_IS_SPARE, &isspare) == 0 && isspare)
dp->dd_num_spares++;
(dp->dd_func)(zhp, nvl, dp->dd_islabeled);
}
@@ -538,7 +702,7 @@ zfs_enable_ds(void *arg)
{
unavailpool_t *pool = (unavailpool_t *)arg;
(void) zpool_enable_datasets(pool->uap_zhp, NULL, 0);
(void) zpool_enable_datasets(pool->uap_zhp, NULL, 0, 512);
zpool_close(pool->uap_zhp);
free(pool);
}
@@ -565,6 +729,8 @@ zfs_iter_pool(zpool_handle_t *zhp, void *data)
ZPOOL_CONFIG_VDEV_TREE, &nvl);
zfs_iter_vdev(zhp, nvl, data);
}
} else {
zed_log_msg(LOG_INFO, "%s: no config\n", __func__);
}
/*
@@ -588,7 +754,9 @@ zfs_iter_pool(zpool_handle_t *zhp, void *data)
}
zpool_close(zhp);
return (dp->dd_found); /* cease iteration after a match */
/* cease iteration after a match */
return (dp->dd_found && dp->dd_num_spares == 0);
}
/*
@@ -597,7 +765,7 @@ zfs_iter_pool(zpool_handle_t *zhp, void *data)
*/
static boolean_t
devphys_iter(const char *physical, const char *devid, zfs_process_func_t func,
boolean_t is_slice)
boolean_t is_slice, uint64_t new_vdev_guid)
{
dev_data_t data = { 0 };
@@ -607,6 +775,73 @@ devphys_iter(const char *physical, const char *devid, zfs_process_func_t func,
data.dd_found = B_FALSE;
data.dd_islabeled = is_slice;
data.dd_new_devid = devid; /* used by auto replace code */
data.dd_new_vdev_guid = new_vdev_guid;
(void) zpool_iter(g_zfshdl, zfs_iter_pool, &data);
return (data.dd_found);
}
/*
* Given a device identifier, find any vdevs with a matching by-vdev
* path. Normally we shouldn't need this as the comparison would be
* made earlier in the devphys_iter(). For example, if we were replacing
* /dev/disk/by-vdev/L28, normally devphys_iter() would match the
* ZPOOL_CONFIG_PHYS_PATH of "L28" from the old disk config to "L28"
* of the new disk config. However, we've seen cases where
* ZPOOL_CONFIG_PHYS_PATH was not in the config for the old disk. Here's
* an example of a real 2-disk mirror pool where one disk was force
* faulted:
*
* com.delphix:vdev_zap_top: 129
* children[0]:
* type: 'disk'
* id: 0
* guid: 14309659774640089719
* path: '/dev/disk/by-vdev/L28'
* whole_disk: 0
* DTL: 654
* create_txg: 4
* com.delphix:vdev_zap_leaf: 1161
* faulted: 1
* aux_state: 'external'
* children[1]:
* type: 'disk'
* id: 1
* guid: 16002508084177980912
* path: '/dev/disk/by-vdev/L29'
* devid: 'dm-uuid-mpath-35000c500a61d68a3'
* phys_path: 'L29'
* vdev_enc_sysfs_path: '/sys/class/enclosure/0:0:1:0/SLOT 30 32'
* whole_disk: 0
* DTL: 1028
* create_txg: 4
* com.delphix:vdev_zap_leaf: 131
*
* So in the case above, the only thing we could compare is the path.
*
* We can do this because we assume by-vdev paths are authoritative as physical
* paths. We could not assume this for normal paths like /dev/sda since the
* physical location /dev/sda points to could change over time.
*/
static boolean_t
by_vdev_path_iter(const char *by_vdev_path, const char *devid,
zfs_process_func_t func, boolean_t is_slice)
{
dev_data_t data = { 0 };
data.dd_compare = by_vdev_path;
data.dd_func = func;
data.dd_prop = ZPOOL_CONFIG_PATH;
data.dd_found = B_FALSE;
data.dd_islabeled = is_slice;
data.dd_new_devid = devid;
if (strncmp(by_vdev_path, DEV_BYVDEV_PATH,
strlen(DEV_BYVDEV_PATH)) != 0) {
/* by_vdev_path doesn't start with "/dev/disk/by-vdev/" */
return (B_FALSE);
}
(void) zpool_iter(g_zfshdl, zfs_iter_pool, &data);
@@ -634,6 +869,27 @@ devid_iter(const char *devid, zfs_process_func_t func, boolean_t is_slice)
return (data.dd_found);
}
/*
* Given a device guid, find any vdevs with a matching guid.
*/
static boolean_t
guid_iter(uint64_t pool_guid, uint64_t vdev_guid, const char *devid,
zfs_process_func_t func, boolean_t is_slice)
{
dev_data_t data = { 0 };
data.dd_func = func;
data.dd_found = B_FALSE;
data.dd_pool_guid = pool_guid;
data.dd_vdev_guid = vdev_guid;
data.dd_islabeled = is_slice;
data.dd_new_devid = devid;
(void) zpool_iter(g_zfshdl, zfs_iter_pool, &data);
return (data.dd_found);
}
/*
* Handle a EC_DEV_ADD.ESC_DISK event.
*
@@ -654,18 +910,23 @@ devid_iter(const char *devid, zfs_process_func_t func, boolean_t is_slice)
* phys_path: 'pci-0000:04:00.0-sas-0x4433221106000000-lun-0'
*/
static int
zfs_deliver_add(nvlist_t *nvl, boolean_t is_lofi)
zfs_deliver_add(nvlist_t *nvl)
{
char *devpath = NULL, *devid;
const char *devpath = NULL, *devid = NULL;
uint64_t pool_guid = 0, vdev_guid = 0;
boolean_t is_slice;
/*
* Expecting a devid string and an optional physical location
* Expecting a devid string and an optional physical location and guid
*/
if (nvlist_lookup_string(nvl, DEV_IDENTIFIER, &devid) != 0)
if (nvlist_lookup_string(nvl, DEV_IDENTIFIER, &devid) != 0) {
zed_log_msg(LOG_INFO, "%s: no dev identifier\n", __func__);
return (-1);
}
(void) nvlist_lookup_string(nvl, DEV_PHYS_PATH, &devpath);
(void) nvlist_lookup_uint64(nvl, ZFS_EV_POOL_GUID, &pool_guid);
(void) nvlist_lookup_uint64(nvl, ZFS_EV_VDEV_GUID, &vdev_guid);
is_slice = (nvlist_lookup_boolean(nvl, DEV_IS_PART) == 0);
@@ -676,12 +937,28 @@ zfs_deliver_add(nvlist_t *nvl, boolean_t is_lofi)
* Iterate over all vdevs looking for a match in the following order:
* 1. ZPOOL_CONFIG_DEVID (identifies the unique disk)
* 2. ZPOOL_CONFIG_PHYS_PATH (identifies disk physical location).
*
* For disks, we only want to pay attention to vdevs marked as whole
* disks or are a multipath device.
* 3. ZPOOL_CONFIG_GUID (identifies unique vdev).
* 4. ZPOOL_CONFIG_PATH for /dev/disk/by-vdev devices only (since
* by-vdev paths represent physical paths).
*/
if (!devid_iter(devid, zfs_process_add, is_slice) && devpath != NULL)
(void) devphys_iter(devpath, devid, zfs_process_add, is_slice);
if (devid_iter(devid, zfs_process_add, is_slice))
return (0);
if (devpath != NULL && devphys_iter(devpath, devid, zfs_process_add,
is_slice, vdev_guid))
return (0);
if (vdev_guid != 0)
(void) guid_iter(pool_guid, vdev_guid, devid, zfs_process_add,
is_slice);
if (devpath != NULL) {
/* Can we match a /dev/disk/by-vdev/ path? */
char by_vdev_path[MAXPATHLEN];
snprintf(by_vdev_path, sizeof (by_vdev_path),
"/dev/disk/by-vdev/%s", devpath);
if (by_vdev_path_iter(by_vdev_path, devid, zfs_process_add,
is_slice))
return (0);
}
return (0);
}
@@ -713,21 +990,98 @@ zfs_deliver_check(nvlist_t *nvl)
return (0);
}
/*
* Given a path to a vdev, lookup the vdev's physical size from its
* config nvlist.
*
* Returns the vdev's physical size in bytes on success, 0 on error.
*/
static uint64_t
vdev_size_from_config(zpool_handle_t *zhp, const char *vdev_path)
{
nvlist_t *nvl = NULL;
boolean_t avail_spare, l2cache, log;
vdev_stat_t *vs = NULL;
uint_t c;
nvl = zpool_find_vdev(zhp, vdev_path, &avail_spare, &l2cache, &log);
if (!nvl)
return (0);
verify(nvlist_lookup_uint64_array(nvl, ZPOOL_CONFIG_VDEV_STATS,
(uint64_t **)&vs, &c) == 0);
if (!vs) {
zed_log_msg(LOG_INFO, "%s: no nvlist for '%s'", __func__,
vdev_path);
return (0);
}
return (vs->vs_pspace);
}
/*
* Given a path to a vdev, lookup if the vdev is a "whole disk" in the
* config nvlist. "whole disk" means that ZFS was passed a whole disk
* at pool creation time, which it partitioned up and has full control over.
* Thus a partition with wholedisk=1 set tells us that zfs created the
* partition at creation time. A partition without whole disk set would have
* been created by externally (like with fdisk) and passed to ZFS.
*
* Returns the whole disk value (either 0 or 1).
*/
static uint64_t
vdev_whole_disk_from_config(zpool_handle_t *zhp, const char *vdev_path)
{
nvlist_t *nvl = NULL;
boolean_t avail_spare, l2cache, log;
uint64_t wholedisk = 0;
nvl = zpool_find_vdev(zhp, vdev_path, &avail_spare, &l2cache, &log);
if (!nvl)
return (0);
(void) nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_WHOLE_DISK, &wholedisk);
return (wholedisk);
}
/*
* If the device size grew more than 1% then return true.
*/
#define DEVICE_GREW(oldsize, newsize) \
((newsize > oldsize) && \
((newsize / (newsize - oldsize)) <= 100))
static int
zfsdle_vdev_online(zpool_handle_t *zhp, void *data)
{
char *devname = data;
boolean_t avail_spare, l2cache;
nvlist_t *udev_nvl = data;
nvlist_t *tgt;
int error;
const char *tmp_devname;
char devname[MAXPATHLEN] = "";
uint64_t guid;
if (nvlist_lookup_uint64(udev_nvl, ZFS_EV_VDEV_GUID, &guid) == 0) {
sprintf(devname, "%llu", (u_longlong_t)guid);
} else if (nvlist_lookup_string(udev_nvl, DEV_PHYS_PATH,
&tmp_devname) == 0) {
strlcpy(devname, tmp_devname, MAXPATHLEN);
zfs_append_partition(devname, MAXPATHLEN);
} else {
zed_log_msg(LOG_INFO, "%s: no guid or physpath", __func__);
}
zed_log_msg(LOG_INFO, "zfsdle_vdev_online: searching for '%s' in '%s'",
devname, zpool_get_name(zhp));
if ((tgt = zpool_find_vdev_by_physpath(zhp, devname,
&avail_spare, &l2cache, NULL)) != NULL) {
char *path, fullpath[MAXPATHLEN];
uint64_t wholedisk;
const char *path;
char fullpath[MAXPATHLEN];
uint64_t wholedisk = 0;
error = nvlist_lookup_string(tgt, ZPOOL_CONFIG_PATH, &path);
if (error) {
@@ -735,16 +1089,15 @@ zfsdle_vdev_online(zpool_handle_t *zhp, void *data)
return (0);
}
error = nvlist_lookup_uint64(tgt, ZPOOL_CONFIG_WHOLE_DISK,
(void) nvlist_lookup_uint64(tgt, ZPOOL_CONFIG_WHOLE_DISK,
&wholedisk);
if (error)
wholedisk = 0;
if (wholedisk) {
char *tmp;
path = strrchr(path, '/');
if (path != NULL) {
path = zfs_strip_partition(path + 1);
if (path == NULL) {
tmp = zfs_strip_partition(path + 1);
if (tmp == NULL) {
zpool_close(zhp);
return (0);
}
@@ -753,8 +1106,8 @@ zfsdle_vdev_online(zpool_handle_t *zhp, void *data)
return (0);
}
(void) strlcpy(fullpath, path, sizeof (fullpath));
free(path);
(void) strlcpy(fullpath, tmp, sizeof (fullpath));
free(tmp);
/*
* We need to reopen the pool associated with this
@@ -772,12 +1125,75 @@ zfsdle_vdev_online(zpool_handle_t *zhp, void *data)
vdev_state_t newstate;
if (zpool_get_state(zhp) != POOL_STATE_UNAVAIL) {
error = zpool_vdev_online(zhp, fullpath, 0,
&newstate);
zed_log_msg(LOG_INFO, "zfsdle_vdev_online: "
"setting device '%s' to ONLINE state "
"in pool '%s': %d", fullpath,
zpool_get_name(zhp), error);
/*
* If this disk size has not changed, then
* there's no need to do an autoexpand. To
* check we look at the disk's size in its
* config, and compare it to the disk size
* that udev is reporting.
*/
uint64_t udev_size = 0, conf_size = 0,
wholedisk = 0, udev_parent_size = 0;
/*
* Get the size of our disk that udev is
* reporting.
*/
if (nvlist_lookup_uint64(udev_nvl, DEV_SIZE,
&udev_size) != 0) {
udev_size = 0;
}
/*
* Get the size of our disk's parent device
* from udev (where sda1's parent is sda).
*/
if (nvlist_lookup_uint64(udev_nvl,
DEV_PARENT_SIZE, &udev_parent_size) != 0) {
udev_parent_size = 0;
}
conf_size = vdev_size_from_config(zhp,
fullpath);
wholedisk = vdev_whole_disk_from_config(zhp,
fullpath);
/*
* Only attempt an autoexpand if the vdev size
* changed. There are two different cases
* to consider.
*
* 1. wholedisk=1
* If you do a 'zpool create' on a whole disk
* (like /dev/sda), then zfs will create
* partitions on the disk (like /dev/sda1). In
* that case, wholedisk=1 will be set in the
* partition's nvlist config. So zed will need
* to see if your parent device (/dev/sda)
* expanded in size, and if so, then attempt
* the autoexpand.
*
* 2. wholedisk=0
* If you do a 'zpool create' on an existing
* partition, or a device that doesn't allow
* partitions, then wholedisk=0, and you will
* simply need to check if the device itself
* expanded in size.
*/
if (DEVICE_GREW(conf_size, udev_size) ||
(wholedisk && DEVICE_GREW(conf_size,
udev_parent_size))) {
error = zpool_vdev_online(zhp, fullpath,
0, &newstate);
zed_log_msg(LOG_INFO,
"%s: autoexpanding '%s' from %llu"
" to %llu bytes in pool '%s': %d",
__func__, fullpath, conf_size,
MAX(udev_size, udev_parent_size),
zpool_get_name(zhp), error);
}
}
}
zpool_close(zhp);
@@ -796,7 +1212,8 @@ zfsdle_vdev_online(zpool_handle_t *zhp, void *data)
static int
zfs_deliver_dle(nvlist_t *nvl)
{
char *devname, name[MAXPATHLEN];
const char *devname;
char name[MAXPATHLEN];
uint64_t guid;
if (nvlist_lookup_uint64(nvl, ZFS_EV_VDEV_GUID, &guid) == 0) {
@@ -805,10 +1222,11 @@ zfs_deliver_dle(nvlist_t *nvl)
strlcpy(name, devname, MAXPATHLEN);
zfs_append_partition(name, MAXPATHLEN);
} else {
sprintf(name, "unknown");
zed_log_msg(LOG_INFO, "zfs_deliver_dle: no guid or physpath");
}
if (zpool_iter(g_zfshdl, zfsdle_vdev_online, name) != 1) {
if (zpool_iter(g_zfshdl, zfsdle_vdev_online, nvl) != 1) {
zed_log_msg(LOG_INFO, "zfs_deliver_dle: device '%s' not "
"found", name);
return (1);
@@ -832,18 +1250,15 @@ static int
zfs_slm_deliver_event(const char *class, const char *subclass, nvlist_t *nvl)
{
int ret;
boolean_t is_lofi = B_FALSE, is_check = B_FALSE, is_dle = B_FALSE;
boolean_t is_check = B_FALSE, is_dle = B_FALSE;
if (strcmp(class, EC_DEV_ADD) == 0) {
/*
* We're mainly interested in disk additions, but we also listen
* for new loop devices, to allow for simplified testing.
*/
if (strcmp(subclass, ESC_DISK) == 0)
is_lofi = B_FALSE;
else if (strcmp(subclass, ESC_LOFI) == 0)
is_lofi = B_TRUE;
else
if (strcmp(subclass, ESC_DISK) != 0 &&
strcmp(subclass, ESC_LOFI) != 0)
return (0);
is_check = B_FALSE;
@@ -867,15 +1282,16 @@ zfs_slm_deliver_event(const char *class, const char *subclass, nvlist_t *nvl)
else if (is_check)
ret = zfs_deliver_check(nvl);
else
ret = zfs_deliver_add(nvl, is_lofi);
ret = zfs_deliver_add(nvl);
return (ret);
}
/*ARGSUSED*/
static void *
zfs_enum_pools(void *arg)
{
(void) arg;
(void) zpool_iter(g_zfshdl, zfs_unavail_pool, (void *)&g_pool_list);
/*
* Linux - instead of using a thread pool, each list entry
@@ -894,7 +1310,7 @@ zfs_enum_pools(void *arg)
* For now, each agent has its own libzfs instance
*/
int
zfs_slm_init()
zfs_slm_init(void)
{
if ((g_zfshdl = libzfs_init()) == NULL)
return (-1);
@@ -912,6 +1328,7 @@ zfs_slm_init()
return (-1);
}
pthread_setname_np(g_zfs_tid, "enum-pools");
list_create(&g_device_list, sizeof (struct pendingdev),
offsetof(struct pendingdev, pd_node));
@@ -919,7 +1336,7 @@ zfs_slm_init()
}
void
zfs_slm_fini()
zfs_slm_fini(void)
{
unavailpool_t *pool;
pendingdev_t *device;
@@ -932,17 +1349,14 @@ zfs_slm_fini()
tpool_destroy(g_tpool);
}
while ((pool = (list_head(&g_pool_list))) != NULL) {
list_remove(&g_pool_list, pool);
while ((pool = list_remove_head(&g_pool_list)) != NULL) {
zpool_close(pool->uap_zhp);
free(pool);
}
list_destroy(&g_pool_list);
while ((device = (list_head(&g_device_list))) != NULL) {
list_remove(&g_device_list, device);
while ((device = list_remove_head(&g_device_list)) != NULL)
free(device);
}
list_destroy(&g_device_list);
libzfs_fini(g_zfshdl);
+137 -23
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -38,8 +38,10 @@
#include <sys/fs/zfs.h>
#include <sys/fm/protocol.h>
#include <sys/fm/fs/zfs.h>
#include <libzutil.h>
#include <libzfs.h>
#include <string.h>
#include <libgen.h>
#include "zfs_agents.h"
#include "fmd_api.h"
@@ -74,6 +76,8 @@ typedef struct find_cbdata {
uint64_t cb_guid;
zpool_handle_t *cb_zhp;
nvlist_t *cb_vdev;
uint64_t cb_vdev_guid;
uint64_t cb_num_spares;
} find_cbdata_t;
static int
@@ -139,6 +143,64 @@ find_vdev(libzfs_handle_t *zhdl, nvlist_t *nv, uint64_t search_guid)
return (NULL);
}
static int
remove_spares(zpool_handle_t *zhp, void *data)
{
nvlist_t *config, *nvroot;
nvlist_t **spares;
uint_t nspares;
char *devname;
find_cbdata_t *cbp = data;
uint64_t spareguid = 0;
vdev_stat_t *vs;
unsigned int c;
config = zpool_get_config(zhp, NULL);
if (nvlist_lookup_nvlist(config,
ZPOOL_CONFIG_VDEV_TREE, &nvroot) != 0) {
zpool_close(zhp);
return (0);
}
if (nvlist_lookup_nvlist_array(nvroot, ZPOOL_CONFIG_SPARES,
&spares, &nspares) != 0) {
zpool_close(zhp);
return (0);
}
for (int i = 0; i < nspares; i++) {
if (nvlist_lookup_uint64(spares[i], ZPOOL_CONFIG_GUID,
&spareguid) == 0 && spareguid == cbp->cb_vdev_guid) {
devname = zpool_vdev_name(NULL, zhp, spares[i],
B_FALSE);
nvlist_lookup_uint64_array(spares[i],
ZPOOL_CONFIG_VDEV_STATS, (uint64_t **)&vs, &c);
if (vs->vs_state != VDEV_STATE_REMOVED &&
zpool_vdev_remove_wanted(zhp, devname) == 0)
cbp->cb_num_spares++;
break;
}
}
zpool_close(zhp);
return (0);
}
/*
* Given a vdev guid, find and remove all spares associated with it.
*/
static int
find_and_remove_spares(libzfs_handle_t *zhdl, uint64_t vdev_guid)
{
find_cbdata_t cb;
cb.cb_num_spares = 0;
cb.cb_vdev_guid = vdev_guid;
zpool_iter(zhdl, remove_spares, &cb);
return (cb.cb_num_spares);
}
/*
* Given a (pool, vdev) GUID pair, find the matching pool and vdev.
*/
@@ -219,25 +281,31 @@ replace_with_spare(fmd_hdl_t *hdl, zpool_handle_t *zhp, nvlist_t *vdev)
* replace it.
*/
for (s = 0; s < nspares; s++) {
char *spare_name;
boolean_t rebuild = B_FALSE;
const char *spare_name, *type;
if (nvlist_lookup_string(spares[s], ZPOOL_CONFIG_PATH,
&spare_name) != 0)
continue;
/* prefer sequential resilvering for distributed spares */
if ((nvlist_lookup_string(spares[s], ZPOOL_CONFIG_TYPE,
&type) == 0) && strcmp(type, VDEV_TYPE_DRAID_SPARE) == 0)
rebuild = B_TRUE;
/* if set, add the "ashift" pool property to the spare nvlist */
if (source != ZPROP_SRC_DEFAULT)
(void) nvlist_add_uint64(spares[s],
ZPOOL_CONFIG_ASHIFT, ashift);
(void) nvlist_add_nvlist_array(replacement,
ZPOOL_CONFIG_CHILDREN, &spares[s], 1);
ZPOOL_CONFIG_CHILDREN, (const nvlist_t **)&spares[s], 1);
fmd_hdl_debug(hdl, "zpool_vdev_replace '%s' with spare '%s'",
dev_name, basename(spare_name));
dev_name, zfs_basename(spare_name));
if (zpool_vdev_attach(zhp, dev_name, spare_name,
replacement, B_TRUE, B_FALSE) == 0) {
replacement, B_TRUE, rebuild) == 0) {
free(dev_name);
nvlist_free(replacement);
return (B_TRUE);
@@ -255,7 +323,6 @@ replace_with_spare(fmd_hdl_t *hdl, zpool_handle_t *zhp, nvlist_t *vdev)
* ASRU is now usable. ZFS has found the device to be present and
* functioning.
*/
/*ARGSUSED*/
static void
zfs_vdev_repair(fmd_hdl_t *hdl, nvlist_t *nvl)
{
@@ -294,11 +361,11 @@ zfs_vdev_repair(fmd_hdl_t *hdl, nvlist_t *nvl)
vdev_guid, pool_guid);
}
/*ARGSUSED*/
static void
zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
const char *class)
{
(void) ep;
uint64_t pool_guid, vdev_guid;
zpool_handle_t *zhp;
nvlist_t *resource, *fault;
@@ -308,18 +375,23 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
libzfs_handle_t *zhdl = zdp->zrd_hdl;
boolean_t fault_device, degrade_device;
boolean_t is_repair;
char *scheme;
boolean_t l2arc = B_FALSE;
boolean_t spare = B_FALSE;
const char *scheme;
nvlist_t *vdev = NULL;
char *uuid;
const char *uuid;
int repair_done = 0;
boolean_t retire;
boolean_t is_disk;
vdev_aux_t aux;
uint64_t state = 0;
vdev_stat_t *vs;
unsigned int c;
fmd_hdl_debug(hdl, "zfs_retire_recv: '%s'", class);
nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_STATE, &state);
(void) nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_STATE,
&state);
/*
* If this is a resource notifying us of device removal then simply
@@ -328,14 +400,35 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
*/
if (strcmp(class, "resource.fs.zfs.removed") == 0 ||
(strcmp(class, "resource.fs.zfs.statechange") == 0 &&
state == VDEV_STATE_REMOVED)) {
char *devtype;
(state == VDEV_STATE_REMOVED || state == VDEV_STATE_FAULTED))) {
const char *devtype;
char *devname;
if (nvlist_lookup_string(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_TYPE,
&devtype) == 0) {
if (strcmp(devtype, VDEV_TYPE_SPARE) == 0)
spare = B_TRUE;
else if (strcmp(devtype, VDEV_TYPE_L2CACHE) == 0)
l2arc = B_TRUE;
}
if (nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_GUID, &vdev_guid) != 0)
return;
if (vdev_guid == 0) {
fmd_hdl_debug(hdl, "Got a zero GUID");
return;
}
if (spare) {
int nspares = find_and_remove_spares(zhdl, vdev_guid);
fmd_hdl_debug(hdl, "%d spares removed", nspares);
return;
}
if (nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_POOL_GUID,
&pool_guid) != 0 ||
nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_GUID,
&vdev_guid) != 0)
&pool_guid) != 0)
return;
if ((zhp = find_by_guid(zhdl, pool_guid, vdev_guid,
@@ -344,13 +437,30 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
devname = zpool_vdev_name(NULL, zhp, vdev, B_FALSE);
/* Can't replace l2arc with a spare: offline the device */
if (nvlist_lookup_string(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_TYPE,
&devtype) == 0 && strcmp(devtype, VDEV_TYPE_L2CACHE) == 0) {
fmd_hdl_debug(hdl, "zpool_vdev_offline '%s'", devname);
zpool_vdev_offline(zhp, devname, B_TRUE);
} else if (!fmd_prop_get_int32(hdl, "spare_on_remove") ||
replace_with_spare(hdl, zhp, vdev) == B_FALSE) {
nvlist_lookup_uint64_array(vdev, ZPOOL_CONFIG_VDEV_STATS,
(uint64_t **)&vs, &c);
/*
* If state removed is requested for already removed vdev,
* its a loopback event from spa_async_remove(). Just
* ignore it.
*/
if ((vs->vs_state == VDEV_STATE_REMOVED && state ==
VDEV_STATE_REMOVED) || vs->vs_state == VDEV_STATE_OFFLINE)
return;
/* Remove the vdev since device is unplugged */
int remove_status = 0;
if (l2arc || (strcmp(class, "resource.fs.zfs.removed") == 0)) {
remove_status = zpool_vdev_remove_wanted(zhp, devname);
fmd_hdl_debug(hdl, "zpool_vdev_remove_wanted '%s'"
", err:%d", devname, libzfs_errno(zhdl));
}
/* Replace the vdev with a spare if its not a l2arc */
if (!l2arc && !remove_status &&
(!fmd_prop_get_int32(hdl, "spare_on_remove") ||
replace_with_spare(hdl, zhp, vdev) == B_FALSE)) {
/* Could not handle with spare */
fmd_hdl_debug(hdl, "no spare for '%s'", devname);
}
@@ -364,7 +474,7 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
return;
/*
* Note: on zfsonlinux statechange events are more than just
* Note: on Linux statechange events are more than just
* healthy ones so we need to confirm the actual state value.
*/
if (strcmp(class, "resource.fs.zfs.statechange") == 0 &&
@@ -413,6 +523,9 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
} else if (fmd_nvl_class_match(hdl, fault,
"fault.fs.zfs.vdev.checksum")) {
degrade_device = B_TRUE;
} else if (fmd_nvl_class_match(hdl, fault,
"fault.fs.zfs.vdev.slow_io")) {
degrade_device = B_TRUE;
} else if (fmd_nvl_class_match(hdl, fault,
"fault.fs.zfs.device")) {
fault_device = B_FALSE;
@@ -499,6 +612,7 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
* Attempt to substitute a hot spare.
*/
(void) replace_with_spare(hdl, zhp, vdev);
zpool_close(zhp);
}
+30 -26
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -36,6 +36,7 @@ static volatile sig_atomic_t _got_hup = 0;
static void
_exit_handler(int signum)
{
(void) signum;
_got_exit = 1;
}
@@ -45,6 +46,7 @@ _exit_handler(int signum)
static void
_hup_handler(int signum)
{
(void) signum;
_got_hup = 1;
}
@@ -60,8 +62,8 @@ _setup_sig_handlers(void)
zed_log_die("Failed to initialize sigset");
sa.sa_flags = SA_RESTART;
sa.sa_handler = SIG_IGN;
sa.sa_handler = SIG_IGN;
if (sigaction(SIGPIPE, &sa, NULL) < 0)
zed_log_die("Failed to ignore SIGPIPE");
@@ -75,6 +77,10 @@ _setup_sig_handlers(void)
sa.sa_handler = _hup_handler;
if (sigaction(SIGHUP, &sa, NULL) < 0)
zed_log_die("Failed to register SIGHUP handler");
(void) sigaddset(&sa.sa_mask, SIGCHLD);
if (pthread_sigmask(SIG_BLOCK, &sa.sa_mask, NULL) < 0)
zed_log_die("Failed to block SIGCHLD");
}
/*
@@ -212,22 +218,20 @@ _finish_daemonize(void)
int
main(int argc, char *argv[])
{
struct zed_conf *zcp;
struct zed_conf zcp;
uint64_t saved_eid;
int64_t saved_etime[2];
zed_log_init(argv[0]);
zed_log_stderr_open(LOG_NOTICE);
zcp = zed_conf_create();
zed_conf_parse_opts(zcp, argc, argv);
if (zcp->do_verbose)
zed_conf_init(&zcp);
zed_conf_parse_opts(&zcp, argc, argv);
if (zcp.do_verbose)
zed_log_stderr_open(LOG_INFO);
if (geteuid() != 0)
zed_log_die("Must be run as root");
zed_conf_parse_file(zcp);
zed_file_close_from(STDERR_FILENO + 1);
(void) umask(0);
@@ -235,32 +239,32 @@ main(int argc, char *argv[])
if (chdir("/") < 0)
zed_log_die("Failed to change to root directory");
if (zed_conf_scan_dir(zcp) < 0)
if (zed_conf_scan_dir(&zcp) < 0)
exit(EXIT_FAILURE);
if (!zcp->do_foreground) {
if (!zcp.do_foreground) {
_start_daemonize();
zed_log_syslog_open(LOG_DAEMON);
}
_setup_sig_handlers();
if (zcp->do_memlock)
if (zcp.do_memlock)
_lock_memory();
if ((zed_conf_write_pid(zcp) < 0) && (!zcp->do_force))
if ((zed_conf_write_pid(&zcp) < 0) && (!zcp.do_force))
exit(EXIT_FAILURE);
if (!zcp->do_foreground)
if (!zcp.do_foreground)
_finish_daemonize();
zed_log_msg(LOG_NOTICE,
"ZFS Event Daemon %s-%s (PID %d)",
ZFS_META_VERSION, ZFS_META_RELEASE, (int)getpid());
if (zed_conf_open_state(zcp) < 0)
if (zed_conf_open_state(&zcp) < 0)
exit(EXIT_FAILURE);
if (zed_conf_read_state(zcp, &saved_eid, saved_etime) < 0)
if (zed_conf_read_state(&zcp, &saved_eid, saved_etime) < 0)
exit(EXIT_FAILURE);
idle:
@@ -269,38 +273,38 @@ idle:
* successful.
*/
do {
if (!zed_event_init(zcp))
if (!zed_event_init(&zcp))
break;
/* Wait for some time and try again. tunable? */
sleep(30);
} while (!_got_exit && zcp->do_idle);
} while (!_got_exit && zcp.do_idle);
if (_got_exit)
goto out;
zed_event_seek(zcp, saved_eid, saved_etime);
zed_event_seek(&zcp, saved_eid, saved_etime);
while (!_got_exit) {
int rv;
if (_got_hup) {
_got_hup = 0;
(void) zed_conf_scan_dir(zcp);
(void) zed_conf_scan_dir(&zcp);
}
rv = zed_event_service(zcp);
rv = zed_event_service(&zcp);
/* ENODEV: When kernel module is unloaded (osx) */
if (rv == ENODEV)
if (rv != 0)
break;
}
zed_log_msg(LOG_NOTICE, "Exiting");
zed_event_fini(zcp);
zed_event_fini(&zcp);
if (zcp->do_idle && !_got_exit)
if (zcp.do_idle && !_got_exit)
goto idle;
out:
zed_conf_destroy(zcp);
zed_conf_destroy(&zcp);
zed_log_fini();
exit(EXIT_SUCCESS);
}
+37 -31
View File
@@ -1,53 +1,59 @@
include $(top_srcdir)/config/Rules.am
include $(top_srcdir)/config/Substfiles.am
EXTRA_DIST += README
zedconfdir = $(sysconfdir)/zfs/zed.d
dist_zedconf_DATA = \
zed-functions.sh \
zed.rc
%D%/zed-functions.sh \
%D%/zed.rc
zedexecdir = $(zfsexecdir)/zed.d
dist_zedexec_SCRIPTS = \
all-debug.sh \
all-syslog.sh \
data-notify.sh \
generic-notify.sh \
resilver_finish-notify.sh \
scrub_finish-notify.sh \
statechange-led.sh \
statechange-notify.sh \
vdev_clear-led.sh \
vdev_attach-led.sh \
pool_import-led.sh \
resilver_finish-start-scrub.sh \
trim_finish-notify.sh
%D%/all-debug.sh \
%D%/all-syslog.sh \
%D%/data-notify.sh \
%D%/deadman-slot_off.sh \
%D%/generic-notify.sh \
%D%/pool_import-led.sh \
%D%/resilver_finish-notify.sh \
%D%/resilver_finish-start-scrub.sh \
%D%/scrub_finish-notify.sh \
%D%/statechange-led.sh \
%D%/statechange-notify.sh \
%D%/statechange-slot_off.sh \
%D%/trim_finish-notify.sh \
%D%/vdev_attach-led.sh \
%D%/vdev_clear-led.sh
nodist_zedexec_SCRIPTS = history_event-zfs-list-cacher.sh
nodist_zedexec_SCRIPTS = \
%D%/history_event-zfs-list-cacher.sh
SUBSTFILES += $(nodist_zedexec_SCRIPTS)
zedconfdefaults = \
all-syslog.sh \
data-notify.sh \
deadman-slot_off.sh \
history_event-zfs-list-cacher.sh \
pool_import-led.sh \
resilver_finish-notify.sh \
resilver_finish-start-scrub.sh \
scrub_finish-notify.sh \
statechange-led.sh \
statechange-notify.sh \
vdev_clear-led.sh \
statechange-slot_off.sh \
vdev_attach-led.sh \
pool_import-led.sh \
resilver_finish-start-scrub.sh
vdev_clear-led.sh
install-data-hook:
dist_noinst_DATA += %D%/README
INSTALL_DATA_HOOKS += zed-install-data-hook
zed-install-data-hook:
$(MKDIR_P) "$(DESTDIR)$(zedconfdir)"
for f in $(zedconfdefaults); do \
test -f "$(DESTDIR)$(zedconfdir)/$${f}" -o \
-L "$(DESTDIR)$(zedconfdir)/$${f}" || \
ln -s "$(zedexecdir)/$${f}" "$(DESTDIR)$(zedconfdir)"; \
set -x; for f in $(zedconfdefaults); do \
[ -f "$(DESTDIR)$(zedconfdir)/$${f}" ] ||\
[ -L "$(DESTDIR)$(zedconfdir)/$${f}" ] || \
$(LN_S) "$(zedexecdir)/$${f}" "$(DESTDIR)$(zedconfdir)"; \
done
chmod 0600 "$(DESTDIR)$(zedconfdir)/zed.rc"
SHELLCHECKSCRIPTS += $(dist_zedconf_DATA) $(dist_zedexec_SCRIPTS) $(nodist_zedexec_SCRIPTS)
$(call SHELLCHECK_OPTS,$(dist_zedconf_DATA) $(dist_zedexec_SCRIPTS) $(nodist_zedexec_SCRIPTS)): SHELLCHECK_SHELL = sh
# False positive: 1>&"${ZED_FLOCK_FD}" looks suspiciously similar to a >&filename bash extension
$(call SHELLCHECK_OPTS,$(dist_zedconf_DATA) $(dist_zedexec_SCRIPTS) $(nodist_zedexec_SCRIPTS)): CHECKBASHISMS_IGNORE = -e 'should be >word 2>&1' -e '&"$${ZED_FLOCK_FD}"'
+7 -10
View File
@@ -1,4 +1,5 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# Log all environment variables to ZED_DEBUG_LOG.
#
@@ -12,15 +13,11 @@
zed_exit_if_ignoring_this_event
lockfile="$(basename -- "${ZED_DEBUG_LOG}").lock"
zed_lock "${ZED_DEBUG_LOG}"
{
env | sort
echo
} 1>&"${ZED_FLOCK_FD}"
zed_unlock "${ZED_DEBUG_LOG}"
umask 077
zed_lock "${lockfile}"
exec >> "${ZED_DEBUG_LOG}"
printenv | sort
echo
exec >&-
zed_unlock "${lockfile}"
exit 0
+42 -4
View File
@@ -1,14 +1,52 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
# Copyright (c) 2020 by Delphix. All rights reserved.
#
#
# Log the zevent via syslog.
#
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
zed_exit_if_ignoring_this_event
zed_log_msg "eid=${ZEVENT_EID}" "class=${ZEVENT_SUBCLASS}" \
"${ZEVENT_POOL_GUID:+"pool_guid=${ZEVENT_POOL_GUID}"}" \
"${ZEVENT_VDEV_PATH:+"vdev_path=${ZEVENT_VDEV_PATH}"}" \
"${ZEVENT_VDEV_STATE_STR:+"vdev_state=${ZEVENT_VDEV_STATE_STR}"}"
# build a string of name=value pairs for this event
msg="eid=${ZEVENT_EID} class=${ZEVENT_SUBCLASS}"
if [ "${ZED_SYSLOG_DISPLAY_GUIDS}" = "1" ]; then
[ -n "${ZEVENT_POOL_GUID}" ] && msg="${msg} pool_guid=${ZEVENT_POOL_GUID}"
[ -n "${ZEVENT_VDEV_GUID}" ] && msg="${msg} vdev_guid=${ZEVENT_VDEV_GUID}"
else
[ -n "${ZEVENT_POOL}" ] && msg="${msg} pool='${ZEVENT_POOL}'"
[ -n "${ZEVENT_VDEV_PATH}" ] && msg="${msg} vdev=${ZEVENT_VDEV_PATH##*/}"
fi
# log pool state if state is anything other than 'ACTIVE'
[ -n "${ZEVENT_POOL_STATE_STR}" ] && [ "$ZEVENT_POOL_STATE" -ne 0 ] && \
msg="${msg} pool_state=${ZEVENT_POOL_STATE_STR}"
# Log the following payload nvpairs if they are present
[ -n "${ZEVENT_VDEV_STATE_STR}" ] && msg="${msg} vdev_state=${ZEVENT_VDEV_STATE_STR}"
[ -n "${ZEVENT_CKSUM_ALGORITHM}" ] && msg="${msg} algorithm=${ZEVENT_CKSUM_ALGORITHM}"
[ -n "${ZEVENT_ZIO_SIZE}" ] && msg="${msg} size=${ZEVENT_ZIO_SIZE}"
[ -n "${ZEVENT_ZIO_OFFSET}" ] && msg="${msg} offset=${ZEVENT_ZIO_OFFSET}"
[ -n "${ZEVENT_ZIO_PRIORITY}" ] && msg="${msg} priority=${ZEVENT_ZIO_PRIORITY}"
[ -n "${ZEVENT_ZIO_ERR}" ] && msg="${msg} err=${ZEVENT_ZIO_ERR}"
[ -n "${ZEVENT_ZIO_FLAGS}" ] && msg="${msg} flags=$(printf '0x%x' "${ZEVENT_ZIO_FLAGS}")"
# log delays that are >= 10 milisec
[ -n "${ZEVENT_ZIO_DELAY}" ] && [ "$ZEVENT_ZIO_DELAY" -gt 10000000 ] && \
msg="${msg} delay=$((ZEVENT_ZIO_DELAY / 1000000))ms"
# list the bookmark data together
# shellcheck disable=SC2153
[ -n "${ZEVENT_ZIO_OBJSET}" ] && \
msg="${msg} bookmark=${ZEVENT_ZIO_OBJSET}:${ZEVENT_ZIO_OBJECT}:${ZEVENT_ZIO_LEVEL}:${ZEVENT_ZIO_BLKID}"
zed_log_msg "${msg}"
exit 0
+2 -1
View File
@@ -1,4 +1,5 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# Send notification in response to a DATA error.
#
@@ -25,7 +26,7 @@ zed_rate_limit "${rate_limit_tag}" || exit 3
umask 077
note_subject="ZFS ${ZEVENT_SUBCLASS} error for ${ZEVENT_POOL} on $(hostname)"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
note_pathname="$(mktemp)"
{
echo "ZFS has detected a data error:"
echo
+71
View File
@@ -0,0 +1,71 @@
#!/bin/sh
# shellcheck disable=SC3014,SC2154,SC2086,SC2034
#
# Turn off disk's enclosure slot if an I/O is hung triggering the deadman.
#
# It's possible for outstanding I/O to a misbehaving SCSI disk to neither
# promptly complete or return an error. This can occur due to retry and
# recovery actions taken by the SCSI layer, driver, or disk. When it occurs
# the pool will be unresponsive even though there may be sufficient redundancy
# configured to proceeded without this single disk.
#
# When a hung I/O is detected by the kmods it will be posted as a deadman
# event. By default an I/O is considered to be hung after 5 minutes. This
# value can be changed with the zfs_deadman_ziotime_ms module parameter.
# If ZED_POWER_OFF_ENCLOSURE_SLOT_ON_DEADMAN is set the disk's enclosure
# slot will be powered off causing the outstanding I/O to fail. The ZED
# will then handle this like a normal disk failure and FAULT the vdev.
#
# We assume the user will be responsible for turning the slot back on
# after replacing the disk.
#
# Note that this script requires that your enclosure be supported by the
# Linux SCSI Enclosure services (SES) driver. The script will do nothing
# if you have no enclosure, or if your enclosure isn't supported.
#
# Exit codes:
# 0: slot successfully powered off
# 1: enclosure not available
# 2: ZED_POWER_OFF_ENCLOSURE_SLOT_ON_DEADMAN disabled
# 3: System not configured to wait on deadman
# 4: The enclosure sysfs path passed from ZFS does not exist
# 5: Enclosure slot didn't actually turn off after we told it to
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
if [ ! -d /sys/class/enclosure ] ; then
# No JBOD enclosure or NVMe slots
exit 1
fi
if [ "${ZED_POWER_OFF_ENCLOSURE_SLOT_ON_DEADMAN}" != "1" ] ; then
exit 2
fi
if [ "$ZEVENT_POOL_FAILMODE" != "wait" ] ; then
exit 3
fi
if [ ! -f "$ZEVENT_VDEV_ENC_SYSFS_PATH/power_status" ] ; then
exit 4
fi
# Turn off the slot and wait for sysfs to report that the slot is off.
# It can take ~400ms on some enclosures and multiple retries may be needed.
for i in $(seq 1 20) ; do
echo "off" | tee "$ZEVENT_VDEV_ENC_SYSFS_PATH/power_status"
for j in $(seq 1 5) ; do
if [ "$(cat $ZEVENT_VDEV_ENC_SYSFS_PATH/power_status)" == "off" ] ; then
break 2
fi
sleep 0.1
done
done
if [ "$(cat $ZEVENT_VDEV_ENC_SYSFS_PATH/power_status)" != "off" ] ; then
exit 5
fi
zed_log_msg "powered down slot $ZEVENT_VDEV_ENC_SYSFS_PATH for $ZEVENT_VDEV_PATH"
+3 -2
View File
@@ -1,4 +1,5 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# Send notification in response to a given zevent.
#
@@ -23,7 +24,7 @@
# Rate-limit the notification based in part on the filename.
#
rate_limit_tag="${ZEVENT_POOL};${ZEVENT_SUBCLASS};$(basename -- "$0")"
rate_limit_tag="${ZEVENT_POOL};${ZEVENT_SUBCLASS};${0##*/}"
rate_limit_interval="${ZED_NOTIFY_INTERVAL_SECS}"
zed_rate_limit "${rate_limit_tag}" "${rate_limit_interval}" || exit 3
@@ -31,7 +32,7 @@ umask 077
pool_str="${ZEVENT_POOL:+" for ${ZEVENT_POOL}"}"
host_str=" on $(hostname)"
note_subject="ZFS ${ZEVENT_SUBCLASS} event${pool_str}${host_str}"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
note_pathname="$(mktemp)"
{
echo "ZFS has posted the following event:"
echo
@@ -1,11 +1,11 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# Track changes to enumerated pools for use in early-boot
set -ef
FSLIST_DIR="@sysconfdir@/zfs/zfs-list.cache"
FSLIST_TMP="@runstatedir@/zfs-list.cache.new"
FSLIST="${FSLIST_DIR}/${ZEVENT_POOL}"
FSLIST="@sysconfdir@/zfs/zfs-list.cache/${ZEVENT_POOL}"
FSLIST_TMP="@runstatedir@/zfs-list.cache@${ZEVENT_POOL}"
# If the pool specific cache file is not writeable, abort
[ -w "${FSLIST}" ] || exit 0
@@ -13,21 +13,21 @@ FSLIST="${FSLIST_DIR}/${ZEVENT_POOL}"
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
zed_exit_if_ignoring_this_event
zed_check_cmd "${ZFS}" sort diff grep
[ "$ZEVENT_SUBCLASS" != "history_event" ] && exit 0
zed_check_cmd "${ZFS}" sort diff
# If we are acting on a snapshot, we have nothing to do
printf '%s' "${ZEVENT_HISTORY_DSNAME}" | grep '@' && exit 0
[ "${ZEVENT_HISTORY_DSNAME%@*}" = "${ZEVENT_HISTORY_DSNAME}" ] || exit 0
# We obtain a lock on zfs-list to avoid any simultaneous writes.
# We lock the output file to avoid simultaneous writes.
# If we run into trouble, log and drop the lock
abort_alter() {
zed_log_msg "Error updating zfs-list.cache!"
zed_unlock zfs-list
zed_log_msg "Error updating zfs-list.cache for ${ZEVENT_POOL}!"
zed_unlock "${FSLIST}"
}
finished() {
zed_unlock zfs-list
zed_unlock "${FSLIST}"
trap - EXIT
exit 0
}
@@ -37,7 +37,7 @@ case "${ZEVENT_HISTORY_INTERNAL_NAME}" in
;;
export)
zed_lock zfs-list
zed_lock "${FSLIST}"
trap abort_alter EXIT
echo > "${FSLIST}"
finished
@@ -63,7 +63,7 @@ case "${ZEVENT_HISTORY_INTERNAL_NAME}" in
;;
esac
zed_lock zfs-list
zed_lock "${FSLIST}"
trap abort_alter EXIT
PROPS="name,mountpoint,canmount,atime,relatime,devices,exec\
@@ -73,13 +73,13 @@ PROPS="name,mountpoint,canmount,atime,relatime,devices,exec\
,org.openzfs.systemd:wanted-by,org.openzfs.systemd:required-by\
,org.openzfs.systemd:nofail,org.openzfs.systemd:ignore"
"${ZFS}" list -H -t filesystem -o $PROPS -r "${ZEVENT_POOL}" > "${FSLIST_TMP}"
"${ZFS}" list -H -t filesystem -o "${PROPS}" -r "${ZEVENT_POOL}" > "${FSLIST_TMP}"
# Sort the output so that it is stable
sort "${FSLIST_TMP}" -o "${FSLIST_TMP}"
# Don't modify the file if it hasn't changed
diff -q "${FSLIST_TMP}" "${FSLIST}" || mv "${FSLIST_TMP}" "${FSLIST}"
diff -q "${FSLIST_TMP}" "${FSLIST}" || cat "${FSLIST_TMP}" > "${FSLIST}"
rm -f "${FSLIST_TMP}"
finished
@@ -1,4 +1,5 @@
#!/bin/sh
# shellcheck disable=SC2154
# resilver_finish-start-scrub.sh
# Run a scrub after a resilver
#
+2 -1
View File
@@ -1,4 +1,5 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# Send notification in response to a RESILVER_FINISH or SCRUB_FINISH.
#
@@ -41,7 +42,7 @@ fi
umask 077
note_subject="ZFS ${ZEVENT_SUBCLASS} event for ${ZEVENT_POOL} on $(hostname)"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
note_pathname="$(mktemp)"
{
echo "ZFS has finished a ${action}:"
echo
+115 -50
View File
@@ -1,21 +1,22 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# Turn off/on the VDEV's enclosure fault LEDs when the pool's state changes.
# Turn off/on vdevs' enclosure fault LEDs when their pool's state changes.
#
# Turn the VDEV's fault LED on if it becomes FAULTED, DEGRADED or UNAVAIL.
# Turn the LED off when it's back ONLINE again.
# Turn a vdev's fault LED on if it becomes FAULTED, DEGRADED or UNAVAIL.
# Turn its LED off when it's back ONLINE again.
#
# This script run in two basic modes:
#
# 1. If $ZEVENT_VDEV_ENC_SYSFS_PATH and $ZEVENT_VDEV_STATE_STR are set, then
# only set the LED for that particular VDEV. This is the case for statechange
# only set the LED for that particular vdev. This is the case for statechange
# events and some vdev_* events.
#
# 2. If those vars are not set, then check the state of all VDEVs in the pool
# 2. If those vars are not set, then check the state of all vdevs in the pool
# and set the LEDs accordingly. This is the case for pool_import events.
#
# Note that this script requires that your enclosure be supported by the
# Linux SCSI enclosure services (ses) driver. The script will do nothing
# Linux SCSI Enclosure services (SES) driver. The script will do nothing
# if you have no enclosure, or if your enclosure isn't supported.
#
# Exit codes:
@@ -29,7 +30,8 @@
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
if [ ! -d /sys/class/enclosure ] ; then
if [ ! -d /sys/class/enclosure ] && [ ! -d /sys/bus/pci/slots ] ; then
# No JBOD enclosure or NVMe slots
exit 1
fi
@@ -59,6 +61,10 @@ check_and_set_led()
file="$1"
val="$2"
if [ -z "$val" ]; then
return 0
fi
if [ ! -e "$file" ] ; then
return 3
fi
@@ -66,11 +72,11 @@ check_and_set_led()
# If another process is accessing the LED when we attempt to update it,
# the update will be lost so retry until the LED actually changes or we
# timeout.
for _ in $(seq 1 5); do
for _ in 1 2 3 4 5; do
# We want to check the current state first, since writing to the
# 'fault' entry always causes a SES command, even if the
# current state is already what you want.
current=$(cat "${file}")
read -r current < "${file}"
# On some enclosures if you write 1 to fault, and read it back,
# it will return 2. Treat all non-zero values as 1 for
@@ -85,27 +91,85 @@ check_and_set_led()
else
break
fi
done
done
}
# Fault LEDs for JBODs and NVMe drives are handled a little differently.
#
# On JBODs the fault LED is called 'fault' and on a path like this:
#
# /sys/class/enclosure/0:0:1:0/SLOT 10/fault
#
# On NVMe it's called 'attention' and on a path like this:
#
# /sys/bus/pci/slot/0/attention
#
# This function returns the full path to the fault LED file for a given
# enclosure/slot directory.
#
path_to_led()
{
dir=$1
if [ -f "$dir/fault" ] ; then
echo "$dir/fault"
elif [ -f "$dir/attention" ] ; then
echo "$dir/attention"
fi
}
state_to_val()
{
state="$1"
if [ "$state" = "FAULTED" ] || [ "$state" = "DEGRADED" ] || \
[ "$state" = "UNAVAIL" ] ; then
echo 1
elif [ "$state" = "ONLINE" ] ; then
echo 0
fi
case "$state" in
FAULTED|DEGRADED|UNAVAIL|REMOVED)
echo 1
;;
ONLINE)
echo 0
;;
*)
echo "invalid state: $state"
;;
esac
}
# process_pool ([pool])
#
# Iterate through a pool (or pools) and set the VDEV's enclosure slot LEDs to
# the VDEV's state.
# Given a nvme name like 'nvme0n1', pass back its slot directory
# like "/sys/bus/pci/slots/0"
#
nvme_dev_to_slot()
{
dev="$1"
# Get the address "0000:01:00.0"
read -r address < "/sys/class/block/$dev/device/address"
find /sys/bus/pci/slots -regex '.*/[0-9]+/address$' | \
while read -r sys_addr; do
read -r this_address < "$sys_addr"
# The format of address is a little different between
# /sys/class/block/$dev/device/address and
# /sys/bus/pci/slots/
#
# address= "0000:01:00.0"
# this_address = "0000:01:00"
#
if echo "$address" | grep -Eq ^"$this_address" ; then
echo "${sys_addr%/*}"
break
fi
done
}
# process_pool (pool)
#
# Iterate through a pool and set the vdevs' enclosure slot LEDs to
# those vdevs' state.
#
# Arguments
# pool: Optional pool name. If not specified, iterate though all pools.
# pool: Pool name.
#
# Return
# 0 on success, 3 on missing sysfs path
@@ -113,19 +177,27 @@ state_to_val()
process_pool()
{
pool="$1"
# The output will be the vdevs only (from "grep '/dev/'"):
#
# U45 ONLINE 0 0 0 /dev/sdk 0
# U46 ONLINE 0 0 0 /dev/sdm 0
# U47 ONLINE 0 0 0 /dev/sdn 0
# U50 ONLINE 0 0 0 /dev/sdbn 0
#
ZPOOL_SCRIPTS_AS_ROOT=1 $ZPOOL status -c upath,fault_led "$pool" | grep '/dev/' | (
rc=0
# Lookup all the current LED values and paths in parallel
#shellcheck disable=SC2016
cmd='echo led_token=$(cat "$VDEV_ENC_SYSFS_PATH/fault"),"$VDEV_ENC_SYSFS_PATH",'
out=$($ZPOOL status -vc "$cmd" "$pool" | grep 'led_token=')
#shellcheck disable=SC2034
echo "$out" | while read -r vdev state read write chksum therest; do
while read -r vdev state _ _ _ therest; do
# Read out current LED value and path
tmp=$(echo "$therest" | sed 's/^.*led_token=//g')
vdev_enc_sysfs_path=$(echo "$tmp" | awk -F ',' '{print $2}')
current_val=$(echo "$tmp" | awk -F ',' '{print $1}')
# Get dev name (like 'sda')
dev=$(basename "$(echo "$therest" | awk '{print $(NF-1)}')")
vdev_enc_sysfs_path=$(realpath "/sys/class/block/$dev/device/enclosure_device"*)
if [ ! -d "$vdev_enc_sysfs_path" ] ; then
# This is not a JBOD disk, but it could be a PCI NVMe drive
vdev_enc_sysfs_path=$(nvme_dev_to_slot "$dev")
fi
current_val=$(echo "$therest" | awk '{print $NF}')
if [ "$current_val" != "0" ] ; then
current_val=1
@@ -136,40 +208,33 @@ process_pool()
continue
fi
if [ ! -e "$vdev_enc_sysfs_path/fault" ] ; then
#shellcheck disable=SC2030
rc=1
zed_log_msg "vdev $vdev '$file/fault' doesn't exist"
continue;
led_path=$(path_to_led "$vdev_enc_sysfs_path")
if [ ! -e "$led_path" ] ; then
rc=3
zed_log_msg "vdev $vdev '$led_path' doesn't exist"
continue
fi
val=$(state_to_val "$state")
if [ "$current_val" = "$val" ] ; then
# LED is already set correctly
continue;
continue
fi
if ! check_and_set_led "$vdev_enc_sysfs_path/fault" "$val"; then
rc=1
if ! check_and_set_led "$led_path" "$val"; then
rc=3
fi
done
#shellcheck disable=SC2031
if [ "$rc" = "0" ] ; then
return 0
else
# We didn't see a sysfs entry that we wanted to set
return 3
fi
exit "$rc"; )
}
if [ -n "$ZEVENT_VDEV_ENC_SYSFS_PATH" ] && [ -n "$ZEVENT_VDEV_STATE_STR" ] ; then
# Got a statechange for an individual VDEV
# Got a statechange for an individual vdev
val=$(state_to_val "$ZEVENT_VDEV_STATE_STR")
vdev=$(basename "$ZEVENT_VDEV_PATH")
check_and_set_led "$ZEVENT_VDEV_ENC_SYSFS_PATH/fault" "$val"
ledpath=$(path_to_led "$ZEVENT_VDEV_ENC_SYSFS_PATH")
check_and_set_led "$ledpath" "$val"
else
# Process the entire pool
poolname=$(zed_guid_to_pool "$ZEVENT_POOL_GUID")
+7 -5
View File
@@ -1,4 +1,5 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# CDDL HEADER START
#
@@ -15,7 +16,7 @@
# Send notification in response to a fault induced statechange
#
# ZEVENT_SUBCLASS: 'statechange'
# ZEVENT_VDEV_STATE_STR: 'DEGRADED', 'FAULTED' or 'REMOVED'
# ZEVENT_VDEV_STATE_STR: 'DEGRADED', 'FAULTED', 'REMOVED', or 'UNAVAIL'
#
# Exit codes:
# 0: notification sent
@@ -31,13 +32,14 @@
if [ "${ZEVENT_VDEV_STATE_STR}" != "FAULTED" ] \
&& [ "${ZEVENT_VDEV_STATE_STR}" != "DEGRADED" ] \
&& [ "${ZEVENT_VDEV_STATE_STR}" != "REMOVED" ]; then
&& [ "${ZEVENT_VDEV_STATE_STR}" != "REMOVED" ] \
&& [ "${ZEVENT_VDEV_STATE_STR}" != "UNAVAIL" ]; then
exit 3
fi
umask 077
note_subject="ZFS device fault for pool ${ZEVENT_POOL_GUID} on $(hostname)"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
note_subject="ZFS device fault for pool ${ZEVENT_POOL} on $(hostname)"
note_pathname="$(mktemp)"
{
if [ "${ZEVENT_VDEV_STATE_STR}" = "FAULTED" ] ; then
echo "The number of I/O errors associated with a ZFS device exceeded"
@@ -64,7 +66,7 @@ note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
[ -n "${ZEVENT_VDEV_GUID}" ] && echo " vguid: ${ZEVENT_VDEV_GUID}"
[ -n "${ZEVENT_VDEV_DEVID}" ] && echo " devid: ${ZEVENT_VDEV_DEVID}"
echo " pool: ${ZEVENT_POOL_GUID}"
echo " pool: ${ZEVENT_POOL} (${ZEVENT_POOL_GUID})"
} > "${note_pathname}"
+64
View File
@@ -0,0 +1,64 @@
#!/bin/sh
# shellcheck disable=SC3014,SC2154,SC2086,SC2034
#
# Turn off disk's enclosure slot if it becomes FAULTED.
#
# Bad SCSI disks can often "disappear and reappear" causing all sorts of chaos
# as they flip between FAULTED and ONLINE. If
# ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT is set in zed.rc, and the disk gets
# FAULTED, then power down the slot via sysfs:
#
# /sys/class/enclosure/<enclosure>/<slot>/power_status
#
# We assume the user will be responsible for turning the slot back on again.
#
# Note that this script requires that your enclosure be supported by the
# Linux SCSI Enclosure services (SES) driver. The script will do nothing
# if you have no enclosure, or if your enclosure isn't supported.
#
# Exit codes:
# 0: slot successfully powered off
# 1: enclosure not available
# 2: ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT disabled
# 3: vdev was not FAULTED
# 4: The enclosure sysfs path passed from ZFS does not exist
# 5: Enclosure slot didn't actually turn off after we told it to
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
if [ ! -d /sys/class/enclosure ] ; then
# No JBOD enclosure or NVMe slots
exit 1
fi
if [ "${ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT}" != "1" ] ; then
exit 2
fi
if [ "$ZEVENT_VDEV_STATE_STR" != "FAULTED" ] ; then
exit 3
fi
if [ ! -f "$ZEVENT_VDEV_ENC_SYSFS_PATH/power_status" ] ; then
exit 4
fi
# Turn off the slot and wait for sysfs to report that the slot is off.
# It can take ~400ms on some enclosures and multiple retries may be needed.
for i in $(seq 1 20) ; do
echo "off" | tee "$ZEVENT_VDEV_ENC_SYSFS_PATH/power_status"
for j in $(seq 1 5) ; do
if [ "$(cat $ZEVENT_VDEV_ENC_SYSFS_PATH/power_status)" == "off" ] ; then
break 2
fi
sleep 0.1
done
done
if [ "$(cat $ZEVENT_VDEV_ENC_SYSFS_PATH/power_status)" != "off" ] ; then
exit 5
fi
zed_log_msg "powered down slot $ZEVENT_VDEV_ENC_SYSFS_PATH for $ZEVENT_VDEV_PATH"
+2 -1
View File
@@ -1,4 +1,5 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# Send notification in response to a TRIM_FINISH. The event
# will be received for each vdev in the pool which was trimmed.
@@ -19,7 +20,7 @@ zed_check_cmd "${ZPOOL}" || exit 9
umask 077
note_subject="ZFS ${ZEVENT_SUBCLASS} event for ${ZEVENT_POOL} on $(hostname)"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
note_pathname="$(mktemp)"
{
echo "ZFS has finished a trim:"
echo
+303 -22
View File
@@ -1,5 +1,5 @@
#!/bin/sh
# shellcheck disable=SC2039
# shellcheck disable=SC2154,SC3043
# zed-functions.sh
#
# ZED helper functions for use in ZEDLETs
@@ -76,8 +76,7 @@ zed_log_msg()
#
zed_log_err()
{
logger -p "${ZED_SYSLOG_PRIORITY}" -t "${ZED_SYSLOG_TAG}" -- "error:" \
"$(basename -- "$0"):""${ZEVENT_EID:+" eid=${ZEVENT_EID}:"}" "$@"
zed_log_msg "error: ${0##*/}:""${ZEVENT_EID:+" eid=${ZEVENT_EID}:"}" "$@"
}
@@ -126,10 +125,8 @@ zed_lock()
# Obtain a lock on the file bound to the given file descriptor.
#
eval "exec ${fd}> '${lockfile}'"
err="$(flock --exclusive "${fd}" 2>&1)"
# shellcheck disable=SC2181
if [ $? -ne 0 ]; then
eval "exec ${fd}>> '${lockfile}'"
if ! err="$(flock --exclusive "${fd}" 2>&1)"; then
zed_log_err "failed to lock \"${lockfile}\": ${err}"
fi
@@ -165,9 +162,7 @@ zed_unlock()
fi
# Release the lock and close the file descriptor.
err="$(flock --unlock "${fd}" 2>&1)"
# shellcheck disable=SC2181
if [ $? -ne 0 ]; then
if ! err="$(flock --unlock "${fd}" 2>&1)"; then
zed_log_err "failed to unlock \"${lockfile}\": ${err}"
fi
eval "exec ${fd}>&-"
@@ -206,6 +201,18 @@ zed_notify()
[ "${rv}" -eq 0 ] && num_success=$((num_success + 1))
[ "${rv}" -eq 1 ] && num_failure=$((num_failure + 1))
zed_notify_pushover "${subject}" "${pathname}"; rv=$?
[ "${rv}" -eq 0 ] && num_success=$((num_success + 1))
[ "${rv}" -eq 1 ] && num_failure=$((num_failure + 1))
zed_notify_ntfy "${subject}" "${pathname}"; rv=$?
[ "${rv}" -eq 0 ] && num_success=$((num_success + 1))
[ "${rv}" -eq 1 ] && num_failure=$((num_failure + 1))
zed_notify_gotify "${subject}" "${pathname}"; rv=$?
[ "${rv}" -eq 0 ] && num_success=$((num_success + 1))
[ "${rv}" -eq 1 ] && num_failure=$((num_failure + 1))
[ "${num_success}" -gt 0 ] && return 0
[ "${num_failure}" -gt 0 ] && return 1
return 2
@@ -224,6 +231,8 @@ zed_notify()
# ZED_EMAIL_OPTS. This undergoes the following keyword substitutions:
# - @ADDRESS@ is replaced with the space-delimited recipient email address(es)
# - @SUBJECT@ is replaced with the notification subject
# If @SUBJECT@ was omited here, a "Subject: ..." header will be added to notification
#
#
# Arguments
# subject: notification subject
@@ -241,7 +250,7 @@ zed_notify()
#
zed_notify_email()
{
local subject="$1"
local subject="${1:-"ZED notification"}"
local pathname="${2:-"/dev/null"}"
: "${ZED_EMAIL_PROG:="mail"}"
@@ -258,19 +267,30 @@ zed_notify_email()
[ -n "${subject}" ] || return 1
if [ ! -r "${pathname}" ]; then
zed_log_err \
"$(basename "${ZED_EMAIL_PROG}") cannot read \"${pathname}\""
"${ZED_EMAIL_PROG##*/} cannot read \"${pathname}\""
return 1
fi
ZED_EMAIL_OPTS="$(echo "${ZED_EMAIL_OPTS}" \
# construct cmdline options
ZED_EMAIL_OPTS_PARSED="$(echo "${ZED_EMAIL_OPTS}" \
| sed -e "s/@ADDRESS@/${ZED_EMAIL_ADDR}/g" \
-e "s/@SUBJECT@/${subject}/g")"
# shellcheck disable=SC2086
eval "${ZED_EMAIL_PROG}" ${ZED_EMAIL_OPTS} < "${pathname}" >/dev/null 2>&1
# pipe message to email prog
# shellcheck disable=SC2086,SC2248
{
# no subject passed as option?
if [ "${ZED_EMAIL_OPTS%@SUBJECT@*}" = "${ZED_EMAIL_OPTS}" ] ; then
# inject subject header
printf "Subject: %s\n" "${subject}"
fi
# output message
cat "${pathname}"
} |
eval ${ZED_EMAIL_PROG} ${ZED_EMAIL_OPTS_PARSED} >/dev/null 2>&1
rv=$?
if [ "${rv}" -ne 0 ]; then
zed_log_err "$(basename "${ZED_EMAIL_PROG}") exit=${rv}"
zed_log_err "${ZED_EMAIL_PROG##*/} exit=${rv}"
return 1
fi
return 0
@@ -367,7 +387,7 @@ zed_notify_pushbullet()
#
# Notification via Slack Webhook <https://api.slack.com/incoming-webhooks>.
# The Webhook URL (ZED_SLACK_WEBHOOK_URL) identifies this client to the
# Slack channel.
# Slack channel.
#
# Requires awk, curl, and sed executables to be installed in the standard PATH.
#
@@ -417,7 +437,7 @@ zed_notify_slack_webhook()
# Construct the JSON message for posting.
#
msg_json="$(printf '{"text": "*%s*\n%s"}' "${subject}" "${msg_body}" )"
msg_json="$(printf '{"text": "*%s*\\n%s"}' "${subject}" "${msg_body}" )"
# Send the POST request and check for errors.
#
@@ -437,6 +457,269 @@ zed_notify_slack_webhook()
return 0
}
# zed_notify_pushover (subject, pathname)
#
# Send a notification via Pushover <https://pushover.net/>.
# The access token (ZED_PUSHOVER_TOKEN) identifies this client to the
# Pushover server. The user token (ZED_PUSHOVER_USER) defines the user or
# group to which the notification will be sent.
#
# Requires curl and sed executables to be installed in the standard PATH.
#
# References
# https://pushover.net/api
#
# Arguments
# subject: notification subject
# pathname: pathname containing the notification message (OPTIONAL)
#
# Globals
# ZED_PUSHOVER_TOKEN
# ZED_PUSHOVER_USER
#
# Return
# 0: notification sent
# 1: notification failed
# 2: not configured
#
zed_notify_pushover()
{
local subject="$1"
local pathname="${2:-"/dev/null"}"
local msg_body
local msg_out
local msg_err
local url="https://api.pushover.net/1/messages.json"
[ -n "${ZED_PUSHOVER_TOKEN}" ] && [ -n "${ZED_PUSHOVER_USER}" ] || return 2
if [ ! -r "${pathname}" ]; then
zed_log_err "pushover cannot read \"${pathname}\""
return 1
fi
zed_check_cmd "curl" "sed" || return 1
# Read the message body in.
#
msg_body="$(cat "${pathname}")"
if [ -z "${msg_body}" ]
then
msg_body=$subject
subject=""
fi
# Send the POST request and check for errors.
#
msg_out="$( \
curl \
--form-string "token=${ZED_PUSHOVER_TOKEN}" \
--form-string "user=${ZED_PUSHOVER_USER}" \
--form-string "message=${msg_body}" \
--form-string "title=${subject}" \
"${url}" \
2>/dev/null \
)"; rv=$?
if [ "${rv}" -ne 0 ]; then
zed_log_err "curl exit=${rv}"
return 1
fi
msg_err="$(echo "${msg_out}" \
| sed -n -e 's/.*"errors" *:.*\[\(.*\)\].*/\1/p')"
if [ -n "${msg_err}" ]; then
zed_log_err "pushover \"${msg_err}"\"
return 1
fi
return 0
}
# zed_notify_ntfy (subject, pathname)
#
# Send a notification via Ntfy.sh <https://ntfy.sh/>.
# The ntfy topic (ZED_NTFY_TOPIC) identifies the topic that the notification
# will be sent to Ntfy.sh server. The ntfy url (ZED_NTFY_URL) defines the
# self-hosted or provided hosted ntfy service location. The ntfy access token
# <https://docs.ntfy.sh/publish/#access-tokens> (ZED_NTFY_ACCESS_TOKEN) reprsents an
# access token that could be used if a topic is read/write protected. If a
# topic can be written to publicaly, a ZED_NTFY_ACCESS_TOKEN is not required.
#
# Requires curl and sed executables to be installed in the standard PATH.
#
# References
# https://docs.ntfy.sh
#
# Arguments
# subject: notification subject
# pathname: pathname containing the notification message (OPTIONAL)
#
# Globals
# ZED_NTFY_TOPIC
# ZED_NTFY_ACCESS_TOKEN (OPTIONAL)
# ZED_NTFY_URL
#
# Return
# 0: notification sent
# 1: notification failed
# 2: not configured
#
zed_notify_ntfy()
{
local subject="$1"
local pathname="${2:-"/dev/null"}"
local msg_body
local msg_out
local msg_err
[ -n "${ZED_NTFY_TOPIC}" ] || return 2
local url="${ZED_NTFY_URL:-"https://ntfy.sh"}/${ZED_NTFY_TOPIC}"
if [ ! -r "${pathname}" ]; then
zed_log_err "ntfy cannot read \"${pathname}\""
return 1
fi
zed_check_cmd "curl" "sed" || return 1
# Read the message body in.
#
msg_body="$(cat "${pathname}")"
if [ -z "${msg_body}" ]
then
msg_body=$subject
subject=""
fi
# Send the POST request and check for errors.
#
if [ -n "${ZED_NTFY_ACCESS_TOKEN}" ]; then
msg_out="$( \
curl \
-u ":${ZED_NTFY_ACCESS_TOKEN}" \
-H "Title: ${subject}" \
-d "${msg_body}" \
-H "Priority: high" \
"${url}" \
2>/dev/null \
)"; rv=$?
else
msg_out="$( \
curl \
-H "Title: ${subject}" \
-d "${msg_body}" \
-H "Priority: high" \
"${url}" \
2>/dev/null \
)"; rv=$?
fi
if [ "${rv}" -ne 0 ]; then
zed_log_err "curl exit=${rv}"
return 1
fi
msg_err="$(echo "${msg_out}" \
| sed -n -e 's/.*"errors" *:.*\[\(.*\)\].*/\1/p')"
if [ -n "${msg_err}" ]; then
zed_log_err "ntfy \"${msg_err}"\"
return 1
fi
return 0
}
# zed_notify_gotify (subject, pathname)
#
# Send a notification via Gotify <https://gotify.net/>.
# The Gotify URL (ZED_GOTIFY_URL) defines a self-hosted Gotify location.
# The Gotify application token (ZED_GOTIFY_APPTOKEN) defines a
# Gotify application token which is associated with a message.
# The optional Gotify priority value (ZED_GOTIFY_PRIORITY) overrides the
# default or configured priority at the Gotify server for the application.
#
# Requires curl and sed executables to be installed in the standard PATH.
#
# References
# https://gotify.net/docs/index
#
# Arguments
# subject: notification subject
# pathname: pathname containing the notification message (OPTIONAL)
#
# Globals
# ZED_GOTIFY_URL
# ZED_GOTIFY_APPTOKEN
# ZED_GOTIFY_PRIORITY
#
# Return
# 0: notification sent
# 1: notification failed
# 2: not configured
#
zed_notify_gotify()
{
local subject="$1"
local pathname="${2:-"/dev/null"}"
local msg_body
local msg_out
local msg_err
[ -n "${ZED_GOTIFY_URL}" ] && [ -n "${ZED_GOTIFY_APPTOKEN}" ] || return 2
local url="${ZED_GOTIFY_URL}/message?token=${ZED_GOTIFY_APPTOKEN}"
if [ ! -r "${pathname}" ]; then
zed_log_err "gotify cannot read \"${pathname}\""
return 1
fi
zed_check_cmd "curl" "sed" || return 1
# Read the message body in.
#
msg_body="$(cat "${pathname}")"
if [ -z "${msg_body}" ]
then
msg_body=$subject
subject=""
fi
# Send the POST request and check for errors.
#
if [ -n "${ZED_GOTIFY_PRIORITY}" ]; then
msg_out="$( \
curl \
--form-string "title=${subject}" \
--form-string "message=${msg_body}" \
--form-string "priority=${ZED_GOTIFY_PRIORITY}" \
"${url}" \
2>/dev/null \
)"; rv=$?
else
msg_out="$( \
curl \
--form-string "title=${subject}" \
--form-string "message=${msg_body}" \
"${url}" \
2>/dev/null \
)"; rv=$?
fi
if [ "${rv}" -ne 0 ]; then
zed_log_err "curl exit=${rv}"
return 1
fi
msg_err="$(echo "${msg_out}" \
| sed -n -e 's/.*"errors" *:.*\[\(.*\)\].*/\1/p')"
if [ -n "${msg_err}" ]; then
zed_log_err "gotify \"${msg_err}"\"
return 1
fi
return 0
}
# zed_rate_limit (tag, [interval])
#
# Check whether an event of a given type [tag] has already occurred within the
@@ -511,10 +794,8 @@ zed_guid_to_pool()
return
fi
guid=$(printf "%llu" "$1")
if [ -n "$guid" ] ; then
$ZPOOL get -H -ovalue,name guid | awk '$1=='"$guid"' {print $2}'
fi
guid="$(printf "%u" "$1")"
$ZPOOL get -H -ovalue,name guid | awk '$1 == '"$guid"' {print $2; exit}'
}
# zed_exit_if_ignoring_this_event
+85 -8
View File
@@ -1,8 +1,7 @@
##
# zed.rc
#
# This file should be owned by root and permissioned 0600.
# zed.rc ZEDLET configuration.
##
# shellcheck disable=SC2034
##
# Absolute path to the debug output file.
@@ -13,9 +12,9 @@
# Email address of the zpool administrator for receipt of notifications;
# multiple addresses can be specified if they are delimited by whitespace.
# Email will only be sent if ZED_EMAIL_ADDR is defined.
# Disabled by default; uncomment to enable.
# Enabled by default; comment to disable.
#
#ZED_EMAIL_ADDR="root"
ZED_EMAIL_ADDR="root"
##
# Name or path of executable responsible for sending notifications via email;
@@ -30,6 +29,7 @@
# The string @SUBJECT@ will be replaced with the notification subject;
# this should be protected with quotes to prevent word-splitting.
# Email will only be sent if ZED_EMAIL_ADDR is defined.
# If @SUBJECT@ was omited here, a "Subject: ..." header will be added to notification
#
#ZED_EMAIL_OPTS="-s '@SUBJECT@' @ADDRESS@"
@@ -82,6 +82,23 @@
#
#ZED_SLACK_WEBHOOK_URL=""
##
# Pushover token.
# This defines the application from which the notification will be sent.
# <https://pushover.net/api#registration>
# Disabled by default; uncomment to enable.
# ZED_PUSHOVER_USER, below, must also be configured.
#
#ZED_PUSHOVER_TOKEN=""
##
# Pushover user key.
# This defines which user or group will receive Pushover notifications.
# <https://pushover.net/api#identifiers>
# Disabled by default; uncomment to enable.
# ZED_PUSHOVER_TOKEN, above, must also be configured.
#ZED_PUSHOVER_USER=""
##
# Default directory for zed state files.
#
@@ -89,8 +106,8 @@
##
# Turn on/off enclosure LEDs when drives get DEGRADED/FAULTED. This works for
# device mapper and multipath devices as well. Your enclosure must be
# supported by the Linux SES driver for this to work.
# device mapper and multipath devices as well. This works with JBOD enclosures
# and NVMe PCI drives (assuming they're supported by Linux in sysfs).
#
ZED_USE_ENCLOSURE_LEDS=1
@@ -118,5 +135,65 @@ ZED_USE_ENCLOSURE_LEDS=1
# Otherwise, if ZED_SYSLOG_SUBCLASS_EXCLUDE is set, the
# matching subclasses are excluded from logging.
#ZED_SYSLOG_SUBCLASS_INCLUDE="checksum|scrub_*|vdev.*"
#ZED_SYSLOG_SUBCLASS_EXCLUDE="statechange|config_*|history_event"
ZED_SYSLOG_SUBCLASS_EXCLUDE="history_event"
##
# Use GUIDs instead of names when logging pool and vdevs
# Disabled by default, 1 to enable and 0 to disable.
#ZED_SYSLOG_DISPLAY_GUIDS=1
##
# Power off the drive's slot in the enclosure if it becomes FAULTED. This can
# help silence misbehaving drives. This assumes your drive enclosure fully
# supports slot power control via sysfs.
#ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT=1
##
# Power off the drive's slot in the enclosure if there is a hung I/O which
# exceeds the deadman timeout. This can help prevent a single misbehaving
# drive from rendering a redundant pool unavailable. This assumes your drive
# enclosure fully supports slot power control via sysfs.
#ZED_POWER_OFF_ENCLOSURE_SLOT_ON_DEADMAN=1
##
# Ntfy topic
# This defines which topic will receive the ntfy notification.
# <https://docs.ntfy.sh/publish/>
# Disabled by default; uncomment to enable.
#ZED_NTFY_TOPIC=""
##
# Ntfy access token (optional for public topics)
# This defines an access token which can be used
# to allow you to authenticate when sending to topics
# <https://docs.ntfy.sh/publish/#access-tokens>
# Disabled by default; uncomment to enable.
#ZED_NTFY_ACCESS_TOKEN=""
##
# Ntfy Service URL
# This defines which service the ntfy call will be directed toward
# <https://docs.ntfy.sh/install/>
# https://ntfy.sh by default; uncomment to enable an alternative service url.
#ZED_NTFY_URL="https://ntfy.sh"
##
# Gotify server URL
# This defines a URL that the Gotify call will be directed toward.
# <https://gotify.net/docs/index>
# Disabled by default; uncomment to enable.
#ZED_GOTIFY_URL=""
##
# Gotify application token
# This defines a Gotify application token which a message is associated with.
# This token is generated when an application is created on the Gotify server.
# Disabled by default; uncomment to enable.
#ZED_GOTIFY_APPTOKEN=""
##
# Gotify priority (optional)
# If defined, this overrides the default priority of the
# Gotify application associated with ZED_GOTIFY_APPTOKEN.
# Value is an integer 0 and up.
#ZED_GOTIFY_PRIORITY=""
+3 -18
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -15,11 +15,6 @@
#ifndef ZED_H
#define ZED_H
/*
* Absolute path for the default zed configuration file.
*/
#define ZED_CONF_FILE SYSCONFDIR "/zfs/zed.conf"
/*
* Absolute path for the default zed pid file.
*/
@@ -35,16 +30,6 @@
*/
#define ZED_ZEDLET_DIR SYSCONFDIR "/zfs/zed.d"
/*
* Reserved for future use.
*/
#define ZED_MAX_EVENTS 0
/*
* Reserved for future use.
*/
#define ZED_MIN_EVENTS 0
/*
* String prefix for ZED variables passed via environment variables.
*/
+113 -128
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -22,6 +22,7 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/uio.h>
#include <unistd.h>
@@ -32,43 +33,27 @@
#include "zed_strings.h"
/*
* Return a new configuration with default values.
* Initialise the configuration with default values.
*/
struct zed_conf *
zed_conf_create(void)
void
zed_conf_init(struct zed_conf *zcp)
{
struct zed_conf *zcp;
memset(zcp, 0, sizeof (*zcp));
zcp = calloc(1, sizeof (*zcp));
if (!zcp)
goto nomem;
/* zcp->zfs_hdl opened in zed_event_init() */
/* zcp->zedlets created in zed_conf_scan_dir() */
zcp->syslog_facility = LOG_DAEMON;
zcp->min_events = ZED_MIN_EVENTS;
zcp->max_events = ZED_MAX_EVENTS;
zcp->pid_fd = -1;
zcp->zedlets = NULL; /* created via zed_conf_scan_dir() */
zcp->state_fd = -1; /* opened via zed_conf_open_state() */
zcp->zfs_hdl = NULL; /* opened via zed_event_init() */
zcp->zevent_fd = -1; /* opened via zed_event_init() */
zcp->pid_fd = -1; /* opened in zed_conf_write_pid() */
zcp->state_fd = -1; /* opened in zed_conf_open_state() */
zcp->zevent_fd = -1; /* opened in zed_event_init() */
if (!(zcp->conf_file = strdup(ZED_CONF_FILE)))
goto nomem;
zcp->max_jobs = 16;
zcp->max_zevent_buf_len = 1 << 20;
if (!(zcp->pid_file = strdup(ZED_PID_FILE)))
goto nomem;
if (!(zcp->zedlet_dir = strdup(ZED_ZEDLET_DIR)))
goto nomem;
if (!(zcp->state_file = strdup(ZED_STATE_FILE)))
goto nomem;
return (zcp);
nomem:
zed_log_die("Failed to create conf: %s", strerror(errno));
return (NULL);
if (!(zcp->pid_file = strdup(ZED_PID_FILE)) ||
!(zcp->zedlet_dir = strdup(ZED_ZEDLET_DIR)) ||
!(zcp->state_file = strdup(ZED_STATE_FILE)))
zed_log_die("Failed to create conf: %s", strerror(errno));
}
/*
@@ -79,9 +64,6 @@ nomem:
void
zed_conf_destroy(struct zed_conf *zcp)
{
if (!zcp)
return;
if (zcp->state_fd >= 0) {
if (close(zcp->state_fd) < 0)
zed_log_msg(LOG_WARNING,
@@ -102,10 +84,6 @@ zed_conf_destroy(struct zed_conf *zcp)
zcp->pid_file, strerror(errno));
zcp->pid_fd = -1;
}
if (zcp->conf_file) {
free(zcp->conf_file);
zcp->conf_file = NULL;
}
if (zcp->pid_file) {
free(zcp->pid_file);
zcp->pid_file = NULL;
@@ -122,7 +100,6 @@ zed_conf_destroy(struct zed_conf *zcp)
zed_strings_destroy(zcp->zedlets);
zcp->zedlets = NULL;
}
free(zcp);
}
/*
@@ -132,46 +109,54 @@ zed_conf_destroy(struct zed_conf *zcp)
* otherwise, output to stderr and exit with a failure status.
*/
static void
_zed_conf_display_help(const char *prog, int got_err)
_zed_conf_display_help(const char *prog, boolean_t got_err)
{
struct opt { const char *o, *d, *v; };
FILE *fp = got_err ? stderr : stdout;
int w1 = 4; /* width of leading whitespace */
int w2 = 8; /* width of L-justified option field */
struct opt *oo;
struct opt iopts[] = {
{ .o = "-h", .d = "Display help" },
{ .o = "-L", .d = "Display license information" },
{ .o = "-V", .d = "Display version information" },
{},
};
struct opt nopts[] = {
{ .o = "-v", .d = "Be verbose" },
{ .o = "-f", .d = "Force daemon to run" },
{ .o = "-F", .d = "Run daemon in the foreground" },
{ .o = "-I",
.d = "Idle daemon until kernel module is (re)loaded" },
{ .o = "-M", .d = "Lock all pages in memory" },
{ .o = "-P", .d = "$PATH for ZED to use (only used by ZTS)" },
{ .o = "-Z", .d = "Zero state file" },
{},
};
struct opt vopts[] = {
{ .o = "-d DIR", .d = "Read enabled ZEDLETs from DIR.",
.v = ZED_ZEDLET_DIR },
{ .o = "-p FILE", .d = "Write daemon's PID to FILE.",
.v = ZED_PID_FILE },
{ .o = "-s FILE", .d = "Write daemon's state to FILE.",
.v = ZED_STATE_FILE },
{ .o = "-j JOBS", .d = "Start at most JOBS at once.",
.v = "16" },
{ .o = "-b LEN", .d = "Cap kernel event buffer at LEN entries.",
.v = "1048576" },
{},
};
fprintf(fp, "Usage: %s [OPTION]...\n", (prog ? prog : "zed"));
fprintf(fp, "\n");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-h",
"Display help.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-L",
"Display license information.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-V",
"Display version information.");
for (oo = iopts; oo->o; ++oo)
fprintf(fp, " %*s %s\n", -8, oo->o, oo->d);
fprintf(fp, "\n");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-v",
"Be verbose.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-f",
"Force daemon to run.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-F",
"Run daemon in the foreground.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-I",
"Idle daemon until kernel module is (re)loaded.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-M",
"Lock all pages in memory.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-P",
"$PATH for ZED to use (only used by ZTS).");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-Z",
"Zero state file.");
for (oo = nopts; oo->o; ++oo)
fprintf(fp, " %*s %s\n", -8, oo->o, oo->d);
fprintf(fp, "\n");
#if 0
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-c FILE",
"Read configuration from FILE.", ZED_CONF_FILE);
#endif
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-d DIR",
"Read enabled ZEDLETs from DIR.", ZED_ZEDLET_DIR);
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-p FILE",
"Write daemon's PID to FILE.", ZED_PID_FILE);
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-s FILE",
"Write daemon's state to FILE.", ZED_STATE_FILE);
for (oo = vopts; oo->o; ++oo)
fprintf(fp, " %*s %s [%s]\n", -8, oo->o, oo->d, oo->v);
fprintf(fp, "\n");
exit(got_err ? EXIT_FAILURE : EXIT_SUCCESS);
@@ -183,20 +168,14 @@ _zed_conf_display_help(const char *prog, int got_err)
static void
_zed_conf_display_license(void)
{
const char **pp;
const char *text[] = {
"The ZFS Event Daemon (ZED) is distributed under the terms of the",
" Common Development and Distribution License (CDDL-1.0)",
" <http://opensource.org/licenses/CDDL-1.0>.",
"",
printf(
"The ZFS Event Daemon (ZED) is distributed under the terms of the\n"
" Common Development and Distribution License (CDDL-1.0)\n"
" <http://opensource.org/licenses/CDDL-1.0>.\n"
"\n"
"Developed at Lawrence Livermore National Laboratory"
" (LLNL-CODE-403049).",
"",
NULL
};
for (pp = text; *pp; pp++)
printf("%s\n", *pp);
" (LLNL-CODE-403049).\n"
"\n");
exit(EXIT_SUCCESS);
}
@@ -231,16 +210,19 @@ _zed_conf_parse_path(char **resultp, const char *path)
if (path[0] == '/') {
*resultp = strdup(path);
} else if (!getcwd(buf, sizeof (buf))) {
zed_log_die("Failed to get current working dir: %s",
strerror(errno));
} else if (strlcat(buf, "/", sizeof (buf)) >= sizeof (buf)) {
zed_log_die("Failed to copy path: %s", strerror(ENAMETOOLONG));
} else if (strlcat(buf, path, sizeof (buf)) >= sizeof (buf)) {
zed_log_die("Failed to copy path: %s", strerror(ENAMETOOLONG));
} else {
if (!getcwd(buf, sizeof (buf)))
zed_log_die("Failed to get current working dir: %s",
strerror(errno));
if (strlcat(buf, "/", sizeof (buf)) >= sizeof (buf) ||
strlcat(buf, path, sizeof (buf)) >= sizeof (buf))
zed_log_die("Failed to copy path: %s",
strerror(ENAMETOOLONG));
*resultp = strdup(buf);
}
if (!*resultp)
zed_log_die("Failed to copy path: %s", strerror(ENOMEM));
}
@@ -251,8 +233,9 @@ _zed_conf_parse_path(char **resultp, const char *path)
void
zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
{
const char * const opts = ":hLVc:d:p:P:s:vfFMZI";
const char * const opts = ":hLVd:p:P:s:vfFMZIj:b:";
int opt;
unsigned long raw;
if (!zcp || !argv || !argv[0])
zed_log_die("Failed to parse options: Internal error");
@@ -262,7 +245,7 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
while ((opt = getopt(argc, argv, opts)) != -1) {
switch (opt) {
case 'h':
_zed_conf_display_help(argv[0], EXIT_SUCCESS);
_zed_conf_display_help(argv[0], B_FALSE);
break;
case 'L':
_zed_conf_display_license();
@@ -270,9 +253,6 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
case 'V':
_zed_conf_display_version();
break;
case 'c':
_zed_conf_parse_path(&zcp->conf_file, optarg);
break;
case 'd':
_zed_conf_parse_path(&zcp->zedlet_dir, optarg);
break;
@@ -303,31 +283,41 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
case 'Z':
zcp->do_zero = 1;
break;
case 'j':
errno = 0;
raw = strtoul(optarg, NULL, 0);
if (errno == ERANGE || raw > INT16_MAX) {
zed_log_die("%lu is too many jobs", raw);
} if (raw == 0) {
zed_log_die("0 jobs makes no sense");
} else {
zcp->max_jobs = raw;
}
break;
case 'b':
errno = 0;
raw = strtoul(optarg, NULL, 0);
if (errno == ERANGE || raw > INT32_MAX) {
zed_log_die("%lu is too large", raw);
} if (raw == 0) {
zcp->max_zevent_buf_len = INT32_MAX;
} else {
zcp->max_zevent_buf_len = raw;
}
break;
case '?':
default:
if (optopt == '?')
_zed_conf_display_help(argv[0], EXIT_SUCCESS);
_zed_conf_display_help(argv[0], B_FALSE);
fprintf(stderr, "%s: %s '-%c'\n\n", argv[0],
"Invalid option", optopt);
_zed_conf_display_help(argv[0], EXIT_FAILURE);
fprintf(stderr, "%s: Invalid option '-%c'\n\n",
argv[0], optopt);
_zed_conf_display_help(argv[0], B_TRUE);
break;
}
}
}
/*
* Parse the configuration file into the configuration [zcp].
*
* FIXME: Not yet implemented.
*/
void
zed_conf_parse_file(struct zed_conf *zcp)
{
if (!zcp)
zed_log_die("Failed to parse config: %s", strerror(EINVAL));
}
/*
* Scan the [zcp] zedlet_dir for files to exec based on the event class.
* Files must be executable by user, but not writable by group or other.
@@ -335,8 +325,6 @@ zed_conf_parse_file(struct zed_conf *zcp)
*
* Return 0 on success with an updated set of zedlets,
* or -1 on error with errno set.
*
* FIXME: Check if zedlet_dir and all parent dirs are secure.
*/
int
zed_conf_scan_dir(struct zed_conf *zcp)
@@ -452,8 +440,6 @@ zed_conf_scan_dir(struct zed_conf *zcp)
int
zed_conf_write_pid(struct zed_conf *zcp)
{
const mode_t dirmode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
const mode_t filemode = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;
char buf[PATH_MAX];
int n;
char *p;
@@ -481,7 +467,7 @@ zed_conf_write_pid(struct zed_conf *zcp)
if (p)
*p = '\0';
if ((mkdirp(buf, dirmode) < 0) && (errno != EEXIST)) {
if ((mkdirp(buf, 0755) < 0) && (errno != EEXIST)) {
zed_log_msg(LOG_ERR, "Failed to create directory \"%s\": %s",
buf, strerror(errno));
goto err;
@@ -491,7 +477,7 @@ zed_conf_write_pid(struct zed_conf *zcp)
*/
mask = umask(0);
umask(mask | 022);
zcp->pid_fd = open(zcp->pid_file, (O_RDWR | O_CREAT), filemode);
zcp->pid_fd = open(zcp->pid_file, O_RDWR | O_CREAT | O_CLOEXEC, 0644);
umask(mask);
if (zcp->pid_fd < 0) {
zed_log_msg(LOG_ERR, "Failed to open PID file \"%s\": %s",
@@ -528,7 +514,7 @@ zed_conf_write_pid(struct zed_conf *zcp)
errno = ERANGE;
zed_log_msg(LOG_ERR, "Failed to write PID file \"%s\": %s",
zcp->pid_file, strerror(errno));
} else if (zed_file_write_n(zcp->pid_fd, buf, n) != n) {
} else if (write(zcp->pid_fd, buf, n) != n) {
zed_log_msg(LOG_ERR, "Failed to write PID file \"%s\": %s",
zcp->pid_file, strerror(errno));
} else if (fdatasync(zcp->pid_fd) < 0) {
@@ -556,7 +542,6 @@ int
zed_conf_open_state(struct zed_conf *zcp)
{
char dirbuf[PATH_MAX];
mode_t dirmode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
int n;
char *p;
int rv;
@@ -578,7 +563,7 @@ zed_conf_open_state(struct zed_conf *zcp)
if (p)
*p = '\0';
if ((mkdirp(dirbuf, dirmode) < 0) && (errno != EEXIST)) {
if ((mkdirp(dirbuf, 0755) < 0) && (errno != EEXIST)) {
zed_log_msg(LOG_WARNING,
"Failed to create directory \"%s\": %s",
dirbuf, strerror(errno));
@@ -596,7 +581,7 @@ zed_conf_open_state(struct zed_conf *zcp)
(void) unlink(zcp->state_file);
zcp->state_fd = open(zcp->state_file,
(O_RDWR | O_CREAT), (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH));
O_RDWR | O_CREAT | O_CLOEXEC, 0644);
if (zcp->state_fd < 0) {
zed_log_msg(LOG_WARNING, "Failed to open state file \"%s\": %s",
zcp->state_file, strerror(errno));
@@ -672,7 +657,7 @@ zed_conf_read_state(struct zed_conf *zcp, uint64_t *eidp, int64_t etime[])
} else if (n != len) {
errno = EIO;
zed_log_msg(LOG_WARNING,
"Failed to read state file \"%s\": Read %d of %d bytes",
"Failed to read state file \"%s\": Read %zd of %zd bytes",
zcp->state_file, n, len);
return (-1);
}
@@ -721,7 +706,7 @@ zed_conf_write_state(struct zed_conf *zcp, uint64_t eid, int64_t etime[])
if (n != len) {
errno = EIO;
zed_log_msg(LOG_WARNING,
"Failed to write state file \"%s\": Wrote %d of %d bytes",
"Failed to write state file \"%s\": Wrote %zd of %zd bytes",
zcp->state_file, n, len);
return (-1);
}
+20 -23
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -20,43 +20,40 @@
#include "zed_strings.h"
struct zed_conf {
unsigned do_force:1; /* true if force enabled */
unsigned do_foreground:1; /* true if run in foreground */
unsigned do_memlock:1; /* true if locking memory */
unsigned do_verbose:1; /* true if verbosity enabled */
unsigned do_zero:1; /* true if zeroing state */
unsigned do_idle:1; /* true if idle enabled */
int syslog_facility; /* syslog facility value */
int min_events; /* RESERVED FOR FUTURE USE */
int max_events; /* RESERVED FOR FUTURE USE */
char *conf_file; /* abs path to config file */
char *pid_file; /* abs path to pid file */
int pid_fd; /* fd to pid file for lock */
char *zedlet_dir; /* abs path to zedlet dir */
zed_strings_t *zedlets; /* names of enabled zedlets */
char *state_file; /* abs path to state file */
int state_fd; /* fd to state file */
libzfs_handle_t *zfs_hdl; /* handle to libzfs */
int zevent_fd; /* fd for access to zevents */
zed_strings_t *zedlets; /* names of enabled zedlets */
char *path; /* custom $PATH for zedlets to use */
int pid_fd; /* fd to pid file for lock */
int state_fd; /* fd to state file */
int zevent_fd; /* fd for access to zevents */
int16_t max_jobs; /* max zedlets to run at one time */
int32_t max_zevent_buf_len; /* max size of kernel event list */
boolean_t do_force:1; /* true if force enabled */
boolean_t do_foreground:1; /* true if run in foreground */
boolean_t do_memlock:1; /* true if locking memory */
boolean_t do_verbose:1; /* true if verbosity enabled */
boolean_t do_zero:1; /* true if zeroing state */
boolean_t do_idle:1; /* true if idle enabled */
};
struct zed_conf *zed_conf_create(void);
void zed_conf_init(struct zed_conf *zcp);
void zed_conf_destroy(struct zed_conf *zcp);
void zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv);
void zed_conf_parse_file(struct zed_conf *zcp);
int zed_conf_scan_dir(struct zed_conf *zcp);
int zed_conf_write_pid(struct zed_conf *zcp);
int zed_conf_open_state(struct zed_conf *zcp);
int zed_conf_read_state(struct zed_conf *zcp, uint64_t *eidp, int64_t etime[]);
int zed_conf_write_state(struct zed_conf *zcp, uint64_t eid, int64_t etime[]);
#endif /* !ZED_CONF_H */
+69 -15
View File
@@ -49,7 +49,7 @@ struct udev_monitor *g_mon;
#define DEV_BYID_PATH "/dev/disk/by-id/"
/* 64MB is minimum usable disk for ZFS */
#define MINIMUM_SECTORS 131072
#define MINIMUM_SECTORS 131072ULL
/*
@@ -60,7 +60,7 @@ struct udev_monitor *g_mon;
static void
zed_udev_event(const char *class, const char *subclass, nvlist_t *nvl)
{
char *strval;
const char *strval;
uint64_t numval;
zed_log_msg(LOG_INFO, "zed_disk_event:");
@@ -72,10 +72,14 @@ zed_udev_event(const char *class, const char *subclass, nvlist_t *nvl)
zed_log_msg(LOG_INFO, "\t%s: %s", DEV_PATH, strval);
if (nvlist_lookup_string(nvl, DEV_IDENTIFIER, &strval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %s", DEV_IDENTIFIER, strval);
if (nvlist_lookup_boolean(nvl, DEV_IS_PART) == B_TRUE)
zed_log_msg(LOG_INFO, "\t%s: B_TRUE", DEV_IS_PART);
if (nvlist_lookup_string(nvl, DEV_PHYS_PATH, &strval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %s", DEV_PHYS_PATH, strval);
if (nvlist_lookup_uint64(nvl, DEV_SIZE, &numval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %llu", DEV_SIZE, numval);
if (nvlist_lookup_uint64(nvl, DEV_PARENT_SIZE, &numval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %llu", DEV_PARENT_SIZE, numval);
if (nvlist_lookup_uint64(nvl, ZFS_EV_POOL_GUID, &numval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %llu", ZFS_EV_POOL_GUID, numval);
if (nvlist_lookup_uint64(nvl, ZFS_EV_VDEV_GUID, &numval) == 0)
@@ -128,6 +132,21 @@ dev_event_nvlist(struct udev_device *dev)
numval *= strtoull(value, NULL, 10);
(void) nvlist_add_uint64(nvl, DEV_SIZE, numval);
/*
* If the device has a parent, then get the parent block
* device's size as well. For example, /dev/sda1's parent
* is /dev/sda.
*/
struct udev_device *parent_dev = udev_device_get_parent(dev);
if (parent_dev != NULL &&
(value = udev_device_get_sysattr_value(parent_dev, "size"))
!= NULL) {
uint64_t numval = DEV_BSIZE;
numval *= strtoull(value, NULL, 10);
(void) nvlist_add_uint64(nvl, DEV_PARENT_SIZE, numval);
}
}
/*
@@ -160,14 +179,15 @@ static void *
zed_udev_monitor(void *arg)
{
struct udev_monitor *mon = arg;
char *tmp, *tmp2;
const char *tmp;
char *tmp2;
zed_log_msg(LOG_INFO, "Waiting for new udev disk events...");
while (1) {
struct udev_device *dev;
const char *action, *type, *part, *sectors;
const char *bus, *uuid;
const char *bus, *uuid, *devpath;
const char *class, *subclass;
nvlist_t *nvl;
boolean_t is_zfs = B_FALSE;
@@ -206,6 +226,12 @@ zed_udev_monitor(void *arg)
* if this is a disk and it is partitioned, then the
* zfs label will reside in a DEVTYPE=partition and
* we can skip passing this event
*
* Special case: Blank disks are sometimes reported with
* an erroneous 'atari' partition, and should not be
* excluded from being used as an autoreplace disk:
*
* https://github.com/openzfs/zfs/issues/13497
*/
type = udev_device_get_property_value(dev, "DEVTYPE");
part = udev_device_get_property_value(dev,
@@ -213,9 +239,23 @@ zed_udev_monitor(void *arg)
if (type != NULL && type[0] != '\0' &&
strcmp(type, "disk") == 0 &&
part != NULL && part[0] != '\0') {
/* skip and wait for partition event */
udev_device_unref(dev);
continue;
const char *devname =
udev_device_get_property_value(dev, "DEVNAME");
if (strcmp(part, "atari") == 0) {
zed_log_msg(LOG_INFO,
"%s: %s is reporting an atari partition, "
"but we're going to assume it's a false "
"positive and still use it (issue #13497)",
__func__, devname);
} else {
zed_log_msg(LOG_INFO,
"%s: skip %s since it has a %s partition "
"already", __func__, devname, part);
/* skip and wait for partition event */
udev_device_unref(dev);
continue;
}
}
/*
@@ -227,6 +267,11 @@ zed_udev_monitor(void *arg)
sectors = udev_device_get_sysattr_value(dev, "size");
if (sectors != NULL &&
strtoull(sectors, NULL, 10) < MINIMUM_SECTORS) {
zed_log_msg(LOG_INFO,
"%s: %s sectors %s < %llu (minimum)",
__func__,
udev_device_get_property_value(dev, "DEVNAME"),
sectors, MINIMUM_SECTORS);
udev_device_unref(dev);
continue;
}
@@ -236,10 +281,19 @@ zed_udev_monitor(void *arg)
* device id string is required in the message schema
* for matching with vdevs. Preflight here for expected
* udev information.
*
* Special case:
* NVMe devices don't have ID_BUS set (at least on RHEL 7-8),
* but they are valid for autoreplace. Add a special case for
* them by searching for "/nvme/" in the udev DEVPATH:
*
* DEVPATH=/devices/pci0000:00/0000:00:1e.0/nvme/nvme2/nvme2n1
*/
bus = udev_device_get_property_value(dev, "ID_BUS");
uuid = udev_device_get_property_value(dev, "DM_UUID");
if (!is_zfs && (bus == NULL && uuid == NULL)) {
devpath = udev_device_get_devpath(dev);
if (!is_zfs && (bus == NULL && uuid == NULL &&
strstr(devpath, "/nvme/") == NULL)) {
zed_log_msg(LOG_INFO, "zed_udev_monitor: %s no devid "
"source", udev_device_get_devnode(dev));
udev_device_unref(dev);
@@ -284,7 +338,7 @@ zed_udev_monitor(void *arg)
if (strcmp(class, EC_DEV_STATUS) == 0 &&
udev_device_get_property_value(dev, "DM_UUID") &&
udev_device_get_property_value(dev, "MPATH_SBIN_PATH")) {
tmp = (char *)udev_device_get_devnode(dev);
tmp = udev_device_get_devnode(dev);
tmp2 = zfs_get_underlying_path(tmp);
if (tmp && tmp2 && (strcmp(tmp, tmp2) != 0)) {
/*
@@ -301,8 +355,7 @@ zed_udev_monitor(void *arg)
class = EC_DEV_ADD;
subclass = ESC_DISK;
} else {
tmp = (char *)
udev_device_get_property_value(dev,
tmp = udev_device_get_property_value(dev,
"DM_NR_VALID_PATHS");
/* treat as a multipath remove */
if (tmp != NULL && strcmp(tmp, "0") == 0) {
@@ -350,7 +403,7 @@ zed_udev_monitor(void *arg)
}
int
zed_disk_event_init()
zed_disk_event_init(void)
{
int fd, fflags;
@@ -379,13 +432,14 @@ zed_disk_event_init()
return (-1);
}
pthread_setname_np(g_mon_tid, "udev monitor");
zed_log_msg(LOG_INFO, "zed_disk_event_init");
return (0);
}
void
zed_disk_event_fini()
zed_disk_event_fini(void)
{
/* cancel monitor thread at recvmsg() */
(void) pthread_cancel(g_mon_tid);
@@ -403,13 +457,13 @@ zed_disk_event_fini()
#include "zed_disk_event.h"
int
zed_disk_event_init()
zed_disk_event_init(void)
{
return (0);
}
void
zed_disk_event_fini()
zed_disk_event_fini(void)
{
}
+108 -40
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -15,7 +15,7 @@
#include <ctype.h>
#include <errno.h>
#include <fcntl.h>
#include <libzfs.h> /* FIXME: Replace with libzfs_core. */
#include <libzfs_core.h>
#include <paths.h>
#include <stdarg.h>
#include <stdio.h>
@@ -35,9 +35,12 @@
#include "zed_strings.h"
#include "agents/zfs_agents.h"
#include <libzutil.h>
#define MAXBUF 4096
static int max_zevent_buf_len = 1 << 20;
/*
* Open the libzfs interface.
*/
@@ -54,7 +57,7 @@ zed_event_init(struct zed_conf *zcp)
zed_log_die("Failed to initialize libzfs");
}
zcp->zevent_fd = open(ZFS_DEV, O_RDWR);
zcp->zevent_fd = open(ZFS_DEV, O_RDWR | O_CLOEXEC);
if (zcp->zevent_fd < 0) {
if (zcp->do_idle)
return (-1);
@@ -70,6 +73,9 @@ zed_event_init(struct zed_conf *zcp)
zed_log_die("Failed to initialize disk events");
}
if (zcp->max_zevent_buf_len != 0)
max_zevent_buf_len = zcp->max_zevent_buf_len;
return (0);
}
@@ -96,6 +102,57 @@ zed_event_fini(struct zed_conf *zcp)
libzfs_fini(zcp->zfs_hdl);
zcp->zfs_hdl = NULL;
}
zed_exec_fini();
}
static void
_bump_event_queue_length(void)
{
int zzlm = -1, wr;
char qlen_buf[12] = {0}; /* parameter is int => max "-2147483647\n" */
long int qlen, orig_qlen;
zzlm = open("/sys/module/zfs/parameters/zfs_zevent_len_max", O_RDWR);
if (zzlm < 0)
goto done;
if (read(zzlm, qlen_buf, sizeof (qlen_buf)) < 0)
goto done;
qlen_buf[sizeof (qlen_buf) - 1] = '\0';
errno = 0;
orig_qlen = qlen = strtol(qlen_buf, NULL, 10);
if (errno == ERANGE)
goto done;
if (qlen <= 0)
qlen = 512; /* default zfs_zevent_len_max value */
else
qlen *= 2;
/*
* Don't consume all of kernel memory with event logs if something
* goes wrong.
*/
if (qlen > max_zevent_buf_len)
qlen = max_zevent_buf_len;
if (qlen == orig_qlen)
goto done;
wr = snprintf(qlen_buf, sizeof (qlen_buf), "%ld", qlen);
if (wr >= sizeof (qlen_buf)) {
wr = sizeof (qlen_buf) - 1;
zed_log_msg(LOG_WARNING, "Truncation in %s()", __func__);
}
if (pwrite(zzlm, qlen_buf, wr + 1, 0) < 0)
goto done;
zed_log_msg(LOG_WARNING, "Bumping queue length to %ld", qlen);
done:
if (zzlm > -1)
(void) close(zzlm);
}
/*
@@ -136,10 +193,7 @@ zed_event_seek(struct zed_conf *zcp, uint64_t saved_eid, int64_t saved_etime[])
if (n_dropped > 0) {
zed_log_msg(LOG_WARNING, "Missed %d events", n_dropped);
/*
* FIXME: Increase max size of event nvlist in
* /sys/module/zfs/parameters/zfs_zevent_len_max ?
*/
_bump_event_queue_length();
}
if (nvlist_lookup_uint64(nvl, "eid", &eid) != 0) {
zed_log_msg(LOG_WARNING, "Failed to lookup zevent eid");
@@ -211,7 +265,7 @@ _zed_event_value_is_hex(const char *name)
*
* All environment variables in [zsp] should be added through this function.
*/
static int
static __attribute__((format(printf, 5, 6))) int
_zed_event_add_var(uint64_t eid, zed_strings_t *zsp,
const char *prefix, const char *name, const char *fmt, ...)
{
@@ -559,7 +613,7 @@ _zed_event_add_string_array(uint64_t eid, zed_strings_t *zsp,
char buf[MAXBUF];
int buflen = sizeof (buf);
const char *name;
char **strp;
const char **strp;
uint_t nelem;
uint_t i;
char *p;
@@ -586,8 +640,6 @@ _zed_event_add_string_array(uint64_t eid, zed_strings_t *zsp,
* Convert the nvpair [nvp] to a string which is added to the environment
* of the child process.
* Return 0 on success, -1 on error.
*
* FIXME: Refactor with cmd/zpool/zpool_main.c:zpool_do_events_nvprint()?
*/
static void
_zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
@@ -601,7 +653,7 @@ _zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
uint16_t i16;
uint32_t i32;
uint64_t i64;
char *str;
const char *str;
assert(zsp != NULL);
assert(nvp != NULL);
@@ -686,23 +738,11 @@ _zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
_zed_event_add_var(eid, zsp, prefix, name,
"%llu", (u_longlong_t)i64);
break;
case DATA_TYPE_NVLIST:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break;
case DATA_TYPE_STRING:
(void) nvpair_value_string(nvp, &str);
_zed_event_add_var(eid, zsp, prefix, name,
"%s", (str ? str : "<NULL>"));
break;
case DATA_TYPE_BOOLEAN_ARRAY:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break;
case DATA_TYPE_BYTE_ARRAY:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break;
case DATA_TYPE_INT8_ARRAY:
_zed_event_add_int8_array(eid, zsp, prefix, nvp);
break;
@@ -730,9 +770,11 @@ _zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
case DATA_TYPE_STRING_ARRAY:
_zed_event_add_string_array(eid, zsp, prefix, nvp);
break;
case DATA_TYPE_NVLIST:
case DATA_TYPE_BOOLEAN_ARRAY:
case DATA_TYPE_BYTE_ARRAY:
case DATA_TYPE_NVLIST_ARRAY:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
_zed_event_add_var(eid, zsp, prefix, name, "_NOT_IMPLEMENTED_");
break;
default:
errno = EINVAL;
@@ -858,21 +900,21 @@ _zed_event_get_subclass(const char *class)
static void
_zed_event_add_time_strings(uint64_t eid, zed_strings_t *zsp, int64_t etime[])
{
struct tm *stp;
struct tm stp;
char buf[32];
assert(zsp != NULL);
assert(etime != NULL);
_zed_event_add_var(eid, zsp, ZEVENT_VAR_PREFIX, "TIME_SECS",
"%lld", (long long int) etime[0]);
"%" PRId64, etime[0]);
_zed_event_add_var(eid, zsp, ZEVENT_VAR_PREFIX, "TIME_NSECS",
"%lld", (long long int) etime[1]);
"%" PRId64, etime[1]);
if (!(stp = localtime((const time_t *) &etime[0]))) {
if (!localtime_r((const time_t *) &etime[0], &stp)) {
zed_log_msg(LOG_WARNING, "Failed to add %s%s for eid=%llu: %s",
ZEVENT_VAR_PREFIX, "TIME_STRING", eid, "localtime error");
} else if (!strftime(buf, sizeof (buf), "%Y-%m-%d %H:%M:%S%z", stp)) {
} else if (!strftime(buf, sizeof (buf), "%Y-%m-%d %H:%M:%S%z", &stp)) {
zed_log_msg(LOG_WARNING, "Failed to add %s%s for eid=%llu: %s",
ZEVENT_VAR_PREFIX, "TIME_STRING", eid, "strftime error");
} else {
@@ -881,6 +923,25 @@ _zed_event_add_time_strings(uint64_t eid, zed_strings_t *zsp, int64_t etime[])
}
}
static void
_zed_event_update_enc_sysfs_path(nvlist_t *nvl)
{
const char *vdev_path;
if (nvlist_lookup_string(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_PATH,
&vdev_path) != 0) {
return; /* some other kind of event, ignore it */
}
if (vdev_path == NULL) {
return;
}
update_vdev_config_dev_sysfs_path(nvl, vdev_path,
FM_EREPORT_PAYLOAD_ZFS_VDEV_ENC_SYSFS_PATH);
}
/*
* Service the next zevent, blocking until one is available.
*/
@@ -894,7 +955,7 @@ zed_event_service(struct zed_conf *zcp)
uint64_t eid;
int64_t *etime;
uint_t nelem;
char *class;
const char *class;
const char *subclass;
int rv;
@@ -912,10 +973,7 @@ zed_event_service(struct zed_conf *zcp)
if (n_dropped > 0) {
zed_log_msg(LOG_WARNING, "Missed %d events", n_dropped);
/*
* FIXME: Increase max size of event nvlist in
* /sys/module/zfs/parameters/zfs_zevent_len_max ?
*/
_bump_event_queue_length();
}
if (nvlist_lookup_uint64(nvl, "eid", &eid) != 0) {
zed_log_msg(LOG_WARNING, "Failed to lookup zevent eid");
@@ -931,6 +989,17 @@ zed_event_service(struct zed_conf *zcp)
zed_log_msg(LOG_WARNING,
"Failed to lookup zevent class (eid=%llu)", eid);
} else {
/*
* Special case: If we can dynamically detect an enclosure sysfs
* path, then use that value rather than the one stored in the
* vd->vdev_enc_sysfs_path. There have been rare cases where
* vd->vdev_enc_sysfs_path becomes outdated. However, there
* will be other times when we can not dynamically detect the
* sysfs path (like if a disk disappears) and have to rely on
* the old value for things like turning on the fault LED.
*/
_zed_event_update_enc_sysfs_path(nvl);
/* let internal modules see this event first */
zfs_agent_post_event(class, NULL, nvl);
@@ -953,8 +1022,7 @@ zed_event_service(struct zed_conf *zcp)
_zed_event_add_time_strings(eid, zsp, etime);
zed_exec_process(eid, class, subclass,
zcp->zedlet_dir, zcp->zedlets, zsp, zcp->zevent_fd);
zed_exec_process(eid, class, subclass, zcp, zsp);
zed_conf_write_state(zcp, eid, etime);
+3 -3
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).

Some files were not shown because too many files have changed in this diff Show More