Break out the range_tree, btree, and highbit64/lowbit64 code from kernel
space into shared kernel and userspace code. This is needed for the
updated `zpool status -vv` error byte range reporting that will be
coming in a future commit. That commit needs the range_tree code in
kernel and userspace.
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#18133
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#18077
libzfs is the only user of libshare, and only internally, so there's no
particular reason to build it separately, nor to export its symbols. So,
pull it into libzfs proper, remove its "public" header, and hide its
symbols.
The bare minimum "public" API is just to count and enumerate the
supported share types. These are moved to libzfs.h with the other share
API.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#18072
Use 64-bit POSIX off_t in user space instead of the Linux kernel type
loff_t. This is enforced at configure time via AC_SYS_LARGEFILE and
AC_CHECK_SIZEOF([off_t]). loff_t remains in shared headers where they
mirror Linux VFS interfaces, and on FreeBSD we typedef loff_t to off_t
in those headers since libc does not provide it.
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Closes#18020
Musl exposes loff_t only as a macro in <fcntl.h> when _GNU_SOURCE is
defined. Including <fcntl.h> ensures the type is available, and a
fallback typedef is provided when no macro is defined. This fixes build
failures on musl systems.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Closes#18002
A deadlock occurs when snapshot expiry tasks are cancelled while holding
locks. The snapshot expiry task (snapentry_expire) spawns an umount
process and waits for it to complete. Concurrently, ARC memory pressure
triggers arc_prune which calls zfs_exit_fs(), attempting to cancel the
expiry task while holding locks. The umount process spawned by the
expiry task blocks trying to acquire locks held by arc_prune, which is
blocked waiting for the expiry task to complete. This creates a circular
dependency: expiry task waits for umount, umount waits for arc_prune,
arc_prune waits for expiry task.
Fix by adding non-blocking cancellation support to taskq_cancel_id().
The zfs_exit_fs() path calls zfsctl_snapshot_unmount_delay() to
reschedule the unmount, which needs to cancel any existing expiry task.
It now uses non-blocking cancellation to avoid waiting while holding
locks, breaking the deadlock by returning immediately when the task is
already running.
The per-entry se_taskqid_lock has been removed, with all taskqid
operations now protected by the global zfs_snapshot_lock held as
WRITER. Additionally, an se_in_umount flag prevents recursive waits when
zfsctl_destroy() is called during unmount. The taskqid is now only
cleared by the caller on successful cancellation; running tasks clear
their own taskqid upon completion.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes#17941
They're basically the same thing; lets just carry one.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#17948
ztest wants to force all kernel random calls to use the pseudo-random
generator (/dev/urandom), to avoid depleting the system entropy pool
just for testing.
Up until the previous commit, it did this by switching the path that the
libzpool (now libspl) random API would use to get random data from; that
is, it took advantage of an implementation detail.
Now that that hole is closed to it, we need another method. This commit
introduces that; a simple API call to enable/disable "force pseudo"
mode.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#17861
Currently libspl is a static archive that is linked into multiple shared
objects, which then re-export its symbols. We intend to fix this soon.
For the moment though, most programs shipped with OpenZFS depend on two
or more of these shared objects, and see the same symbols twice. For
functions this is not a problem, as they do not have any mutable state
and so the linker can simply select the first one and use that for all.
For global data objects however, each shared object will have direct
(non-relocatable) references to its own instance of the symbol, such
that changes on one will not necessarily be seen by the other. While
this shouldn't be a problem in practice as these reexported interfaces
are not supposed to be used, they are technically undefined behaviour in
C (C17 6.9.2) and are reported by ASAN as a violation of C++'s "One
Definition Rule".
To fix this, we hide these globals inside their compilation units, and
add access functions and macros as appropriate to preserve the existing
API (though not ABI).
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#17861
Pull all of the internal debug infrastructure up in to the zfs
code to clean up the layering. Remove all the dodgy usage of
SET_ERROR and DTRACE_PROBE from the spl. Luckily it was
lightly used in the spl layer so we're not losing much.
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#17861
These are kind-of compiler attribute placeholders, so go here with the
others for now.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#17861
sys/debug.h is not really the right place for them, but we already have
some there for libspl, so it is at least convenient.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#17861
The extra inclusion via xvattr.h appears to upset the linter in CI. I'm
not entirely sure what its complaint is, but removing sys/string.h
entirely is not quite possible yet, and include guards are rarely a bad
idea, so this will do.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#17861
Keeping the spl/zfs module split, libzpool is the zfs module for
userspace. Headers and functions specific to it belong there.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#17861
The zfs_tunable_* functions are a public interface which are
part of the internal libspl convenience library. They should
be hidden to prevent an unnecessary ABI change in installed
libraries which link against libspl (e.g. libzfs_core, libuutil).
We do already leak long standing libspl symbols. This commit is
solely intended to prevent leaking these new ones until this is
properly sorted out.
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#17911
We need to specifically use the FX_XFLAG_* macros in zpl_ioctl_*attr()
codepaths, and the FS_*_FL macros in the zpl_ioctl_*flags() codepaths.
The earlier code just assumes the FS_*_FL macros for both codepaths.
The 6.17 kernel add a bitmask check in copy_fsxattr_from_user() that
exposed this error via failing 'projectquota' ZTS tests.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#17884Closes#17869
Resolve a build failure for user applications that include <sys/uio.h>.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Closes#17781Closes#17814
A single slow responding disk can affect the overall read
performance of a raidz group. When a raidz child disk is
determined to be a persistent slow outlier, then have it
sit out during reads for a period of time. The raidz group
can use parity to reconstruct the data that was skipped.
Each time a slow disk is placed into a sit out period, its
`vdev_stat.vs_slow_ios count` is incremented and a zevent
class `ereport.fs.zfs.delay` is posted.
The length of the sit out period can be changed using the
`raid_read_sit_out_secs` module parameter. Setting it to
zero disables slow outlier detection.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Contributions-by: Don Brady <don.brady@klarasystems.com>
Contributions-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#17227
glibc includes linux/stat.h for statx, but musl defines its own statx
struct and associated constants, which does not include STATX_MNT_ID
yet. Thus, including linux/stat.h directly should be avoided for
maximum libc compatibility.
Tested on:
- glibc: x86_64, i686, aarch64, armv7l, armv6l
- musl: x86_64, aarch64, armv7l, armv6l
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-By: Achill Gilgenast <achill@achill.org>
Signed-off-by: classabbyamp <dev@placeviolette.net>
Closes#17675
This uses the AVX2 versions of the AESENC and PCLMULQDQ instructions; on
Zen 3 this provides an up to 80% performance improvement.
Original source:
d5440dd2c2/gen/bcm/aes-gcm-avx2-x86_64-linux.S
See the original BoringSSL commit at
3b6e1be439.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Joel Low <joel@joelsplace.sg>
Closes#17058
This changes zil_commit() to have an int return, and updates all callers
to check it. There are no corresponding internal changes yet; it will
always return 0.
Since zil_commit() is an indication that the caller _really_ wants the
associated data to be durability stored, I've annotated it with the
__warn_unused_result__ compiler attribute (via __must_check), to emit a
warning if it's ever ussd without doing something with the return code.
I hope this will mean we never misuse it in the future.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#17398
These are all cases where we initialise or update a variable, and then
never use it. None of them particularly matter, as the compiler should
optimise them all away during dead store elimination, but some static
analysers complain about them and they are extra work for casual readers
to follow, so worth removing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/Closes#17551