Changed zfs_k(un)map_atomic to zfs_k(un)map_local
Signed-off-by: Jason Lee <jasonlee@lanl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Currently, if a minor is in use when we try to remove it, we'll skip it
and never come back to it again. Since the zvol state is hung off the
minor in the kernel, this can get us into weird situations if something
tries to use it after the removal fails. It's even worse at pool export,
as there's now a vestigial zvol state with no pool under it. It's
weirder again if the pool is subsequently reimported, as the zvol code
(reasonably) assumes the zvol state has been properly setup, when it's
actually left over from the previous import of the pool.
This commit attempts to tackle that by setting a flag on the zvol if its
minor can't be removed, and then checking that flag when a request is
made and rejecting it, thus stopping new work coming in.
The flag also causes a condvar to be signaled when the last client
finishes. For the case where a single minor is being removed (eg
changing volmode), it will wait for this signal before proceeding.
Meanwhile, when removing all minors, a background task is created for
each minor that couldn't be removed on the spot, and those tasks then
wake and clean up.
Since any new tasks are queued on to the pool's spa_zvol_taskq,
spa_export_common() will continue to wait at export until all minors are
removed.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#14872Closes#16364
When process got SIGSTOP/SIGTSTP, issig() dequeue them and return 0.
But process could still have another signal pending after dequeue. So,
after dequeue, check and return 1, if signal_pending.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes#16464
We always call it twice with JUSTLOOKING and then FORREAL.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes#16225
Linux provides SLAB_RECLAIM_ACCOUNT and __GFP_RECLAIMABLE flags to
mark memory allocations that can be freed via shinker calls. It
should allow kernel to tune and group such allocations for lower
memory fragmentation and better reclamation under pressure.
This patch marks as reclaimable most of ARC memory, directly
evictable via ZFS shrinker, plus also dnode/znode/sa memory,
indirectly evictable via kernel's superblock shrinker.
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
This commit replaces current usages of schedule_timeout() with
schedule_timeout_interruptible() in code paths that expect the running
task to sleep for a short period of time. When schedule_timeout() is
called without previously calling set_current_state(), the running
task never sleeps because the task state remains in TASK_RUNNING.
By calling schedule_timeout_interruptible() to set the task state to
TASK_INTERRUPTIBLE before calling schedule_timeout() we achieve the
intended/desired behavior of putting the task to sleep for the
specified timeout.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Daniel Perry <dtperry@amazon.com>
Closes#16150
Freeing an ABD can take sleeping locks to update various stats. We
aren't allowed to sleep on an interrupt handler. So, move the free off
to the io_done callback.
We should never have been freeing things in the interrupt handler, but
we got away with it because we were usually freeing a linear ABD, which
at most is returning two objects to a cache and never sleeping. Scatter
ABDs can be used now, and those have more complex locking.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16687
It seems out our notion of "properly" aligned IO was incomplete. In
particular, dm-crypt does its own splitting, and assumes that a logical
block will never cross an order-0 page boundary (ie, the physical page
size, not compound size). This effectively means that it needs to be
possible to split a BIO at any page or block size boundary and have it
work correctly.
This updates the alignment check function to enforce these rules (to the
extent possible).
Our response to misaligned data is to make some new allocation that is
properly aligned, and copy the data into it. It turns out that
linearising (via abd_borrow_buf()) is not enough, because we allocate eg
4K blocks from a general purpose slab, and so may receive (or already
have) a 4K block that crosses pages.
So instead, we allocate a new ABD, which is guaranteed to be aligned
properly to block sizes, and then copy everything into it, and back out
on the way back.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16687#16631#15646#15533#14533
(cherry picked from commit 63bafe60ec741c269d29e26b192a8a5c4f6acf92)
In FreeBSD's `zio_do_crypt_data()`, ensure that two `struct uio`
variables are cleared before copying data out of them. This avoids
accessing garbage data, and fixes gcc `-Wuninitialized` warnings.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Toomas Soome <tsoome@me.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes#16688
While mounting ZFS root during boot on Linux distributions from initrd,
mount from busybox is effectively used which executes mount system call
directly. This skips the ZFS helper mount.zfs, which checks and enables
the mount options as specified in dataset properties. As a result,
datasets mounted during boot from initrd do not have correct mount
options as specified in ZFS dataset properties.
There has been an attempt to use mount.zfs in zfs initrd script,
responsible for mounting the ZFS root filesystem (PR#13305). This was
later reverted (PR#14908) after discovering that using mount.zfs breaks
mounting of snapshots on root (/) and other child datasets of root have
the same issue (Issue#9461).
This happens because switching from busybox mount to mount.zfs correctly
parses the mount options but also adds 'mntpoint=/root' to the mount
options, which is then prepended to the snapshot mountpoint in
'.zfs/snapshot'. '/root' is the directory on Debian with initramfs-tools
where root filesystem is mounted before pivot_root. When Linux runtime
is reached, trying to access the snapshots on root results in
automounting the snapshot on '/root/.zfs/*', which fails.
This commit attempts to fix the automounting of snapshots on root, while
using mount.zfs in initrd script. Since the mountpoint of dataset is
stored in vfs_mntpoint field, we can check if current mountpoint of
dataset and vfs_mntpoint are same or not. If they are not same, reset
the vfs_mntpoint field with current mountpoint. This fixes the
mountpoints of root dataset and children in respective vfs_mntpoint
fields when we try to access the snapshots of root dataset or its
children. With correct mountpoint for root dataset and children stored
in vfs_mntpoint, all snapshots of root dataset are mounted correctly
and become accessible.
This fix will come into play only if current process, that is trying to
access the snapshots is not in chroot context. The Linux kernel API
that is used to convert struct path into char format (d_path), returns
the complete path for given struct path. It works in chroot environment
as well and returns the correct path from original filesystem root.
However d_path fails to return the complete path if any directory from
original root filesystem is mounted using --bind flag or --rbind flag
in chroot environment. In this case, if we try to access the snapshot
from outside the chroot environment, d_path returns the path correctly,
i.e. it returns the correct path to the directory that is mounted with
--bind flag. However inside the chroot environment, it only returns the
path inside chroot.
For now, there is not a better way in my understanding that gives the
complete path in char format and handles the case where directories from
root filesystem are mounted with --bind or --rbind on another path which
user will later chroot into. So this fix gets enabled if current
process trying to access the snapshot is not in chroot context.
With the snapshots issue fixed for root filesystem, using mount.zfs in
ZFS initrd script, mounts the datasets with correct mount options.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes#16646
Couple places in the code depend on 0 returned only if the task was
actually cancelled. Doing otherwise could lead to extra references
being dropped. The race could be small, but I believe CI hit it
from time to time.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#16565
zfs_acl_node_alloc allocates an uninitialized data buffer, but upstack
zfs_acl_chmod only partially initializes it. KMSAN reported that this
memory remained uninitialized at the point when it was read by
lzjb_compress, which suggests a possible kernel memory disclosure bug.
The full KMSAN warning may be found in the PR.
https://github.com/openzfs/zfs/pull/16511
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored by: Axcient
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>