A deadlock occurs when snapshot expiry tasks are cancelled while holding
locks. The snapshot expiry task (snapentry_expire) spawns an umount
process and waits for it to complete. Concurrently, ARC memory pressure
triggers arc_prune which calls zfs_exit_fs(), attempting to cancel the
expiry task while holding locks. The umount process spawned by the
expiry task blocks trying to acquire locks held by arc_prune, which is
blocked waiting for the expiry task to complete. This creates a circular
dependency: expiry task waits for umount, umount waits for arc_prune,
arc_prune waits for expiry task.
Fix by adding non-blocking cancellation support to taskq_cancel_id().
The zfs_exit_fs() path calls zfsctl_snapshot_unmount_delay() to
reschedule the unmount, which needs to cancel any existing expiry task.
It now uses non-blocking cancellation to avoid waiting while holding
locks, breaking the deadlock by returning immediately when the task is
already running.
The per-entry se_taskqid_lock has been removed, with all taskqid
operations now protected by the global zfs_snapshot_lock held as
WRITER. Additionally, an se_in_umount flag prevents recursive waits when
zfsctl_destroy() is called during unmount. The taskqid is now only
cleared by the caller on successful cancellation; running tasks clear
their own taskqid upon completion.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes#17941
They're basically the same thing; lets just carry one.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#17948
Currently libspl is a static archive that is linked into multiple shared
objects, which then re-export its symbols. We intend to fix this soon.
For the moment though, most programs shipped with OpenZFS depend on two
or more of these shared objects, and see the same symbols twice. For
functions this is not a problem, as they do not have any mutable state
and so the linker can simply select the first one and use that for all.
For global data objects however, each shared object will have direct
(non-relocatable) references to its own instance of the symbol, such
that changes on one will not necessarily be seen by the other. While
this shouldn't be a problem in practice as these reexported interfaces
are not supposed to be used, they are technically undefined behaviour in
C (C17 6.9.2) and are reported by ASAN as a violation of C++'s "One
Definition Rule".
To fix this, we hide these globals inside their compilation units, and
add access functions and macros as appropriate to preserve the existing
API (though not ABI).
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#17861