mirror_zfs/module/zfs
Rob Norris 8a0e5e8b54 zvol: stop using zvol_state_lock to protect OS-side private data
zvol_state_lock is intended to protect access to the global name->zvol
lists (zvol_find_by_name()), but has also been used to control access to
OS-side private data, accessed through whatever kernel object is used to
represent the volume (gendisk, geom, etc).

This appears to have been necessary to some degree because the OS-side
object is what's used to get a handle on zvol_state_t, so zv_state_lock
and zv_suspend_lock can't be used to manage access, but also, with the
private object and the zvol_state_t being shutdown and destroyed at the
same time in zvol_os_free(), we must ensure that the private object
pointer only ever corresponds to a real zvol_state_t, not one in partial
destruction. Taking the global lock seems like a convenient way to
ensure this.

The problem with this is that zvol_state_lock does not actually protect
access to the zvol_state_t internals, so we need to take zv_state_lock
and/or zv_suspend_lock. If those are contended, this can then cause
OS-side operations (eg zvol_open()) to sleep to wait for them while hold
zvol_state_lock. This then blocks out all other OS-side operations which
want to get the private data, and any ZFS-side control operations that
would take the write half of the lock. It's even worse if ZFS-side
operations induce OS-side calls back into the zvol (eg creating a zvol
triggers a partition probe inside the kernel, and also a userspace
access from udev to set up device links). And it gets even works again
if anything decides to defer those ops to a task and wait on them, which
zvol_remove_minors_impl() will do under high load.

However, since the previous commit, we have a guarantee that the private
data pointer will always be NULL'd out in zvol_os_remove_minor()
_before_ the zvol_state_t is made invalid, but it won't happen until all
users are ejected. So, if we make access to the private object pointer
atomic, we remove the need to take a global lockout to access it, and so
we can remove all acquisitions of zvol_state_lock from the OS side.

While here, I've rewritten much of the locking theory comment at the top
of zvol.c. It wasn't wrong, but it hadn't been followed exactly, so I've
tried to describe the purpose of each lock in a little more detail, and
in particular describe where it should and shouldn't be used.

Sponsored-by: Klara, Inc.
Sponsored-by: Railway Corporation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17625
2025-08-19 10:06:34 -07:00
..
abd.c Prefer VERIFY0(n) over VERIFY3U(n, ==, 0) 2025-08-07 11:41:25 -07:00
aggsum.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
arc.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
blake3_zfs.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
blkptr.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
bplist.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
bpobj.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
bptree.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
bqueue.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
brt.c BRT: Fix ZAP entry endianness 2025-07-30 09:42:47 -07:00
btree.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
dataset_kstats.c ZIL: "crash" the ZIL if the pool suspends during fallback 2025-08-08 16:43:26 -07:00
dbuf_stats.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
dbuf.c Fix Assert in dbuf_undirty, which triggers during usage zap shrink 2025-08-12 14:19:05 -07:00
ddt_log.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
ddt_stats.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
ddt_zap.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
ddt.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
dmu_diff.c Allow physical rewrite without logical 2025-08-06 10:36:07 -07:00
dmu_direct.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
dmu_object.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
dmu_objset.c Prefer VERIFY0P(n) over VERIFY(n == NULL) 2025-08-07 11:41:37 -07:00
dmu_recv.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
dmu_redact.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
dmu_send.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
dmu_traverse.c Allow physical rewrite without logical 2025-08-06 10:36:07 -07:00
dmu_tx.c Prefer VERIFY0(n) over VERIFY(n == 0) 2025-08-07 11:40:59 -07:00
dmu_zfetch.c Wire O_DIRECT also to Uncached I/O (#17218) 2025-05-13 14:26:55 -07:00
dmu.c Prefer VERIFY0(n) over VERIFY(n == 0) 2025-08-07 11:40:59 -07:00
dnode_sync.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
dnode.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
dsl_bookmark.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
dsl_crypt.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
dsl_dataset.c Prefer VERIFY0P(n) over VERIFY(n == NULL) 2025-08-07 11:41:37 -07:00
dsl_deadlist.c Fix missed assertion update in physical rewrite patch 2025-08-13 15:56:25 -04:00
dsl_deleg.c Prefer VERIFY0(n) over VERIFY(n == 0) 2025-08-07 11:40:59 -07:00
dsl_destroy.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
dsl_dir.c Prefer VERIFY0(n) over VERIFY(n == 0) 2025-08-07 11:40:59 -07:00
dsl_pool.c Prefer VERIFY0P(n) over VERIFY(n == NULL) 2025-08-07 11:41:37 -07:00
dsl_prop.c Prefer VERIFY0(n) over VERIFY(n == 0) 2025-08-07 11:40:59 -07:00
dsl_scan.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
dsl_synctask.c dmu_tx_assign: make all VERIFY0 calls use DMU_TX_SUSPEND 2025-05-28 10:28:59 -07:00
dsl_userhold.c Prefer VERIFY0(n) over VERIFY(n == 0) 2025-08-07 11:40:59 -07:00
edonr_zfs.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
fm.c Prefer VERIFY0(n) over VERIFY(n == 0) 2025-08-07 11:40:59 -07:00
gzip.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
hkdf.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
lz4_zfs.c SPDX: license tags: BSD-2-Clause 2025-03-13 17:56:46 -07:00
lz4.c SPDX: license tags: BSD-2-Clause 2025-03-13 17:56:46 -07:00
lzjb.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
metaslab.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
mmp.c Prefer VERIFY0P(n) over VERIFY(n == NULL) 2025-08-07 11:41:37 -07:00
multilist.c Allow vmem_alloc backed multilists 2025-08-12 13:36:03 -07:00
objlist.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
pathname.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
range_tree.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
refcount.c Implement allocation size ranges and use for gang leaves (#17111) 2025-05-02 15:32:18 -07:00
rrwlock.c Prefer VERIFY0P(n) over VERIFY(n == NULL) 2025-08-07 11:41:37 -07:00
sa.c Prefer VERIFY0(n) over VERIFY3U(n, ==, 0) 2025-08-07 11:41:25 -07:00
sha2_zfs.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
skein_zfs.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
spa_checkpoint.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
spa_config.c Retire zfs_autoimport_disable kmod option 2025-08-14 14:58:58 -07:00
spa_errlog.c Allow physical rewrite without logical 2025-08-06 10:36:07 -07:00
spa_history.c dmu_tx: rename dmu_tx_assign() flags from TXG_* to DMU_TX_* (#17143) 2025-03-18 16:04:22 -07:00
spa_log_spacemap.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
spa_misc.c Retire zfs_autoimport_disable kmod option 2025-08-14 14:58:58 -07:00
spa_stats.c Prefer VERIFY0(n) over VERIFY(n == 0) 2025-08-07 11:40:59 -07:00
spa.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
space_map.c Prefer VERIFY0P(n) over VERIFY(n == NULL) 2025-08-07 11:41:37 -07:00
space_reftree.c Prefer VERIFY0(n) over VERIFY(n == 0) 2025-08-07 11:40:59 -07:00
THIRDPARTYLICENSE.cityhash OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
THIRDPARTYLICENSE.cityhash.descrip OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
txg.c txg_wait_synced_flags: add TXG_WAIT_SUSPEND flag to not wait if pool suspended 2025-05-28 10:27:46 -07:00
uberblock.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
unique.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_draid_rand.c SPDX: license tags: LicenseRef-OpenZFS-ThirdParty-PublicDomain 2025-03-13 17:57:31 -07:00
vdev_draid.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
vdev_file.c Implement allocation size ranges and use for gang leaves (#17111) 2025-05-02 15:32:18 -07:00
vdev_indirect_births.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_indirect_mapping.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_indirect.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
vdev_initialize.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
vdev_label.c Prefer VERIFY0(n) over VERIFY(n == 0) 2025-08-07 11:40:59 -07:00
vdev_mirror.c Allow physical rewrite without logical 2025-08-06 10:36:07 -07:00
vdev_missing.c Implement allocation size ranges and use for gang leaves (#17111) 2025-05-02 15:32:18 -07:00
vdev_queue.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
vdev_raidz_math_aarch64_neon_common.h SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_aarch64_neon.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_aarch64_neonx2.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_avx2.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_avx512bw.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_avx512f.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_impl.h SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_powerpc_altivec_common.h SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_powerpc_altivec.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_scalar.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_sse2.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_ssse3.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math.c tunables: don't assert initialisation in impl getters 2025-05-28 16:50:22 -07:00
vdev_raidz.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
vdev_rebuild.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
vdev_removal.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
vdev_root.c Implement allocation size ranges and use for gang leaves (#17111) 2025-05-02 15:32:18 -07:00
vdev_trim.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
vdev.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
zap_leaf.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zap_micro.c Prefer VERIFY0(n) over VERIFY(n == 0) 2025-08-07 11:40:59 -07:00
zap.c Prefer VERIFY0(n) over VERIFY3U(n, ==, 0) 2025-08-07 11:41:25 -07:00
zcp_get.c zcp: get_prop: fix encryptionroot and encryption 2025-05-27 20:04:37 -04:00
zcp_global.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zcp_iter.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zcp_set.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zcp_synctask.c zcp_synctask: add zfs.sync.clone() 2025-06-10 14:53:10 -07:00
zcp.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
zfeature.c Prefer VERIFY0(n) over VERIFY(n == 0) 2025-08-07 11:40:59 -07:00
zfs_byteswap.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zfs_chksum.c Faster checksum benchmark on system boot 2025-07-29 17:09:48 -07:00
zfs_crrd.c Add TXG timestamp database 2025-08-06 10:31:21 -07:00
zfs_debug_common.c nvlist: Add nvlist_snprintf() and zfs_dbgmsg_nvlist() 2025-04-18 09:22:16 -04:00
zfs_fm.c events: include zio type in IO error reports 2025-05-30 10:29:29 -04:00
zfs_fuid.c Prefer VERIFY0(n) over VERIFY(n == 0) 2025-08-07 11:40:59 -07:00
zfs_impl.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zfs_ioctl.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
zfs_log.c ZIL: pass commit errors back to ITX callbacks 2025-08-08 16:43:20 -07:00
zfs_onexit.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zfs_quota.c Prefer VERIFY0(n) over VERIFY(n == 0) 2025-08-07 11:40:59 -07:00
zfs_ratelimit.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zfs_replay.c dmu_tx: rename dmu_tx_assign() flags from TXG_* to DMU_TX_* (#17143) 2025-03-18 16:04:22 -07:00
zfs_rlock.c Prefer VERIFY0(n) over VERIFY3U(n, ==, 0) 2025-08-07 11:41:25 -07:00
zfs_sa.c ZIL: allow zil_commit() to fail with error 2025-08-08 16:43:09 -07:00
zfs_vnops.c ZIL: allow zil_commit() to fail with error 2025-08-08 16:43:09 -07:00
zfs_znode.c Add default user/group/project quota properties 2025-04-03 10:35:22 -07:00
zil.c ZIL: Make allocations more flexible 2025-08-14 08:50:17 -07:00
zio_checksum.c Prefer VERIFY0(n) over VERIFY(n == 0) 2025-08-07 11:40:59 -07:00
zio_compress.c Removed unused zio_decompress_fail_fraction variable 2025-08-06 17:10:03 -07:00
zio_inject.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
zio.c ZIL: Make allocations more flexible 2025-08-14 08:50:17 -07:00
zle.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zrlock.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
zthr.c Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) 2025-08-07 11:41:42 -07:00
zvol.c zvol: stop using zvol_state_lock to protect OS-side private data 2025-08-19 10:06:34 -07:00