mirror_zfs/module/zfs
George Wilson ffd2e15d65
zio can deadlock during device removal
When doing a device removal on a pool with gang blocks, the zio pipeline
can deadlock when trying to free blocks from a device which is being
removed with a stack similar to this:

 0xffff8ab9a13a1740 UNINTERRUPTIBLE       4
                   __schedule+0x2e5
                   __schedule+0x2e5
                   schedule+0x33
                   schedule_preempt_disabled+0xe
                   __mutex_lock.isra.12+0x2a7
                   __mutex_lock.isra.12+0x2a7
                   __mutex_lock_slowpath+0x13
                   mutex_lock+0x2c
                   free_from_removing_vdev+0x61
                   metaslab_free_impl+0xd6
                   metaslab_free_dva+0x5e
                   metaslab_free+0x196
                   zio_free_sync+0xe4
                   zio_free_gang+0x38
                   zio_gang_tree_issue+0x42
                   zio_gang_tree_issue+0xa2
                   zio_gang_issue+0x6d
                   zio_execute+0x94
                   zio_execute+0x94
                   taskq_thread+0x23b
                   kthread+0x120
                   ret_from_fork+0x1f

Since there are gang blocks we have to read the gang members as part of
the free. This can be seen with a zio dependency tree that looks like
this:

sdb> echo 0xffff900c24f8a700 | zio -rc | zio
ADDRESS                       TYPE  STAGE            WAITER
0xffff900c24f8a700            NULL  CHECKSUM_VERIFY  0xffff900ddfd31740
0xffff900c24f8c920            FREE  GANG_ASSEMBLE    -
0xffff900d93d435a0            READ  DONE

In the illustration above we are processing frees but because of gang
block we have to read the constituents blocks. Once we finish the READ
in the zio pipeline we will execute the parent. In this case the parent
is a FREE but the zio taskq is a READ and we continue to process the
pipeline leading to the stack above. In the stack above, we are blocked
waiting for the svr_lock so as a result a READ interrupt taskq thread
is now consumed. Eventually, all of the READ taskq threads end up
blocked and we're unable to complete any read requests.

In zio_notify_parent there is an optimization to continue to use
the taskq thread to exectue the parent's pipeline. To resolve the
deadlock above, we only allow this optimization if the parent's
zio type matches the child which just completed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-issue: DLPX-80130
Closes #14236
2022-12-02 17:46:29 -08:00
..
abd.c abd_return_buf() should call zfs_refcount_remove_many() early 2022-10-19 17:11:01 -07:00
aggsum.c Remove bcopy(), bzero(), bcmp() 2022-03-15 15:13:42 -07:00
arc.c Fix arc_p aggressive increase 2022-11-11 10:41:36 -08:00
blake3_zfs.c Fix memory allocation issue for BLAKE3 context 2022-06-21 14:32:09 -07:00
blkptr.c Remove bcopy(), bzero(), bcmp() 2022-03-15 15:13:42 -07:00
bplist.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
bpobj.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
bptree.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
bqueue.c zfs recv hangs if max recordsize is less than received recordsize 2022-09-16 13:52:25 -07:00
btree.c Optimize microzaps 2022-10-20 11:57:15 -07:00
dataset_kstats.c Introduce kmem_scnprintf() 2022-10-29 13:05:11 -07:00
dbuf_stats.c Revert "Reduce dbuf_find() lock contention" 2022-09-22 12:59:41 -07:00
dbuf.c Fix NULL pointer dereference in dbuf_prefetch_indirect_done() 2022-11-29 10:00:50 -08:00
ddt_zap.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
ddt.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
dmu_diff.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
dmu_object.c Cleanup: Specify unsignedness on things that should not be signed 2022-09-27 16:42:41 -07:00
dmu_objset.c Convert enum zio_flag to uint64_t 2022-10-27 09:54:54 -07:00
dmu_recv.c Deny receiving into encrypted datasets if the keys are not loaded 2022-11-03 09:55:13 -07:00
dmu_redact.c Fix incorrect size given to bqueue_enqueue() call in dmu_redact.c 2022-09-15 16:21:21 -07:00
dmu_send.c Cleanup: Delete dead code from send_merge_thread() 2022-11-29 09:59:53 -08:00
dmu_traverse.c Convert enum zio_flag to uint64_t 2022-10-27 09:54:54 -07:00
dmu_tx.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
dmu_zfetch.c Cleanup: 64-bit kernel module parameters should use fixed width types 2022-10-13 10:03:29 -07:00
dmu.c Convert enum zio_flag to uint64_t 2022-10-27 09:54:54 -07:00
dnode_sync.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
dnode.c Remove few pointer dereferences in dbuf_read() 2022-11-29 09:49:02 -08:00
dsl_bookmark.c Cleanup: Address Clang's static analyzer's unused code complaints 2022-10-14 13:37:54 -07:00
dsl_crypt.c Handle and detect #13709's unlock regression (#14161) 2022-11-15 14:44:12 -08:00
dsl_dataset.c Fix setting the large_block feature after receiving a snapshot 2022-11-18 11:38:37 -08:00
dsl_deadlist.c Cleanup: 64-bit kernel module parameters should use fixed width types 2022-10-13 10:03:29 -07:00
dsl_deleg.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
dsl_destroy.c Prevent zevent list from consuming all of kernel memory 2022-08-22 12:36:22 -07:00
dsl_dir.c Fix potential NULL pointer dereference regression 2022-11-10 13:56:28 -08:00
dsl_pool.c Cleanup: Address Clang's static analyzer's unused code complaints 2022-10-14 13:37:54 -07:00
dsl_prop.c dsl_prop_known_index(): check for invalid prop 2022-11-08 10:16:01 -08:00
dsl_scan.c Cleanup: 64-bit kernel module parameters should use fixed width types 2022-10-13 10:03:29 -07:00
dsl_synctask.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
dsl_userhold.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
edonr_zfs.c Remove bcopy(), bzero(), bcmp() 2022-03-15 15:13:42 -07:00
fm.c fm_fmri_hc_create() must call va_end() before returning 2022-10-18 15:34:36 -07:00
gzip.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
hkdf.c Remove bcopy(), bzero(), bcmp() 2022-03-15 15:13:42 -07:00
lz4_zfs.c Updated the lz4 decompressor 2022-01-07 10:36:49 -08:00
lz4.c lz4: Cherrypick fix for CVE-2021-3520 2022-01-12 16:14:36 -08:00
lzjb.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
metaslab.c Cleanup: metaslab_alloc_dva() should not NULL check mg->mg_next 2022-10-18 15:39:40 -07:00
mmp.c Cleanup: Address Clang's static analyzer's unused code complaints 2022-10-14 13:37:54 -07:00
multilist.c Cleanup: Specify unsignedness on things that should not be signed 2022-09-27 16:42:41 -07:00
objlist.c Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
pathname.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
range_tree.c Add defensive assertions 2022-10-12 11:25:18 -07:00
refcount.c Cleanup: Specify unsignedness on things that should not be signed 2022-09-27 16:42:41 -07:00
rrwlock.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
sa.c Fix double const qualifier declarations 2022-09-30 15:34:39 -07:00
sha256.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
skein_zfs.c Remove bcopy(), bzero(), bcmp() 2022-03-15 15:13:42 -07:00
spa_checkpoint.c Cleanup: 64-bit kernel module parameters should use fixed width types 2022-10-13 10:03:29 -07:00
spa_config.c zed: mark disks as REMOVED when they are removed 2022-09-28 09:48:46 -07:00
spa_errlog.c Cleanup: Specify unsignedness on things that should not be signed 2022-09-27 16:42:41 -07:00
spa_history.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
spa_log_spacemap.c Address warnings about possible division by zero from clangsa 2022-11-03 09:58:14 -07:00
spa_misc.c zed: post a udev change event from spa_vdev_attach() 2022-11-18 11:39:59 -08:00
spa_stats.c Cleanup: Specify unsignedness on things that should not be signed 2022-09-27 16:42:41 -07:00
spa.c zed: Prevent special vdev to be replaced by hot spare 2022-11-04 11:33:47 -07:00
space_map.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
space_reftree.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
THIRDPARTYLICENSE.cityhash OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
THIRDPARTYLICENSE.cityhash.descrip OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
txg.c Fix the last two CFI callback prototype mismatches 2022-11-29 09:56:16 -08:00
uberblock.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
unique.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_cache.c Cleanup: Specify unsignedness on things that should not be signed 2022-09-27 16:42:41 -07:00
vdev_draid_rand.c Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
vdev_draid.c vdev_draid_lookup_map() should not iterate outside draid_maps 2022-09-12 12:51:17 -07:00
vdev_indirect_births.c Remove bcopy(), bzero(), bcmp() 2022-03-15 15:13:42 -07:00
vdev_indirect_mapping.c Remove bcopy(), bzero(), bcmp() 2022-03-15 15:13:42 -07:00
vdev_indirect.c Bump checksum error counter before reporting to ZED 2022-12-02 17:42:22 -08:00
vdev_initialize.c Cleanup: 64-bit kernel module parameters should use fixed width types 2022-10-13 10:03:29 -07:00
vdev_label.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_mirror.c Improve too large physical ashift handling 2022-09-08 10:30:53 -07:00
vdev_missing.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_queue.c Convert enum zio_flag to uint64_t 2022-10-27 09:54:54 -07:00
vdev_raidz_math_aarch64_neon_common.h Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_aarch64_neon.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_aarch64_neonx2.c Fix Clang 15 compilation errors 2022-11-30 13:46:26 -08:00
vdev_raidz_math_avx2.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_avx512bw.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_avx512f.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_impl.h Cleanup Raid-Z Typo fixes 2022-09-06 09:43:21 -07:00
vdev_raidz_math_powerpc_altivec_common.h Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_powerpc_altivec.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_scalar.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_sse2.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_ssse3.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math.c Convert some sprintf() calls to kmem_scnprintf() 2022-11-28 13:49:58 -08:00
vdev_raidz.c Bump checksum error counter before reporting to ZED 2022-12-02 17:42:22 -08:00
vdev_rebuild.c Fix sequential resilver drive failure race condition 2022-10-19 15:48:13 -07:00
vdev_removal.c Cleanup: Specify unsignedness on things that should not be signed 2022-09-27 16:42:41 -07:00
vdev_root.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_trim.c Propagate extent_bytes change to autotrim thread 2022-10-28 10:16:31 -07:00
vdev.c zed: unclean disk attachment faults the vdev 2022-11-29 09:24:10 -08:00
zap_leaf.c Optimize microzaps 2022-10-20 11:57:15 -07:00
zap_micro.c Optimize microzaps 2022-10-20 11:57:15 -07:00
zap.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zcp_get.c Cleanup: Address Clang's static analyzer's unused code complaints 2022-10-14 13:37:54 -07:00
zcp_global.c OpenZFS 8600 - ZFS channel programs - snapshot 2018-02-08 15:29:24 -08:00
zcp_iter.c module/*.ko: prune .data, global .rodata 2022-01-14 15:37:55 -08:00
zcp_set.c Support setting user properties in a channel program 2020-02-14 13:41:42 -08:00
zcp_synctask.c Add zfs.sync.snapshot_rename 2022-09-02 13:31:19 -07:00
zcp.c Fix too few arguments to formatting function 2022-10-29 13:04:52 -07:00
zfeature.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_byteswap.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_chksum.c Introduce kmem_scnprintf() 2022-10-29 13:05:11 -07:00
zfs_fm.c Remove an unused variable 2022-11-03 10:17:17 -07:00
zfs_fuid.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_ioctl.c Cleanup: 64-bit kernel module parameters should use fixed width types 2022-10-13 10:03:29 -07:00
zfs_log.c zfs_rename: support RENAME_* flags 2022-10-28 09:49:20 -07:00
zfs_onexit.c zfs_onexit_add_cb: make action_handle point to a uintptr_t 2022-11-03 09:52:12 -07:00
zfs_quota.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_ratelimit.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_replay.c Support idmapped mount in user namespace 2022-11-08 10:28:56 -08:00
zfs_rlock.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_sa.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_vnops.c Support idmapped mount in user namespace 2022-11-08 10:28:56 -08:00
zil.c Optionally skip zil_close during zvol_create_minor_impl 2022-11-08 12:38:08 -08:00
zio_checksum.c Fix double const qualifier declarations 2022-09-30 15:34:39 -07:00
zio_compress.c Fix declarations of non-global variables 2022-10-18 11:05:32 -07:00
zio_inject.c Cleanup: Switch to strlcpy from strncpy 2022-09-27 16:35:29 -07:00
zio.c zio can deadlock during device removal 2022-12-02 17:46:29 -08:00
zle.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zrlock.c Micro-optimize zrl_remove() 2022-11-29 09:26:03 -08:00
zthr.c Switch from _Noreturn to __attribute__((noreturn)) 2022-03-23 08:51:00 -07:00
zvol.c zfs_rename: support RENAME_* flags 2022-10-28 09:49:20 -07:00