mirror_zfs/module/zfs
George Wilson 07ce5d7390 OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio
PROBLEM
=======
It's possible for a parent zio to complete even though it has children
which have not completed. This can result in the following panic:
    > $C
    ffffff01809128c0 vpanic()
    ffffff01809128e0 mutex_panic+0x58(fffffffffb94c904, ffffff597dde7f80)
    ffffff0180912950 mutex_vector_enter+0x347(ffffff597dde7f80)
    ffffff01809129b0 zio_remove_child+0x50(ffffff597dde7c58, ffffff32bd901ac0,
    ffffff3373370908)
    ffffff0180912a40 zio_done+0x390(ffffff32bd901ac0)
    ffffff0180912a70 zio_execute+0x78(ffffff32bd901ac0)
    ffffff0180912b30 taskq_thread+0x2d0(ffffff33bae44140)
    ffffff0180912b40 thread_start+8()
    > ::status
    debugging crash dump vmcore.2 (64-bit) from batfs0390
    operating system: 5.11 joyent_20170911T171900Z (i86pc)
    image uuid: (not set)
    panic message: mutex_enter: bad mutex, lp=ffffff597dde7f80
    owner=ffffff3c59b39480 thread=ffffff0180912c40
    dump content: kernel pages only
The problem is that dbuf_prefetch along with l2arc can create a zio tree
which confuses the parent zio and allows it to complete with while children
still exist. Here's the scenario:
    zio tree:
        pio
         |--- lio
The parent zio, pio, has entered the zio_done stage and begins to check its
children to see there are still some that have not completed. In zio_done(),
the children are checked in the following order:
    zio_wait_for_children(zio, ZIO_CHILD_VDEV, ZIO_WAIT_DONE)
    zio_wait_for_children(zio, ZIO_CHILD_GANG, ZIO_WAIT_DONE)
    zio_wait_for_children(zio, ZIO_CHILD_DDT, ZIO_WAIT_DONE)
    zio_wait_for_children(zio, ZIO_CHILD_LOGICAL, ZIO_WAIT_DONE)
If pio, finds any child which has not completed then it stops executing and
goes to sleep. Each call to zio_wait_for_children() will grab the io_lock
while checking the particular child.
In this scenario, the pio has completed the first call to
zio_wait_for_children() to check for any ZIO_CHILD_VDEV children. Since
the only zio in the zio tree right now is the logical zio, lio, then it
completes that call and prepares to check the next child type.
In the meantime, the lio completes and in its callback creates a child vdev
zio, cio. The zio tree looks like this:
    zio tree:
        pio
         |--- lio
         |--- cio
The lio then grabs the parent's io_lock and removes itself.
    zio tree:
        pio
         |--- cio
The pio continues to run but has already completed its check for ZIO_CHILD_VDEV
and will erroneously complete. When the child zio, cio, completes it will panic
the system trying to reference the parent zio which has been destroyed.
SOLUTION
========
The fix is to rework the zio_wait_for_children() logic to accept a bitfield
for all the children types that it's interested in checking. The
io_lock will is held the entire time we check all the children types. Since
the function now accepts a bitfield, a simple ZIO_CHILD_BIT() macro is provided
to allow for the conversion between a ZIO_CHILD type and the bitfield used by
the zio_wiat_for_children logic.

Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Youzhong Yang <youzhong@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@omniti.com>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/8857
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/862ff6d99c
Issue #5918
Closes #7168
2018-03-14 16:10:37 -07:00
..
abd.c Update for cppcheck v1.80 2018-01-30 10:27:31 -06:00
arc.c Remove deprecated zfs_arc_p_aggressive_disable 2018-03-14 16:10:36 -07:00
blkptr.c OpenZFS 8067 - zdb should be able to dump literal embedded block pointer 2017-07-07 11:28:01 -07:00
bplist.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
bpobj.c Don't dirty bpobj if it has no entries 2017-05-26 11:42:10 -07:00
bptree.c OpenZFS 7082 - bptree_iterate() passes wrong args to zfs_dbgmsg() 2017-01-17 14:49:24 -08:00
bqueue.c Call cv_signal() with mutex held 2017-06-26 14:36:49 -07:00
dbuf_stats.c Improved dnode allocation and dmu_hold_impl() (#6611) 2017-09-13 15:46:15 -07:00
dbuf.c Fix ARC hit rate 2018-01-30 10:27:31 -06:00
ddt_zap.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
ddt.c Cache ddt_get_dedup_dspace() value if there was no ddt changes 2016-12-02 16:59:35 -07:00
dmu_diff.c OpenZFS 6950 - ARC should cache compressed data 2016-09-13 09:58:33 -07:00
dmu_object.c Improved dnode allocation and dmu_hold_impl() (#6611) 2017-09-13 15:46:15 -07:00
dmu_objset.c dmu_objset: release bonus buffer in failure path 2018-01-30 10:27:30 -06:00
dmu_send.c Skip FREEOBJECTS for objects which can't exist 2017-10-16 10:57:55 -07:00
dmu_traverse.c Fix zdb -c traverse stop on damaged objset root 2018-03-14 16:10:36 -07:00
dmu_tx.c Call commit callbacks from the tail of the list 2018-01-30 10:27:31 -06:00
dmu_zfetch.c OpenZFS 8835 - Speculative prefetch in ZFS not working for misaligned reads 2018-01-30 10:27:31 -06:00
dmu.c Fix dirty check in dmu_offset_next() 2017-11-21 13:11:29 -06:00
dnode_sync.c OpenZFS 7968 - multi-threaded spa_sync() 2017-03-20 18:36:00 -07:00
dnode.c Improved dnode allocation and dmu_hold_impl() (#6611) 2017-09-13 15:46:15 -07:00
dsl_bookmark.c OpenZFS 8377 - Panic in bookmark deletion 2017-06-30 11:11:01 -07:00
dsl_dataset.c OpenZFS 7600 - zfs rollback should pass target snapshot to kernel 2017-07-04 15:29:52 -07:00
dsl_deadlist.c OpenZFS 5428 - provide fts(), reallocarray(), and strtonum() 2017-07-08 20:35:35 -07:00
dsl_deleg.c Performance optimization of AVL tree comparator functions 2016-08-31 14:35:34 -07:00
dsl_destroy.c OpenZFS 7254 - ztest failed assertion in ztest_dataset_dirobj_verify: dirobjs + 1 == usedobjs 2017-01-27 11:43:42 -08:00
dsl_dir.c Reduce stack usage of dsl_dir_tempreserve_impl 2017-06-12 11:41:03 -07:00
dsl_pool.c Multi-modifier protection (MMP) 2017-07-13 13:54:00 -04:00
dsl_prop.c Fix dsl_props_set_sync_impl to work with nested nvlist 2016-12-20 18:46:59 -08:00
dsl_scan.c OpenZFS 6939 - add sysevents to zfs core for commands 2017-07-12 21:28:13 -07:00
dsl_synctask.c Illumos 4951 - ZFS administrative commands should use reserved space 2015-05-04 09:41:10 -07:00
dsl_userhold.c OpenZFS 5428 - provide fts(), reallocarray(), and strtonum() 2017-07-08 20:35:35 -07:00
edonr_zfs.c DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
fm.c Linux 4.14 compat: CONFIG_GCC_PLUGIN_RANDSTRUCT 2017-12-04 17:21:39 -08:00
gzip.c GZIP compression offloading with QAT accelerator 2017-03-22 17:58:47 -07:00
lz4.c Fix LZ4_uncompress_unknownOutputSize caused panic 2017-05-19 13:45:46 -07:00
lzjb.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
Makefile.in Multi-modifier protection (MMP) 2017-07-13 13:54:00 -04:00
metaslab.c OpenZFS 8023 - Panic destroying a metaslab deferred range tree 2017-04-09 16:12:35 -07:00
mmp.c mmp should use a fixed tag for spa_config locks 2018-03-14 16:10:37 -07:00
multilist.c OpenZFS 7968 - multi-threaded spa_sync() 2017-03-20 18:36:00 -07:00
pathname.c Add pn_alloc()/pn_free() functions 2016-04-21 09:49:25 -07:00
policy.c codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
qat_compress.c Bug fix in qat_compress.c for vmalloc addr check 2018-03-14 16:10:36 -07:00
qat_compress.h GZIP compression offloading with QAT accelerator 2017-03-22 17:58:47 -07:00
range_tree.c Performance optimization of AVL tree comparator functions 2016-08-31 14:35:34 -07:00
refcount.c Linux 4.11 compat: avoid refcount_t name conflict 2017-02-28 16:10:18 -08:00
rrwlock.c Fix spelling 2017-01-03 11:31:18 -06:00
sa.c OpenZFS 8061 - sa_find_idx_tab can be declared more type-safely 2017-04-14 11:11:28 -07:00
sha256.c DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
skein_zfs.c DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
spa_boot.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
spa_config.c Remove vn_rename and vn_remove dependency 2018-03-14 16:10:36 -07:00
spa_errlog.c OpenZFS 5428 - provide fts(), reallocarray(), and strtonum() 2017-07-08 20:35:35 -07:00
spa_history.c Emit history events for 'zpool create' 2017-12-04 17:21:03 -08:00
spa_misc.c Multi-modifier protection (MMP) 2017-07-13 13:54:00 -04:00
spa_stats.c Multi-modifier protection (MMP) 2017-07-13 13:54:00 -04:00
spa.c Fix multihost stale cache file import 2017-12-18 10:31:01 -08:00
space_map.c OpenZFS 8023 - Panic destroying a metaslab deferred range tree 2017-04-09 16:12:35 -07:00
space_reftree.c OpenZFS 6328 - Fix cstyle errors in zfs codebase 2017-01-12 09:42:11 -08:00
trace.c OpenZFS 6531 - Provide mechanism to artificially limit disk performance 2016-05-26 10:11:51 -07:00
txg.c OpenZFS 8063 - verify that we do not attempt to access inactive txg 2017-05-10 13:52:22 -04:00
uberblock.c Multi-modifier protection (MMP) 2017-07-13 13:54:00 -04:00
unique.c Performance optimization of AVL tree comparator functions 2016-08-31 14:35:34 -07:00
vdev_cache.c Fix wrong offset args in vdev_cache_write 2017-03-28 11:06:22 -07:00
vdev_disk.c Linux 4.14 compat: IO acct, global_page_state, etc (#6655) 2017-09-19 14:24:34 -07:00
vdev_file.c Skip spurious resilver IO on raidz vdev 2017-05-12 17:28:03 -07:00
vdev_label.c Use linear abd in vdev_copy_uberblocks() 2017-10-16 10:57:55 -07:00
vdev_mirror.c vdev_mirror: load balancing fixes 2018-01-30 10:27:30 -06:00
vdev_missing.c Skip spurious resilver IO on raidz vdev 2017-05-12 17:28:03 -07:00
vdev_queue.c vdev_mirror: load balancing fixes 2018-01-30 10:27:30 -06:00
vdev_raidz_math_aarch64_neon_common.h ABD raidz NEON support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_aarch64_neon.c codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz_math_aarch64_neonx2.c ABD raidz NEON support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_avx2.c ABD raidz avx512f support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_avx512bw.c ABD: Adapt avx512bw raidz assembly 2016-12-15 17:31:33 -08:00
vdev_raidz_math_avx512f.c Use cstyle -cpP in make cstyle check 2016-12-12 10:46:26 -08:00
vdev_raidz_math_impl.h codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz_math_scalar.c ABD Vectorized raidz 2016-11-29 14:34:33 -08:00
vdev_raidz_math_sse2.c ABD raidz avx512f support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_ssse3.c codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz_math.c codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz.c Linux 4.14 compat: CONFIG_GCC_PLUGIN_RANDSTRUCT 2017-12-04 17:21:39 -08:00
vdev_root.c Skip spurious resilver IO on raidz vdev 2017-05-12 17:28:03 -07:00
vdev.c OpenZFS 6939 - add sysevents to zfs core for commands 2017-07-12 21:28:13 -07:00
zap_leaf.c Handle zap_add() failures in mixed case mode 2018-03-14 16:10:37 -07:00
zap_micro.c Handle zap_add() failures in mixed case mode 2018-03-14 16:10:37 -07:00
zap.c Handle zap_add() failures in mixed case mode 2018-03-14 16:10:37 -07:00
zfeature_common.c OpenZFS 2932 - support crash dumps to raidz, etc. pools 2017-04-10 10:24:17 -07:00
zfeature.c OpenZFS 6328 - Fix cstyle errors in zfs codebase 2017-01-12 09:42:11 -08:00
zfs_acl.c OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable 2018-03-14 16:10:36 -07:00
zfs_byteswap.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
zfs_ctldir.c Linux 4.9 compat: fix zfs_ctldir xattr handling 2017-06-05 11:26:25 -07:00
zfs_debug.c Add line info and SET_ERROR() to ZFS debug log 2017-07-25 23:09:48 -07:00
zfs_dir.c Handle zap_add() failures in mixed case mode 2018-03-14 16:10:37 -07:00
zfs_fm.c OpenZFS 6939 - add sysevents to zfs core for commands 2017-07-12 21:28:13 -07:00
zfs_fuid.c Rename zfs_sb_t -> zfsvfs_t 2017-03-10 09:51:33 -08:00
zfs_ioctl.c Fix zfs_ioc_pool_sync should not use fnvlist 2018-01-30 10:27:30 -06:00
zfs_log.c OpenZFS 7578 - Fix/improve some aspects of ZIL writing 2017-06-09 09:15:37 -07:00
zfs_onexit.c zfsdev_getminor() should check for invalid file handles 2015-06-22 17:02:13 -07:00
zfs_replay.c Rename zfs_sb_t -> zfsvfs_t 2017-03-10 09:51:33 -08:00
zfs_rlock.c Fix spelling 2017-01-03 11:31:18 -06:00
zfs_sa.c Modifying XATTRs doesnt change the ctime 2017-09-13 16:05:18 -07:00
zfs_vfsops.c Revert "Long hold the dataset during upgrade" 2017-12-06 13:25:40 -06:00
zfs_vnops.c Handle zap_add() failures in mixed case mode 2018-03-14 16:10:37 -07:00
zfs_znode.c Fix dnode allocation race 2017-08-08 10:17:33 -07:00
zil.c Preserve itx alloc size for zio_data_buf_free() 2017-12-04 17:21:39 -08:00
zio_checksum.c Remove dependency on linear ABD 2017-03-29 12:24:51 -07:00
zio_compress.c DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
zio_inject.c Inject zinject(8) a percentage amount of dev errs 2017-06-16 17:21:11 -07:00
zio.c OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio 2018-03-14 16:10:37 -07:00
zle.c Fix zle_decompress out of bound access 2018-03-14 16:10:36 -07:00
zpl_ctldir.c Linux 4.12 compat: CURRENT_TIME removed 2017-05-10 09:30:48 -07:00
zpl_export.c Use cstyle -cpP in make cstyle check 2016-12-12 10:46:26 -08:00
zpl_file.c Rename zfs_sb_t -> zfsvfs_t 2017-03-10 09:51:33 -08:00
zpl_inode.c Linux 4.12 compat: CURRENT_TIME removed 2017-05-10 09:30:48 -07:00
zpl_super.c Linux 4.16 compat: inode_set_iversion() 2018-03-14 16:10:36 -07:00
zpl_xattr.c Update for cppcheck v1.80 2018-01-30 10:27:31 -06:00
zrlock.c OpenZFS 3746 - ZRLs are racy 2017-01-23 10:35:58 -08:00
zvol.c Update for cppcheck v1.80 2018-01-30 10:27:31 -06:00