mirror_zfs/module/zfs
George Wilson ddc751d56b OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio
PROBLEM
=======
It's possible for a parent zio to complete even though it has children
which have not completed. This can result in the following panic:
    > $C
    ffffff01809128c0 vpanic()
    ffffff01809128e0 mutex_panic+0x58(fffffffffb94c904, ffffff597dde7f80)
    ffffff0180912950 mutex_vector_enter+0x347(ffffff597dde7f80)
    ffffff01809129b0 zio_remove_child+0x50(ffffff597dde7c58, ffffff32bd901ac0,
    ffffff3373370908)
    ffffff0180912a40 zio_done+0x390(ffffff32bd901ac0)
    ffffff0180912a70 zio_execute+0x78(ffffff32bd901ac0)
    ffffff0180912b30 taskq_thread+0x2d0(ffffff33bae44140)
    ffffff0180912b40 thread_start+8()
    > ::status
    debugging crash dump vmcore.2 (64-bit) from batfs0390
    operating system: 5.11 joyent_20170911T171900Z (i86pc)
    image uuid: (not set)
    panic message: mutex_enter: bad mutex, lp=ffffff597dde7f80
    owner=ffffff3c59b39480 thread=ffffff0180912c40
    dump content: kernel pages only
The problem is that dbuf_prefetch along with l2arc can create a zio tree
which confuses the parent zio and allows it to complete with while children
still exist. Here's the scenario:
    zio tree:
        pio
         |--- lio
The parent zio, pio, has entered the zio_done stage and begins to check its
children to see there are still some that have not completed. In zio_done(),
the children are checked in the following order:
    zio_wait_for_children(zio, ZIO_CHILD_VDEV, ZIO_WAIT_DONE)
    zio_wait_for_children(zio, ZIO_CHILD_GANG, ZIO_WAIT_DONE)
    zio_wait_for_children(zio, ZIO_CHILD_DDT, ZIO_WAIT_DONE)
    zio_wait_for_children(zio, ZIO_CHILD_LOGICAL, ZIO_WAIT_DONE)
If pio, finds any child which has not completed then it stops executing and
goes to sleep. Each call to zio_wait_for_children() will grab the io_lock
while checking the particular child.
In this scenario, the pio has completed the first call to
zio_wait_for_children() to check for any ZIO_CHILD_VDEV children. Since
the only zio in the zio tree right now is the logical zio, lio, then it
completes that call and prepares to check the next child type.
In the meantime, the lio completes and in its callback creates a child vdev
zio, cio. The zio tree looks like this:
    zio tree:
        pio
         |--- lio
         |--- cio
The lio then grabs the parent's io_lock and removes itself.
    zio tree:
        pio
         |--- cio
The pio continues to run but has already completed its check for ZIO_CHILD_VDEV
and will erroneously complete. When the child zio, cio, completes it will panic
the system trying to reference the parent zio which has been destroyed.
SOLUTION
========
The fix is to rework the zio_wait_for_children() logic to accept a bitfield
for all the children types that it's interested in checking. The
io_lock will is held the entire time we check all the children types. Since
the function now accepts a bitfield, a simple ZIO_CHILD_BIT() macro is provided
to allow for the conversion between a ZIO_CHILD type and the bitfield used by
the zio_wiat_for_children logic.

Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Youzhong Yang <youzhong@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@omniti.com>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/8857
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/862ff6d99c
Issue #5918
Closes #7168
2018-02-14 15:30:09 -08:00
..
abd.c Update for cppcheck v1.80 2017-11-18 14:08:00 -08:00
arc.c Remove deprecated zfs_arc_p_aggressive_disable 2018-02-07 11:54:20 -08:00
blkptr.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
bplist.c
bpobj.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
bptree.c Native Encryption for ZFS on Linux 2017-08-14 10:36:48 -07:00
bqueue.c Call cv_signal() with mutex held 2017-06-26 14:36:49 -07:00
dbuf_stats.c Add dbuf hash and dbuf cache kstats 2018-01-29 10:24:52 -08:00
dbuf.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
ddt_zap.c
ddt.c Sequential scrub and resilvers 2017-11-15 17:27:01 -08:00
dmu_diff.c
dmu_object.c Raw sends must be able to decrease nlevels 2018-02-02 11:43:11 -08:00
dmu_objset.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
dmu_send.c Raw sends must be able to decrease nlevels 2018-02-02 11:43:11 -08:00
dmu_traverse.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
dmu_tx.c OpenZFS 8997 - ztest assertion failure in zil_lwb_write_issue 2018-01-26 20:19:46 -08:00
dmu_zfetch.c OpenZFS 8835 - Speculative prefetch in ZFS not working for misaligned reads 2018-01-19 09:31:29 -08:00
dmu.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
dnode_sync.c Raw sends must be able to decrease nlevels 2018-02-02 11:43:11 -08:00
dnode.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
dsl_bookmark.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
dsl_crypt.c Fix hash_lock / keystore.sk_dk_lock lock inversion 2018-02-04 14:07:13 -08:00
dsl_dataset.c OpenZFS 8520 - lzc_rollback 2018-02-09 10:27:58 -08:00
dsl_deadlist.c OpenZFS 5428 - provide fts(), reallocarray(), and strtonum() 2017-07-08 20:35:35 -07:00
dsl_deleg.c
dsl_destroy.c OpenZFS 8677 - Open-Context Channel Programs 2018-02-08 16:05:57 -08:00
dsl_dir.c OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
dsl_pool.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
dsl_prop.c Native Encryption for ZFS on Linux 2017-08-14 10:36:48 -07:00
dsl_scan.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
dsl_synctask.c
dsl_userhold.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
edonr_zfs.c
fm.c Linux 4.14 compat: CONFIG_GCC_PLUGIN_RANDSTRUCT 2017-11-28 17:33:48 -06:00
gzip.c GZIP compression offloading with QAT accelerator 2017-03-22 17:58:47 -07:00
hkdf.c Encryption patch follow-up 2017-10-11 16:54:48 -04:00
lz4.c Fix LZ4_uncompress_unknownOutputSize caused panic 2017-05-19 13:45:46 -07:00
lzjb.c
Makefile.in OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
metaslab.c Sequential scrub and resilvers 2017-11-15 17:27:01 -08:00
mmp.c mmp should use a fixed tag for spa_config locks 2018-02-12 11:30:38 -08:00
multilist.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
pathname.c
policy.c
qat_compress.c Bug fix in qat_compress.c for vmalloc addr check 2018-02-05 10:26:27 -08:00
qat_compress.h GZIP compression offloading with QAT accelerator 2017-03-22 17:58:47 -07:00
range_tree.c Sequential scrub and resilvers 2017-11-15 17:27:01 -08:00
refcount.c Linux 4.11 compat: avoid refcount_t name conflict 2017-02-28 16:10:18 -08:00
rrwlock.c
sa.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
sha256.c
skein_zfs.c
spa_boot.c
spa_config.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
spa_errlog.c Native Encryption for ZFS on Linux 2017-08-14 10:36:48 -07:00
spa_history.c Emit history events for 'zpool create' 2017-10-23 09:45:59 -07:00
spa_misc.c Extend deadman logic 2018-01-25 13:40:38 -08:00
spa_stats.c Update the default for zfs_txg_history 2017-09-29 15:58:52 -07:00
spa.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
space_map.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
space_reftree.c
trace.c
txg.c OpenZFS 8585 - improve batching done in zil_commit() 2017-12-05 09:39:16 -08:00
uberblock.c Multi-modifier protection (MMP) 2017-07-13 13:54:00 -04:00
unique.c
vdev_cache.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
vdev_disk.c Fix printk() calls missing log level 2017-09-25 10:38:27 -07:00
vdev_file.c Skip spurious resilver IO on raidz vdev 2017-05-12 17:28:03 -07:00
vdev_label.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
vdev_mirror.c Linux 4.14 compat: CONFIG_GCC_PLUGIN_RANDSTRUCT 2017-11-28 17:33:48 -06:00
vdev_missing.c Skip spurious resilver IO on raidz vdev 2017-05-12 17:28:03 -07:00
vdev_queue.c Support re-prioritizing asynchronous prefetches 2017-12-21 09:13:06 -08:00
vdev_raidz_math_aarch64_neon_common.h
vdev_raidz_math_aarch64_neon.c
vdev_raidz_math_aarch64_neonx2.c
vdev_raidz_math_avx2.c
vdev_raidz_math_avx512bw.c
vdev_raidz_math_avx512f.c
vdev_raidz_math_impl.h
vdev_raidz_math_scalar.c
vdev_raidz_math_sse2.c
vdev_raidz_math_ssse3.c
vdev_raidz_math.c OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
vdev_raidz.c Linux 4.14 compat: CONFIG_GCC_PLUGIN_RANDSTRUCT 2017-11-28 17:33:48 -06:00
vdev_root.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
vdev.c Extend deadman logic 2018-01-25 13:40:38 -08:00
zap_leaf.c Handle zap_add() failures in mixed case mode 2018-02-09 10:15:53 -08:00
zap_micro.c Handle zap_add() failures in mixed case mode 2018-02-09 10:15:53 -08:00
zap.c Handle zap_add() failures in mixed case mode 2018-02-09 10:15:53 -08:00
zcp_get.c OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp_global.c OpenZFS 8600 - ZFS channel programs - snapshot 2018-02-08 15:29:24 -08:00
zcp_iter.c OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp_synctask.c OpenZFS 8677 - Open-Context Channel Programs 2018-02-08 16:05:57 -08:00
zcp.c OpenZFS 8677 - Open-Context Channel Programs 2018-02-08 16:05:57 -08:00
zfeature.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
zfs_acl.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_byteswap.c
zfs_ctldir.c Use SET_ERROR for constant non-zero return codes 2017-08-02 21:16:12 -07:00
zfs_debug.c Add line info and SET_ERROR() to ZFS debug log 2017-07-25 23:09:48 -07:00
zfs_dir.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_fm.c OpenZFS 8731 - ASSERT3U(nui64s, <=, UINT16_MAX) fails for large blocks 2018-01-25 10:02:11 -08:00
zfs_fuid.c Rename zfs_sb_t -> zfsvfs_t 2017-03-10 09:51:33 -08:00
zfs_ioctl.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_log.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_onexit.c
zfs_ratelimit.c Add libtpool (thread pools) 2017-08-09 15:31:08 -07:00
zfs_replay.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_rlock.c
zfs_sa.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_vfsops.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_vnops.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_znode.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zil.c Change os->os_next_write_raw to work per txg 2018-02-02 11:44:53 -08:00
zio_checksum.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
zio_compress.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
zio_crypt.c Encryption Stability and On-Disk Format Fixes 2018-02-02 11:37:16 -08:00
zio_inject.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
zio.c OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio 2018-02-14 15:30:09 -08:00
zle.c Fix zle_decompress out of bound access 2018-02-09 10:08:05 -08:00
zpl_ctldir.c Linux 4.12 compat: CURRENT_TIME removed 2017-05-10 09:30:48 -07:00
zpl_export.c
zpl_file.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zpl_inode.c Linux 4.12 compat: CURRENT_TIME removed 2017-05-10 09:30:48 -07:00
zpl_super.c Linux 4.16 compat: inode_set_iversion() 2018-02-08 21:25:19 -08:00
zpl_xattr.c Update for cppcheck v1.80 2017-11-18 14:08:00 -08:00
zrlock.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
zvol.c Encryption Stability and On-Disk Format Fixes 2018-02-02 11:37:16 -08:00