mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-01-05 12:56:01 +03:00

History

Serapheim Dimitropoulos 0f8ff49eb6 dmu_tx_wait() hang likely due to cv_signal() in dsl_pool_dirty_delta() Even though the bug's writeup (Github issue #9136) is very detailed, we still don't know exactly how we got to that state, thus I wasn't able to reproduce the bug. That said, we can make an educated guess combining the information on filled issue with the code. From the fact that `dp_dirty_total` was 0 (which is less than `zfs_dirty_data_max`) we know that there was one thread that set it to 0 and then signaled one of the waiters of `dp_spaceavail_cv` [see `dsl_pool_dirty_delta()` which is also the only place that `dp_dirty_total` is changed]. Thus, the only logical explaination then for the bug being hit is that the waiter that just got awaken didn't go through `dsl_pool_dirty_data()`. Given that this function is only called by `dsl_pool_dirty_space()` or `dsl_pool_undirty_space()` I can only think of two possible ways of the above scenario happening: [1] The waiter didn't call into any of the two functions - which I find highly unlikely (i.e. why wait on `dp_spaceavail_cv` to begin with?). [2] The waiter did call in one of the above function but it passed 0 as the space/delta to be dirtied (or undirtied) and then the callee returned immediately (e.g both `dsl_pool_dirty_space()` and `dsl_pool_undirty_space()` return immediately when space is 0). In any case and no matter how we got there, the easy fix would be to just broadcast to all waiters whenever `dp_dirty_total` hits 0. That said and given that we've never hit this before, it would make sense to think more on why the above situation occured. Attempting to mimic what Prakash was doing in the issue filed, I created a dataset with `sync=always` and started doing contiguous writes in a file within that dataset. I observed with DTrace that even though we update the pool's dirty data accounting when we would dirty stuff, the accounting wouldn't be decremented incrementally as we were done with the ZIOs of those writes (the reason being that `dbuf_write_physdone()` isn't be called as we go through the override code paths, and thus `dsl_pool_undirty_space()` is never called). As a result we'd have to wait until we get to `dsl_pool_sync()` where we zero out all dirty data accounting for the pool and the current TXG's metadata. In addition, as Matt noted and I later verified, the same issue would arise when using dedup. In both cases (sync & dedup) we shouldn't have to wait until `dsl_pool_sync()` zeros out the accounting data. According to the comment in that part of the code, the reasons why we do the zeroing, have nothing to do with what we observe: ```` /* * We have written all of the accounted dirty data, so our * dp_space_towrite should now be zero. However, some seldom-used * code paths do not adhere to this (e.g. dbuf_undirty(), also * rounding error in dbuf_write_physdone). * Shore up the accounting of any dirtied space now. */ dsl_pool_undirty_space(dp, dp->dp_dirty_pertxg[txg & TXG_MASK], txg); ```` Ideally what we want to do is to undirty in the accounting exactly what we dirty (I use the word ideally as we can still have rounding errors). This would make the behavior of the system more clear and predictable. Another interesting issue that I observed with DTrace was that we wouldn't update any of the pool's dirty data accounting whenever we would dirty and/or undirty MOS data. In addition, every time we would change the size of a dbuf through `dbuf_new_size()` we wouldn't update the accounted space dirtied in the appropriate dirty record, so when ZIOs are done we would undirty less that we dirtied from the pool's accounting point of view. For the first two issues observed (sync & dedup) this patch ensures that we still update the pool's accounting when we undirty data, regardless of the write being physical or not. For changes in the MOS, we first ensure to zero out the pool's dirty data accounting in `dsl_pool_sync()` after we synced the MOS. Then we can go ahead and enable the update of the pool's dirty data accounting wheneve we change MOS data. Another fix is that we now update the accounting explicitly for counting errors in `dbuf_write_done()`. Finally, `dbuf_new_size()` updates the accounted space of the appropriate dirty record correctly now. The problem is that we still don't know how the bug came up in the issue filled. That said the issues fixed seem to be very relevant, so instead of going with the broadcasting solution right away, I decided to leave this patch as is. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> External-issue: DLPX-47285 Closes #9137		2019-08-15 17:53:53 -06:00
..
abd.c	single-chunk scatter ABDs can be treated as linear	2019-06-11 09:02:31 -07:00
aggsum.c	OpenZFS 9688 - aggsum_fini leaks memory	2018-10-19 12:08:03 -07:00
arc.c	hdr_recl calls zthr_wakeup() on destroyed zthr	2019-07-18 12:55:29 -07:00
blkptr.c	Undo c89 workarounds to match with upstream	2017-11-04 13:25:13 -07:00
bplist.c	Fast Clone Deletion	2019-07-26 10:54:14 -07:00
bpobj.c	Fast Clone Deletion	2019-07-26 10:54:14 -07:00
bptree.c	Implement Redacted Send/Receive	2019-06-19 09:48:12 -07:00
bqueue.c	Implement Redacted Send/Receive	2019-06-19 09:48:12 -07:00
cityhash.c	OpenZFS 8484 - Implement aggregate sum and use for arc counters	2018-06-06 09:35:59 -07:00
dataset_kstats.c	port async unlinked drain from illumos-nexenta	2019-02-12 10:41:15 -08:00
dbuf_stats.c	Prefix all refcount functions with zfs_	2018-10-01 10:42:05 -07:00
dbuf.c	dmu_tx_wait() hang likely due to cv_signal() in dsl_pool_dirty_delta()	2019-08-15 17:53:53 -06:00
ddt_zap.c	fat zap should prefetch when iterating	2019-06-12 13:13:09 -07:00
ddt.c	Remove dedupditto functionality	2019-06-19 14:54:02 -07:00
dmu_diff.c	Implement Redacted Send/Receive	2019-06-19 09:48:12 -07:00
dmu_object.c	Fix send/recv lost spill block	2019-05-07 15:18:44 -07:00
dmu_objset.c	dmu_tx_wait() hang likely due to cv_signal() in dsl_pool_dirty_delta()	2019-08-15 17:53:53 -06:00
dmu_recv.c	Allow unencrypted children of encrypted datasets	2019-06-20 12:29:51 -07:00
dmu_redact.c	Implement Redacted Send/Receive	2019-06-19 09:48:12 -07:00
dmu_send.c	Implement Redacted Send/Receive	2019-06-19 09:48:12 -07:00
dmu_traverse.c	Implement Redacted Send/Receive	2019-06-19 09:48:12 -07:00
dmu_tx.c	Improve performance by using dmu_tx_hold_*_by_dnode()	2019-07-30 09:18:30 -07:00
dmu_zfetch.c	Replace zf_rwlock with a mutex	2019-07-25 11:57:58 -07:00
dmu.c	dmu_tx_wait() hang likely due to cv_signal() in dsl_pool_dirty_delta()	2019-08-15 17:53:53 -06:00
dnode_sync.c	Decrease contention on dn_struct_rwlock	2019-07-08 13:18:50 -07:00
dnode.c	Assert that a dnode's bonuslen never exceeds its recorded size	2019-08-15 08:44:57 -06:00
dsl_bookmark.c	Implement Redacted Send/Receive	2019-06-19 09:48:12 -07:00
dsl_crypt.c	Remove VERIFY from dsl_dataset_crypt_stats()	2019-07-05 16:53:14 -07:00
dsl_dataset.c	Mark dsl_livelist_should_disable() static	2019-08-13 21:16:23 -06:00
dsl_deadlist.c	Fast Clone Deletion	2019-07-26 10:54:14 -07:00
dsl_deleg.c	Update build system and packaging	2018-05-29 16:00:33 -07:00
dsl_destroy.c	Fast Clone Deletion	2019-07-26 10:54:14 -07:00
dsl_dir.c	Fast Clone Deletion	2019-07-26 10:54:14 -07:00
dsl_pool.c	dmu_tx_wait() hang likely due to cv_signal() in dsl_pool_dirty_delta()	2019-08-15 17:53:53 -06:00
dsl_prop.c	Update build system and packaging	2018-05-29 16:00:33 -07:00
dsl_scan.c	Fast Clone Deletion	2019-07-26 10:54:14 -07:00
dsl_synctask.c	OpenZFS 9425 - channel programs can be interrupted	2019-06-22 16:51:46 -07:00
dsl_userhold.c	zfs should optionally send holds	2019-02-15 12:41:38 -08:00
edonr_zfs.c	DLPX-44812 integrate EP-220 large memory scalability	2016-11-29 14:34:27 -08:00
fm.c	Don't wakeup unnecessarily in 'zpool events -f'	2019-08-05 11:35:47 -07:00
gzip.c	Update build system and packaging	2018-05-29 16:00:33 -07:00
hkdf.c	Encryption patch follow-up	2017-10-11 16:54:48 -04:00
lz4.c	Reword comment in lz4_compress_zfs	2019-05-02 16:46:04 -07:00
lzjb.c	Change KM_PUSHPAGE -> KM_SLEEP	2015-01-16 14:41:26 -08:00
Makefile.in	Log Spacemap Project	2019-07-16 10:11:49 -07:00
metaslab.c	Metaslab max_size should be persisted while unloaded	2019-08-05 14:34:27 -07:00
mmp.c	MMP interval and fail_intervals in uberblock	2019-03-21 12:47:57 -07:00
multilist.c	Avoid extra taskq_dispatch() calls by DMU	2019-06-25 12:03:38 -07:00
objlist.c	Implement Redacted Send/Receive	2019-06-19 09:48:12 -07:00
pathname.c	Disable unused pathname::pn_path* (unneeded in Linux)	2019-07-15 13:57:56 -07:00
policy.c	Implement secpolicy_vnode_setid_retain()	2019-07-26 13:52:30 -07:00
qat_compress.c	Code improvement and bug fixes for QAT support	2019-04-16 12:38:36 -07:00
qat_crypt.c	Code improvement and bug fixes for QAT support	2019-04-16 12:38:36 -07:00
qat.c	Code improvement and bug fixes for QAT support	2019-04-16 12:38:36 -07:00
qat.h	Code improvement and bug fixes for QAT support	2019-04-16 12:38:36 -07:00
range_tree.c	Metaslab max_size should be persisted while unloaded	2019-08-05 14:34:27 -07:00
refcount.c	Prevent race in blkptr_verify against device removal	2019-08-13 21:24:43 -06:00
rrwlock.c	8659 static dtrace probes unavailable on non-GPL modules	2019-07-08 11:20:53 -07:00
sa.c	Improve performance by using dmu_tx_hold_*_by_dnode()	2019-07-30 09:18:30 -07:00
sha256.c	SHA256 QAT acceleration	2018-03-15 10:53:58 -07:00
skein_zfs.c	DLPX-44812 integrate EP-220 large memory scalability	2016-11-29 14:34:27 -08:00
spa_boot.c	Add linux kernel module support	2010-08-31 13:41:58 -07:00
spa_checkpoint.c	Get rid of space_map_update() for ms_synced_length	2019-02-12 10:38:11 -08:00
spa_config.c	Remove vn_set_fs_pwd()/vn_set_pwd() (no need to be at / during insmod)	2019-05-29 16:18:14 -07:00
spa_errlog.c	Update build system and packaging	2018-05-29 16:00:33 -07:00
spa_history.c	Fast Clone Deletion	2019-07-26 10:54:14 -07:00
spa_log_spacemap.c	Sort log spacemap tunables in alphabetical order	2019-08-12 09:49:07 -07:00
spa_misc.c	Prevent race in blkptr_verify against device removal	2019-08-13 21:24:43 -06:00
spa_stats.c	Restrict kstats and print real pointers	2019-04-04 18:57:06 -07:00
spa.c	spa_load_verify() may consume too much memory	2019-08-13 08:11:57 -06:00
space_map.c	Log Spacemap Project	2019-07-16 10:11:49 -07:00
space_reftree.c	OpenZFS 7614, 9064 - zfs device evacuation/removal	2018-04-14 12:16:17 -07:00
THIRDPARTYLICENSE.cityhash	OpenZFS 8484 - Implement aggregate sum and use for arc counters	2018-06-06 09:35:59 -07:00
THIRDPARTYLICENSE.cityhash.descrip	OpenZFS 8484 - Implement aggregate sum and use for arc counters	2018-06-06 09:35:59 -07:00
trace.c	8659 static dtrace probes unavailable on non-GPL modules	2019-07-08 11:20:53 -07:00
txg.c	Log Spacemap Project	2019-07-16 10:11:49 -07:00
uberblock.c	MMP interval and fail_intervals in uberblock	2019-03-21 12:47:57 -07:00
unique.c	Performance optimization of AVL tree comparator functions	2016-08-31 14:35:34 -07:00
vdev_cache.c	Update build system and packaging	2018-05-29 16:00:33 -07:00
vdev_disk.c	Revert "Fail early on bio corruption confirmed on 5.2-rc1"	2019-07-05 20:38:56 -07:00
vdev_file.c	Update vdev_ops_t from illumos	2019-06-20 18:29:02 -07:00
vdev_indirect_births.c	Fixes: #8934 Large kmem_alloc	2019-07-10 15:54:49 -07:00
vdev_indirect_mapping.c	Get rid of space_map_update() for ms_synced_length	2019-02-12 10:38:11 -08:00
vdev_indirect.c	Log Spacemap Project	2019-07-16 10:11:49 -07:00
vdev_initialize.c	Add TRIM support	2019-03-29 09:13:20 -07:00
vdev_label.c	panic in removal_remap test on 4K devices	2019-06-13 13:12:39 -07:00
vdev_mirror.c	Update vdev_ops_t from illumos	2019-06-20 18:29:02 -07:00
vdev_missing.c	Update vdev_ops_t from illumos	2019-06-20 18:29:02 -07:00
vdev_queue.c	Move write aggregation memory copy out of vq_lock	2019-06-13 13:08:24 -07:00
vdev_raidz_math_aarch64_neon_common.h	Linux 5.0 compat: ASM_BUG macro	2019-05-08 10:18:40 -07:00
vdev_raidz_math_aarch64_neon.c	Linux 5.0 compat: SIMD compatibility	2019-07-12 09:31:20 -07:00
vdev_raidz_math_aarch64_neonx2.c	Linux 5.0 compat: SIMD compatibility	2019-07-12 09:31:20 -07:00
vdev_raidz_math_avx2.c	Linux 5.0 compat: SIMD compatibility	2019-07-12 09:31:20 -07:00
vdev_raidz_math_avx512bw.c	Linux 5.0 compat: SIMD compatibility	2019-07-12 09:31:20 -07:00
vdev_raidz_math_avx512f.c	Linux 5.0 compat: SIMD compatibility	2019-07-12 09:31:20 -07:00
vdev_raidz_math_impl.h	codebase style improvements for OpenZFS 6459 port	2017-01-22 13:25:40 -08:00
vdev_raidz_math_scalar.c	ABD Vectorized raidz	2016-11-29 14:34:33 -08:00
vdev_raidz_math_sse2.c	Linux 5.0 compat: SIMD compatibility	2019-07-12 09:31:20 -07:00
vdev_raidz_math_ssse3.c	Linux 5.0 compat: SIMD compatibility	2019-07-12 09:31:20 -07:00
vdev_raidz_math.c	Linux 5.0 compat: SIMD compatibility	2019-07-12 09:31:20 -07:00
vdev_raidz.c	Update vdev_ops_t from illumos	2019-06-20 18:29:02 -07:00
vdev_removal.c	Log Spacemap Project	2019-07-16 10:11:49 -07:00
vdev_root.c	Update vdev_ops_t from illumos	2019-06-20 18:29:02 -07:00
vdev_trim.c	Add TRIM support	2019-03-29 09:13:20 -07:00
vdev.c	Log Spacemap Project	2019-07-16 10:11:49 -07:00
zap_leaf.c	Off-by-one in zap_leaf_array_create()	2019-01-18 09:58:46 -08:00
zap_micro.c	fat zap should prefetch when iterating	2019-06-12 13:13:09 -07:00
zap.c	fat zap should prefetch when iterating	2019-06-12 13:13:09 -07:00
zcp_get.c	Fix get_special_prop() build failure	2019-07-16 14:14:12 -07:00
zcp_global.c	OpenZFS 8600 - ZFS channel programs - snapshot	2018-02-08 15:29:24 -08:00
zcp_iter.c	Introduce getting holds and listing bookmarks through ZCP	2019-08-12 10:02:34 -07:00
zcp_synctask.c	OpenZFS 9166 - zfs storage pool checkpoint	2018-06-26 10:07:42 -07:00
zcp.c	OpenZFS 9425 - channel programs can be interrupted	2019-06-22 16:51:46 -07:00
zfeature.c	Consistently captialize GUID for features	2019-04-16 10:01:51 -07:00
zfs_acl.c	Update build system and packaging	2018-05-29 16:00:33 -07:00
zfs_byteswap.c	Update build system and packaging	2018-05-29 16:00:33 -07:00
zfs_ctldir.c	Change boolean-like uint8_t fields in znode_t to boolean_t	2019-08-13 07:58:02 -06:00
zfs_debug.c	Restrict kstats and print real pointers	2019-04-04 18:57:06 -07:00
zfs_dir.c	port async unlinked drain from illumos-nexenta	2019-02-12 10:41:15 -08:00
zfs_fm.c	Add zpool status -s (slow I/Os) and -p (parseable)	2018-11-08 16:47:24 -08:00
zfs_fuid.c	Update build system and packaging	2018-05-29 16:00:33 -07:00
zfs_ioctl.c	Don't directly cast unsigned long to void*	2019-07-25 11:59:20 -07:00
zfs_log.c	Improve write performance by using dmu_read_by_dnode()	2019-08-15 17:36:24 -06:00
zfs_onexit.c	Update build system and packaging	2018-05-29 16:00:33 -07:00
zfs_ratelimit.c	Change checksum & IO delay ratelimit values	2018-03-04 17:34:51 -08:00
zfs_replay.c	Use SEEK_{SET,CUR,END} for file seek "whence"	2019-04-25 10:17:27 -07:00
zfs_rlock.c	OpenZFS 9689 - zfs range lock code should not be zpl-specific	2018-10-11 10:19:33 -07:00
zfs_sa.c	Project Quota on ZFS	2018-02-13 14:54:54 -08:00
zfs_sysfs.c	Prevent pointer to an out-of-scope local variable	2019-06-20 18:31:52 -07:00
zfs_vfsops.c	Make txg_wait_synced conditional in zfsvfs_teardown	2019-08-15 08:27:13 -06:00
zfs_vnops.c	Fix out-of-order ZIL txtype lost on hardlinked files	2019-08-13 21:21:27 -06:00
zfs_znode.c	Change boolean-like uint8_t fields in znode_t to boolean_t	2019-08-13 07:58:02 -06:00
zil.c	Fix out-of-order ZIL txtype lost on hardlinked files	2019-08-13 21:21:27 -06:00
zio_checksum.c	Undo c89 workarounds to match with upstream	2017-11-04 13:25:13 -07:00
zio_compress.c	OpenZFS 9403 - assertion failed in arc_buf_destroy()	2018-08-29 11:33:33 -07:00
zio_crypt.c	Always call rw_init in zio_crypt_key_unwrap	2019-04-10 15:39:40 -07:00
zio_inject.c	Multiple DVA Scrubbing Fix	2019-03-15 14:14:31 -07:00
zio.c	Prevent race in blkptr_verify against device removal	2019-08-13 21:24:43 -06:00
zle.c	Fix zle_decompress out of bound access	2018-02-09 10:08:05 -08:00
zpl_ctldir.c	RHEL 7.5 compat: FMODE_KABI_ITERATE	2018-05-02 15:01:24 -07:00
zpl_export.c	Use cstyle -cpP in `make cstyle` check	2016-12-12 10:46:26 -08:00
zpl_file.c	Fix errant EFAULT during writes (#8719 )	2019-05-08 10:04:04 -07:00
zpl_inode.c	Fix errant EFAULT during writes (#8719 )	2019-05-08 10:04:04 -07:00
zpl_super.c	Fix statfs(2) for 32-bit user space	2018-09-24 17:11:25 -07:00
zpl_xattr.c	Drop redundant POSIX ACL check in zpl_init_acl()	2019-07-15 16:26:52 -07:00
zrlock.c	Update build system and packaging	2018-05-29 16:00:33 -07:00
zthr.c	Fast Clone Deletion	2019-07-26 10:54:14 -07:00
zvol.c	Add SCSI_PASSTHROUGH to zvols to enable UNMAP support	2019-06-21 09:40:56 -07:00