mirror_zfs/module/zfs
Serapheim Dimitropoulos e48afbc4eb OpenZFS 9464 - txg_kick() fails to see that we are quiescing
txg_kick() fails to see that we are quiescing, forcing transactions to
their next stages without leaving them accumulate changes

Creating a fragmented pool in a DCenter VM and continuously writing to it with
multiple instances of randwritecomp, we get the following output from txg.d:

    0ms   311MB in  4114ms (95% p1)  75MB/s  544MB (76%)  336us   153ms     0ms
    0ms     8MB in    51ms ( 0% p1) 163MB/s  474MB (66%)  129us    34ms     0ms
    0ms   366MB in  4454ms (93% p1)  82MB/s  572MB (79%)  498us    20ms     0ms
    0ms   406MB in  5212ms (95% p1)  77MB/s  591MB (82%)  661us    37ms     0ms
    0ms   340MB in  5110ms (94% p1)  66MB/s  622MB (86%) 1048us    41ms     1ms
    0ms     3MB in    61ms ( 0% p1)  51MB/s  419MB (58%)   33us     0ms     0ms
    0ms   361MB in  3555ms (88% p1) 101MB/s  542MB (75%)  335us    40ms     0ms
    0ms   356MB in  4592ms (92% p1)  77MB/s  561MB (78%)  430us    89ms     1ms
    0ms    11MB in   129ms (13% p1)  90MB/s  507MB (70%)  222us    15ms     0ms
    0ms   281MB in  2520ms (89% p1) 111MB/s  542MB (75%)  334us    42ms     0ms
    0ms   383MB in  3666ms (91% p1) 104MB/s  557MB (77%)  411us   133ms     0ms
    0ms   404MB in  5757ms (94% p1)  70MB/s  635MB (88%) 1274us   123ms     2ms
    4ms   367MB in  4172ms (89% p1)  88MB/s  556MB (77%)  401us    51ms     0ms
    0ms    42MB in   470ms (44% p1)  90MB/s  557MB (77%)  412us    43ms     0ms
    0ms   261MB in  2273ms (88% p1) 114MB/s  556MB (77%)  407us    27ms     0ms
    0ms   394MB in  3646ms (85% p1) 108MB/s  552MB (77%)  393us   304ms     0ms
    0ms   275MB in  2416ms (89% p1) 113MB/s  510MB (71%)  200us    53ms     0ms
    0ms     9MB in    53ms ( 0% p1) 169MB/s  483MB (67%)  140us   100ms     1ms

The TXGs that are getting synced and don't have lots of changes are pushed by
txg_kick() which basically forces the current open txg to get to the quiesced
state:

        if (tx->tx_syncing_txg == 0 &&
        tx->tx_quiesce_txg_waiting <= tx->tx_open_txg &&
        tx->tx_sync_txg_waiting <= tx->tx_synced_txg &&
        tx->tx_quiesced_txg <= tx->tx_synced_txg) {
        tx->tx_quiesce_txg_waiting = tx->tx_open_txg + 1;
        cv_broadcast(&tx->tx_quiesce_more_cv);
    }

The problem is that the above code doesn't check if we are currently quiescing
anything (only if a quiesce or a sync has been requested, ..etc) so the
following scenario can happen:

1] We have an open txg A that had enough dirty data (more than
   zfs_dirty_data_sync) and it was pushed to the quiesced state, and opened
   a new txg B. No txg is currently being synced.
2] Immediately after the opening of B, txg_kick() was run by some other write
   (and because of A's dirty data) and saw that we are not currently syncing
   any txg and no one has requested quiescing so it requests one by bumping
   tx_quiesce_txg_waiting and broadcasts the quiesce thread.
3] The quiesce thread just passed txg A to be synced and sees that a quiescing
   request has been sent to it so it immediately grabs B without letting it
   gather enough data, putting it in a quiesced state and opening a new txg C.

In this scenario txg B, is an example of how the entries of interest show up in
the txg.d output.

Ideally we would like txg_kick() to get triggered only when we are sure that
we are not syncing AND not quiescing any txg. This way we can kick an open TXG
to the quiescing state when we are sure that there is nothing going on and we
would benefit from the different states running concurrently.

Authored by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://illumos.org/issues/9464
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1cd7635b
Closes #7587
2018-06-04 14:56:06 -07:00
..
abd.c Update build system and packaging 2018-05-29 16:00:33 -07:00
arc.c Update build system and packaging 2018-05-29 16:00:33 -07:00
blkptr.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
bplist.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
bpobj.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
bptree.c Native Encryption for ZFS on Linux 2017-08-14 10:36:48 -07:00
bqueue.c Call cv_signal() with mutex held 2017-06-26 14:36:49 -07:00
dbuf_stats.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dbuf.c Update build system and packaging 2018-05-29 16:00:33 -07:00
ddt_zap.c Update build system and packaging 2018-05-29 16:00:33 -07:00
ddt.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dmu_diff.c Fix issues found with zfs diff 2018-05-01 11:24:20 -07:00
dmu_object.c OpenZFS 9329 - panic in zap_leaf_lookup() due to concurrent zapification 2018-05-31 10:53:49 -07:00
dmu_objset.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dmu_send.c OpenZFS 9256 - zfs send space estimation off by > 10% on some datasets 2018-05-08 08:59:24 -07:00
dmu_traverse.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dmu_tx.c OpenZFS 9464 - txg_kick() fails to see that we are quiescing 2018-06-04 14:56:06 -07:00
dmu_zfetch.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dmu.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dnode_sync.c Raw sends must be able to decrease nlevels 2018-02-02 11:43:11 -08:00
dnode.c Fix object reclaim when using large dnodes 2018-04-17 11:13:57 -07:00
dsl_bookmark.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
dsl_crypt.c Add support for decryption faults in zinject 2018-05-02 15:36:20 -07:00
dsl_dataset.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dsl_deadlist.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_deleg.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dsl_destroy.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dsl_dir.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dsl_pool.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dsl_prop.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dsl_scan.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dsl_synctask.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dsl_userhold.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
edonr_zfs.c DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
fm.c Update build system and packaging 2018-05-29 16:00:33 -07:00
gzip.c Update build system and packaging 2018-05-29 16:00:33 -07:00
hkdf.c Encryption patch follow-up 2017-10-11 16:54:48 -04:00
lz4.c Fix LZ4_uncompress_unknownOutputSize caused panic 2017-05-19 13:45:46 -07:00
lzjb.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
Makefile.in OpenZFS 9079 - race condition in starting and ending condensing thread for indirect vdevs 2018-04-14 12:23:53 -07:00
metaslab.c Update build system and packaging 2018-05-29 16:00:33 -07:00
mmp.c Update build system and packaging 2018-05-29 16:00:33 -07:00
multilist.c Update build system and packaging 2018-05-29 16:00:33 -07:00
pathname.c Update build system and packaging 2018-05-29 16:00:33 -07:00
policy.c Take user namespaces into account in policy checks 2018-03-07 15:40:42 -08:00
qat_compress.c Fix inst_num overflow in qat_crypt.c 2018-05-01 20:44:24 -07:00
qat_crypt.c Fix inst_num overflow in qat_crypt.c 2018-05-01 20:44:24 -07:00
qat.c SHA256 QAT acceleration 2018-03-15 10:53:58 -07:00
qat.h Resolve QAT issues with incompressible data 2018-03-29 17:40:34 -07:00
range_tree.c OpenZFS 9486 - reduce memory used by device removal on fragmented pools 2018-05-24 10:18:07 -07:00
refcount.c Linux 4.11 compat: avoid refcount_t name conflict 2017-02-28 16:10:18 -08:00
rrwlock.c Fix spelling 2017-01-03 11:31:18 -06:00
sa.c Update build system and packaging 2018-05-29 16:00:33 -07:00
sha256.c SHA256 QAT acceleration 2018-03-15 10:53:58 -07:00
skein_zfs.c DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
spa_boot.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
spa_config.c Update build system and packaging 2018-05-29 16:00:33 -07:00
spa_errlog.c Update build system and packaging 2018-05-29 16:00:33 -07:00
spa_history.c Update build system and packaging 2018-05-29 16:00:33 -07:00
spa_misc.c Update build system and packaging 2018-05-29 16:00:33 -07:00
spa_stats.c Update build system and packaging 2018-05-29 16:00:33 -07:00
spa.c OpenZFS 9235 - rename zpool_rewind_policy_t to zpool_load_policy_t 2018-06-04 14:54:20 -07:00
space_map.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
space_reftree.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
trace.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
txg.c OpenZFS 9464 - txg_kick() fails to see that we are quiescing 2018-06-04 14:56:06 -07:00
uberblock.c Multi-modifier protection (MMP) 2017-07-13 13:54:00 -04:00
unique.c Performance optimization of AVL tree comparator functions 2016-08-31 14:35:34 -07:00
vdev_cache.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_disk.c zpool reopen should detect expanded devices 2018-05-31 10:36:37 -07:00
vdev_file.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_indirect_births.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_indirect_mapping.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_indirect.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_label.c OpenZFS 9486 - reduce memory used by device removal on fragmented pools 2018-05-24 10:18:07 -07:00
vdev_mirror.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_missing.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_queue.c Fix zio->io_priority failed (7 < 6) assert 2018-05-29 18:13:48 -07:00
vdev_raidz_math_aarch64_neon_common.h ABD raidz NEON support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_aarch64_neon.c codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz_math_aarch64_neonx2.c ABD raidz NEON support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_avx2.c ABD raidz avx512f support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_avx512bw.c ABD: Adapt avx512bw raidz assembly 2016-12-15 17:31:33 -08:00
vdev_raidz_math_avx512f.c Use cstyle -cpP in make cstyle check 2016-12-12 10:46:26 -08:00
vdev_raidz_math_impl.h codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz_math_scalar.c ABD Vectorized raidz 2016-11-29 14:34:33 -08:00
vdev_raidz_math_sse2.c ABD raidz avx512f support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_ssse3.c codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz_math.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_raidz.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_removal.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_root.c OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery 2018-05-08 21:35:27 -07:00
vdev.c zpool reopen should detect expanded devices 2018-05-31 10:36:37 -07:00
zap_leaf.c OpenZFS 9328 - zap code can take advantage of c99 2018-05-31 10:53:11 -07:00
zap_micro.c OpenZFS 9329 - panic in zap_leaf_lookup() due to concurrent zapification 2018-05-31 10:53:49 -07:00
zap.c OpenZFS 9328 - zap code can take advantage of c99 2018-05-31 10:53:11 -07:00
zcp_get.c Fix coverity defects: zfs channel programs 2018-02-20 11:19:42 -08:00
zcp_global.c OpenZFS 8600 - ZFS channel programs - snapshot 2018-02-08 15:29:24 -08:00
zcp_iter.c OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp_synctask.c Fix coverity defects: zfs channel programs 2018-02-20 11:19:42 -08:00
zcp.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfeature.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
zfs_acl.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_byteswap.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_ctldir.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_debug.c enable zfs_dbgmsg() by default, without dprintf() 2018-03-21 15:37:32 -07:00
zfs_dir.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_fm.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_fuid.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_ioctl.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_log.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_onexit.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_ratelimit.c Change checksum & IO delay ratelimit values 2018-03-04 17:34:51 -08:00
zfs_replay.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_rlock.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_sa.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_vfsops.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_vnops.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_znode.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zil.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zio_checksum.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
zio_compress.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zio_crypt.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zio_inject.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zio.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zle.c Fix zle_decompress out of bound access 2018-02-09 10:08:05 -08:00
zpl_ctldir.c RHEL 7.5 compat: FMODE_KABI_ITERATE 2018-05-02 15:01:24 -07:00
zpl_export.c Use cstyle -cpP in make cstyle check 2016-12-12 10:46:26 -08:00
zpl_file.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zpl_inode.c Linux 4.12 compat: CURRENT_TIME removed 2017-05-10 09:30:48 -07:00
zpl_super.c Allow mounting datasets more than once 2018-04-13 10:44:05 -07:00
zpl_xattr.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zrlock.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zthr.c Avoid Linux hung task message in ZTHR 2018-04-15 15:12:28 -07:00
zvol.c Linux compat 4.16: blk_queue_flag_{set,clear} 2018-04-10 10:32:14 -07:00