mirror_zfs/module/zfs
Alexander Motin bd27b75401
ZIL: Relax parallel write ZIOs processing
ZIL introduced dependencies between its write ZIOs to permit flush
defer, when we flush vdev caches only once all the write ZIOs has
completed.  But it was recently spotted that it serializes not only
ZIO completions handling, but also their ready stage.  It means ZIO
pipeline can't calculate checksums for the following ZIOs until all
the previous are checksumed, even though it is not required.  On a
systems where memory throughput of a single CPU core is limited,
it creates single-core CPU bottleneck, which is difficult to see
due to ZIO pipeline design with many taskqueue threads.

While it would be great to bypass the ready stage waits, it would
require changes to ZIO code, and I haven't found a clean way to do
it.  But I've noticed that we don't need any dependency between
the write ZIOs if the previous one has some waiters, which means
it won't defer any flushes and work as a barrier for the earlier
ones.

Bypassing it won't help large single-thread writes, since all the
write ZIOs except the last in that case won't have waiters, and
so will be dependent.  But in that case the ZIO processing might
not be a bottleneck, since there will be only one thread populating
the write buffers, that will likely be the bottleneck.

But bypassing the ZIO dependency on multi-threaded write workloads
really allows them to scale beyond the checksuming throughput of
one CPU core.

My tests with writing 12 files on a same dataset on a pool with
4 striped NVMes as SLOGs from 12 threads with 1MB blocks on a
system with Xeon Silver 4114 CPU show total throughput increase
from 4.3GB/s to 8.5GB/s, increasing the SLOGs busy from ~30% to
~70%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17458
2025-06-14 09:37:18 -04:00
..
abd.c Export correct symbols for Lustre Direct I/O 2025-04-24 13:55:21 -04:00
aggsum.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
arc.c tunables: ensure tunable and variable have same define gate 2025-05-28 16:50:22 -07:00
blake3_zfs.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
blkptr.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
bplist.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
bpobj.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
bptree.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
bqueue.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
brt.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
btree.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
dataset_kstats.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
dbuf_stats.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
dbuf.c Improve block cloning transactions accounting 2025-06-11 11:59:16 -07:00
ddt_log.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
ddt_stats.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
ddt_zap.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
ddt.c Improve allocation fallback handling 2025-05-31 19:12:16 -04:00
dmu_diff.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
dmu_direct.c Wire O_DIRECT also to Uncached I/O (#17218) 2025-05-13 14:26:55 -07:00
dmu_object.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
dmu_objset.c dsl_dataset: rename dmu_objset_clone* to dsl_dataset_clone* 2025-06-10 14:52:43 -07:00
dmu_recv.c Wire O_DIRECT also to Uncached I/O (#17218) 2025-05-13 14:26:55 -07:00
dmu_redact.c dmu_tx_assign: make all VERIFY0 calls use DMU_TX_SUSPEND 2025-05-28 10:28:59 -07:00
dmu_send.c Fix 2 bugs in non-raw send with encryption 2025-05-19 09:55:00 -07:00
dmu_traverse.c dmu_traverse: remove 'ignore_hole_birth' tunable alias 2025-05-27 15:05:09 -07:00
dmu_tx.c Improve block cloning transactions accounting 2025-06-11 11:59:16 -07:00
dmu_zfetch.c Wire O_DIRECT also to Uncached I/O (#17218) 2025-05-13 14:26:55 -07:00
dmu.c Reduce zfs_dmu_offset_next_sync penalty 2025-06-11 11:50:49 -07:00
dnode_sync.c Wire O_DIRECT also to Uncached I/O (#17218) 2025-05-13 14:26:55 -07:00
dnode.c Polish db_rwlock scope 2025-06-11 11:13:48 -07:00
dsl_bookmark.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
dsl_crypt.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
dsl_dataset.c dsl_dataset: rename dmu_objset_clone* to dsl_dataset_clone* 2025-06-10 14:52:43 -07:00
dsl_deadlist.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
dsl_deleg.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
dsl_destroy.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
dsl_dir.c cred: properly pass and test creds on other threads (#17273) 2025-04-29 16:27:48 -07:00
dsl_pool.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
dsl_prop.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
dsl_scan.c scrub: generate scrub_finish event 2025-06-06 22:43:10 -04:00
dsl_synctask.c dmu_tx_assign: make all VERIFY0 calls use DMU_TX_SUSPEND 2025-05-28 10:28:59 -07:00
dsl_userhold.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
edonr_zfs.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
fm.c tunables: ensure tunable and variable have same define gate 2025-05-28 16:50:22 -07:00
gzip.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
hkdf.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
lz4_zfs.c SPDX: license tags: BSD-2-Clause 2025-03-13 17:56:46 -07:00
lz4.c SPDX: license tags: BSD-2-Clause 2025-03-13 17:56:46 -07:00
lzjb.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
metaslab.c Include class name into struct metaslab_class 2025-06-03 11:12:59 -04:00
mmp.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
multilist.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
objlist.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
pathname.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
range_tree.c Fix off-by-one bug in range tree code 2025-05-23 10:33:33 -04:00
refcount.c Implement allocation size ranges and use for gang leaves (#17111) 2025-05-02 15:32:18 -07:00
rrwlock.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
sa.c Wire O_DIRECT also to Uncached I/O (#17218) 2025-05-13 14:26:55 -07:00
sha2_zfs.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
skein_zfs.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
spa_checkpoint.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
spa_config.c tunables: ensure tunable and variable have same define gate 2025-05-28 16:50:22 -07:00
spa_errlog.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
spa_history.c dmu_tx: rename dmu_tx_assign() flags from TXG_* to DMU_TX_* (#17143) 2025-03-18 16:04:22 -07:00
spa_log_spacemap.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
spa_misc.c Improve allocation fallback handling 2025-05-31 19:12:16 -04:00
spa_stats.c Wire O_DIRECT also to Uncached I/O (#17218) 2025-05-13 14:26:55 -07:00
spa.c Include class name into struct metaslab_class 2025-06-03 11:12:59 -04:00
space_map.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
space_reftree.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
THIRDPARTYLICENSE.cityhash OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
THIRDPARTYLICENSE.cityhash.descrip OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
txg.c txg_wait_synced_flags: add TXG_WAIT_SUSPEND flag to not wait if pool suspended 2025-05-28 10:27:46 -07:00
uberblock.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
unique.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_draid_rand.c SPDX: license tags: LicenseRef-OpenZFS-ThirdParty-PublicDomain 2025-03-13 17:57:31 -07:00
vdev_draid.c Implement allocation size ranges and use for gang leaves (#17111) 2025-05-02 15:32:18 -07:00
vdev_file.c Implement allocation size ranges and use for gang leaves (#17111) 2025-05-02 15:32:18 -07:00
vdev_indirect_births.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_indirect_mapping.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_indirect.c dmu_tx_assign: make all VERIFY0 calls use DMU_TX_SUSPEND 2025-05-28 10:28:59 -07:00
vdev_initialize.c dmu_tx_assign: make all VERIFY0 calls use DMU_TX_SUSPEND 2025-05-28 10:28:59 -07:00
vdev_label.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_mirror.c Implement allocation size ranges and use for gang leaves (#17111) 2025-05-02 15:32:18 -07:00
vdev_missing.c Implement allocation size ranges and use for gang leaves (#17111) 2025-05-02 15:32:18 -07:00
vdev_queue.c Unified allocation throttling (#17020) 2025-03-24 09:25:01 -07:00
vdev_raidz_math_aarch64_neon_common.h SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_aarch64_neon.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_aarch64_neonx2.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_avx2.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_avx512bw.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_avx512f.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_impl.h SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_powerpc_altivec_common.h SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_powerpc_altivec.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_scalar.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_sse2.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math_ssse3.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
vdev_raidz_math.c tunables: don't assert initialisation in impl getters 2025-05-28 16:50:22 -07:00
vdev_raidz.c dmu_tx_assign: make all VERIFY0 calls use DMU_TX_SUSPEND 2025-05-28 10:28:59 -07:00
vdev_rebuild.c dmu_tx_assign: make all VERIFY0 calls use DMU_TX_SUSPEND 2025-05-28 10:28:59 -07:00
vdev_removal.c dmu_tx_assign: make all VERIFY0 calls use DMU_TX_SUSPEND 2025-05-28 10:28:59 -07:00
vdev_root.c Implement allocation size ranges and use for gang leaves (#17111) 2025-05-02 15:32:18 -07:00
vdev_trim.c dmu_tx_assign: make all VERIFY0 calls use DMU_TX_SUSPEND 2025-05-28 10:28:59 -07:00
vdev.c vdev: skip faulting disks pending removal 2025-05-30 09:14:37 -07:00
zap_leaf.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zap_micro.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zap.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zcp_get.c zcp: get_prop: fix encryptionroot and encryption 2025-05-27 20:04:37 -04:00
zcp_global.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zcp_iter.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zcp_set.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zcp_synctask.c zcp_synctask: add zfs.sync.clone() 2025-06-10 14:53:10 -07:00
zcp.c txg: generalise txg_wait_synced_sig() to txg_wait_synced_flags() (#17284) 2025-05-02 15:29:50 -07:00
zfeature.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zfs_byteswap.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zfs_chksum.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zfs_debug_common.c nvlist: Add nvlist_snprintf() and zfs_dbgmsg_nvlist() 2025-04-18 09:22:16 -04:00
zfs_fm.c events: include zio type in IO error reports 2025-05-30 10:29:29 -04:00
zfs_fuid.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zfs_impl.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zfs_ioctl.c dsl_dataset: rename dmu_objset_clone* to dsl_dataset_clone* 2025-06-10 14:52:43 -07:00
zfs_log.c zfs_log_write: only put the callback on the last itx 2025-06-12 14:44:33 -07:00
zfs_onexit.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zfs_quota.c Show default quotas in zfs userspace tools 2025-04-03 10:36:45 -07:00
zfs_ratelimit.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zfs_replay.c dmu_tx: rename dmu_tx_assign() flags from TXG_* to DMU_TX_* (#17143) 2025-03-18 16:04:22 -07:00
zfs_rlock.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zfs_sa.c dmu_tx: rename dmu_tx_assign() flags from TXG_* to DMU_TX_* (#17143) 2025-03-18 16:04:22 -07:00
zfs_vnops.c Improve block cloning transactions accounting 2025-06-11 11:59:16 -07:00
zfs_znode.c Add default user/group/project quota properties 2025-04-03 10:35:22 -07:00
zil.c ZIL: Relax parallel write ZIOs processing 2025-06-14 09:37:18 -04:00
zio_checksum.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zio_compress.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zio_inject.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zio.c Include class name into struct metaslab_class 2025-06-03 11:12:59 -04:00
zle.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zrlock.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zthr.c SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zvol.c Improve block cloning transactions accounting 2025-06-11 11:59:16 -07:00