mirror_zfs/module
Alexander Motin bd27b75401
ZIL: Relax parallel write ZIOs processing
ZIL introduced dependencies between its write ZIOs to permit flush
defer, when we flush vdev caches only once all the write ZIOs has
completed.  But it was recently spotted that it serializes not only
ZIO completions handling, but also their ready stage.  It means ZIO
pipeline can't calculate checksums for the following ZIOs until all
the previous are checksumed, even though it is not required.  On a
systems where memory throughput of a single CPU core is limited,
it creates single-core CPU bottleneck, which is difficult to see
due to ZIO pipeline design with many taskqueue threads.

While it would be great to bypass the ready stage waits, it would
require changes to ZIO code, and I haven't found a clean way to do
it.  But I've noticed that we don't need any dependency between
the write ZIOs if the previous one has some waiters, which means
it won't defer any flushes and work as a barrier for the earlier
ones.

Bypassing it won't help large single-thread writes, since all the
write ZIOs except the last in that case won't have waiters, and
so will be dependent.  But in that case the ZIO processing might
not be a bottleneck, since there will be only one thread populating
the write buffers, that will likely be the bottleneck.

But bypassing the ZIO dependency on multi-threaded write workloads
really allows them to scale beyond the checksuming throughput of
one CPU core.

My tests with writing 12 files on a same dataset on a pool with
4 striped NVMes as SLOGs from 12 threads with 1MB blocks on a
system with Xeon Silver 4114 CPU show total throughput increase
from 4.3GB/s to 8.5GB/s, increasing the SLOGs busy from ~30% to
~70%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17458
2025-06-14 09:37:18 -04:00
..
avl SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
icp Linux build: silence objtool warnings 2025-06-04 17:40:09 -07:00
lua SPDX: license tags: MIT 2025-03-13 17:56:54 -07:00
nvpair nvlist: Add nvlist_snprintf() and zfs_dbgmsg_nvlist() 2025-04-18 09:22:16 -04:00
os FreeBSD: zfs_putpages: don't undirty pages until after write completes 2025-06-12 14:45:18 -07:00
unicode SPDX: license tags: CDDL-1.0 2025-03-13 17:56:27 -07:00
zcommon events: include zio type in IO error reports 2025-05-30 10:29:29 -04:00
zfs ZIL: Relax parallel write ZIOs processing 2025-06-14 09:37:18 -04:00
zstd SPDX: license tags: BSD-3-Clause OR GPL-2.0-only 2025-03-13 17:57:17 -07:00
.gitignore FreeBSD: Ignore symlink to i386 includes 2022-08-02 16:34:23 -07:00
Kbuild.in Linux build: always use objtool 2025-05-29 18:04:20 -07:00
Makefile.bsd freebsd: unbreak module/Makefile.bsd build on 15-CURRENT-arm64 2025-04-05 19:43:41 -04:00
Makefile.in Fix "make install" with DESTDIR set (#16995) 2025-02-07 16:38:58 -08:00