mirror of
https://git.proxmox.com/git/mirror_zfs.git
synced 2025-06-25 10:38:00 +03:00
ZIL: Relax parallel write ZIOs processing
ZIL introduced dependencies between its write ZIOs to permit flush defer, when we flush vdev caches only once all the write ZIOs has completed. But it was recently spotted that it serializes not only ZIO completions handling, but also their ready stage. It means ZIO pipeline can't calculate checksums for the following ZIOs until all the previous are checksumed, even though it is not required. On a systems where memory throughput of a single CPU core is limited, it creates single-core CPU bottleneck, which is difficult to see due to ZIO pipeline design with many taskqueue threads. While it would be great to bypass the ready stage waits, it would require changes to ZIO code, and I haven't found a clean way to do it. But I've noticed that we don't need any dependency between the write ZIOs if the previous one has some waiters, which means it won't defer any flushes and work as a barrier for the earlier ones. Bypassing it won't help large single-thread writes, since all the write ZIOs except the last in that case won't have waiters, and so will be dependent. But in that case the ZIO processing might not be a bottleneck, since there will be only one thread populating the write buffers, that will likely be the bottleneck. But bypassing the ZIO dependency on multi-threaded write workloads really allows them to scale beyond the checksuming throughput of one CPU core. My tests with writing 12 files on a same dataset on a pool with 4 striped NVMes as SLOGs from 12 threads with 1MB blocks on a system with Xeon Silver 4114 CPU show total throughput increase from 4.3GB/s to 8.5GB/s, increasing the SLOGs busy from ~30% to ~70%. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17458
This commit is contained in:
parent
b4ebba0e04
commit
bd27b75401
@ -1691,7 +1691,7 @@ zil_lwb_set_zio_dependency(zilog_t *zilog, lwb_t *lwb)
|
|||||||
* If the previous lwb's write hasn't already completed, we also want
|
* If the previous lwb's write hasn't already completed, we also want
|
||||||
* to order the completion of the lwb write zios (above, we only order
|
* to order the completion of the lwb write zios (above, we only order
|
||||||
* the completion of the lwb root zios). This is required because of
|
* the completion of the lwb root zios). This is required because of
|
||||||
* how we can defer the flush commands for each lwb.
|
* how we can defer the flush commands for any lwb without waiters.
|
||||||
*
|
*
|
||||||
* When the flush commands are deferred, the previous lwb will rely on
|
* When the flush commands are deferred, the previous lwb will rely on
|
||||||
* this lwb to flush the vdevs written to by that previous lwb. Thus,
|
* this lwb to flush the vdevs written to by that previous lwb. Thus,
|
||||||
@ -1708,7 +1708,10 @@ zil_lwb_set_zio_dependency(zilog_t *zilog, lwb_t *lwb)
|
|||||||
*/
|
*/
|
||||||
if (prev_lwb->lwb_state == LWB_STATE_ISSUED) {
|
if (prev_lwb->lwb_state == LWB_STATE_ISSUED) {
|
||||||
ASSERT3P(prev_lwb->lwb_write_zio, !=, NULL);
|
ASSERT3P(prev_lwb->lwb_write_zio, !=, NULL);
|
||||||
zio_add_child(lwb->lwb_write_zio, prev_lwb->lwb_write_zio);
|
if (list_is_empty(&prev_lwb->lwb_waiters)) {
|
||||||
|
zio_add_child(lwb->lwb_write_zio,
|
||||||
|
prev_lwb->lwb_write_zio);
|
||||||
|
}
|
||||||
} else {
|
} else {
|
||||||
ASSERT3S(prev_lwb->lwb_state, ==, LWB_STATE_WRITE_DONE);
|
ASSERT3S(prev_lwb->lwb_state, ==, LWB_STATE_WRITE_DONE);
|
||||||
}
|
}
|
||||||
|
Loading…
Reference in New Issue
Block a user