Improve write issue taskqs utilization

- Reduce number of allocators on small system down to one per 4
CPU cores, keeping maximum at 4 on 16+ core systems. Small systems
should not have the lock contention multiple allocators supposed
to solve, while having several metaslabs open and modified each
TXG is not free.
 - Reduce number of write issue taskqs down to one per 16 CPU
cores and an integer fraction of number of allocators.  On mid-
sized systems, where multiple allocators already make sense, too
many write issue taskqs may reduce write speed on single-file
workloads, since single file is handled by only one taskq to
reduce fragmentation. On large systems, that can actually benefit
from many taskq's better IOPS, the bottleneck is less important,
since in worst case there will be at least 16 cores to handle it.
 - Distribute dnodes between allocators (and taskqs) in a round-
robin fashion instead of relying on sync taskqs to be balanced.
The last is not guarantied and may depend on scheduling.
 - Remove io_wr_iss_tq from struct zio.  io_allocator is enough.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16130
This commit is contained in:
Alexander Motin
2024-05-01 14:07:20 -04:00
committed by GitHub
parent 8fd3a5d02f
commit 645b833079
8 changed files with 98 additions and 47 deletions
+15 -10
View File
@@ -525,10 +525,17 @@ most ZPL operations (e.g. write, create) will return
.
.It Sy spa_num_allocators Ns = Ns Sy 4 Pq int
Determines the number of block alloctators to use per spa instance.
Capped by the number of actual CPUs in the system.
Capped by the number of actual CPUs in the system via
.Sy spa_cpus_per_allocator .
.Pp
Note that setting this value too high could result in performance
degredation and/or excess fragmentation.
Set value only applies to pools imported/created after that.
.
.It Sy spa_cpus_per_allocator Ns = Ns Sy 4 Pq int
Determines the minimum number of CPUs in a system for block alloctator
per spa instance.
Set value only applies to pools imported/created after that.
.
.It Sy spa_upgrade_errlog_limit Ns = Ns Sy 0 Pq uint
Limits the number of on-disk error log entries that will be converted to the
@@ -2339,21 +2346,19 @@ Set value only applies to pools imported/created after that.
.
.It Sy zio_taskq_batch_tpq Ns = Ns Sy 0 Pq uint
Number of worker threads per taskq.
Lower values improve I/O ordering and CPU utilization,
while higher reduces lock contention.
Higher values improve I/O ordering and CPU utilization,
while lower reduce lock contention.
Set value only applies to pools imported/created after that.
.Pp
If
.Sy 0 ,
generate a system-dependent value close to 6 threads per taskq.
Set value only applies to pools imported/created after that.
.
.It Sy zio_taskq_wr_iss_ncpus Ns = Ns Sy 0 Pq uint
Determines the number of CPUs to run write issue taskqs.
.Pp
When 0 (the default), the value to use is computed internally
as the number of actual CPUs in the system divided by the
.Sy spa_num_allocators
value.
.It Sy zio_taskq_write_tpq Ns = Ns Sy 16 Pq uint
Determines the minumum number of threads per write issue taskq.
Higher values improve CPU utilization on high throughput,
while lower reduce taskq locks contention on high IOPS.
Set value only applies to pools imported/created after that.
.
.It Sy zio_taskq_read Ns = Ns Sy fixed,1,8 null scale null Pq charp