mirror of
https://git.proxmox.com/git/mirror_zfs.git
synced 2026-05-22 18:40:43 +03:00
Improve ZFS objset sync parallelism
As part of transaction group commit, dsl_pool_sync() sequentially calls dsl_dataset_sync() for each dirty dataset, which subsequently calls dmu_objset_sync(). dmu_objset_sync() in turn uses up to 75% of CPU cores to run sync_dnodes_task() in taskq threads to sync the dirty dnodes (files). There are two problems: 1. Each ZVOL in a pool is a separate dataset/objset having a single dnode. This means the objsets are synchronized serially, which leads to a bottleneck of ~330K blocks written per second per pool. 2. In the case of multiple dirty dnodes/files on a dataset/objset on a big system they will be sync'd in parallel taskq threads. However, it is inefficient to to use 75% of CPU cores of a big system to do that, because of (a) bottlenecks on a single write issue taskq, and (b) allocation throttling. In addition, if not for the allocation throttling sorting write requests by bookmarks (logical address), writes for different files may reach space allocators interleaved, leading to unwanted fragmentation. The solution to both problems is to always sync no more and (if possible) no fewer dnodes at the same time than there are allocators the pool. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Edmund Nadolski <edmund.nadolski@ixsystems.com> Closes #15197
This commit is contained in:
+15
-7
@@ -496,6 +496,13 @@ If we have less than this amount of free space,
|
||||
most ZPL operations (e.g. write, create) will return
|
||||
.Sy ENOSPC .
|
||||
.
|
||||
.It Sy spa_num_allocators Ns = Ns Sy 4 Pq int
|
||||
Determines the number of block alloctators to use per spa instance.
|
||||
Capped by the number of actual CPUs in the system.
|
||||
.Pp
|
||||
Note that setting this value too high could result in performance
|
||||
degredation and/or excess fragmentation.
|
||||
.
|
||||
.It Sy spa_upgrade_errlog_limit Ns = Ns Sy 0 Pq uint
|
||||
Limits the number of on-disk error log entries that will be converted to the
|
||||
new format when enabling the
|
||||
@@ -1974,13 +1981,6 @@ and may need to load new metaslabs to satisfy these allocations.
|
||||
.It Sy zfs_sync_pass_rewrite Ns = Ns Sy 2 Pq uint
|
||||
Rewrite new block pointers starting in this pass.
|
||||
.
|
||||
.It Sy zfs_sync_taskq_batch_pct Ns = Ns Sy 75 Ns % Pq int
|
||||
This controls the number of threads used by
|
||||
.Sy dp_sync_taskq .
|
||||
The default value of
|
||||
.Sy 75%
|
||||
will create a maximum of one thread per CPU.
|
||||
.
|
||||
.It Sy zfs_trim_extent_bytes_max Ns = Ns Sy 134217728 Ns B Po 128 MiB Pc Pq uint
|
||||
Maximum size of TRIM command.
|
||||
Larger ranges will be split into chunks no larger than this value before
|
||||
@@ -2265,6 +2265,14 @@ If
|
||||
.Sy 0 ,
|
||||
generate a system-dependent value close to 6 threads per taskq.
|
||||
.
|
||||
.It Sy zio_taskq_wr_iss_ncpus Ns = Ns Sy 0 Pq uint
|
||||
Determines the number of CPUs to run write issue taskqs.
|
||||
.Pp
|
||||
When 0 (the default), the value to use is computed internally
|
||||
as the number of actual CPUs in the system divided by the
|
||||
.Sy spa_num_allocators
|
||||
value.
|
||||
.
|
||||
.It Sy zvol_inhibit_dev Ns = Ns Sy 0 Ns | Ns 1 Pq uint
|
||||
Do not create zvol device nodes.
|
||||
This may slightly improve startup time on
|
||||
|
||||
Reference in New Issue
Block a user