mirror of
https://git.proxmox.com/git/mirror_zfs.git
synced 2026-05-22 02:27:36 +03:00
Only examine best metaslabs on each vdev
On a system with very high fragmentation, we may need to do lots of gang allocations (e.g. most indirect block allocations (~50KB) may need to gang). Before failing a "normal" allocation and resorting to ganging, we try every metaslab. This has the impact of loading every metaslab (not a huge deal since we now typically keep all metaslabs loaded), and also iterating over every metaslab for every failing allocation. If there are many metaslabs (more than the typical ~200, e.g. due to vdev expansion or very large vdevs), the CPU cost of this iteration can be very impactful. This iteration is done with the mg_lock held, creating long hold times and high lock contention for concurrent allocations, ultimately causing long txg sync times and poor application performance. To address this, this commit changes the behavior of "normal" (not try_hard, not ZIL) allocations. These will now only examine the 100 best metaslabs (as determined by their ms_weight). If none of these have a large enough free segment, then the allocation will fail and we'll fall back on ganging. To accomplish this, we will now (normally) gang before doing a `try_hard` allocation. Non-try_hard allocations will only examine the 100 best metaslabs of each vdev. In summary, we will first try normal allocation. If that fails then we will do a gang allocation. If that fails then we will do a "try hard" gang allocation. If that fails then we will have a multi-layer gang block. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #11327
This commit is contained in:
@@ -526,6 +526,40 @@ memory that is the threshold.
|
||||
Default value: \fB25 percent\fR
|
||||
.RE
|
||||
|
||||
.sp
|
||||
.ne 2
|
||||
.na
|
||||
\fBzfs_metaslab_try_hard_before_gang\fR (int)
|
||||
.ad
|
||||
.RS 12n
|
||||
If not set (the default), we will first try normal allocation.
|
||||
If that fails then we will do a gang allocation.
|
||||
If that fails then we will do a "try hard" gang allocation.
|
||||
If that fails then we will have a multi-layer gang block.
|
||||
.sp
|
||||
If set, we will first try normal allocation.
|
||||
If that fails then we will do a "try hard" allocation.
|
||||
If that fails we will do a gang allocation.
|
||||
If that fails we will do a "try hard" gang allocation.
|
||||
If that fails then we will have a multi-layer gang block.
|
||||
.sp
|
||||
Default value: \fB0 (false)\fR
|
||||
.RE
|
||||
|
||||
.sp
|
||||
.ne 2
|
||||
.na
|
||||
\fBzfs_metaslab_find_max_tries\fR (int)
|
||||
.ad
|
||||
.RS 12n
|
||||
When not trying hard, we only consider this number of the best metaslabs.
|
||||
This improves performance, especially when there are many metaslabs per vdev
|
||||
and the allocation can't actually be satisfied (so we would otherwise iterate
|
||||
all the metaslabs).
|
||||
.sp
|
||||
Default value: \fB100\fR
|
||||
.RE
|
||||
|
||||
.sp
|
||||
.ne 2
|
||||
.na
|
||||
|
||||
Reference in New Issue
Block a user