mirror of
https://git.proxmox.com/git/mirror_zfs.git
synced 2026-05-22 10:37:35 +03:00
Introduce auxiliary metaslab histograms
This patch introduces 3 new histograms per metaslab. These histograms track segments that have made it to the metaslab's space map histogram (and are part of the spacemap) but have not yet reached the ms_allocatable tree on loaded metaslab's because these metaslab's are currently syncing and haven't gone through metaslab_sync_done() yet. The histograms help when we decide whether to load an unloaded metaslab in-order to allocate from it. When calculating the weight of an unloaded metaslab traditionally, we look at the highest bucket of its spacemap's histogram. The problem is that we are not guaranteed to be able to allocated that segment when we load the metaslab because it may still be at the freeing, freed, or defer trees. The new histograms are used when we try to calculate an unloaded metaslab's weight to deal with this issue by removing segments that have would not be in the allocatable tree at runtime. Note, that this method of dealing with this is not completely accurate as adjacent segments are not always consolidated in the space map histogram of a metaslab. In addition and to make things deterministic, we always reset the weight of unloaded metaslabs based on their space map weight (instead of doing that on a need basis). Thus, every time a metaslab is loaded and its weight is reset again (from the weight based on its space map to the one based on its allocatable range tree) we expect (and assert) that this change in weight can only get better if it doesn't stay the same. Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matt Ahrens <mahrens@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #8358
This commit is contained in:
committed by
Brian Behlendorf
parent
bb1be77a35
commit
928e8ad47d
@@ -119,6 +119,7 @@ void metaslab_group_histogram_remove(metaslab_group_t *, metaslab_t *);
|
||||
void metaslab_group_alloc_decrement(spa_t *, uint64_t, void *, int, int,
|
||||
boolean_t);
|
||||
void metaslab_group_alloc_verify(spa_t *, const blkptr_t *, void *, int);
|
||||
void metaslab_recalculate_weight_and_sort(metaslab_t *);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
|
||||
@@ -402,6 +402,49 @@ struct metaslab {
|
||||
boolean_t ms_loaded;
|
||||
boolean_t ms_loading;
|
||||
|
||||
/*
|
||||
* The following histograms count entries that are in the
|
||||
* metaslab's space map (and its histogram) but are not in
|
||||
* ms_allocatable yet, because they are in ms_freed, ms_freeing,
|
||||
* or ms_defer[].
|
||||
*
|
||||
* When the metaslab is not loaded, its ms_weight needs to
|
||||
* reflect what is allocatable (i.e. what will be part of
|
||||
* ms_allocatable if it is loaded). The weight is computed from
|
||||
* the spacemap histogram, but that includes ranges that are
|
||||
* not yet allocatable (because they are in ms_freed,
|
||||
* ms_freeing, or ms_defer[]). Therefore, when calculating the
|
||||
* weight, we need to remove those ranges.
|
||||
*
|
||||
* The ranges in the ms_freed and ms_defer[] range trees are all
|
||||
* present in the spacemap. However, the spacemap may have
|
||||
* multiple entries to represent a contiguous range, because it
|
||||
* is written across multiple sync passes, but the changes of
|
||||
* all sync passes are consolidated into the range trees.
|
||||
* Adjacent ranges that are freed in different sync passes of
|
||||
* one txg will be represented separately (as 2 or more entries)
|
||||
* in the space map (and its histogram), but these adjacent
|
||||
* ranges will be consolidated (represented as one entry) in the
|
||||
* ms_freed/ms_defer[] range trees (and their histograms).
|
||||
*
|
||||
* When calculating the weight, we can not simply subtract the
|
||||
* range trees' histograms from the spacemap's histogram,
|
||||
* because the range trees' histograms may have entries in
|
||||
* higher buckets than the spacemap, due to consolidation.
|
||||
* Instead we must subtract the exact entries that were added to
|
||||
* the spacemap's histogram. ms_synchist and ms_deferhist[]
|
||||
* represent these exact entries, so we can subtract them from
|
||||
* the spacemap's histogram when calculating ms_weight.
|
||||
*
|
||||
* ms_synchist represents the same ranges as ms_freeing +
|
||||
* ms_freed, but without consolidation across sync passes.
|
||||
*
|
||||
* ms_deferhist[i] represents the same ranges as ms_defer[i],
|
||||
* but without consolidation across sync passes.
|
||||
*/
|
||||
uint64_t ms_synchist[SPACE_MAP_HISTOGRAM_SIZE];
|
||||
uint64_t ms_deferhist[TXG_DEFER_SIZE][SPACE_MAP_HISTOGRAM_SIZE];
|
||||
|
||||
/*
|
||||
* Tracks the exact amount of allocated space of this metaslab
|
||||
* (and specifically the metaslab's space map) up to the most
|
||||
|
||||
@@ -201,6 +201,7 @@ int space_map_iterate(space_map_t *sm, uint64_t length,
|
||||
int space_map_incremental_destroy(space_map_t *sm, sm_cb_t callback, void *arg,
|
||||
dmu_tx_t *tx);
|
||||
|
||||
boolean_t space_map_histogram_verify(space_map_t *sm, range_tree_t *rt);
|
||||
void space_map_histogram_clear(space_map_t *sm);
|
||||
void space_map_histogram_add(space_map_t *sm, range_tree_t *rt,
|
||||
dmu_tx_t *tx);
|
||||
|
||||
Reference in New Issue
Block a user