mirror_zfs/module/zfs
Etienne Dechamps bb3250d07e Allocate disk space fairly in the presence of vdevs of unequal size.
The metaslab allocator device selection algorithm contains a bias
mechanism whose goal is to achieve roughly equal disk space usage across
all top-level vdevs.

It seems that the initial rationale for this code was to allow newly
added (empty) vdevs to "come up to speed" faster in an attempt to make
the pool quickly converge to a steady state where all vdevs are equally
utilized.

While the code seems to work reasonably well for this use case, there
is another scenario in which this algorithm fails miserably: the case
where top-level vdevs don't have the same sizes (capacities). ZFS
allows this, and it is a good feature to have, so that users who simply
want to build a pool with the disks they happen to have lying around can
do so even if the disks have heteregenous sizes.

Here's a script that simulates a pool with two vdevs, with one 4X larger
than the other:

    dd if=/dev/zero of=/tmp/d1 bs=1 count=1 seek=134217728
    dd if=/dev/zero of=/tmp/d2 bs=1 count=1 seek=536870912
    zpool create testspace /tmp/d1 /tmp/d2
    dd if=/dev/zero of=/testspace/foobar bs=1M count=256
    zpool iostat -v testspace

Before this commit, the script would output the following:

                   capacity
    pool        alloc   free
    ----------  -----  -----
    testspace    252M   375M
      /tmp/d1    104M  18.5M
      /tmp/d2    148M   356M
    ----------  -----  -----

This demonstrates that the current code handles this situation very
poorly: d1 shows 85% usage despite the pool itself being only 40% full.
d1 is quite saturated at this point, and is slowing down the entire pool
due to saturation, fragmentation and the like.

In contrast, here's the result with the code in this commit:

                   capacity
    pool        alloc   free
    ----------  -----  -----
    testspace    252M   375M
      /tmp/d1   56.7M  66.3M
      /tmp/d2    195M   309M
   ----------  -----  ------

This looks much better. d1 is 46% used, which is close to the overall
pool utilization (40%). The code still doesn't result in perfectly
balanced allocation, probably because of the way mg_bias is applied
which does not guarantee perfect accuracy, but this is still much better
than before.

Signed-off-by: Etienne Dechamps <etienne@edechamps.fr>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3389
2015-06-22 14:18:29 -07:00
..
arc.c Rename cv_wait_interruptible() to cv_wait_sig() 2015-06-11 10:50:47 -07:00
blkptr.c Illumos 4757, 4913 2014-08-01 14:28:05 -07:00
bplist.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
bpobj.c Illumos 5810 - zdb should print details of bpobj 2015-05-11 15:10:24 -07:00
bptree.c Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
dbuf_stats.c Illumos 5497 - lock contention on arcs_mtx 2015-06-11 10:27:25 -07:00
dbuf.c Illumos 5369 - arc flags should be an enum 2015-06-11 10:27:25 -07:00
ddt_zap.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
ddt.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
dmu_diff.c Illumos 5369 - arc flags should be an enum 2015-06-11 10:27:25 -07:00
dmu_object.c Illumos 3693 - restore_object uses at least two transactions to restore an object 2014-10-21 15:26:50 -07:00
dmu_objset.c Illumos 5369 - arc flags should be an enum 2015-06-11 10:27:25 -07:00
dmu_send.c Illumos 5369 - arc flags should be an enum 2015-06-11 10:27:25 -07:00
dmu_traverse.c Wait interruptibly in prefetch thread 2015-06-16 16:18:11 -07:00
dmu_tx.c Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
dmu_zfetch.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
dmu.c Illumos 5820 - verify failed in zio_done(): BP_EQUAL(bp, io_bp_orig) 2015-05-04 10:49:49 -07:00
dnode_sync.c Illumos 5350 - clean up code in dnode_sync() 2015-05-11 15:09:51 -07:00
dnode.c Illumos 5422 - preserve AVL invariants in dn_dbufs 2015-05-11 15:09:29 -07:00
dsl_bookmark.c Illumos 4951 - ZFS administrative commands should use reserved space 2015-05-04 09:41:10 -07:00
dsl_dataset.c Illumos 5393 - spurious failures from dsl_dataset_hold_obj() 2015-05-13 08:56:04 -07:00
dsl_deadlist.c Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
dsl_deleg.c Illumos 4951 - ZFS administrative commands should use reserved space 2015-05-04 09:41:10 -07:00
dsl_destroy.c Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
dsl_dir.c Illumos 4951 - ZFS administrative commands should use reserved space 2015-05-04 09:41:10 -07:00
dsl_pool.c Increase the number of iput taskq threads 2015-06-22 10:22:10 -07:00
dsl_prop.c Illumos 4951 - ZFS administrative commands should use reserved space 2015-05-04 09:41:10 -07:00
dsl_scan.c Illumos 5369 - arc flags should be an enum 2015-06-11 10:27:25 -07:00
dsl_synctask.c Illumos 4951 - ZFS administrative commands should use reserved space 2015-05-04 09:41:10 -07:00
dsl_userhold.c Illumos 4951 - ZFS administrative commands should use reserved space 2015-05-04 09:41:10 -07:00
fm.c Rename cv_wait_interruptible() to cv_wait_sig() 2015-06-11 10:50:47 -07:00
gzip.c cstyle: Resolve C style issues 2013-12-18 16:46:35 -08:00
lz4.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
lzjb.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
Makefile.in Illumos 5497 - lock contention on arcs_mtx 2015-06-11 10:27:25 -07:00
metaslab.c Allocate disk space fairly in the presence of vdevs of unequal size. 2015-06-22 14:18:29 -07:00
multilist.c Illumos 5497 - lock contention on arcs_mtx 2015-06-11 10:27:25 -07:00
range_tree.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
refcount.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
rrwlock.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
sa.c Illumos 5562 - ZFS sa_handle's violate kmem invariants, debug kernels panic on boot 2015-05-11 15:10:57 -07:00
sha256.c Add linux sha2 support 2010-08-31 13:41:59 -07:00
spa_boot.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
spa_config.c Use vmem_alloc() in spa_config_write() 2015-04-07 15:10:19 -07:00
spa_errlog.c Illumos 4914 - zfs on-disk bookmark structure should be named *_phys_t 2014-08-06 14:48:41 -07:00
spa_history.c Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
spa_misc.c Illumos 5818 - zfs {ref}compressratio is incorrect with 4k sector size 2015-06-10 16:24:01 -07:00
spa_stats.c Illumos 5369 - arc flags should be an enum 2015-06-11 10:27:25 -07:00
spa.c Illumos 5818 - zfs {ref}compressratio is incorrect with 4k sector size 2015-06-10 16:24:01 -07:00
space_map.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
space_reftree.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
trace.c Illumos 5497 - lock contention on arcs_mtx 2015-06-11 10:27:25 -07:00
txg.c Rename cv_wait_interruptible() to cv_wait_sig() 2015-06-11 10:50:47 -07:00
uberblock.c Illumos #3598 2013-10-31 14:58:04 -07:00
unique.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
vdev_cache.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
vdev_disk.c Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
vdev_file.c Illumos #5244 - zio pipeline callers should explicitly invoke next stage 2015-04-30 15:07:47 -07:00
vdev_label.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
vdev_mirror.c Illumos #5244 - zio pipeline callers should explicitly invoke next stage 2015-04-30 15:07:47 -07:00
vdev_missing.c Illumos #5244 - zio pipeline callers should explicitly invoke next stage 2015-04-30 15:07:47 -07:00
vdev_queue.c Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
vdev_raidz.c Illumos #5244 - zio pipeline callers should explicitly invoke next stage 2015-04-30 15:07:47 -07:00
vdev_root.c Illumos #3598 2013-10-31 14:58:04 -07:00
vdev.c Remove unused variable in vdev_add_child() 2015-06-11 10:22:38 -07:00
zap_leaf.c Illumos 5314 - Remove "dbuf phys" db->db_data pointer aliases in ZFS 2015-04-28 16:25:20 -07:00
zap_micro.c Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
zap.c Illumos 3654,3656 2015-05-04 09:41:09 -07:00
zfeature_common.c Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
zfeature.c Use cached feature info in spa_add_feature_stats() 2015-03-05 14:11:10 -08:00
zfs_acl.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
zfs_byteswap.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
zfs_ctldir.c Illumos 4368, 4369. 2014-07-29 10:55:29 -07:00
zfs_debug.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
zfs_dir.c Revert "Revert "Revert "Fix unlink/xattr deadlock""" 2014-08-11 16:12:36 -07:00
zfs_fm.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
zfs_fuid.c Illumos #3522 2013-10-30 14:51:27 -07:00
zfs_ioctl.c Relax restriction on zfs_ioc_next_obj() iteration 2015-05-14 11:16:08 -07:00
zfs_log.c Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
zfs_onexit.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
zfs_replay.c Linux AIO Support 2014-09-05 15:11:43 -07:00
zfs_rlock.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
zfs_sa.c Illumos 5056 - ZFS deadlock on db_mtx and dn_holds 2015-04-28 16:25:34 -07:00
zfs_vfsops.c Add zfs_sb_prune_aliases() function 2015-06-22 10:22:49 -07:00
zfs_vnops.c Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
zfs_znode.c Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
zil.c Illumos 5369 - arc flags should be an enum 2015-06-11 10:27:25 -07:00
zio_checksum.c Illumos 4757, 4913 2014-08-01 14:28:05 -07:00
zio_compress.c Illumos 4757, 4913 2014-08-01 14:28:05 -07:00
zio_inject.c Illumos 5497 - lock contention on arcs_mtx 2015-06-11 10:27:25 -07:00
zio.c Illumos 5369 - arc flags should be an enum 2015-06-11 10:27:25 -07:00
zle.c Update core ZFS code from build 121 to build 141. 2010-05-28 13:45:14 -07:00
zpl_ctldir.c Mark additional functions as PF_FSTRANS 2015-04-17 09:35:24 -07:00
zpl_export.c Return -ESTALE to force lookup for missing NFS file handles 2015-05-14 11:16:52 -07:00
zpl_file.c Linux 4.1 compat: use read_iter() / write_iter() 2015-06-18 12:06:59 -07:00
zpl_inode.c Safely handle security / ACL failures 2015-05-11 14:35:14 -07:00
zpl_super.c Extend PF_FSTRANS critical regions 2015-04-24 09:54:22 -07:00
zpl_xattr.c Mark additional functions as PF_FSTRANS 2015-04-17 09:35:24 -07:00
zrlock.c Illumos 5812 - assertion failed in zrl_tryenter(): zr_owner==NULL 2015-04-30 14:43:40 -07:00
zvol.c Set the maximum ZVOL transfer size correctly 2015-03-25 11:58:42 -07:00