mirror_zfs/include/sys
smh 9f500936c8 FreeBSD r256956: Improve ZFS N-way mirror read performance by using load and locality information.
The existing algorithm selects a preferred leaf vdev based on offset of the zio
request modulo the number of members in the mirror. It assumes the devices are
of equal performance and that spreading the requests randomly over both drives
will be sufficient to saturate them. In practice this results in the leaf vdevs
being under utilized.

The new algorithm takes into the following additional factors:
* Load of the vdevs (number outstanding I/O requests)
* The locality of last queued I/O vs the new I/O request.

Within the locality calculation additional knowledge about the underlying vdev
is considered such as; is the device backing the vdev a rotating media device.

This results in performance increases across the board as well as significant
increases for predominantly streaming loads and for configurations which don't
have evenly performing devices.

The following are results from a setup with 3 Way Mirror with 2 x HD's and
1 x SSD from a basic test running multiple parrallel dd's.

With pre-fetch disabled (vfs.zfs.prefetch_disable=1):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s

With pre-fetch enabled (vfs.zfs.prefetch_disable=0):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s

In addition to the performance changes the code was also restructured, with
the help of Justin Gibbs, to provide a more logical flow which also ensures
vdevs loads are only calculated from the set of valid candidates.

The following additional sysctls where added to allow the administrator
to tune the behaviour of the load algorithm:
* vfs.zfs.vdev.mirror.rotating_inc
* vfs.zfs.vdev.mirror.rotating_seek_inc
* vfs.zfs.vdev.mirror.rotating_seek_offset
* vfs.zfs.vdev.mirror.non_rotating_inc
* vfs.zfs.vdev.mirror.non_rotating_seek_inc

These changes where based on work started by the zfsonlinux developers:
https://github.com/zfsonlinux/zfs/pull/1487

Reviewed by:	gibbs, mav, will
MFC after:	2 weeks
Sponsored by:	Multiplay

References:
  https://github.com/freebsd/freebsd@5c7a6f5d
  https://github.com/freebsd/freebsd@31b7f68d
  https://github.com/freebsd/freebsd@e186f564

Performance Testing:
  https://github.com/zfsonlinux/zfs/pull/4334#issuecomment-189057141

Porting notes:
- The tunables were adjusted to have ZoL-style names.
- The code was modified to use ZoL's vd_nonrot.
- Fixes were done to make cstyle.pl happy
- Merge conflicts were handled manually
- freebsd/freebsd@e186f564bc by my
  collegue Andriy Gapon has been included. It applied perfectly, but
  added a cstyle regression.
- This replaces 556011dbec entirely.
- A typo "IO'a" has been corrected to say "IO's"
- Descriptions of new tunables were added to man/man5/zfs-module-parameters.5.

Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4334
2016-02-26 11:24:35 -08:00
..
fm Illumos 3749 - zfs event processing should work on R/O root filesystems 2016-01-12 14:42:32 -08:00
fs Illumos 4929 - want prevsnap property 2016-01-11 11:58:26 -08:00
arc_impl.h Illumos 6214 - zpools going south 2015-09-11 11:14:38 -07:00
arc.h Illumos 5987 - zfs prefetch code needs work 2016-01-12 09:02:33 -08:00
avl_impl.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
avl.h Illumos 4745 - fix AVL code misspellings 2015-07-10 11:58:37 -07:00
blkptr.h Illumos 4757, 4913 2014-08-01 14:28:05 -07:00
bplist.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
bpobj.h Illumos 5810 - zdb should print details of bpobj 2015-05-11 15:10:24 -07:00
bptree.h Illumos 4914 - zfs on-disk bookmark structure should be named *_phys_t 2014-08-06 14:48:41 -07:00
bqueue.h Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
dbuf.h Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
ddt.h Add ddt, ddt_entry, and l2arc_hdr caches 2014-01-07 10:33:11 -08:00
dmu_impl.h Illumos 4757, 4913 2014-08-01 14:28:05 -07:00
dmu_objset.h Illumos 6267 - dn_bonus evicted too early 2015-10-13 14:12:02 -07:00
dmu_send.h Illumos 5765 - add support for estimating send stream size with lzc_send_space when source is a bookmark 2015-05-13 09:03:59 -07:00
dmu_traverse.h Illumos 4914 - zfs on-disk bookmark structure should be named *_phys_t 2014-08-06 14:48:41 -07:00
dmu_tx.h dmu_tx kstat cleanup 2014-03-04 12:22:24 -08:00
dmu_zfetch.h Illumos 5987 - zfs prefetch code needs work 2016-01-12 09:02:33 -08:00
dmu.h Illumos 4950 - files sometimes can't be removed from a full filesystem 2016-01-21 16:59:30 -08:00
dnode.h Illumos 5141 - zfs minimum indirect block size is 4K 2016-01-12 14:11:31 -08:00
dsl_bookmark.h Illumos 4368, 4369. 2014-07-29 10:55:29 -07:00
dsl_dataset.h Illumos 6171 - dsl_prop_unregister() slows down dataset eviction. 2016-01-12 10:53:12 -08:00
dsl_deadlist.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
dsl_deleg.h Illumos 4368, 4369. 2014-07-29 10:55:29 -07:00
dsl_destroy.h Illumos #3888 2013-11-04 11:18:14 -08:00
dsl_dir.h Illumos 6171 - dsl_prop_unregister() slows down dataset eviction. 2016-01-12 10:53:12 -08:00
dsl_pool.h Illumos 5981 - Deadlock in dmu_objset_find_dp 2015-07-06 09:31:35 -07:00
dsl_prop.h Illumos 6171 - dsl_prop_unregister() slows down dataset eviction. 2016-01-12 10:53:12 -08:00
dsl_scan.h Illumos 4914 - zfs on-disk bookmark structure should be named *_phys_t 2014-08-06 14:48:41 -07:00
dsl_synctask.h Illumos 4951 - ZFS administrative commands should use reserved space 2015-05-04 09:41:10 -07:00
dsl_userhold.h Illumos #3740 2013-11-04 11:17:48 -08:00
efi_partition.h Ext4's typical GPT partition type not recognized 2015-12-04 09:27:00 -08:00
Makefile.am Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
metaslab_impl.h Remove fastwrite mutex 2016-01-15 15:38:35 -08:00
metaslab.h Illumos 5213 - panic in metaslab_init due to space_map_open returning ENXIO 2014-11-14 15:37:45 -08:00
mntent.h Honor xattr=sa dataset property 2015-09-19 14:04:14 -07:00
multilist.h Illumos 5497 - lock contention on arcs_mtx 2015-06-11 10:27:25 -07:00
nvpair_impl.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
nvpair.h Replace __va_list with va_list 2014-08-13 10:35:00 -07:00
range_tree.h Illumos #4374 2014-07-30 09:20:35 -07:00
refcount.h Illumos 5045 - use atomic_{inc,dec}_* instead of atomic_add_* 2016-01-15 15:38:36 -08:00
rrwlock.h Illumos 5008 - lock contention (rrw_exit) while running a read only load 2015-07-06 09:34:13 -07:00
sa_impl.h Illumos 5056 - ZFS deadlock on db_mtx and dn_holds 2015-04-28 16:25:34 -07:00
sa.h Prevent SA length overflow 2015-12-30 13:20:12 -08:00
sdt.h Swap DTRACE_PROBE* with Linux tracepoints 2014-11-17 11:13:55 -08:00
spa_boot.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
spa_impl.h Illumos 3749 - zfs event processing should work on R/O root filesystems 2016-01-12 14:42:32 -08:00
spa.h Illumos 5746 - more checksumming in zfs send 2015-12-30 14:24:14 -08:00
space_map.h Illumos 5164-5165 - space map fixes 2014-10-23 15:30:32 -07:00
space_reftree.h Illumos #4101, #4102, #4103, #4105, #4106 2014-07-22 09:39:16 -07:00
trace_acl.h Fix build failure with Linux 4.1 and FTRACE 2015-07-29 07:35:06 -07:00
trace_arc.h Illumos 5987 - zfs prefetch code needs work 2016-01-12 09:02:33 -08:00
trace_dbgmsg.h SET_ERROR should print strings 2016-01-15 15:38:35 -08:00
trace_dbuf.h SET_ERROR should print strings 2016-01-15 15:38:35 -08:00
trace_dmu.h Fix build failure with Linux 4.1 and FTRACE 2015-07-29 07:35:06 -07:00
trace_dnode.h Fix build failure with Linux 4.1 and FTRACE 2015-07-29 07:35:06 -07:00
trace_multilist.h Fix build failure with Linux 4.1 and FTRACE 2015-08-18 16:47:21 -07:00
trace_txg.h Fix build failure with Linux 4.1 and FTRACE 2015-07-29 07:35:06 -07:00
trace_zil.h Fix build failure with Linux 4.1 and FTRACE 2015-07-29 07:35:06 -07:00
trace_zrlock.h SET_ERROR should print strings 2016-01-15 15:38:35 -08:00
trace.h Remove duplicate typedefs from trace.h 2015-01-06 16:53:24 -08:00
txg_impl.h Illumos #4045 write throttle & i/o scheduler performance work 2013-12-06 09:32:43 -08:00
txg.h Illumos 4753 - increase number of outstanding async writes when sync task is waiting 2014-09-23 13:50:55 -07:00
u8_textprep_data.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
u8_textprep.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
uberblock_impl.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
uberblock.h Illumos 5347 - idle pool may run itself out of space 2015-07-14 10:35:21 -07:00
uio_impl.h Add basic uio support 2011-02-10 09:21:43 -08:00
unique.h Illumos #3742 2013-11-04 10:55:25 -08:00
uuid.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
vdev_disk.h cstyle: Resolve C style issues 2013-12-18 16:46:35 -08:00
vdev_file.h Update all default taskq settings 2015-06-25 08:58:16 -07:00
vdev_impl.h FreeBSD r256956: Improve ZFS N-way mirror read performance by using load and locality information. 2016-02-26 11:24:35 -08:00
vdev.h FreeBSD r256956: Improve ZFS N-way mirror read performance by using load and locality information. 2016-02-26 11:24:35 -08:00
xvattr.h Add xvattr support 2011-03-02 11:43:50 -08:00
zap_impl.h Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
zap_leaf.h Illumos 5056 - ZFS deadlock on db_mtx and dn_holds 2015-04-28 16:25:34 -07:00
zap.h Add zap_prefetch() interface 2015-12-04 09:39:20 -08:00
zfeature.h Illumos 4370, 4371 2014-07-28 14:29:58 -07:00
zfs_acl.h Illumos #3742 2013-11-04 10:55:25 -08:00
zfs_context.h Illumos 6815179, 6844191 2016-01-22 09:39:46 -08:00
zfs_ctldir.h Use spa as key besides objsetid for snapentry 2015-12-08 16:38:56 -08:00
zfs_debug.h Add dbgmsg kstat 2015-09-04 16:08:14 -07:00
zfs_delay.h cstyle: Resolve C style issues 2013-12-18 16:46:35 -08:00
zfs_dir.h Prototype/structure update for Linux 2011-02-10 09:27:21 -08:00
zfs_fuid.h Prototype/structure update for Linux 2011-02-10 09:27:21 -08:00
zfs_ioctl.h Illumos 5746 - more checksumming in zfs send 2015-12-30 14:24:14 -08:00
zfs_onexit.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
zfs_rlock.h Illumos #3742 2013-11-04 10:55:25 -08:00
zfs_sa.h Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
zfs_stat.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
zfs_vfsops.h Fix zsb->z_hold_mtx deadlock 2016-01-15 15:33:45 -08:00
zfs_vnops.h Add zfs_iput_async() interface 2014-08-11 16:11:43 -07:00
zfs_znode.h Fix zsb->z_hold_mtx deadlock 2016-01-15 15:33:45 -08:00
zil_impl.h Illumos 5027 - zfs large block support 2015-05-11 12:23:16 -07:00
zil.h Illumos 5269 - zpool import slow 2015-06-09 13:48:02 -07:00
zio_checksum.h Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
zio_compress.h Illumos #3742 2013-11-04 10:55:25 -08:00
zio_impl.h Illumos #3836 2013-11-05 12:14:56 -08:00
zio_priority.h Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
zio.h Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
zpl.h Add temporary mount options 2015-09-03 14:14:55 -07:00
zrlock.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
zvol.h Check large block feature flag on volumes 2015-08-28 09:25:03 -07:00