mirror_zfs/module/zfs
Don Brady 3dfb57a35e OpenZFS 7090 - zfs should throttle allocations
OpenZFS 7090 - zfs should throttle allocations

Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Ported-by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>

When write I/Os are issued, they are issued in block order but the ZIO
pipeline will drive them asynchronously through the allocation stage
which can result in blocks being allocated out-of-order. It would be
nice to preserve as much of the logical order as possible.

In addition, the allocations are equally scattered across all top-level
VDEVs but not all top-level VDEVs are created equally. The pipeline
should be able to detect devices that are more capable of handling
allocations and should allocate more blocks to those devices. This
allows for dynamic allocation distribution when devices are imbalanced
as fuller devices will tend to be slower than empty devices.

The change includes a new pool-wide allocation queue which would
throttle and order allocations in the ZIO pipeline. The queue would be
ordered by issued time and offset and would provide an initial amount of
allocation of work to each top-level vdev. The allocation logic utilizes
a reservation system to reserve allocations that will be performed by
the allocator. Once an allocation is successfully completed it's
scheduled on a given top-level vdev. Each top-level vdev maintains a
maximum number of allocations that it can handle (mg_alloc_queue_depth).
The pool-wide reserved allocations (top-levels * mg_alloc_queue_depth)
are distributed across the top-level vdevs metaslab groups and round
robin across all eligible metaslab groups to distribute the work. As
top-levels complete their work, they receive additional work from the
pool-wide allocation queue until the allocation queue is emptied.

OpenZFS-issue: https://www.illumos.org/issues/7090
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4756c3d7
Closes #5258 

Porting Notes:
- Maintained minimal stack in zio_done
- Preserve linux-specific io sizes in zio_write_compress
- Added module params and documentation
- Updated to use optimize AVL cmp macros
2016-10-13 17:59:18 -07:00
..
arc.c Fix file permissions 2016-10-08 14:57:56 -07:00
blkptr.c Illumos 4757, 4913 2014-08-01 14:28:05 -07:00
bplist.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
bpobj.c Illumos 5810 - zdb should print details of bpobj 2015-05-11 15:10:24 -07:00
bptree.c Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
bqueue.c Fix coverity defects: CID 147565-147567 2016-10-07 13:19:43 -07:00
dbuf_stats.c OpenZFS 6950 - ARC should cache compressed data 2016-09-13 09:58:33 -07:00
dbuf.c Fix coverity defects: CID 150943, 150938 2016-10-13 14:30:50 -07:00
ddt_zap.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
ddt.c OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
dmu_diff.c OpenZFS 6950 - ARC should cache compressed data 2016-09-13 09:58:33 -07:00
dmu_object.c Implement large_dnode pool feature 2016-06-24 13:13:21 -07:00
dmu_objset.c Fix coverity defects: CID 153394 2016-10-12 13:24:03 -07:00
dmu_send.c OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
dmu_traverse.c Rename hole_birth tunable to match OpenZFS 2016-10-07 21:02:24 -07:00
dmu_tx.c Fix coverity defects: CID 147571, 147574 2016-10-13 14:25:05 -07:00
dmu_zfetch.c OpenZFS 6322 - ZFS indirect block predictive prefetch 2016-08-30 14:26:55 -07:00
dmu.c OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
dnode_sync.c Add support for user/group dnode accounting & quota 2016-10-07 09:45:13 -07:00
dnode.c OpenZFS 6988 spa_sync() spends half its time in dmu_objset_do_userquota_updates 2016-10-07 09:45:13 -07:00
dsl_bookmark.c OpenZFS 6314 - buffer overflow in dsl_dataset_name 2016-06-28 13:47:03 -07:00
dsl_dataset.c OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
dsl_deadlist.c Performance optimization of AVL tree comparator functions 2016-08-31 14:35:34 -07:00
dsl_deleg.c Performance optimization of AVL tree comparator functions 2016-08-31 14:35:34 -07:00
dsl_destroy.c OpenZFS 2605, 6980, 6902 2016-06-28 13:47:02 -07:00
dsl_dir.c Fix coverity defects: 147658, 147652, 147651 2016-09-29 12:06:14 -07:00
dsl_pool.c Fix file permissions 2016-10-08 14:57:56 -07:00
dsl_prop.c Free property names with spa_strfree() rather than strfree() 2016-09-12 09:45:26 -07:00
dsl_scan.c OpenZFS 6950 - ARC should cache compressed data 2016-09-13 09:58:33 -07:00
dsl_synctask.c Illumos 4951 - ZFS administrative commands should use reserved space 2015-05-04 09:41:10 -07:00
dsl_userhold.c OpenZFS 6314 - buffer overflow in dsl_dataset_name 2016-06-28 13:47:03 -07:00
edonr_zfs.c OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
fm.c Fix coverity defects: CID 147448, 147449, 147450, 147453, 147454 2016-10-02 11:24:54 -07:00
gzip.c cstyle: Resolve C style issues 2013-12-18 16:46:35 -08:00
lz4.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
lzjb.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
Makefile.in OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
metaslab.c OpenZFS 7090 - zfs should throttle allocations 2016-10-13 17:59:18 -07:00
multilist.c Identify locks flagged by lockdep 2015-12-22 10:21:33 -08:00
pathname.c Add pn_alloc()/pn_free() functions 2016-04-21 09:49:25 -07:00
policy.c Fix NFS credential 2016-06-21 09:58:37 -07:00
range_tree.c Performance optimization of AVL tree comparator functions 2016-08-31 14:35:34 -07:00
refcount.c OpenZFS 7090 - zfs should throttle allocations 2016-10-13 17:59:18 -07:00
rrwlock.c Illumos 5008 - lock contention (rrw_exit) while running a read only load 2015-07-06 09:34:13 -07:00
sa.c Performance optimization of AVL tree comparator functions 2016-08-31 14:35:34 -07:00
sha256.c OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
skein_zfs.c OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
spa_boot.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
spa_config.c Fix coverity defects: CID 147565-147567 2016-10-07 13:19:43 -07:00
spa_errlog.c Illumos 4914 - zfs on-disk bookmark structure should be named *_phys_t 2014-08-06 14:48:41 -07:00
spa_history.c Fix indefinite article 2016-08-11 11:23:49 -07:00
spa_misc.c OpenZFS 7090 - zfs should throttle allocations 2016-10-13 17:59:18 -07:00
spa_stats.c Explicit integer promotion for bit shift operations 2016-09-29 15:55:41 -07:00
spa.c OpenZFS 7090 - zfs should throttle allocations 2016-10-13 17:59:18 -07:00
space_map.c Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
space_reftree.c Performance optimization of AVL tree comparator functions 2016-08-31 14:35:34 -07:00
trace.c OpenZFS 6531 - Provide mechanism to artificially limit disk performance 2016-05-26 10:11:51 -07:00
txg.c txg visibility code should not execute under tc_open_lock 2016-07-27 14:11:13 -07:00
uberblock.c Illumos 5347 - idle pool may run itself out of space 2015-07-14 10:35:21 -07:00
unique.c Performance optimization of AVL tree comparator functions 2016-08-31 14:35:34 -07:00
vdev_cache.c OpenZFS 7090 - zfs should throttle allocations 2016-10-13 17:59:18 -07:00
vdev_disk.c Explicit block device plugging when submitting multiple BIOs 2016-09-29 13:13:31 -07:00
vdev_file.c OpenZFS 6531 - Provide mechanism to artificially limit disk performance 2016-05-26 10:11:51 -07:00
vdev_label.c Performance optimization of AVL tree comparator functions 2016-08-31 14:35:34 -07:00
vdev_mirror.c OpenZFS 7090 - zfs should throttle allocations 2016-10-13 17:59:18 -07:00
vdev_missing.c Illumos #5244 - zio pipeline callers should explicitly invoke next stage 2015-04-30 15:07:47 -07:00
vdev_queue.c OpenZFS 7090 - zfs should throttle allocations 2016-10-13 17:59:18 -07:00
vdev_raidz_math_aarch64_neon_common.h Add parity generation/rebuild using 128-bits NEON for Aarch64 2016-10-03 09:44:00 -07:00
vdev_raidz_math_aarch64_neon.c Add parity generation/rebuild using 128-bits NEON for Aarch64 2016-10-03 09:44:00 -07:00
vdev_raidz_math_aarch64_neonx2.c Add parity generation/rebuild using 128-bits NEON for Aarch64 2016-10-03 09:44:00 -07:00
vdev_raidz_math_avx2.c Add parity generation/rebuild using 128-bits NEON for Aarch64 2016-10-03 09:44:00 -07:00
vdev_raidz_math_impl.h Add parity generation/rebuild using 128-bits NEON for Aarch64 2016-10-03 09:44:00 -07:00
vdev_raidz_math_scalar.c Add parity generation/rebuild using 128-bits NEON for Aarch64 2016-10-03 09:44:00 -07:00
vdev_raidz_math_sse2.c Add parity generation/rebuild using 128-bits NEON for Aarch64 2016-10-03 09:44:00 -07:00
vdev_raidz_math_ssse3.c Add parity generation/rebuild using 128-bits NEON for Aarch64 2016-10-03 09:44:00 -07:00
vdev_raidz_math.c Add parity generation/rebuild using 128-bits NEON for Aarch64 2016-10-03 09:44:00 -07:00
vdev_raidz.c OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
vdev_root.c Illumos #3598 2013-10-31 14:58:04 -07:00
vdev.c OpenZFS 7090 - zfs should throttle allocations 2016-10-13 17:59:18 -07:00
zap_leaf.c Illumos 5314 - Remove "dbuf phys" db->db_data pointer aliases in ZFS 2015-04-28 16:25:20 -07:00
zap_micro.c Fix coverity defects: CID 147650, 147649, 147647, 147646 2016-09-25 15:08:28 -07:00
zap.c Avoid undefined shift overflow in fzap_cursor_retrieve() 2016-09-29 15:55:41 -07:00
zfeature_common.c Add support for user/group dnode accounting & quota 2016-10-07 09:45:13 -07:00
zfeature.c Revert "zhack: Add 'feature disable' command" 2016-05-17 11:52:07 -07:00
zfs_acl.c Add support for user/group dnode accounting & quota 2016-10-07 09:45:13 -07:00
zfs_byteswap.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
zfs_ctldir.c Use env, not sh in zfsctl_snapshot_{,un}mount() 2016-10-08 17:43:29 +02:00
zfs_debug.c Add dbgmsg kstat 2015-09-04 16:08:14 -07:00
zfs_dir.c Remove znode's z_uid/z_gid member 2016-07-25 13:21:49 -07:00
zfs_fm.c Bring over illumos ZFS FMA logic -- phase 1 2016-09-01 11:39:45 -07:00
zfs_fuid.c Fix coverity defects 2016-09-21 18:09:00 -07:00
zfs_ioctl.c Add support for user/group dnode accounting & quota 2016-10-07 09:45:13 -07:00
zfs_log.c Remove znode's z_uid/z_gid member 2016-07-25 13:21:49 -07:00
zfs_onexit.c zfsdev_getminor() should check for invalid file handles 2015-06-22 17:02:13 -07:00
zfs_replay.c Fix coverity defects 2016-09-21 18:09:00 -07:00
zfs_rlock.c Performance optimization of AVL tree comparator functions 2016-08-31 14:35:34 -07:00
zfs_sa.c Use native inode->i_nlink instead of znode->z_links 2016-07-14 16:25:34 -07:00
zfs_vfsops.c Add support for user/group dnode accounting & quota 2016-10-07 09:45:13 -07:00
zfs_vnops.c Refactor inode->i_mode management 2016-09-27 14:08:52 -07:00
zfs_znode.c Refactor updating of immutable/appendonly flags 2016-10-05 14:47:29 -07:00
zil.c OpenZFS 6950 - ARC should cache compressed data 2016-09-13 09:58:33 -07:00
zio_checksum.c OpenZFS 6541 - Pool feature-flag check defeated if "verify" is included in the dedup property value 2016-10-03 14:51:21 -07:00
zio_compress.c Illumos 5661 - ZFS: "compression = on" should use lz4 if feature is enabled 2015-07-10 12:11:45 -07:00
zio_inject.c OpenZFS 6531 - Provide mechanism to artificially limit disk performance 2016-05-26 10:11:51 -07:00
zio.c OpenZFS 7090 - zfs should throttle allocations 2016-10-13 17:59:18 -07:00
zle.c Update core ZFS code from build 121 to build 141. 2010-05-28 13:45:14 -07:00
zpl_ctldir.c Use file_dentry and file_inode wrappers 2016-08-11 12:06:37 -07:00
zpl_export.c zfsctl: No need to sync ctldir inodes 2015-08-31 13:54:39 -07:00
zpl_file.c Use file_dentry and file_inode wrappers 2016-08-11 12:06:37 -07:00
zpl_inode.c Linux 4.7 compat: Fix deadlock during lookup on case-insensitive 2016-09-22 19:09:16 -07:00
zpl_super.c Fix memleak in zpl_parse_options 2016-05-31 16:04:26 -07:00
zpl_xattr.c Linux 4.7 compat: fix zpl_get_acl returns invalid acl pointer 2016-08-09 10:03:04 -07:00
zrlock.c Illumos 5812 - assertion failed in zrl_tryenter(): zr_owner==NULL 2015-04-30 14:43:40 -07:00
zvol.c Linux 4.8 compat: Fix removal of bio->bi_rw member 2016-08-11 11:19:34 -07:00