mirror_zfs/include/sys
Matthew Ahrens 0dc2f70c5c OpenZFS 9486 - reduce memory used by device removal on fragmented pools
Device removal allocates a new location for each allocated segment on
the disk that's being removed.  Each allocation results in one entry in
the mapping table, which maps from old location + length to new
location.  When a fragmented disk is removed, this can result in a large
number of mapping entries, and thus a large amount of memory consumed by
the mapping table.  In the worst real-world cases, we've seen around 1GB
of RAM per 1TB of storage removed.

We can improve on this situation by allocating larger segments, which
span across both allocated and free regions of the device being removed.
By including free regions in the allocation (and thus mapping), we
reduce the number of mapping entries.  For example, if we have a 4K
allocation followed by 1K free and then 4K allocated, we would allocate
4+1+4 = 9KB, and then move the entire region (including allocated and
free parts).  In this case we used one mapping where previously we would
have used two, but often the ratio is much higher (up to 20:1 in
real-world use).  We then need to mark the regions that were free on the
removing device as free in the new locations, and also obsolete in the
mapping entry.

This method preserves the fragmentation of the removing device, rather
than consolidating its allocated space into a small number of chunks
where possible.  But it results in drastic reduction of memory used by
the mapping table - around 20x in the most-fragmented cases.

In the most fragmented real-world cases, this reduces memory used by the
mapping from ~1GB to ~50MB of RAM per 1TB of storage removed.  Less
fragmented cases will typically also see around 50-100MB of RAM per 1TB
of storage.

Porting notes:

* Add the following as module parameters:
    * zfs_condense_indirect_vdevs_enable
    * zfs_condense_max_obsolete_bytes

* Document the following module parameters:
   * zfs_condense_indirect_vdevs_enable
   * zfs_condense_max_obsolete_bytes
   * zfs_condense_min_mapping_bytes

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>

OpenZFS-issue: https://illumos.org/issues/9486
OpenZFS-commit: https://github.com/ahrens/illumos/commit/07152e142e44c
External-issue: DLPX-57962
Closes #7536
2018-05-24 10:18:07 -07:00
..
crypto OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
fm Extend deadman logic 2018-01-25 13:40:38 -08:00
fs OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery 2018-05-08 21:35:27 -07:00
lua Fix coverity defects: zfs channel programs 2018-02-20 11:19:42 -08:00
sysevent OpenZFS 8959 - Add notifications when a scrub is paused or resumed 2018-01-17 10:31:00 -08:00
abd.h OpenZFS 8416 - abd.h is not C++ friendly 2017-06-30 11:11:01 -07:00
arc_impl.h Add support for decryption faults in zinject 2018-05-02 15:36:20 -07:00
arc.h Decryption error handling improvements 2018-03-31 11:12:51 -07:00
avl_impl.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
avl.h Remove dead code from AVL tree 2017-10-05 19:28:00 -07:00
blkptr.h OpenZFS 8067 - zdb should be able to dump literal embedded block pointer 2017-07-07 11:28:01 -07:00
bplist.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
bpobj.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
bptree.h Illumos 4914 - zfs on-disk bookmark structure should be named *_phys_t 2014-08-06 14:48:41 -07:00
bqueue.h Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
dbuf.h assertion in arc_release() during encrypted receive 2018-04-17 11:06:54 -07:00
ddt.h Incorrect maximum DVA value in DDE_GET_NDVAS() 2018-02-26 14:20:12 -08:00
dmu_impl.h Fix race in dnode_check_slots_free() 2018-04-10 11:15:05 -07:00
dmu_objset.h assertion in arc_release() during encrypted receive 2018-04-17 11:06:54 -07:00
dmu_send.h Raw receive should change key atomically 2018-02-21 12:31:03 -08:00
dmu_traverse.h Native Encryption for ZFS on Linux 2017-08-14 10:36:48 -07:00
dmu_tx.h OpenZFS 8997 - ztest assertion failure in zil_lwb_write_issue 2018-01-26 20:19:46 -08:00
dmu_zfetch.h OpenZFS 6322 - ZFS indirect block predictive prefetch 2016-08-30 14:26:55 -07:00
dmu.h assertion in arc_release() during encrypted receive 2018-04-17 11:06:54 -07:00
dnode.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_bookmark.h Illumos 4368, 4369. 2014-07-29 10:55:29 -07:00
dsl_crypt.h Add support for decryption faults in zinject 2018-05-02 15:36:20 -07:00
dsl_dataset.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_deadlist.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_deleg.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_destroy.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
dsl_dir.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_pool.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_prop.h Illumos 6171 - dsl_prop_unregister() slows down dataset eviction. 2016-01-12 10:53:12 -08:00
dsl_scan.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_synctask.h Illumos 4951 - ZFS administrative commands should use reserved space 2015-05-04 09:41:10 -07:00
dsl_userhold.h Illumos #3740 2013-11-04 11:17:48 -08:00
edonr.h OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
efi_partition.h Fix spelling 2017-01-03 11:31:18 -06:00
frame.h Suppress incorrect objtool warnings 2017-12-07 10:28:50 -08:00
hkdf.h Encryption patch follow-up 2017-10-11 16:54:48 -04:00
Makefile.am OpenZFS 9079 - race condition in starting and ending condensing thread for indirect vdevs 2018-04-14 12:23:53 -07:00
metaslab_impl.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
metaslab.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
mmp.h Record skipped MMP writes in multihost_history 2018-03-06 15:15:15 -08:00
mntent.h Make zfs mount according to relatime config in dataset 2016-04-05 18:55:59 -07:00
multilist.h OpenZFS 7968 - multi-threaded spa_sync() 2017-03-20 18:36:00 -07:00
nvpair_impl.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
nvpair.h Replace __va_list with va_list 2014-08-13 10:35:00 -07:00
pathname.h Add pn_alloc()/pn_free() functions 2016-04-21 09:49:25 -07:00
policy.h Add zfs allow and zfs unallow support 2016-06-07 09:16:52 -07:00
range_tree.h OpenZFS 9486 - reduce memory used by device removal on fragmented pools 2018-05-24 10:18:07 -07:00
refcount.h OpenZFS 8081 - Compiler warnings in zdb 2017-10-27 12:46:35 -07:00
rrwlock.h Illumos 5008 - lock contention (rrw_exit) while running a read only load 2015-07-06 09:34:13 -07:00
sa_impl.h Implement large_dnode pool feature 2016-06-24 13:13:21 -07:00
sa.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
sdt.h Add line info and SET_ERROR() to ZFS debug log 2017-07-25 23:09:48 -07:00
sha2.h OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
skein.h OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
spa_boot.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
spa_checksum.h Implementation of AVX2 optimized Fletcher-4 2016-06-02 14:30:51 -07:00
spa_impl.h OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery 2018-05-08 21:35:27 -07:00
spa.h OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery 2018-05-08 21:35:27 -07:00
space_map.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
space_reftree.h Illumos #4101, #4102, #4103, #4105, #4106 2014-07-22 09:39:16 -07:00
sysevent.h OpenZFS 6939 - add sysevents to zfs core for commands 2017-07-12 21:28:13 -07:00
trace_acl.h Linux 4.16 compat: inode_set_iversion() 2018-02-08 21:25:19 -08:00
trace_arc.h Support re-prioritizing asynchronous prefetches 2017-12-21 09:13:06 -08:00
trace_common.h OpenZFS 6531 - Provide mechanism to artificially limit disk performance 2016-05-26 10:11:51 -07:00
trace_dbgmsg.h Add line info and SET_ERROR() to ZFS debug log 2017-07-25 23:09:48 -07:00
trace_dbuf.h Crash in dbuf_evict_one with DTRACE_PROBE 2017-08-09 11:04:41 -07:00
trace_dmu.h tx_waited -> tx_dirty_delayed in trace_dmu.h 2018-01-31 16:13:26 -08:00
trace_dnode.h Fix build-it compilation regression 2017-01-24 08:50:15 -08:00
trace_multilist.h Fix build-it compilation regression 2017-01-24 08:50:15 -08:00
trace_txg.h Fix build-it compilation regression 2017-01-24 08:50:15 -08:00
trace_vdev.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
trace_zil.h OpenZFS 8585 - improve batching done in zil_commit() 2017-12-05 09:39:16 -08:00
trace_zio.h Use cstyle -cpP in make cstyle check 2016-12-12 10:46:26 -08:00
trace_zrlock.h Fix race in trace point in zrl_add_impl 2018-03-12 11:27:02 -07:00
trace.h Remove duplicate typedefs from trace.h 2015-01-06 16:53:24 -08:00
txg_impl.h Fix spelling 2017-01-03 11:31:18 -06:00
txg.h OpenZFS 8063 - verify that we do not attempt to access inactive txg 2017-05-10 13:52:22 -04:00
u8_textprep_data.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
u8_textprep.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
uberblock_impl.h OpenZFS 8491 - uberblock on-disk padding to reserve space for smoothly merging zpool checkpoint & MMP in ZFS 2017-07-24 13:47:51 -04:00
uberblock.h Multi-modifier protection (MMP) 2017-07-13 13:54:00 -04:00
uio_impl.h Add basic uio support 2011-02-10 09:21:43 -08:00
unique.h Illumos #3742 2013-11-04 10:55:25 -08:00
uuid.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
vdev_disk.h Remove custom root pool import code 2016-08-11 11:19:34 -07:00
vdev_file.h Use a dedicated taskq for vdev_file 2016-12-21 10:47:15 -08:00
vdev_impl.h OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery 2018-05-08 21:35:27 -07:00
vdev_indirect_births.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_indirect_mapping.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_raidz_impl.h Revert raidz_map and _col structure types 2018-01-09 14:46:52 -08:00
vdev_raidz.h Use cstyle -cpP in make cstyle check 2016-12-12 10:46:26 -08:00
vdev_removal.h OpenZFS 9486 - reduce memory used by device removal on fragmented pools 2018-05-24 10:18:07 -07:00
vdev.h OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery 2018-05-08 21:35:27 -07:00
xvattr.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zap_impl.h OpenZFS 7793 - ztest fails assertion in dmu_tx_willuse_space 2017-03-07 09:51:59 -08:00
zap_leaf.h Fix ENOSPC in "Handle zap_add() failures in ..." 2018-04-18 14:19:50 -07:00
zap.h OpenZFS 1300 - filename normalization doesn't work for removes 2017-02-02 14:13:41 -08:00
zcp_global.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp_iter.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp_prop.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp.h OpenZFS 8677 - Open-Context Channel Programs 2018-02-08 16:05:57 -08:00
zfeature.h Revert "zhack: Add 'feature disable' command" 2016-05-17 11:52:07 -07:00
zfs_acl.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_context.h OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery 2018-05-08 21:35:27 -07:00
zfs_ctldir.h Rename zfs_sb_t -> zfsvfs_t 2017-03-10 09:51:33 -08:00
zfs_debug.h OpenZFS 9236 - nuke spa_dbgmsg 2018-04-30 10:19:48 -07:00
zfs_delay.h cstyle: Resolve C style issues 2013-12-18 16:46:35 -08:00
zfs_dir.h Rename zfs_sb_t -> zfsvfs_t 2017-03-10 09:51:33 -08:00
zfs_fuid.h Rename zfs_sb_t -> zfsvfs_t 2017-03-10 09:51:33 -08:00
zfs_ioctl.h Add support for decryption faults in zinject 2018-05-02 15:36:20 -07:00
zfs_onexit.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
zfs_project.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_ratelimit.h Change checksum & IO delay ratelimit values 2018-03-04 17:34:51 -08:00
zfs_rlock.h Rename zfs_sb_t -> zfsvfs_t 2017-03-10 09:51:33 -08:00
zfs_sa.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_stat.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
zfs_vfsops.h ZIL claiming should not start user accounting 2018-02-20 16:27:31 -08:00
zfs_vnops.h RHEL 7.5 compat: FMODE_KABI_ITERATE 2018-05-02 15:01:24 -07:00
zfs_znode.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zil_impl.h OpenZFS 8909 - 8585 can cause a use-after-free kernel panic 2017-12-28 10:18:04 -08:00
zil.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
zio_checksum.h Remove dependency on linear ABD 2017-03-29 12:24:51 -07:00
zio_compress.h DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
zio_crypt.h Add support for decryption faults in zinject 2018-05-02 15:36:20 -07:00
zio_impl.h Native Encryption for ZFS on Linux 2017-08-14 10:36:48 -07:00
zio_priority.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
zio.h Add support for decryption faults in zinject 2018-05-02 15:36:20 -07:00
zpl.h RHEL 7.5 compat: FMODE_KABI_ITERATE 2018-05-02 15:01:24 -07:00
zrlock.h OpenZFS 6328 - Fix cstyle errors in zfs codebase 2017-01-12 09:42:11 -08:00
zthr.h OpenZFS 9079 - race condition in starting and ending condensing thread for indirect vdevs 2018-04-14 12:23:53 -07:00
zvol.h Add port of FreeBSD 'volmode' property 2017-07-12 13:05:37 -07:00