mirror_zfs/module/zfs
Serapheim Dimitropoulos 7bf4c97a36
Bypass metaslab throttle for removal allocations
Context:
We recently had a scenario where a customer with 2x10TB disks at 95+%
fragmentation and capacity, wanted to migrate their disks to a 2x20TB
setup. So they added the 2 new disks and submitted the removal of the
first 10TB disk.  The removal took a lot more than expected (order of
more than a week to 2 weeks vs a couple of days) and once it was done it
generated a huge indirect mappign table in RAM (~16GB vs expected ~1GB).

Root-Cause:
The removal code calls `metaslab_alloc_dva()` to allocate a new block
for each evacuating block in the removing device and it tries to batch
them into 16MB segments. If it can't find such a segment it tries for
8MBs, 4MBs, all the way down to 512 bytes.

In our scenario what would happen is that `metaslab_alloc_dva()` from
the removal thread pick the new devices initially but wouldn't allocate
from them because of throttling in their metaslab allocation queue's
depth (see `metaslab_group_allocatable()`) as these devices are new and
favored for most types of allocations because of their free space. So
then the removal thread would look at the old fragmented disk for
allocations and wouldn't find any contiguous space and finally retry
with a smaller allocation size until it would to the low KB range. This
caused a lot of small mappings to be generated blowing up the size of
the indirect table. It also wasted a lot of CPU while the removal was
active making everything slow.

This patch:
Make all allocations coming from the device removal thread bypass the
throttle checks. These allocations are not even counted in the metaslab
allocation queues anyway so why check them?

Side-Fix:
Allocations with METASLAB_DONT_THROTTLE in their flags would not be
accounted at the throttle queues but they'd still abide by the
throttling rules which seems wrong. This patch fixes this by checking
for that flag in `metaslab_group_allocatable()`. I did a quick check to
see where else this flag is used and it doesn't seem like this change
would cause issues.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14159
2022-12-09 10:48:33 -08:00
..
abd.c abd_return_buf() should call zfs_refcount_remove_many() early 2022-10-19 17:11:01 -07:00
aggsum.c Remove bcopy(), bzero(), bcmp() 2022-03-15 15:13:42 -07:00
arc.c Fix arc_p aggressive increase 2022-11-11 10:41:36 -08:00
blake3_zfs.c Fix memory allocation issue for BLAKE3 context 2022-06-21 14:32:09 -07:00
blkptr.c Remove bcopy(), bzero(), bcmp() 2022-03-15 15:13:42 -07:00
bplist.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
bpobj.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
bptree.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
bqueue.c zfs recv hangs if max recordsize is less than received recordsize 2022-09-16 13:52:25 -07:00
btree.c Optimize microzaps 2022-10-20 11:57:15 -07:00
dataset_kstats.c Introduce kmem_scnprintf() 2022-10-29 13:05:11 -07:00
dbuf_stats.c Revert "Reduce dbuf_find() lock contention" 2022-09-22 12:59:41 -07:00
dbuf.c Fix NULL pointer dereference in dbuf_prefetch_indirect_done() 2022-11-29 10:00:50 -08:00
ddt_zap.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
ddt.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
dmu_diff.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
dmu_object.c Cleanup: Specify unsignedness on things that should not be signed 2022-09-27 16:42:41 -07:00
dmu_objset.c Convert enum zio_flag to uint64_t 2022-10-27 09:54:54 -07:00
dmu_recv.c Deny receiving into encrypted datasets if the keys are not loaded 2022-11-03 09:55:13 -07:00
dmu_redact.c Fix incorrect size given to bqueue_enqueue() call in dmu_redact.c 2022-09-15 16:21:21 -07:00
dmu_send.c Fix dereference after null check in enqueue_range 2022-12-08 14:15:21 -08:00
dmu_traverse.c Convert enum zio_flag to uint64_t 2022-10-27 09:54:54 -07:00
dmu_tx.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
dmu_zfetch.c Cleanup: 64-bit kernel module parameters should use fixed width types 2022-10-13 10:03:29 -07:00
dmu.c Convert enum zio_flag to uint64_t 2022-10-27 09:54:54 -07:00
dnode_sync.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
dnode.c Remove few pointer dereferences in dbuf_read() 2022-11-29 09:49:02 -08:00
dsl_bookmark.c Cleanup: Address Clang's static analyzer's unused code complaints 2022-10-14 13:37:54 -07:00
dsl_crypt.c Handle and detect #13709's unlock regression (#14161) 2022-11-15 14:44:12 -08:00
dsl_dataset.c Remove duplicate statically allocated variable 2022-12-08 13:52:42 -08:00
dsl_deadlist.c Cleanup: 64-bit kernel module parameters should use fixed width types 2022-10-13 10:03:29 -07:00
dsl_deleg.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
dsl_destroy.c Prevent zevent list from consuming all of kernel memory 2022-08-22 12:36:22 -07:00
dsl_dir.c Fix potential NULL pointer dereference regression 2022-11-10 13:56:28 -08:00
dsl_pool.c Cleanup: Address Clang's static analyzer's unused code complaints 2022-10-14 13:37:54 -07:00
dsl_prop.c dsl_prop_known_index(): check for invalid prop 2022-11-08 10:16:01 -08:00
dsl_scan.c Cleanup: 64-bit kernel module parameters should use fixed width types 2022-10-13 10:03:29 -07:00
dsl_synctask.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
dsl_userhold.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
edonr_zfs.c Remove bcopy(), bzero(), bcmp() 2022-03-15 15:13:42 -07:00
fm.c fm_fmri_hc_create() must call va_end() before returning 2022-10-18 15:34:36 -07:00
gzip.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
hkdf.c Remove bcopy(), bzero(), bcmp() 2022-03-15 15:13:42 -07:00
lz4_zfs.c Updated the lz4 decompressor 2022-01-07 10:36:49 -08:00
lz4.c lz4: Cherrypick fix for CVE-2021-3520 2022-01-12 16:14:36 -08:00
lzjb.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
metaslab.c Bypass metaslab throttle for removal allocations 2022-12-09 10:48:33 -08:00
mmp.c Cleanup: Address Clang's static analyzer's unused code complaints 2022-10-14 13:37:54 -07:00
multilist.c Cleanup: Specify unsignedness on things that should not be signed 2022-09-27 16:42:41 -07:00
objlist.c Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
pathname.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
range_tree.c Add defensive assertions 2022-10-12 11:25:18 -07:00
refcount.c Cleanup: Specify unsignedness on things that should not be signed 2022-09-27 16:42:41 -07:00
rrwlock.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
sa.c Fix double const qualifier declarations 2022-09-30 15:34:39 -07:00
sha256.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
skein_zfs.c Remove bcopy(), bzero(), bcmp() 2022-03-15 15:13:42 -07:00
spa_checkpoint.c Cleanup: 64-bit kernel module parameters should use fixed width types 2022-10-13 10:03:29 -07:00
spa_config.c zed: mark disks as REMOVED when they are removed 2022-09-28 09:48:46 -07:00
spa_errlog.c Cleanup: Specify unsignedness on things that should not be signed 2022-09-27 16:42:41 -07:00
spa_history.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
spa_log_spacemap.c Address warnings about possible division by zero from clangsa 2022-11-03 09:58:14 -07:00
spa_misc.c zed: post a udev change event from spa_vdev_attach() 2022-11-18 11:39:59 -08:00
spa_stats.c Cleanup: Specify unsignedness on things that should not be signed 2022-09-27 16:42:41 -07:00
spa.c zed: Prevent special vdev to be replaced by hot spare 2022-11-04 11:33:47 -07:00
space_map.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
space_reftree.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
THIRDPARTYLICENSE.cityhash OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
THIRDPARTYLICENSE.cityhash.descrip OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
txg.c Fix the last two CFI callback prototype mismatches 2022-11-29 09:56:16 -08:00
uberblock.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
unique.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_cache.c Cleanup: Specify unsignedness on things that should not be signed 2022-09-27 16:42:41 -07:00
vdev_draid_rand.c Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
vdev_draid.c vdev_draid_lookup_map() should not iterate outside draid_maps 2022-09-12 12:51:17 -07:00
vdev_indirect_births.c Remove bcopy(), bzero(), bcmp() 2022-03-15 15:13:42 -07:00
vdev_indirect_mapping.c Remove bcopy(), bzero(), bcmp() 2022-03-15 15:13:42 -07:00
vdev_indirect.c Bump checksum error counter before reporting to ZED 2022-12-02 17:42:22 -08:00
vdev_initialize.c Cleanup: 64-bit kernel module parameters should use fixed width types 2022-10-13 10:03:29 -07:00
vdev_label.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_mirror.c Improve too large physical ashift handling 2022-09-08 10:30:53 -07:00
vdev_missing.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_queue.c Convert enum zio_flag to uint64_t 2022-10-27 09:54:54 -07:00
vdev_raidz_math_aarch64_neon_common.h Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_aarch64_neon.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_aarch64_neonx2.c Fix Clang 15 compilation errors 2022-11-30 13:46:26 -08:00
vdev_raidz_math_avx2.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_avx512bw.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_avx512f.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_impl.h Cleanup Raid-Z Typo fixes 2022-09-06 09:43:21 -07:00
vdev_raidz_math_powerpc_altivec_common.h Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_powerpc_altivec.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_scalar.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_sse2.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math_ssse3.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_raidz_math.c Convert some sprintf() calls to kmem_scnprintf() 2022-11-28 13:49:58 -08:00
vdev_raidz.c Bump checksum error counter before reporting to ZED 2022-12-02 17:42:22 -08:00
vdev_rebuild.c Fix sequential resilver drive failure race condition 2022-10-19 15:48:13 -07:00
vdev_removal.c Bypass metaslab throttle for removal allocations 2022-12-09 10:48:33 -08:00
vdev_root.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
vdev_trim.c Propagate extent_bytes change to autotrim thread 2022-10-28 10:16:31 -07:00
vdev.c zed: unclean disk attachment faults the vdev 2022-11-29 09:24:10 -08:00
zap_leaf.c Optimize microzaps 2022-10-20 11:57:15 -07:00
zap_micro.c Optimize microzaps 2022-10-20 11:57:15 -07:00
zap.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zcp_get.c Cleanup: Address Clang's static analyzer's unused code complaints 2022-10-14 13:37:54 -07:00
zcp_global.c OpenZFS 8600 - ZFS channel programs - snapshot 2018-02-08 15:29:24 -08:00
zcp_iter.c module/*.ko: prune .data, global .rodata 2022-01-14 15:37:55 -08:00
zcp_set.c Support setting user properties in a channel program 2020-02-14 13:41:42 -08:00
zcp_synctask.c Add zfs.sync.snapshot_rename 2022-09-02 13:31:19 -07:00
zcp.c Fix too few arguments to formatting function 2022-10-29 13:04:52 -07:00
zfeature.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_byteswap.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_chksum.c Introduce kmem_scnprintf() 2022-10-29 13:05:11 -07:00
zfs_fm.c Remove an unused variable 2022-11-03 10:17:17 -07:00
zfs_fuid.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_ioctl.c Cleanup: 64-bit kernel module parameters should use fixed width types 2022-10-13 10:03:29 -07:00
zfs_log.c zfs_rename: support RENAME_* flags 2022-10-28 09:49:20 -07:00
zfs_onexit.c zfs_onexit_add_cb: make action_handle point to a uintptr_t 2022-11-03 09:52:12 -07:00
zfs_quota.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_ratelimit.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_replay.c Support idmapped mount in user namespace 2022-11-08 10:28:56 -08:00
zfs_rlock.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_sa.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_vnops.c Support idmapped mount in user namespace 2022-11-08 10:28:56 -08:00
zil.c Optionally skip zil_close during zvol_create_minor_impl 2022-11-08 12:38:08 -08:00
zio_checksum.c Fix double const qualifier declarations 2022-09-30 15:34:39 -07:00
zio_compress.c Fix declarations of non-global variables 2022-10-18 11:05:32 -07:00
zio_inject.c Cleanup: Switch to strlcpy from strncpy 2022-09-27 16:35:29 -07:00
zio.c zio can deadlock during device removal 2022-12-02 17:46:29 -08:00
zle.c Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zrlock.c Micro-optimize zrl_remove() 2022-11-29 09:26:03 -08:00
zthr.c Switch from _Noreturn to __attribute__((noreturn)) 2022-03-23 08:51:00 -07:00
zvol.c zfs_rename: support RENAME_* flags 2022-10-28 09:49:20 -07:00