mirror_zfs/include/sys
Matthew Ahrens 3442c2a02d
Revise ARC shrinker algorithm
The ARC shrinker callback `arc_shrinker_count/_scan()` is invoked by the
kernel's shrinker mechanism when the system is running low on free
pages.  This happens via 2 code paths:

1. "direct reclaim": The system is attempting to allocate a page, but we
are low on memory.  The ARC shrinker callback is invoked from the
page-allocation code path.

2. "indirect reclaim": kswapd notices that there aren't many free pages,
so it invokes the ARC shrinker callback.

In both cases, the kernel's shrinker code requests that the ARC shrinker
callback release some of its cache, and then it measures how many pages
were released.  However, it's measurement of released pages does not
include pages that are freed via `__free_pages()`, which is how the ARC
releases memory (via `abd_free_chunks()`).  Rather, the kernel shrinker
code is looking for pages to be placed on the lists of reclaimable pages
(which is separate from actually-free pages).

Because the kernel shrinker code doesn't detect that the ARC has
released pages, it may call the ARC shrinker callback many times,
resulting in the ARC "collapsing" down to `arc_c_min`.  This has several
negative impacts:

1. ZFS doesn't use RAM to cache data effectively.

2. In the direct reclaim case, a single page allocation may wait a long
time (e.g. more than a minute) while we evict the entire ARC.

3. Even with the improvements made in 67c0f0dedc ("ARC shrinking blocks
reads/writes"), occasionally `arc_size` may stay above `arc_c` for the
entire time of the ARC collapse, thus blocking ZFS read/write operations
in `arc_get_data_impl()`.

To address these issues, this commit limits the ways that the ARC
shrinker callback can be used by the kernel shrinker code, and mitigates
the impact of arc_is_overflowing() on ZFS read/write operations.

With this commit:

1. We limit the amount of data that can be reclaimed from the ARC via
the "direct reclaim" shrinker.  This limits the amount of time it takes
to allocate a single page.

2. We do not allow the ARC to shrink via kswapd (indirect reclaim).
Instead we rely on `arc_evict_zthr` to monitor free memory and reduce
the ARC target size to keep sufficient free memory in the system.  Note
that we can't simply rely on limiting the amount that we reclaim at once
(as for the direct reclaim case), because kswapd's "boosted" logic can
invoke the callback an unlimited number of times (see
`balance_pgdat()`).

3. When `arc_is_overflowing()` and we want to allocate memory,
`arc_get_data_impl()` will wait only for a multiple of the requested
amount of data to be evicted, rather than waiting for the ARC to no
longer be overflowing.  This allows ZFS reads/writes to make progress
even while the ARC is overflowing, while also ensuring that the eviction
thread makes progress towards reducing the total amount of memory used
by the ARC.

4. The amount of memory that the ARC always tries to keep free for the
rest of the system, `arc_sys_free` is increased.

5. Now that the shrinker callback is able to provide feedback to the
kernel's shrinker code about our progress, we can safely enable
the kswapd hook. This will allow the arc to receive notifications
when memory pressure is first detected by the kernel. We also
re-enable the appropriate kstats to track these callbacks.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10600
2020-07-31 21:10:52 -07:00
..
crypto Avoid installing kernel headers on FreeBSD 2020-06-27 17:40:14 -07:00
fm Avoid installing kernel headers on FreeBSD 2020-06-27 17:40:14 -07:00
fs Remove dependency on sharetab file and refactor sharing logic 2020-07-13 09:19:18 -07:00
lua Avoid installing kernel headers on FreeBSD 2020-06-27 17:40:14 -07:00
sysevent Avoid installing kernel headers on FreeBSD 2020-06-27 17:40:14 -07:00
abd_impl.h Removing ZERO_PAGE abd_alloc_zero_scatter 2020-06-10 17:54:11 -07:00
abd.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
aggsum.h Reduce number of atomic_add() calls in aggsum 2020-02-06 13:21:06 -08:00
arc_impl.h Revise ARC shrinker algorithm 2020-07-31 21:10:52 -07:00
arc.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
avl_impl.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
avl.h Restore avl_update() calls and related functions 2020-06-03 09:49:32 -07:00
bitops.h Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
blkptr.h OpenZFS 8067 - zdb should be able to dump literal embedded block pointer 2017-07-07 11:28:01 -07:00
bplist.h Fast Clone Deletion 2019-07-26 10:54:14 -07:00
bpobj.h Fast Clone Deletion 2019-07-26 10:54:14 -07:00
bptree.h Illumos 4914 - zfs on-disk bookmark structure should be named *_phys_t 2014-08-06 14:48:41 -07:00
bqueue.h Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
btree.h Fix typos 2020-06-09 21:24:09 -07:00
dataset_kstats.h port async unlinked drain from illumos-nexenta 2019-02-12 10:41:15 -08:00
dbuf.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
ddt.h Fix gcc10.1 truncation error 2020-06-13 11:02:00 -07:00
dmu_impl.h Prevent race condition in dnode_dest (#10101) 2020-03-12 10:25:56 -07:00
dmu_objset.h File incorrectly zeroed when receiving incremental stream that toggles -L 2020-06-09 10:41:01 -07:00
dmu_recv.h filesystem_limit/snapshot_limit is incorrectly enforced against root 2020-07-11 17:18:02 -07:00
dmu_redact.h Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
dmu_send.h Add 'zfs send --saved' flag 2020-01-10 10:16:58 -08:00
dmu_traverse.h Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
dmu_tx.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
dmu_zfetch.h Replace zf_rwlock with a mutex 2019-07-25 11:57:58 -07:00
dmu.h Annotate unused parameters on inline definitions as such 2020-07-23 17:41:48 -07:00
dnode.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
dsl_bookmark.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
dsl_crypt.h dmu_objset_from_ds must be called with dp_config_rwlock held 2020-03-12 10:55:02 -07:00
dsl_dataset.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
dsl_deadlist.h Add fast path for zfs_ioc_space_snaps() handling of empty_bpobj 2019-08-20 11:34:52 -07:00
dsl_deleg.h Remove code for zfs remap 2019-06-24 16:44:01 -07:00
dsl_destroy.h Fast Clone Deletion 2019-07-26 10:54:14 -07:00
dsl_dir.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
dsl_pool.h Eliminate Linux specific inode usage from common code 2019-12-11 11:53:57 -08:00
dsl_prop.h Support inheriting properties in channel programs 2020-01-22 17:03:17 -08:00
dsl_scan.h Add device rebuild feature 2020-07-03 11:05:50 -07:00
dsl_synctask.h OpenZFS 9425 - channel programs can be interrupted 2019-06-22 16:51:46 -07:00
dsl_userhold.h Illumos #3740 2013-11-04 11:17:48 -08:00
edonr.h OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
efi_partition.h Fix typos in include/ 2019-08-30 09:53:15 -07:00
frame.h Suppress incorrect objtool warnings 2017-12-07 10:28:50 -08:00
hkdf.h Encryption patch follow-up 2017-10-11 16:54:48 -04:00
Makefile.am Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
metaslab_impl.h Use a struct to organize metaslab-group-allocator fields 2020-04-22 10:26:56 -07:00
metaslab.h Extend zdb to print inconsistencies in livelists and metaslabs 2020-07-14 17:51:05 -07:00
mmp.h Add zfs_multihost_interval tunable handler for FreeBSD 2020-06-23 13:32:42 -07:00
mntent.h Add FreeBSD required defines to mntent.h 2019-11-30 15:49:09 -08:00
mod.h Wrap Linux module macros 2019-11-01 10:41:03 -07:00
multilist.h Avoid extra taskq_dispatch() calls by DMU 2019-06-25 12:03:38 -07:00
note.h Update build system and packaging 2018-05-29 16:00:33 -07:00
nvpair_impl.h OpenZFS 9580 - Add a hash-table on top of nvlist to speed-up operations 2018-07-30 11:30:03 -07:00
nvpair.h Add new fnvlist_lookup_* functions 2018-10-03 15:30:55 -07:00
objlist.h Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
pathname.h Disable unused pathname::pn_path* (unneeded in Linux) 2019-07-15 13:57:56 -07:00
qat.h QAT related bug fixes 2019-09-12 13:33:44 -07:00
range_tree.h Improve compatibility with C++ consumers 2020-06-06 12:54:04 -07:00
rrwlock.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
sa_impl.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
sa.h Fix typos in include/ 2019-08-30 09:53:15 -07:00
skein.h OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
spa_boot.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
spa_checkpoint.h Serialize ZTHR operations to eliminate races 2019-01-13 10:09:46 -08:00
spa_checksum.h Implementation of AVX2 optimized Fletcher-4 2016-06-02 14:30:51 -07:00
spa_impl.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
spa_log_spacemap.h Log Spacemap Project 2019-07-16 10:11:49 -07:00
spa.h Prefix zfs internal endian checks with _ZFS 2020-07-28 13:02:49 -07:00
space_map.h Extend zdb to print inconsistencies in livelists and metaslabs 2020-07-14 17:51:05 -07:00
space_reftree.h Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
sysevent.h OpenZFS 6939 - add sysevents to zfs core for commands 2017-07-12 21:28:13 -07:00
txg_impl.h Fix typos in include/ 2019-08-30 09:53:15 -07:00
txg.h OpenZFS 9425 - channel programs can be interrupted 2019-06-22 16:51:46 -07:00
u8_textprep_data.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
u8_textprep.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
uberblock_impl.h MMP interval and fail_intervals in uberblock 2019-03-21 12:47:57 -07:00
uberblock.h Multi-modifier protection (MMP) 2017-07-13 13:54:00 -04:00
uio_impl.h deadlock between mm_sem and tx assign in zfs_write() and page fault 2018-10-16 11:11:24 -07:00
unique.h Illumos #3742 2013-11-04 10:55:25 -08:00
uuid.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
vdev_disk.h Make struct vdev_disk_t be platform private 2020-06-16 11:43:33 -07:00
vdev_file.h Add zfs_file_* interface, remove vnodes 2019-11-21 09:32:57 -08:00
vdev_impl.h Add device rebuild feature 2020-07-03 11:05:50 -07:00
vdev_indirect_births.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_indirect_mapping.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_initialize.h Add TRIM support 2019-03-29 09:13:20 -07:00
vdev_raidz_impl.h Add prototypes 2020-06-18 12:21:32 -07:00
vdev_raidz.h Linux 5.0 compat: SIMD compatibility 2019-07-12 09:31:20 -07:00
vdev_rebuild.h Add device rebuild feature 2020-07-03 11:05:50 -07:00
vdev_removal.h panic in removal_remap test on 4K devices 2019-06-13 13:12:39 -07:00
vdev_trim.h Trim L2ARC 2020-06-09 10:15:08 -07:00
vdev.h Add device rebuild feature 2020-07-03 11:05:50 -07:00
xvattr.h Linux 4.18 compat: inode timespec -> timespec64 2018-06-19 21:51:18 -07:00
zap_impl.h OpenZFS 7793 - ztest fails assertion in dmu_tx_willuse_space 2017-03-07 09:51:59 -08:00
zap_leaf.h Fix ENOSPC in "Handle zap_add() failures in ..." 2018-04-18 14:19:50 -07:00
zap.h fat zap should prefetch when iterating 2019-06-12 13:13:09 -07:00
zcp_global.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp_iter.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp_prop.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp_set.h Support setting user properties in a channel program 2020-02-14 13:41:42 -08:00
zcp.h filesystem_limit/snapshot_limit is incorrectly enforced against root 2020-07-11 17:18:02 -07:00
zfeature.h Revert "zhack: Add 'feature disable' command" 2016-05-17 11:52:07 -07:00
zfs_acl.h Return an error code from zfs_acl_chmod_setattr 2019-11-01 10:19:11 -07:00
zfs_context.h Introduce names for ZTHRs 2020-07-29 09:43:33 -07:00
zfs_debug.h Remove sdt.h 2019-10-25 13:38:37 -07:00
zfs_delay.h Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_file.h Re-share zfsdev_getminor and zfs_onexit_fd_hold 2020-02-28 14:50:32 -08:00
zfs_fuid.h Replace sprintf()->snprintf() and strcpy()->strlcpy() 2020-06-07 11:42:12 -07:00
zfs_ioctl_impl.h Restore support for in-kernel ZFS ioctls 2020-06-08 13:57:22 -07:00
zfs_ioctl.h drr_begin: can't forward declare untagged struct 2020-06-16 11:57:04 -07:00
zfs_onexit.h Remove deduplicated send/receive code 2020-04-23 10:06:57 -07:00
zfs_project.h Minor diff reduction with ZoF in include/sys 2019-11-27 11:11:03 -08:00
zfs_quota.h File incorrectly zeroed when receiving incremental stream that toggles -L 2020-06-09 10:41:01 -07:00
zfs_ratelimit.h Change checksum & IO delay ratelimit values 2018-03-04 17:34:51 -08:00
zfs_refcount.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
zfs_rlock.h Add a "try" operation for range locks 2020-07-06 11:53:31 -07:00
zfs_sa.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_stat.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
zfs_sysfs.h Fix in-kernel sysfs entries 2018-09-06 21:44:52 -07:00
zfs_znode.h Fix typos 2020-06-09 21:24:09 -07:00
zil_impl.h make zil max block size tunable 2019-06-10 11:48:42 -07:00
zil.h Add prototypes 2020-06-18 12:21:32 -07:00
zio_checksum.h Remove dependency on linear ABD 2017-03-29 12:24:51 -07:00
zio_compress.h lz4_decompress_abd declared but not defined 2019-06-13 13:14:34 -07:00
zio_crypt.h Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
zio_impl.h Fix typos in include/ 2019-08-30 09:53:15 -07:00
zio_priority.h Add device rebuild feature 2020-07-03 11:05:50 -07:00
zio.h Improve compatibility with C++ consumers 2020-06-06 12:54:04 -07:00
zrlock.h OpenZFS 6328 - Fix cstyle errors in zfs codebase 2017-01-12 09:42:11 -08:00
zthr.h Introduce names for ZTHRs 2020-07-29 09:43:33 -07:00
zvol_impl.h Connect dataset_kstats for FreeBSD 2020-06-05 17:17:02 -07:00
zvol.h async zvol minor node creation interferes with receive 2020-02-03 09:33:14 -08:00