mirror_zfs/module/zfs
Matthew Ahrens 3442c2a02d
Revise ARC shrinker algorithm
The ARC shrinker callback `arc_shrinker_count/_scan()` is invoked by the
kernel's shrinker mechanism when the system is running low on free
pages.  This happens via 2 code paths:

1. "direct reclaim": The system is attempting to allocate a page, but we
are low on memory.  The ARC shrinker callback is invoked from the
page-allocation code path.

2. "indirect reclaim": kswapd notices that there aren't many free pages,
so it invokes the ARC shrinker callback.

In both cases, the kernel's shrinker code requests that the ARC shrinker
callback release some of its cache, and then it measures how many pages
were released.  However, it's measurement of released pages does not
include pages that are freed via `__free_pages()`, which is how the ARC
releases memory (via `abd_free_chunks()`).  Rather, the kernel shrinker
code is looking for pages to be placed on the lists of reclaimable pages
(which is separate from actually-free pages).

Because the kernel shrinker code doesn't detect that the ARC has
released pages, it may call the ARC shrinker callback many times,
resulting in the ARC "collapsing" down to `arc_c_min`.  This has several
negative impacts:

1. ZFS doesn't use RAM to cache data effectively.

2. In the direct reclaim case, a single page allocation may wait a long
time (e.g. more than a minute) while we evict the entire ARC.

3. Even with the improvements made in 67c0f0dedc ("ARC shrinking blocks
reads/writes"), occasionally `arc_size` may stay above `arc_c` for the
entire time of the ARC collapse, thus blocking ZFS read/write operations
in `arc_get_data_impl()`.

To address these issues, this commit limits the ways that the ARC
shrinker callback can be used by the kernel shrinker code, and mitigates
the impact of arc_is_overflowing() on ZFS read/write operations.

With this commit:

1. We limit the amount of data that can be reclaimed from the ARC via
the "direct reclaim" shrinker.  This limits the amount of time it takes
to allocate a single page.

2. We do not allow the ARC to shrink via kswapd (indirect reclaim).
Instead we rely on `arc_evict_zthr` to monitor free memory and reduce
the ARC target size to keep sufficient free memory in the system.  Note
that we can't simply rely on limiting the amount that we reclaim at once
(as for the direct reclaim case), because kswapd's "boosted" logic can
invoke the callback an unlimited number of times (see
`balance_pgdat()`).

3. When `arc_is_overflowing()` and we want to allocate memory,
`arc_get_data_impl()` will wait only for a multiple of the requested
amount of data to be evicted, rather than waiting for the ARC to no
longer be overflowing.  This allows ZFS reads/writes to make progress
even while the ARC is overflowing, while also ensuring that the eviction
thread makes progress towards reducing the total amount of memory used
by the ARC.

4. The amount of memory that the ARC always tries to keep free for the
rest of the system, `arc_sys_free` is increased.

5. Now that the shrinker callback is able to provide feedback to the
kernel's shrinker code about our progress, we can safely enable
the kswapd hook. This will allow the arc to receive notifications
when memory pressure is first detected by the kernel. We also
re-enable the appropriate kstats to track these callbacks.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10600
2020-07-31 21:10:52 -07:00
..
abd.c Add gang ABD child to parent gang ABD 2020-07-24 21:09:20 -07:00
aggsum.c Reduce number of atomic_add() calls in aggsum 2020-02-06 13:21:06 -08:00
arc.c Revise ARC shrinker algorithm 2020-07-31 21:10:52 -07:00
blkptr.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
bplist.c Fast Clone Deletion 2019-07-26 10:54:14 -07:00
bpobj.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
bptree.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
bqueue.c Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
btree.c Fix typos 2020-06-09 21:24:09 -07:00
dataset_kstats.c Fix panic on DilOS with kstat per dataset statistics 2019-09-03 12:12:31 -07:00
dbuf_stats.c Mark functions as static 2020-06-18 12:20:38 -07:00
dbuf.c Make use of ZFS_DEBUG consistent within kmod sources 2020-07-25 20:07:44 -07:00
ddt_zap.c Refactor dnode dirty context from dbuf_dirty 2020-02-26 16:09:17 -08:00
ddt.c Remove dead code 2020-06-18 12:21:18 -07:00
dmu_diff.c Mark write_record static 2019-12-03 09:51:44 -08:00
dmu_object.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
dmu_objset.c Remove duplicate include of sys/zfeature.h in dmu_objset.c 2020-07-31 09:04:45 -07:00
dmu_recv.c filesystem_limit/snapshot_limit is incorrectly enforced against root 2020-07-11 17:18:02 -07:00
dmu_redact.c dmu_objset_from_ds must be called with dp_config_rwlock held 2020-03-12 10:55:02 -07:00
dmu_send.c Replace sprintf()->snprintf() and strcpy()->strlcpy() 2020-06-07 11:42:12 -07:00
dmu_traverse.c Make module tunables cross platform 2019-09-05 14:49:49 -07:00
dmu_tx.c Refactor dnode dirty context from dbuf_dirty 2020-02-26 16:09:17 -08:00
dmu_zfetch.c Make module tunables cross platform 2019-09-05 14:49:49 -07:00
dmu.c Mark functions as static 2020-06-18 12:20:38 -07:00
dnode_sync.c Make use of ZFS_DEBUG consistent within kmod sources 2020-07-25 20:07:44 -07:00
dnode.c Prevent race condition in dnode_dest (#10101) 2020-03-12 10:25:56 -07:00
dsl_bookmark.c Fix typos 2020-06-09 21:24:09 -07:00
dsl_crypt.c Mark functions as static 2020-06-18 12:20:38 -07:00
dsl_dataset.c zfs promote does not delete livelist of origin 2020-07-31 08:59:00 -07:00
dsl_deadlist.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
dsl_deleg.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
dsl_destroy.c Memory leak in dsl_destroy_snapshots_nvl error case 2020-05-26 16:13:41 -07:00
dsl_dir.c Make use of ZFS_DEBUG consistent within kmod sources 2020-07-25 20:07:44 -07:00
dsl_pool.c Mark functions as static 2020-06-18 12:20:38 -07:00
dsl_prop.c Replace sprintf()->snprintf() and strcpy()->strlcpy() 2020-06-07 11:42:12 -07:00
dsl_scan.c Add device rebuild feature 2020-07-03 11:05:50 -07:00
dsl_synctask.c Fix typos in module/zfs/ 2019-09-02 17:56:41 -07:00
dsl_userhold.c Replace sprintf()->snprintf() and strcpy()->strlcpy() 2020-06-07 11:42:12 -07:00
edonr_zfs.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
fm.c Enable zpool events tunables and tests on FreeBSD 2020-02-18 11:22:56 -08:00
gzip.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
hkdf.c Encryption patch follow-up 2017-10-11 16:54:48 -04:00
lz4.c Prefix zfs internal endian checks with _ZFS 2020-07-28 13:02:49 -07:00
lzjb.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
Makefile.in Add device rebuild feature 2020-07-03 11:05:50 -07:00
metaslab.c Make use of ZFS_DEBUG consistent within kmod sources 2020-07-25 20:07:44 -07:00
mmp.c Add zfs_multihost_interval tunable handler for FreeBSD 2020-06-23 13:32:42 -07:00
multilist.c Make use of ZFS_DEBUG consistent within kmod sources 2020-07-25 20:07:44 -07:00
objlist.c Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
pathname.c Disable unused pathname::pn_path* (unneeded in Linux) 2019-07-15 13:57:56 -07:00
range_tree.c Function name and comment updates 2019-10-11 10:13:21 -07:00
refcount.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
rrwlock.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
sa.c Mark functions as static 2020-06-18 12:20:38 -07:00
sha256.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
skein_zfs.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
spa_boot.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
spa_checkpoint.c Refactor dnode dirty context from dbuf_dirty 2020-02-26 16:09:17 -08:00
spa_config.c freebsd: changes necessary to coexist with dtrace in tree 2020-07-01 09:10:08 -07:00
spa_errlog.c Fix typos in module/zfs/ 2019-09-02 17:56:41 -07:00
spa_history.c Make spa_history_zone platform-dependent in kernel 2020-03-02 09:43:30 -08:00
spa_log_spacemap.c Make module tunables cross platform 2019-09-05 14:49:49 -07:00
spa_misc.c Add device rebuild feature 2020-07-03 11:05:50 -07:00
spa.c Introduce names for ZTHRs 2020-07-29 09:43:33 -07:00
space_map.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
space_reftree.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
THIRDPARTYLICENSE.cityhash OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
THIRDPARTYLICENSE.cityhash.descrip OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
txg.c Use boot_ncpus in place of max_ncpus in taskq_create 2020-05-20 10:07:21 -07:00
uberblock.c MMP interval and fail_intervals in uberblock 2019-03-21 12:47:57 -07:00
unique.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
vdev_cache.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
vdev_indirect_births.c Fixes: #8934 Large kmem_alloc 2019-07-10 15:54:49 -07:00
vdev_indirect_mapping.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
vdev_indirect.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
vdev_initialize.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
vdev_label.c Add device rebuild feature 2020-07-03 11:05:50 -07:00
vdev_mirror.c Add device rebuild feature 2020-07-03 11:05:50 -07:00
vdev_missing.c Update vdev_ops_t from illumos 2019-06-20 18:29:02 -07:00
vdev_queue.c Add device rebuild feature 2020-07-03 11:05:50 -07:00
vdev_raidz_math_aarch64_neon_common.h Minor performance fix for NEON RAID-Z 2019-12-17 19:34:52 -08:00
vdev_raidz_math_aarch64_neon.c Linux 5.0 compat: SIMD compatibility 2019-07-12 09:31:20 -07:00
vdev_raidz_math_aarch64_neonx2.c Linux 5.0 compat: SIMD compatibility 2019-07-12 09:31:20 -07:00
vdev_raidz_math_avx2.c OpenZFS restructuring - move platform specific headers 2019-09-05 09:34:54 -07:00
vdev_raidz_math_avx512bw.c Refactor ccompile.h to not include system headers 2020-07-25 20:09:50 -07:00
vdev_raidz_math_avx512f.c Make clang happy with vdev_raidz_ code 2019-10-10 09:45:37 -07:00
vdev_raidz_math_impl.h Fix const-correctness in raidz math 2020-02-03 10:52:41 -08:00
vdev_raidz_math_powerpc_altivec_common.h Add AltiVec RAID-Z 2020-01-23 11:01:24 -08:00
vdev_raidz_math_powerpc_altivec.c Prefix zfs internal endian checks with _ZFS 2020-07-28 13:02:49 -07:00
vdev_raidz_math_scalar.c Linux 5.3: Fix switch() fall though compiler errors 2019-08-21 09:29:23 -07:00
vdev_raidz_math_sse2.c Make clang happy with vdev_raidz_ code 2019-10-10 09:45:37 -07:00
vdev_raidz_math_ssse3.c Refactor ccompile.h to not include system headers 2020-07-25 20:09:50 -07:00
vdev_raidz_math.c FreeBSD: Fixes required to build ZFS on PowerPC 2020-07-25 11:00:23 -07:00
vdev_raidz.c Fix typos 2020-06-09 21:24:09 -07:00
vdev_rebuild.c Add device rebuild feature 2020-07-03 11:05:50 -07:00
vdev_removal.c Trim L2ARC 2020-06-09 10:15:08 -07:00
vdev_root.c Enable splitting mirrors with indirect vdevs 2020-05-06 10:32:28 -07:00
vdev_trim.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
vdev.c Fix error handling of vdev_top_zap 2020-07-29 17:04:34 -07:00
zap_leaf.c Refactor dnode dirty context from dbuf_dirty 2020-02-26 16:09:17 -08:00
zap_micro.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
zap.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
zcp_get.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
zcp_global.c OpenZFS 8600 - ZFS channel programs - snapshot 2018-02-08 15:29:24 -08:00
zcp_iter.c Fix typos in module/zfs/ 2019-09-02 17:56:41 -07:00
zcp_set.c Support setting user properties in a channel program 2020-02-14 13:41:42 -08:00
zcp_synctask.c filesystem_limit/snapshot_limit is incorrectly enforced against root 2020-07-11 17:18:02 -07:00
zcp.c filesystem_limit/snapshot_limit is incorrectly enforced against root 2020-07-11 17:18:02 -07:00
zfeature.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
zfs_byteswap.c Mark functions as static 2020-06-18 12:20:38 -07:00
zfs_fm.c Add zpool status -s (slow I/Os) and -p (parseable) 2018-11-08 16:47:24 -08:00
zfs_fuid.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
zfs_ioctl.c Make use of ZFS_DEBUG consistent within kmod sources 2020-07-25 20:07:44 -07:00
zfs_log.c Add prototypes 2020-06-18 12:21:32 -07:00
zfs_onexit.c Remove deduplicated send/receive code 2020-04-23 10:06:57 -07:00
zfs_quota.c File incorrectly zeroed when receiving incremental stream that toggles -L 2020-06-09 10:41:01 -07:00
zfs_ratelimit.c Change checksum & IO delay ratelimit values 2018-03-04 17:34:51 -08:00
zfs_replay.c Simplify FreeBSD's locking requirements in zfs_replay.c 2020-01-22 17:55:56 -08:00
zfs_rlock.c Add a "try" operation for range locks 2020-07-06 11:53:31 -07:00
zfs_sa.c Add convenience wrappers for common uio usage 2020-06-14 10:09:55 -07:00
zil.c Mark functions as static 2020-06-18 12:20:38 -07:00
zio_checksum.c Mark functions as static 2020-06-18 12:20:38 -07:00
zio_compress.c zio_decompress_data always ASSERTs successful decompression 2019-12-10 15:51:58 -08:00
zio_inject.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
zio.c Mark functions as static 2020-06-18 12:20:38 -07:00
zle.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
zrlock.c Remove dead code 2020-06-18 12:21:18 -07:00
zthr.c Introduce names for ZTHRs 2020-07-29 09:43:33 -07:00
zvol.c Fix typos 2020-06-09 21:24:09 -07:00