mirror_zfs/module/zfs
Matthew Ahrens df4474f92d Illumos #3805 arc shouldn't cache freed blocks
3805 arc shouldn't cache freed blocks
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Richard Elling <richard.elling@dey-sys.com>
Reviewed by: Will Andrews <will@firepipe.net>
Approved by: Dan McDonald <danmcd@nexenta.com>

References:
  illumos/illumos-gate@6e6d5868f5
  https://www.illumos.org/issues/3805

ZFS should proactively evict freed blocks from the cache.

On dcenter, we saw that we were caching ~256GB of metadata, while the
pool only had <4GB of metadata on disk.  We were wasting about half the
system's RAM (252GB) on blocks that have been freed.

Even though these freed blocks will never be used again, and thus will
eventually be evicted, this causes us to use memory inefficiently for 2
reasons:

1. A block that is freed has no chance of being accessed again, but will
be kept in memory preferentially to a block that was accessed before it
(and is thus older) but has not been freed and thus has at least some
chance of being accessed again.

2. We partition the ARC into several buckets:
user data that has been accessed only once (MRU)
metadata that has been accessed only once (MRU)
user data that has been accessed more than once (MFU)
metadata that has been accessed more than once (MFU)

The user data vs metadata split is somewhat arbitrary, and the primary
control on how much memory is used to cache data vs metadata is to
simply try to keep the proportion the same as it has been in the past
(each bucket "evicts against" itself).  The secondary control is to
evict data before evicting metadata.

Because of this bucketing, we may end up with one bucket mostly
containing freed blocks that are very old, while another bucket has more
recently accessed, still-allocated blocks.  Data in the useful bucket
(with still-allocated blocks) may be evicted in preference to data in
the useless bucket (with old, freed blocks).

On dcenter, we saw that the MFU metadata bucket was 230MB, while the MFU
data bucket was 27GB and the MRU metadata bucket was 256GB.  However,
the vast majority of data in the MRU metadata bucket (256GB) was freed
blocks, and thus useless.  Meanwhile, the MFU metadata bucket (230MB)
was constantly evicting useful blocks that will be soon needed.

The problem of cache segmentation is a larger problem that needs more
investigation.  However, if we stop caching freed blocks, it should
reduce the impact of this more fundamental issue.

Ported-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1503
2013-06-20 09:55:52 -07:00
..
arc.c Illumos #3805 arc shouldn't cache freed blocks 2013-06-20 09:55:52 -07:00
bplist.c Switch KM_SLEEP to KM_PUSHPAGE 2012-08-27 12:01:37 -07:00
bpobj.c Illumos #3006 2013-06-19 15:14:10 -07:00
bptree.c Illumos #3006 2013-06-19 15:14:10 -07:00
dbuf.c Illumos #3006 2013-06-19 15:14:10 -07:00
ddt_zap.c Add ddt_object_count() error handling 2012-10-29 08:57:45 -07:00
ddt.c Fix incorrect assertions in ddt_phys_decref and ddt_sync_entry 2013-05-06 14:10:55 -07:00
dmu_diff.c Update to onnv_147 2010-08-26 14:24:34 -07:00
dmu_object.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
dmu_objset.c Switch KM_SLEEP to KM_PUSHPAGE 2013-02-06 11:19:58 -08:00
dmu_send.c Illumos #3006 2013-06-19 15:14:10 -07:00
dmu_traverse.c Illumos #3006 2013-06-19 15:14:10 -07:00
dmu_tx.c Illumos #3006 2013-06-19 15:14:10 -07:00
dmu_zfetch.c Switch KM_SLEEP to KM_PUSHPAGE 2012-08-27 12:01:37 -07:00
dmu.c Illumos #3086: unnecessarily setting DS_FLAG_INCONSISTENT on async 2013-01-08 10:35:43 -08:00
dnode_sync.c Illumos #3006 2013-06-19 15:14:10 -07:00
dnode.c Illumos #3006 2013-06-19 15:14:10 -07:00
dsl_dataset.c Illumos #3006 2013-06-19 15:14:10 -07:00
dsl_deadlist.c Illumos #3104: eliminate empty bpobjs 2013-01-08 10:35:43 -08:00
dsl_deleg.c Illumos #2619 and #2747 2013-01-08 10:35:35 -08:00
dsl_dir.c Illumos #3006 2013-06-19 15:14:10 -07:00
dsl_pool.c Illumos #3006 2013-06-19 15:14:10 -07:00
dsl_prop.c Switch KM_SLEEP to KM_PUSHPAGE 2012-09-17 11:22:23 -07:00
dsl_scan.c Illumos #2619 and #2747 2013-01-08 10:35:35 -08:00
dsl_synctask.c Illumos #3006 2013-06-19 15:14:10 -07:00
fm.c Condition variable usage, zevent_cv 2012-10-15 16:01:54 -07:00
gzip.c Fix zmod.h usage in userspace 2010-08-31 08:38:46 -07:00
lz4.c Linux 3.9 compat: Undefine GCC_VERSION 2013-03-06 15:48:48 -08:00
lzjb.c Switch KM_SLEEP to KM_PUSHPAGE 2012-08-27 12:01:37 -07:00
Makefile.in Illumos #3035 LZ4 compression support in ZFS and GRUB 2013-01-29 09:28:20 -08:00
metaslab.c Illumos #3552, #3564 2013-06-19 16:22:39 -07:00
refcount.c Switch KM_SLEEP to KM_PUSHPAGE 2012-08-27 12:01:37 -07:00
rrwlock.c Enable rrwlock.c compilation 2010-12-07 16:05:25 -08:00
sa.c Constify structures containing function pointers 2013-03-04 08:49:32 -08:00
sha256.c Add linux sha2 support 2010-08-31 13:41:59 -07:00
spa_boot.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
spa_config.c Illumos #2619 and #2747 2013-01-08 10:35:35 -08:00
spa_errlog.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
spa_history.c Switch KM_SLEEP to KM_PUSHPAGE 2012-08-27 12:01:37 -07:00
spa_misc.c Illumos #3329, #3330, #3331, #3335 2013-05-06 12:39:34 -07:00
spa.c Illumos #3006 2013-06-19 15:14:10 -07:00
space_map.c Illumos #3552, #3564 2013-06-19 16:22:39 -07:00
txg.c Fix txg_quiesce thread deadlock 2013-04-26 14:42:36 -07:00
uberblock.c Update core ZFS code from build 121 to build 141. 2010-05-28 13:45:14 -07:00
unique.c Switch KM_SLEEP to KM_PUSHPAGE 2012-08-27 12:01:37 -07:00
vdev_cache.c Switch KM_SLEEP to KM_PUSHPAGE 2012-08-27 12:01:37 -07:00
vdev_disk.c 3246 ZFS I/O deadman thread 2013-05-01 17:05:52 -07:00
vdev_file.c Illumos #3581 spa_zio_taskq[ZIO_TYPE_FREE][ZIO_TASKQ_ISSUE]->tq_lock contention 2013-05-06 14:05:37 -07:00
vdev_label.c Illumos #3090 and #3102 2013-01-08 10:35:42 -08:00
vdev_mirror.c Illumos #1948: zpool list should show more detailed pool info 2012-09-19 13:39:05 -07:00
vdev_missing.c Illumos #1948: zpool list should show more detailed pool info 2012-09-19 13:39:05 -07:00
vdev_queue.c 3246 ZFS I/O deadman thread 2013-05-01 17:05:52 -07:00
vdev_raidz.c Illumos #3006 2013-06-19 15:14:10 -07:00
vdev_root.c Illumos #1948: zpool list should show more detailed pool info 2012-09-19 13:39:05 -07:00
vdev.c Illumos #3552, #3564 2013-06-19 16:22:39 -07:00
zap_leaf.c Switch KM_SLEEP to KM_PUSHPAGE 2012-09-05 08:44:58 -07:00
zap_micro.c Illumos #3006 2013-06-19 15:14:10 -07:00
zap.c Illumos #3006 2013-06-19 15:14:10 -07:00
zfeature_common.c Illumos #3035 LZ4 compression support in ZFS and GRUB 2013-01-29 09:28:20 -08:00
zfeature.c Illumos #3104: eliminate empty bpobjs 2013-01-08 10:35:43 -08:00
zfs_acl.c Avoid gcc -Werror=maybe-uninitialized warnings 2013-01-28 09:10:29 -08:00
zfs_byteswap.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
zfs_ctldir.c Use dsl_dataset_snap_lookup() 2013-01-25 15:07:40 -08:00
zfs_debug.c Illumos #3006 2013-06-19 15:14:10 -07:00
zfs_dir.c Trivial spelling fix 2013-04-19 15:43:16 -07:00
zfs_fm.c 3246 ZFS I/O deadman thread 2013-05-01 17:05:52 -07:00
zfs_fuid.c Drop HAVE_XVATTR macros 2011-03-02 11:44:34 -08:00
zfs_ioctl.c Illumos #3006 2013-06-19 15:14:10 -07:00
zfs_log.c Revert "Remove TSD zfs_fsyncer_key" 2012-12-20 09:56:28 -08:00
zfs_onexit.c Add linux kernel device support 2010-08-31 13:41:50 -07:00
zfs_replay.c Constify structures containing function pointers 2013-03-04 08:49:32 -08:00
zfs_rlock.c Illumos #3006 2013-06-19 15:14:10 -07:00
zfs_sa.c Revert "Use SA_HDL_PRIVATE for SA xattrs" 2012-08-25 09:25:56 -07:00
zfs_vfsops.c Illumos #3006 2013-06-19 15:14:10 -07:00
zfs_vnops.c Illumos #3006 2013-06-19 15:14:10 -07:00
zfs_znode.c Illumos #3006 2013-06-19 15:14:10 -07:00
zil.c Illumos #3006 2013-06-19 15:14:10 -07:00
zio_checksum.c Update core ZFS code from build 121 to build 141. 2010-05-28 13:45:14 -07:00
zio_compress.c Illumos #3035 LZ4 compression support in ZFS and GRUB 2013-01-29 09:28:20 -08:00
zio_inject.c 3246 ZFS I/O deadman thread 2013-05-01 17:05:52 -07:00
zio.c Illumos #3805 arc shouldn't cache freed blocks 2013-06-20 09:55:52 -07:00
zle.c Update core ZFS code from build 121 to build 141. 2010-05-28 13:45:14 -07:00
zpl_ctldir.c Add d_clear_d_op() compatibility 2013-01-23 16:33:29 -08:00
zpl_export.c Implement .commit_metadata hook for NFS export 2012-10-03 10:49:45 -07:00
zpl_file.c Remove .readdir from zpl_file_operations table 2013-04-19 15:36:47 -07:00
zpl_inode.c Add explicit MAXNAMELEN check 2013-02-12 10:27:39 -08:00
zpl_super.c Update SAs when an inode is dirtied 2012-12-14 12:18:54 -08:00
zpl_xattr.c Only check directory xattr on ENOENT 2013-05-10 12:24:56 -07:00
zrlock.c Export ZFS symbols needed by Lustre. 2010-09-17 16:24:15 -07:00
zvol.c Add snapdev=[hidden|visible] dataset property 2013-03-05 12:37:54 -08:00