mirror_zfs/module/zfs
Alexander Motin 891568c990
Split dmu_zfetch() speculation and execution parts
To make better predictions on parallel workloads dmu_zfetch() should
be called as early as possible to reduce possible request reordering.
In particular, it should be called before dmu_buf_hold_array_by_dnode()
calls dbuf_hold(), which may sleep waiting for indirect blocks, waking
up multiple threads same time on completion, that can significantly
reorder the requests, making the stream look like random.  But we
should not issue prefetch requests before the on-demand ones, since
they may get to the disks first despite the I/O scheduler, increasing
on-demand request latency.

This patch splits dmu_zfetch() into two functions: dmu_zfetch_prepare()
and dmu_zfetch_run().  The first can be executed as early as needed.
It only updates statistics and makes predictions without issuing any
I/Os.  The I/O issuance is handled by dmu_zfetch_run(), which can be
called later when all on-demand I/Os are already issued.  It even
tracks the activity of other concurrent threads, issuing the prefetch
only when _all_ on-demand requests are issued.

For many years it was a big problem for storage servers, handling
deeper request queues from their clients, having to either serialize
consequential reads to make ZFS prefetcher usable, or execute the
incoming requests as-is and get almost no prefetch from ZFS, relying
only on deep enough prefetch by the clients.  Benefits of those ways
varied, but neither was perfect.  With this patch deeper queue
sequential read benchmarks with CrystalDiskMark from Windows via
iSCSI to FreeBSD target show me much better throughput with almost
100% prefetcher hit rate, comparing to almost zero before.

While there, I also removed per-stream zs_lock as useless, completely
covered by parent zf_lock.  Also I reused zs_blocks refcount to track
zf_stream linkage of the stream, since I believe previous zs_fetch ==
NULL check in dmu_zfetch_stream_done() was racy.

Delete prefetch streams when they reach ends of files.  It saves up
to 1KB of RAM per file, plus reduces searches through the stream list.

Block data prefetch (speculation and indirect block prefetch is still
done since they are cheaper) if all dbufs of the stream are already
in DMU cache.  First cache miss immediately fires all the prefetch
that would be done for the stream by that time.  It saves some CPU
time if same files within DMU cache capacity are read over and over.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11652
2021-03-19 22:56:11 -07:00
..
abd.c Fix abd_get_offset_struct() may allocate new abd 2021-03-05 12:22:57 -08:00
aggsum.c Implement memory and CPU hotplug 2020-12-10 14:09:23 -08:00
arc.c Restore FreeBSD resource usage accounting 2021-02-19 22:34:33 -08:00
blkptr.c Add zstd support to zfs 2020-08-20 10:30:06 -07:00
bplist.c Fast Clone Deletion 2019-07-26 10:54:14 -07:00
bpobj.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
bptree.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
bqueue.c Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
btree.c Fix typo in btree.c 2020-08-17 15:25:37 -07:00
dataset_kstats.c Fix panic on DilOS with kstat per dataset statistics 2019-09-03 12:12:31 -07:00
dbuf_stats.c Eliminate gratuitous bzeroing in dbuf_stats_hash_table_data 2020-09-30 13:24:38 -07:00
dbuf.c Split dmu_zfetch() speculation and execution parts 2021-03-19 22:56:11 -07:00
ddt_zap.c Refactor dnode dirty context from dbuf_dirty 2020-02-26 16:09:17 -08:00
ddt.c Remove dead code 2020-06-18 12:21:18 -07:00
dmu_diff.c Mark write_record static 2019-12-03 09:51:44 -08:00
dmu_object.c Introduce CPU_SEQID_UNSTABLE 2020-11-02 11:51:12 -08:00
dmu_objset.c Relax special_small_blocks assertion. 2021-01-23 15:45:27 -08:00
dmu_recv.c implicit conversion from 'boolean_t' to 'ds_hold_flags_t' 2020-12-27 16:31:02 -08:00
dmu_redact.c Fix dnode refcount tracking 2020-11-10 10:37:10 -08:00
dmu_send.c implicit conversion from 'boolean_t' to 'ds_hold_flags_t' 2020-12-27 16:31:02 -08:00
dmu_traverse.c zil_parse: make callback parameters const 2020-10-09 09:34:54 -07:00
dmu_tx.c Document monotonicity of dmu_tx_assign() and txg_hold_open() 2021-02-02 10:11:37 -08:00
dmu_zfetch.c Split dmu_zfetch() speculation and execution parts 2021-03-19 22:56:11 -07:00
dmu.c Split dmu_zfetch() speculation and execution parts 2021-03-19 22:56:11 -07:00
dnode_sync.c Improve zfs receive performance with lightweight write 2020-12-11 10:26:02 -08:00
dnode.c Improve zfs receive performance with lightweight write 2020-12-11 10:26:02 -08:00
dsl_bookmark.c Fix kernel panic induced by redacted send 2020-12-11 10:22:29 -08:00
dsl_crypt.c Fix raw sends on encrypted datasets when copying back snapshots 2020-12-04 14:34:29 -08:00
dsl_dataset.c Revert special case code from pre-hashtable nvlist era 2021-01-27 21:31:51 -08:00
dsl_deadlist.c Fix i/o error handling of livelists and zap iteration 2020-08-05 10:22:09 -07:00
dsl_deleg.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
dsl_destroy.c Revert special case code from pre-hashtable nvlist era 2021-01-27 21:31:51 -08:00
dsl_dir.c Add 'zfs rename -u' to rename without remounting 2020-09-01 16:14:16 -07:00
dsl_pool.c dsl_pool: extend comment on DSL Pool Configuration Lock 2020-12-19 18:04:05 -08:00
dsl_prop.c Replace sprintf()->snprintf() and strcpy()->strlcpy() 2020-06-07 11:42:12 -07:00
dsl_scan.c Checksum errors may not be counted 2021-02-19 22:33:15 -08:00
dsl_synctask.c nowait synctask must succeed 2020-09-04 10:29:39 -07:00
dsl_userhold.c Replace sprintf()->snprintf() and strcpy()->strlcpy() 2020-06-07 11:42:12 -07:00
edonr_zfs.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
fm.c Avoid posting duplicate zpool events 2020-09-04 10:34:28 -07:00
gzip.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
hkdf.c Encryption patch follow-up 2017-10-11 16:54:48 -04:00
lz4.c Prefix zfs internal endian checks with _ZFS 2020-07-28 13:02:49 -07:00
lzjb.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
Makefile.in Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
metaslab.c Initialize metaslab range trees in metaslab_init 2021-03-19 22:36:02 -07:00
mmp.c Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
multilist.c Implement memory and CPU hotplug 2020-12-10 14:09:23 -08:00
objlist.c Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
pathname.c Replace ZFS on Linux references with OpenZFS 2020-10-08 20:10:13 -07:00
range_tree.c Fix incorrect deletion order in range_tree_add_impl gap case 2020-10-14 08:59:54 -07:00
refcount.c Reference_tracking_enable should be a module param 2021-03-16 14:56:17 -07:00
rrwlock.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
sa.c Extending FreeBSD UIO Struct 2021-01-20 21:27:30 -08:00
sha256.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
skein_zfs.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
spa_boot.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
spa_checkpoint.c Refactor dnode dirty context from dbuf_dirty 2020-02-26 16:09:17 -08:00
spa_config.c Cleaning up uio headers 2021-02-20 20:16:50 -08:00
spa_errlog.c Fix typos in module/zfs/ 2019-09-02 17:56:41 -07:00
spa_history.c record ioctl elapsed time in zpool history 2021-01-11 09:29:25 -08:00
spa_log_spacemap.c Make module tunables cross platform 2019-09-05 14:49:49 -07:00
spa_misc.c FreeBSD: Fix scope of deadman tunables 2021-03-11 19:23:24 -08:00
spa_stats.c FreeBSD: Add support for procfs_list 2020-09-23 16:43:51 -07:00
spa.c Add "compatibility" property for zpool feature sets 2021-02-17 21:30:45 -08:00
space_map.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
space_reftree.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
THIRDPARTYLICENSE.cityhash OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
THIRDPARTYLICENSE.cityhash.descrip OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
txg.c Document monotonicity of dmu_tx_assign() and txg_hold_open() 2021-02-02 10:11:37 -08:00
uberblock.c MMP interval and fail_intervals in uberblock 2019-03-21 12:47:57 -07:00
unique.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
vdev_cache.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
vdev_draid_rand.c Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
vdev_draid.c Clean up RAIDZ/DRAID ereport code 2021-03-19 16:22:10 -07:00
vdev_indirect_births.c Fixes: #8934 Large kmem_alloc 2019-07-10 15:54:49 -07:00
vdev_indirect_mapping.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
vdev_indirect.c Clean up RAIDZ/DRAID ereport code 2021-03-19 16:22:10 -07:00
vdev_initialize.c Cancel TRIM / initialize on FAULTED non-writeable vdevs 2021-03-02 10:27:27 -08:00
vdev_label.c Parallelize vdev_validate 2021-01-26 19:36:51 -08:00
vdev_mirror.c Clean up RAIDZ/DRAID ereport code 2021-03-19 16:22:10 -07:00
vdev_missing.c Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
vdev_queue.c allow callers to allocate and provide the abd_t struct 2021-01-20 11:24:37 -08:00
vdev_raidz_math_aarch64_neon_common.h FreeBSD: fix the build with Clang 11 2020-08-17 15:40:17 -07:00
vdev_raidz_math_aarch64_neon.c Linux 5.0 compat: SIMD compatibility 2019-07-12 09:31:20 -07:00
vdev_raidz_math_aarch64_neonx2.c Linux 5.0 compat: SIMD compatibility 2019-07-12 09:31:20 -07:00
vdev_raidz_math_avx2.c FreeBSD: fix the build with Clang 11 2020-08-17 15:40:17 -07:00
vdev_raidz_math_avx512bw.c Refactor ccompile.h to not include system headers 2020-07-25 20:09:50 -07:00
vdev_raidz_math_avx512f.c FreeBSD: fix the build with Clang 11 2020-08-17 15:40:17 -07:00
vdev_raidz_math_impl.h Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
vdev_raidz_math_powerpc_altivec_common.h FreeBSD: fix the build with Clang 11 2020-08-17 15:40:17 -07:00
vdev_raidz_math_powerpc_altivec.c Prefix zfs internal endian checks with _ZFS 2020-07-28 13:02:49 -07:00
vdev_raidz_math_scalar.c Linux 5.3: Fix switch() fall though compiler errors 2019-08-21 09:29:23 -07:00
vdev_raidz_math_sse2.c FreeBSD: fix the build with Clang 11 2020-08-17 15:40:17 -07:00
vdev_raidz_math_ssse3.c Refactor ccompile.h to not include system headers 2020-07-25 20:09:50 -07:00
vdev_raidz_math.c Reduce fletcher4 and raidz benchmark times 2020-12-06 09:57:20 -08:00
vdev_raidz.c Clean up RAIDZ/DRAID ereport code 2021-03-19 16:22:10 -07:00
vdev_rebuild.c Fix vdev_rebuild_thread deadlock 2021-02-24 10:01:00 -08:00
vdev_removal.c Set aside a metaslab for ZIL blocks 2021-01-21 15:12:54 -08:00
vdev_root.c Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
vdev_trim.c Cancel TRIM / initialize on FAULTED non-writeable vdevs 2021-03-02 10:27:27 -08:00
vdev.c Allow setting bootfs property on pools with indirect vdevs 2021-03-19 22:46:43 -07:00
zap_leaf.c Refactor dnode dirty context from dbuf_dirty 2020-02-26 16:09:17 -08:00
zap_micro.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
zap.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
zcp_get.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
zcp_global.c OpenZFS 8600 - ZFS channel programs - snapshot 2018-02-08 15:29:24 -08:00
zcp_iter.c Fix typos in module/zfs/ 2019-09-02 17:56:41 -07:00
zcp_set.c Support setting user properties in a channel program 2020-02-14 13:41:42 -08:00
zcp_synctask.c filesystem_limit/snapshot_limit is incorrectly enforced against root 2020-07-11 17:18:02 -07:00
zcp.c Channel program may spuriously fail with "memory limit exhausted" 2020-11-11 17:16:15 -08:00
zfeature.c Throw const on some strings 2020-10-02 17:44:10 -07:00
zfs_byteswap.c Mark functions as static 2020-06-18 12:20:38 -07:00
zfs_fm.c Clean up RAIDZ/DRAID ereport code 2021-03-19 16:22:10 -07:00
zfs_fuid.c Fix regression in POSIX mode behavior 2021-03-19 22:50:46 -07:00
zfs_ioctl.c Macroify teardown lock handling 2021-03-12 15:51:39 -08:00
zfs_log.c Fix zfs_get_data access to files with wrong generation 2021-03-19 22:53:31 -07:00
zfs_onexit.c Remove deduplicated send/receive code 2020-04-23 10:06:57 -07:00
zfs_quota.c File incorrectly zeroed when receiving incremental stream that toggles -L 2020-06-09 10:41:01 -07:00
zfs_ratelimit.c Change checksum & IO delay ratelimit values 2018-03-04 17:34:51 -08:00
zfs_replay.c Simplify FreeBSD's locking requirements in zfs_replay.c 2020-01-22 17:55:56 -08:00
zfs_rlock.c Add a "try" operation for range locks 2020-07-06 11:53:31 -07:00
zfs_sa.c Extending FreeBSD UIO Struct 2021-01-20 21:27:30 -08:00
zfs_vnops.c Fix zfs_get_data access to files with wrong generation 2021-03-19 22:53:31 -07:00
zil.c Fix zfs_get_data access to files with wrong generation 2021-03-19 22:53:31 -07:00
zio_checksum.c Mark functions as static 2020-06-18 12:20:38 -07:00
zio_compress.c Avoid symbol collision with in-kernel zstdlib 2020-08-24 12:20:41 -07:00
zio_inject.c Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
zio.c Clean up RAIDZ/DRAID ereport code 2021-03-19 16:22:10 -07:00
zle.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
zrlock.c Remove dead code 2020-06-18 12:21:18 -07:00
zthr.c Retain thread name when resuming a zthr 2020-09-03 20:09:52 -07:00
zvol.c Fix zfs_get_data access to files with wrong generation 2021-03-19 22:53:31 -07:00