Wire O_DIRECT also to Uncached I/O (#17218)

Before Direct I/O was implemented, I've implemented lighter version
I called Uncached I/O.  It uses normal DMU/ARC data path with some
optimizations, but evicts data from caches as soon as possible and
reasonable.  Originally I wired it only to a primarycache property,
but now completing the integration all the way up to the VFS.

While Direct I/O has the lowest possible memory bandwidth usage,
it also has a significant number of limitations.  It require I/Os
to be page aligned, does not allow speculative prefetch, etc.  The
Uncached I/O does not have those limitations, but instead require
additional memory copy, though still one less than regular cached
I/O.  As such it should fill the gap in between.  Considering this
I've disabled annoying EINVAL errors on misaligned requests, adding
a tunable for those who wants to test their applications.

To pass the information between the layers I had to change a number
of APIs.  But as side effect upper layers can now control not only
the caching, but also speculative prefetch.  I haven't wired it to
VFS yet, since it require looking on some OS specifics.  But while
there I've implemented speculative prefetch of indirect blocks for
Direct I/O, controllable via all the same mechanisms.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Fixes #17027
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
This commit is contained in:
Alexander Motin
2025-05-13 17:26:55 -04:00
committed by GitHub
parent e2ba0f7643
commit 734eba251d
35 changed files with 397 additions and 294 deletions
+1
View File
@@ -107,6 +107,7 @@ VOL_USE_BLK_MQ UNSUPPORTED zvol_use_blk_mq
BCLONE_ENABLED bclone_enabled zfs_bclone_enabled
BCLONE_WAIT_DIRTY bclone_wait_dirty zfs_bclone_wait_dirty
DIO_ENABLED dio_enabled zfs_dio_enabled
DIO_STRICT dio_strict zfs_dio_strict
XATTR_COMPAT xattr_compat zfs_xattr_compat
ZEVENT_LEN_MAX zevent.len_max zfs_zevent_len_max
ZEVENT_RETAIN_MAX zevent.retain_max zfs_zevent_retain_max
@@ -40,8 +40,10 @@
verify_runnable "global"
log_must save_tunable DIO_STRICT
function cleanup
{
restore_tunable DIO_STRICT
zfs set recordsize=$rs $TESTPOOL/$TESTFS
zfs set direct=standard $TESTPOOL/$TESTFS
log_must rm -f $tmp_file
@@ -61,6 +63,13 @@ file_size=$((rs * 8))
log_must stride_dd -i /dev/urandom -o $tmp_file -b $file_size -c 1
log_must set_tunable32 DIO_STRICT 0
log_must zfs set direct=standard $TESTPOOL/$TESTFS
# sub-pagesize direct writes/read will always pass if not strict.
log_must stride_dd -i /dev/urandom -o $tmp_file -b 512 -c 8 -D
log_must stride_dd -i $tmp_file -o /dev/null -b 512 -c 8 -d
log_must set_tunable32 DIO_STRICT 1
log_must zfs set direct=standard $TESTPOOL/$TESTFS
# sub-pagesize direct writes/read will always fail if direct=standard.
log_mustnot stride_dd -i /dev/urandom -o $tmp_file -b 512 -c 8 -D
@@ -48,6 +48,7 @@ TESTDS=${TESTPOOL}/${TESTFS}
TESTFILE=${TESTDIR}/${TESTFILE0}
log_must save_tunable DIO_ENABLED
log_must save_tunable DIO_STRICT
typeset recordsize_saved=$(get_prop recordsize $TESTDS)
typeset direct_saved=$(get_prop direct $TESTDS)
@@ -57,6 +58,7 @@ function cleanup
zfs set recordsize=$recordsize_saved $TESTDS
zfs set direct=$direct_saved $TESTDS
restore_tunable DIO_ENABLED
restore_tunable DIO_STRICT
}
log_onexit cleanup
@@ -154,6 +156,7 @@ for krs in 4 8 16 32 64 128 256 512 ; do
done
# reset for write tests
log_must set_tunable32 DIO_STRICT 1
log_must zfs set recordsize=16K $TESTDS
log_must zfs set direct=standard $TESTDS
@@ -173,4 +176,12 @@ log_must zpool sync
assert_dioalign $TESTFILE $PAGE_SIZE 16384
log_mustnot dd if=/dev/urandom of=$TESTFILE bs=1024 count=256 oflag=direct
# same again, but without strict, which should succeed.
log_must set_tunable32 DIO_STRICT 0
log_must rm -f $TESTFILE
log_must touch $TESTFILE
log_must zpool sync
assert_dioalign $TESTFILE $PAGE_SIZE 16384
log_must dd if=/dev/urandom of=$TESTFILE bs=1024 count=256 oflag=direct
log_pass $CLAIM