Always validate checksums for Direct I/O reads

This fixes an oversight in the Direct I/O PR. There is nothing that
stops a process from manipulating the contents of a buffer for a
Direct I/O read while the I/O is in flight. This can lead checksum
verify failures. However, the disk contents are still correct, and this
would lead to false reporting of checksum validation failures.

To remedy this, all Direct I/O reads that have a checksum verification
failure are treated as suspicious. In the event a checksum validation
failure occurs for a Direct I/O read, then the I/O request will be
reissued though the ARC. This allows for actual validation to happen and
removes any possibility of the buffer being manipulated after the I/O
has been issued.

Just as with Direct I/O write checksum validation failures, Direct I/O
read checksum validation failures are reported though zpool status -d in
the DIO column. Also the zevent has been updated to have both:
1. dio_verify_wr -> Checksum verification failure for writes
2. dio_verify_rd -> Checksum verification failure for reads.
This allows for determining what I/O operation was the culprit for the
checksum verification failure. All DIO errors are reported only on the
top-level VDEV.

Even though FreeBSD can write protect pages (stable pages) it still has
the same issue as Linux with Direct I/O reads.

This commit updates the following:
1. Propogates checksum failures for reads all the way up to the
   top-level VDEV.
2. Reports errors through zpool status -d as DIO.
3. Has two zevents for checksum verify errors with Direct I/O. One for
   read and one for write.
4. Updates FreeBSD ABD code to also check for ABD_FLAG_FROM_PAGES and
   handle ABD buffer contents validation the same as Linux.
5. Updated manipulate_user_buffer.c to also manipulate a buffer while a
   Direct I/O read is taking place.
6. Adds a new ZTS test case dio_read_verify that stress tests the new
   code.
7. Updated man pages.
8. Added an IMPLY statement to zio_checksum_verify() to make sure that
   Direct I/O reads are not issued as speculative.
9. Removed self healing through mirror, raidz, and dRAID VDEVs for
   Direct I/O reads.

This issue was first observed when installing a Windows 11 VM on a ZFS
dataset with the dataset property direct set to always. The zpool
devices would report checksum failures, but running a subsequent zpool
scrub would not repair any data and report no errors.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #16598
This commit is contained in:
Brian Atkinson
2024-10-09 15:28:08 -04:00
committed by Brian Behlendorf
parent 774dcba86d
commit 26ecd8b993
24 changed files with 510 additions and 146 deletions
+2 -2
View File
@@ -697,8 +697,8 @@ tags = ['functional', 'delegate']
tests = ['dio_aligned_block', 'dio_async_always', 'dio_async_fio_ioengines',
'dio_compression', 'dio_dedup', 'dio_encryption', 'dio_grow_block',
'dio_max_recordsize', 'dio_mixed', 'dio_mmap', 'dio_overwrites',
'dio_property', 'dio_random', 'dio_recordsize', 'dio_unaligned_block',
'dio_unaligned_filesize']
'dio_property', 'dio_random', 'dio_read_verify', 'dio_recordsize',
'dio_unaligned_block', 'dio_unaligned_filesize']
tags = ['functional', 'direct']
[tests/functional/exec]
+121 -59
View File
@@ -20,7 +20,7 @@
*/
/*
* Copyright (c) 2022 by Triad National Security, LLC.
* Copyright (c) 2024 by Triad National Security, LLC.
*/
#include <sys/types.h>
@@ -39,51 +39,59 @@
#define MIN(a, b) ((a) < (b)) ? (a) : (b)
#endif
static char *outputfile = NULL;
static char *filename = NULL;
static int blocksize = 131072; /* 128K */
static int wr_err_expected = 0;
static int err_expected = 0;
static int read_op = 0;
static int write_op = 0;
static int numblocks = 100;
static char *execname = NULL;
static int print_usage = 0;
static int randompattern = 0;
static int ofd;
static int fd;
char *buf = NULL;
typedef struct {
int entire_file_written;
int entire_file_completed;
} pthread_args_t;
static void
usage(void)
{
(void) fprintf(stderr,
"usage %s -o outputfile [-b blocksize] [-e wr_error_expected]\n"
" [-n numblocks] [-p randpattern] [-h help]\n"
"usage %s -f filename [-b blocksize] [-e wr_error_expected]\n"
" [-n numblocks] [-p randompattern] -r read_op \n"
" -w write_op [-h help]\n"
"\n"
"Testing whether checksum verify works correctly for O_DIRECT.\n"
"when manipulating the contents of a userspace buffer.\n"
"\n"
" outputfile: File to write to.\n"
" blocksize: Size of each block to write (must be at \n"
" least >= 512).\n"
" wr_err_expected: Whether pwrite() is expected to return EIO\n"
" while manipulating the contents of the\n"
" buffer.\n"
" numblocks: Total number of blocksized blocks to\n"
" write.\n"
" randpattern: Fill data buffer with random data. Default\n"
" behavior is to fill the buffer with the \n"
" known data pattern (0xdeadbeef).\n"
" filename: File to read or write to.\n"
" blocksize: Size of each block to write (must be at \n"
" least >= 512).\n"
" err_expected: Whether write() is expected to return EIO\n"
" while manipulating the contents of the\n"
" buffer.\n"
" numblocks: Total number of blocksized blocks to\n"
" write.\n"
" read_op: Perform reads to the filename file while\n"
" while manipulating the buffer contents\n"
" write_op: Perform writes to the filename file while\n"
" manipulating the buffer contents\n"
" randompattern: Fill data buffer with random data for \n"
" writes. Default behavior is to fill the \n"
" buffer with known data pattern (0xdeadbeef)\n"
" help: Print usage information and exit.\n"
"\n"
" Required parameters:\n"
" outputfile\n"
" filename\n"
" read_op or write_op\n"
"\n"
" Default Values:\n"
" blocksize -> 131072\n"
" wr_err_expexted -> false\n"
" numblocks -> 100\n"
" randpattern -> false\n",
" randompattern -> false\n",
execname);
(void) exit(1);
}
@@ -97,16 +105,21 @@ parse_options(int argc, char *argv[])
extern int optind, optopt;
execname = argv[0];
while ((c = getopt(argc, argv, "b:ehn:o:p")) != -1) {
while ((c = getopt(argc, argv, "b:ef:hn:rw")) != -1) {
switch (c) {
case 'b':
blocksize = atoi(optarg);
break;
case 'e':
wr_err_expected = 1;
err_expected = 1;
break;
case 'f':
filename = optarg;
break;
case 'h':
print_usage = 1;
break;
@@ -115,12 +128,12 @@ parse_options(int argc, char *argv[])
numblocks = atoi(optarg);
break;
case 'o':
outputfile = optarg;
case 'r':
read_op = 1;
break;
case 'p':
randompattern = 1;
case 'w':
write_op = 1;
break;
case ':':
@@ -141,7 +154,8 @@ parse_options(int argc, char *argv[])
if (errflag || print_usage == 1)
(void) usage();
if (blocksize < 512 || outputfile == NULL || numblocks <= 0) {
if (blocksize < 512 || filename == NULL || numblocks <= 0 ||
(read_op == 0 && write_op == 0)) {
(void) fprintf(stderr,
"Required paramater(s) missing or invalid.\n");
(void) usage();
@@ -160,10 +174,10 @@ write_thread(void *arg)
ssize_t wrote = 0;
pthread_args_t *args = (pthread_args_t *)arg;
while (!args->entire_file_written) {
wrote = pwrite(ofd, buf, blocksize, offset);
while (!args->entire_file_completed) {
wrote = pwrite(fd, buf, blocksize, offset);
if (wrote != blocksize) {
if (wr_err_expected)
if (err_expected)
assert(errno == EIO);
else
exit(2);
@@ -173,7 +187,35 @@ write_thread(void *arg)
left -= blocksize;
if (left == 0)
args->entire_file_written = 1;
args->entire_file_completed = 1;
}
pthread_exit(NULL);
}
/*
* Read blocksize * numblocks to the file using O_DIRECT.
*/
static void *
read_thread(void *arg)
{
size_t offset = 0;
int total_data = blocksize * numblocks;
int left = total_data;
ssize_t read = 0;
pthread_args_t *args = (pthread_args_t *)arg;
while (!args->entire_file_completed) {
read = pread(fd, buf, blocksize, offset);
if (read != blocksize) {
exit(2);
}
offset = ((offset + blocksize) % total_data);
left -= blocksize;
if (left == 0)
args->entire_file_completed = 1;
}
pthread_exit(NULL);
@@ -189,7 +231,7 @@ manipulate_buf_thread(void *arg)
char rand_char;
pthread_args_t *args = (pthread_args_t *)arg;
while (!args->entire_file_written) {
while (!args->entire_file_completed) {
rand_offset = (rand() % blocksize);
rand_char = (rand() % (126 - 33) + 33);
buf[rand_offset] = rand_char;
@@ -202,9 +244,9 @@ int
main(int argc, char *argv[])
{
const char *datapattern = "0xdeadbeef";
int ofd_flags = O_WRONLY | O_CREAT | O_DIRECT;
int fd_flags = O_DIRECT;
mode_t mode = S_IRUSR | S_IWUSR;
pthread_t write_thr;
pthread_t io_thr;
pthread_t manipul_thr;
int left = blocksize;
int offset = 0;
@@ -213,9 +255,15 @@ main(int argc, char *argv[])
parse_options(argc, argv);
ofd = open(outputfile, ofd_flags, mode);
if (ofd == -1) {
(void) fprintf(stderr, "%s, %s\n", execname, outputfile);
if (write_op) {
fd_flags |= (O_WRONLY | O_CREAT);
} else {
fd_flags |= O_RDONLY;
}
fd = open(filename, fd_flags, mode);
if (fd == -1) {
(void) fprintf(stderr, "%s, %s\n", execname, filename);
perror("open");
exit(2);
}
@@ -228,24 +276,22 @@ main(int argc, char *argv[])
exit(2);
}
if (!randompattern) {
/* Putting known data pattern in buffer */
while (left) {
size_t amt = MIN(strlen(datapattern), left);
memcpy(&buf[offset], datapattern, amt);
offset += amt;
left -= amt;
if (write_op) {
if (!randompattern) {
/* Putting known data pattern in buffer */
while (left) {
size_t amt = MIN(strlen(datapattern), left);
memcpy(&buf[offset], datapattern, amt);
offset += amt;
left -= amt;
}
} else {
/* Putting random data in buffer */
for (int i = 0; i < blocksize; i++)
buf[i] = rand();
}
} else {
/* Putting random data in buffer */
for (int i = 0; i < blocksize; i++)
buf[i] = rand();
}
/*
* Writing using O_DIRECT while manipulating the buffer contents until
* the entire file is written.
*/
if ((rc = pthread_create(&manipul_thr, NULL, manipulate_buf_thread,
&args))) {
fprintf(stderr, "error: pthreads_create, manipul_thr, "
@@ -253,18 +299,34 @@ main(int argc, char *argv[])
exit(2);
}
if ((rc = pthread_create(&write_thr, NULL, write_thread, &args))) {
fprintf(stderr, "error: pthreads_create, write_thr, "
"rc: %d\n", rc);
exit(2);
if (write_op) {
/*
* Writing using O_DIRECT while manipulating the buffer contents
* until the entire file is written.
*/
if ((rc = pthread_create(&io_thr, NULL, write_thread, &args))) {
fprintf(stderr, "error: pthreads_create, io_thr, "
"rc: %d\n", rc);
exit(2);
}
} else {
/*
* Reading using O_DIRECT while manipulating the buffer contents
* until the entire file is read.
*/
if ((rc = pthread_create(&io_thr, NULL, read_thread, &args))) {
fprintf(stderr, "error: pthreads_create, io_thr, "
"rc: %d\n", rc);
exit(2);
}
}
pthread_join(write_thr, NULL);
pthread_join(io_thr, NULL);
pthread_join(manipul_thr, NULL);
assert(args.entire_file_written == 1);
assert(args.entire_file_completed == 1);
(void) close(ofd);
(void) close(fd);
free(buf);
+1
View File
@@ -1477,6 +1477,7 @@ nobase_dist_datadir_zfs_tests_tests_SCRIPTS += \
functional/direct/dio_overwrites.ksh \
functional/direct/dio_property.ksh \
functional/direct/dio_random.ksh \
functional/direct/dio_read_verify.ksh \
functional/direct/dio_recordsize.ksh \
functional/direct/dio_unaligned_block.ksh \
functional/direct/dio_unaligned_filesize.ksh \
@@ -84,8 +84,9 @@ function get_zpool_status_chksum_verify_failures # pool_name vdev_type
function get_zed_dio_verify_events # pool
{
typeset pool=$1
typeset op=$2
val=$(zpool events $pool | grep -c dio_verify)
val=$(zpool events $pool | grep -c "dio_verify_${op}")
echo "$val"
}
@@ -96,11 +97,12 @@ function get_zed_dio_verify_events # pool
# zpool events
# After getting that counts will clear the out the ZPool errors and events
#
function check_dio_write_chksum_verify_failures # pool vdev_type expect_errors
function check_dio_chksum_verify_failures # pool vdev_type op expect_errors
{
typeset pool=$1
typeset vdev_type=$2
typeset expect_errors=$3
typeset op=$4
typeset note_str="expecting none"
if [[ $expect_errors -ne 0 ]]; then
@@ -108,10 +110,10 @@ function check_dio_write_chksum_verify_failures # pool vdev_type expect_errors
fi
log_note "Checking for Direct I/O write checksum verify errors \
$note_str on ZPool: $pool"
$note_str on ZPool: $pool with $vdev_type"
status_failures=$(get_zpool_status_chksum_verify_failures $pool $vdev_type)
zed_dio_verify_events=$(get_zed_dio_verify_events $pool)
zed_dio_verify_events=$(get_zed_dio_verify_events $pool $op)
if [[ $expect_errors -ne 0 ]]; then
if [[ $status_failures -eq 0 ||
+107
View File
@@ -0,0 +1,107 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2024 by Triad National Security, LLC.
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/direct/dio.cfg
. $STF_SUITE/tests/functional/direct/dio.kshlib
#
# DESCRIPTION:
# Verify checksum verify works for Direct I/O reads.
#
# STRATEGY:
# 1. Create a zpool from each vdev type.
# 2. Start a Direct I/O read workload while manipulating the user buffer
# contents.
# 3. Verify there are Direct I/O read verify failures using
# zpool status -d and checking for zevents. We also make sure there
# are reported no data errors.
#
verify_runnable "global"
log_assert "Verify checksum verify works for Direct I/O reads."
log_onexit dio_cleanup
NUMBLOCKS=300
BS=$((128 * 1024)) # 128k
log_must truncate -s $MINVDEVSIZE $DIO_VDEVS
# We will verify that there are no checksum errors for every Direct I/O read
# while manipulating the buffer contents while the I/O is still in flight and
# also that Direct I/O checksum verify failures and dio_verify_rd zevents are
# reported.
for type in "" "mirror" "raidz" "draid"; do
typeset vdev_type=$type
if [[ "${vdev_type}" == "" ]]; then
vdev_type="stripe"
fi
log_note "Verifying every Direct I/O read verify with VDEV type \
${vdev_type}"
create_pool $TESTPOOL1 $type $DIO_VDEVS
log_must eval "zfs create -o recordsize=128k -o compression=off \
$TESTPOOL1/$TESTFS1"
mntpnt=$(get_prop mountpoint $TESTPOOL1/$TESTFS1)
prev_dio_rd=$(get_iostats_stat $TESTPOOL1 direct_read_count)
prev_arc_rd=$(get_iostats_stat $TESTPOOL1 arc_read_count)
# Create the file before trying to manipulate the contents
log_must stride_dd -o "$mntpnt/direct-write.iso" -i /dev/urandom \
-b $BS -c $NUMBLOCKS -D
# Manipulate the buffer contents will reading the file with Direct I/O
log_must manipulate_user_buffer -f "$mntpnt/direct-write.iso" \
-n $NUMBLOCKS -b $BS -r
# Getting new Direct I/O and ARC Write counts.
curr_dio_rd=$(get_iostats_stat $TESTPOOL1 direct_read_count)
curr_arc_rd=$(get_iostats_stat $TESTPOOL1 arc_read_count)
total_dio_rd=$((curr_dio_rd - prev_dio_rd))
total_arc_rd=$((curr_arc_rd - prev_arc_rd))
log_note "Making sure there are no checksum errors with the ZPool"
log_must check_pool_status $TESTPOOL "errors" "No known data errors"
log_note "Making sure we have Direct I/O and ARC reads logged"
if [[ $total_dio_rd -lt 1 ]]; then
log_fail "No Direct I/O reads $total_dio_rd"
fi
if [[ $total_arc_rd -lt 1 ]]; then
log_fail "No ARC reads $total_arc_rd"
fi
log_note "Making sure we have Direct I/O write checksum verifies with ZPool"
check_dio_chksum_verify_failures "$TESTPOOL1" "$vdev_type" 1 "rd"
destroy_pool $TESTPOOL1
done
log_pass "Verified checksum verify works for Direct I/O reads."
@@ -46,7 +46,7 @@ verify_runnable "global"
function cleanup
{
log_must rm -f "$mntpnt/direct-write.iso"
check_dio_write_chksum_verify_failures $TESTPOOL "raidz" 0
check_dio_chksum_verify_failures $TESTPOOL "raidz" 0 "wr"
}
log_assert "Verify stable pages work for Direct I/O writes."
@@ -76,8 +76,8 @@ do
# Manipulate the user's buffer while running O_DIRECT write
# workload with the buffer.
log_must manipulate_user_buffer -o "$mntpnt/direct-write.iso" \
-n $NUMBLOCKS -b $BS
log_must manipulate_user_buffer -f "$mntpnt/direct-write.iso" \
-n $NUMBLOCKS -b $BS -w
# Reading back the contents of the file
log_must stride_dd -i $mntpnt/direct-write.iso -o /dev/null \
@@ -91,8 +91,8 @@ log_must set_tunable32 VDEV_DIRECT_WR_VERIFY 0
log_note "Verifying no panics for Direct I/O writes with compression"
log_must zfs set compression=on $TESTPOOL/$TESTFS
prev_dio_wr=$(get_iostats_stat $TESTPOOL direct_write_count)
log_must manipulate_user_buffer -o "$mntpnt/direct-write.iso" -n $NUMBLOCKS \
-b $BS
log_must manipulate_user_buffer -f "$mntpnt/direct-write.iso" -n $NUMBLOCKS \
-b $BS -w
curr_dio_wr=$(get_iostats_stat $TESTPOOL direct_write_count)
total_dio_wr=$((curr_dio_wr - prev_dio_wr))
@@ -116,8 +116,8 @@ for i in $(seq 1 $ITERATIONS); do
$i of $ITERATIONS with zfs_vdev_direct_write_verify=0"
prev_dio_wr=$(get_iostats_stat $TESTPOOL direct_write_count)
log_must manipulate_user_buffer -o "$mntpnt/direct-write.iso" \
-n $NUMBLOCKS -b $BS
log_must manipulate_user_buffer -f "$mntpnt/direct-write.iso" \
-n $NUMBLOCKS -b $BS -w
# Reading file back to verify checksum errors
filesize=$(get_file_size "$mntpnt/direct-write.iso")
@@ -144,7 +144,7 @@ for i in $(seq 1 $ITERATIONS); do
fi
log_note "Making sure we have no Direct I/O write checksum verifies \
with ZPool"
check_dio_write_chksum_verify_failures $TESTPOOL "raidz" 0
check_dio_chksum_verify_failures $TESTPOOL "raidz" 0 "wr"
log_must rm -f "$mntpnt/direct-write.iso"
done
@@ -166,8 +166,8 @@ for i in $(seq 1 $ITERATIONS); do
$ITERATIONS with zfs_vdev_direct_write_verify=1"
prev_dio_wr=$(get_iostats_stat $TESTPOOL direct_write_count)
log_must manipulate_user_buffer -o "$mntpnt/direct-write.iso" \
-n $NUMBLOCKS -b $BS -e
log_must manipulate_user_buffer -f "$mntpnt/direct-write.iso" \
-n $NUMBLOCKS -b $BS -e -w
# Reading file back to verify there no are checksum errors
filesize=$(get_file_size "$mntpnt/direct-write.iso")
@@ -175,7 +175,7 @@ for i in $(seq 1 $ITERATIONS); do
log_must stride_dd -i "$mntpnt/direct-write.iso" -o /dev/null -b $BS \
-c $num_blocks
# Getting new Direct I/O and ARC Write counts.
# Getting new Direct I/O write counts.
curr_dio_wr=$(get_iostats_stat $TESTPOOL direct_write_count)
total_dio_wr=$((curr_dio_wr - prev_dio_wr))
@@ -188,7 +188,7 @@ for i in $(seq 1 $ITERATIONS); do
fi
log_note "Making sure we have Direct I/O write checksum verifies with ZPool"
check_dio_write_chksum_verify_failures "$TESTPOOL" "raidz" 1
check_dio_chksum_verify_failures "$TESTPOOL" "raidz" 1 "wr"
done
log_must rm -f "$mntpnt/direct-write.iso"