mirror of
https://git.proxmox.com/git/mirror_zfs.git
synced 2026-05-22 02:27:36 +03:00
Always validate checksums for Direct I/O reads
This fixes an oversight in the Direct I/O PR. There is nothing that stops a process from manipulating the contents of a buffer for a Direct I/O read while the I/O is in flight. This can lead checksum verify failures. However, the disk contents are still correct, and this would lead to false reporting of checksum validation failures. To remedy this, all Direct I/O reads that have a checksum verification failure are treated as suspicious. In the event a checksum validation failure occurs for a Direct I/O read, then the I/O request will be reissued though the ARC. This allows for actual validation to happen and removes any possibility of the buffer being manipulated after the I/O has been issued. Just as with Direct I/O write checksum validation failures, Direct I/O read checksum validation failures are reported though zpool status -d in the DIO column. Also the zevent has been updated to have both: 1. dio_verify_wr -> Checksum verification failure for writes 2. dio_verify_rd -> Checksum verification failure for reads. This allows for determining what I/O operation was the culprit for the checksum verification failure. All DIO errors are reported only on the top-level VDEV. Even though FreeBSD can write protect pages (stable pages) it still has the same issue as Linux with Direct I/O reads. This commit updates the following: 1. Propogates checksum failures for reads all the way up to the top-level VDEV. 2. Reports errors through zpool status -d as DIO. 3. Has two zevents for checksum verify errors with Direct I/O. One for read and one for write. 4. Updates FreeBSD ABD code to also check for ABD_FLAG_FROM_PAGES and handle ABD buffer contents validation the same as Linux. 5. Updated manipulate_user_buffer.c to also manipulate a buffer while a Direct I/O read is taking place. 6. Adds a new ZTS test case dio_read_verify that stress tests the new code. 7. Updated man pages. 8. Added an IMPLY statement to zio_checksum_verify() to make sure that Direct I/O reads are not issued as speculative. 9. Removed self healing through mirror, raidz, and dRAID VDEVs for Direct I/O reads. This issue was first observed when installing a Windows 11 VM on a ZFS dataset with the dataset property direct set to always. The zpool devices would report checksum failures, but running a subsequent zpool scrub would not repair any data and report no errors. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #16598
This commit is contained in:
+1
-1
@@ -436,7 +436,7 @@ write.
|
||||
It can also help to identify if reported checksum errors are tied to Direct I/O
|
||||
writes.
|
||||
Each verify error causes a
|
||||
.Sy dio_verify
|
||||
.Sy dio_verify_wr
|
||||
zevent.
|
||||
Direct Write I/O checkum verify errors can be seen with
|
||||
.Nm zpool Cm status Fl d .
|
||||
|
||||
@@ -98,7 +98,10 @@ This can be an indicator of problems with the underlying storage device.
|
||||
The number of delay events is ratelimited by the
|
||||
.Sy zfs_slow_io_events_per_second
|
||||
module parameter.
|
||||
.It Sy dio_verify
|
||||
.It Sy dio_verify_rd
|
||||
Issued when there was a checksum verify error after a Direct I/O read has been
|
||||
issued.
|
||||
.It Sy dio_verify_wr
|
||||
Issued when there was a checksum verify error after a Direct I/O write has been
|
||||
issued.
|
||||
This event can only take place if the module parameter
|
||||
|
||||
@@ -82,14 +82,18 @@ Specify
|
||||
.Sy --json-pool-key-guid
|
||||
to set pool GUID as key for pool objects instead of pool names.
|
||||
.It Fl d
|
||||
Display the number of Direct I/O write checksum verify errors that have occured
|
||||
on a top-level VDEV.
|
||||
Display the number of Direct I/O read/write checksum verify errors that have
|
||||
occured on a top-level VDEV.
|
||||
See
|
||||
.Sx zfs_vdev_direct_write_verify
|
||||
in
|
||||
.Xr zfs 4
|
||||
for details about the conditions that can cause Direct I/O write checksum
|
||||
verify failures to occur.
|
||||
Direct I/O reads checksum verify errors can also occur if the contents of the
|
||||
buffer are being manipulated after the I/O has been issued and is in flight.
|
||||
In the case of Direct I/O read checksum verify errors, the I/O will be reissued
|
||||
through the ARC.
|
||||
.It Fl D
|
||||
Display a histogram of deduplication statistics, showing the allocated
|
||||
.Pq physically present on disk
|
||||
|
||||
Reference in New Issue
Block a user