mirror of
https://git.proxmox.com/git/mirror_zfs.git
synced 2026-05-22 02:27:36 +03:00
Always validate checksums for Direct I/O reads
This fixes an oversight in the Direct I/O PR. There is nothing that stops a process from manipulating the contents of a buffer for a Direct I/O read while the I/O is in flight. This can lead checksum verify failures. However, the disk contents are still correct, and this would lead to false reporting of checksum validation failures. To remedy this, all Direct I/O reads that have a checksum verification failure are treated as suspicious. In the event a checksum validation failure occurs for a Direct I/O read, then the I/O request will be reissued though the ARC. This allows for actual validation to happen and removes any possibility of the buffer being manipulated after the I/O has been issued. Just as with Direct I/O write checksum validation failures, Direct I/O read checksum validation failures are reported though zpool status -d in the DIO column. Also the zevent has been updated to have both: 1. dio_verify_wr -> Checksum verification failure for writes 2. dio_verify_rd -> Checksum verification failure for reads. This allows for determining what I/O operation was the culprit for the checksum verification failure. All DIO errors are reported only on the top-level VDEV. Even though FreeBSD can write protect pages (stable pages) it still has the same issue as Linux with Direct I/O reads. This commit updates the following: 1. Propogates checksum failures for reads all the way up to the top-level VDEV. 2. Reports errors through zpool status -d as DIO. 3. Has two zevents for checksum verify errors with Direct I/O. One for read and one for write. 4. Updates FreeBSD ABD code to also check for ABD_FLAG_FROM_PAGES and handle ABD buffer contents validation the same as Linux. 5. Updated manipulate_user_buffer.c to also manipulate a buffer while a Direct I/O read is taking place. 6. Adds a new ZTS test case dio_read_verify that stress tests the new code. 7. Updated man pages. 8. Added an IMPLY statement to zio_checksum_verify() to make sure that Direct I/O reads are not issued as speculative. 9. Removed self healing through mirror, raidz, and dRAID VDEVs for Direct I/O reads. This issue was first observed when installing a Windows 11 VM on a ZFS dataset with the dataset property direct set to always. The zpool devices would report checksum failures, but running a subsequent zpool scrub would not repair any data and report no errors. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #16598
This commit is contained in:
@@ -620,9 +620,16 @@ abd_borrow_buf_copy(abd_t *abd, size_t n)
|
||||
|
||||
/*
|
||||
* Return a borrowed raw buffer to an ABD. If the ABD is scattered, this will
|
||||
* no change the contents of the ABD and will ASSERT that you didn't modify
|
||||
* the buffer since it was borrowed. If you want any changes you made to buf to
|
||||
* be copied back to abd, use abd_return_buf_copy() instead.
|
||||
* not change the contents of the ABD. If you want any changes you made to
|
||||
* buf to be copied back to abd, use abd_return_buf_copy() instead. If the
|
||||
* ABD is not constructed from user pages from Direct I/O then an ASSERT
|
||||
* checks to make sure the contents of the buffer have not changed since it was
|
||||
* borrowed. We can not ASSERT the contents of the buffer have not changed if
|
||||
* it is composed of user pages. While Direct I/O write pages are placed under
|
||||
* write protection and can not be changed, this is not the case for Direct I/O
|
||||
* reads. The pages of a Direct I/O read could be manipulated at any time.
|
||||
* Checksum verifications in the ZIO pipeline check for this issue and handle
|
||||
* it by returning an error on checksum verification failure.
|
||||
*/
|
||||
void
|
||||
abd_return_buf(abd_t *abd, void *buf, size_t n)
|
||||
@@ -632,8 +639,34 @@ abd_return_buf(abd_t *abd, void *buf, size_t n)
|
||||
#ifdef ZFS_DEBUG
|
||||
(void) zfs_refcount_remove_many(&abd->abd_children, n, buf);
|
||||
#endif
|
||||
if (abd_is_linear(abd)) {
|
||||
if (abd_is_from_pages(abd)) {
|
||||
if (!abd_is_linear_page(abd))
|
||||
zio_buf_free(buf, n);
|
||||
} else if (abd_is_linear(abd)) {
|
||||
ASSERT3P(buf, ==, abd_to_buf(abd));
|
||||
} else if (abd_is_gang(abd)) {
|
||||
#ifdef ZFS_DEBUG
|
||||
/*
|
||||
* We have to be careful with gang ABD's that we do not ASSERT
|
||||
* for any ABD's that contain user pages from Direct I/O. See
|
||||
* the comment above about Direct I/O read buffers possibly
|
||||
* being manipulated. In order to handle this, we jsut iterate
|
||||
* through the gang ABD and only verify ABD's that are not from
|
||||
* user pages.
|
||||
*/
|
||||
void *cmp_buf = buf;
|
||||
|
||||
for (abd_t *cabd = list_head(&ABD_GANG(abd).abd_gang_chain);
|
||||
cabd != NULL;
|
||||
cabd = list_next(&ABD_GANG(abd).abd_gang_chain, cabd)) {
|
||||
if (!abd_is_from_pages(cabd)) {
|
||||
ASSERT0(abd_cmp_buf(cabd, cmp_buf,
|
||||
cabd->abd_size));
|
||||
}
|
||||
cmp_buf = (char *)cmp_buf + cabd->abd_size;
|
||||
}
|
||||
#endif
|
||||
zio_buf_free(buf, n);
|
||||
} else {
|
||||
ASSERT0(abd_cmp_buf(abd, buf, n));
|
||||
zio_buf_free(buf, n);
|
||||
|
||||
Reference in New Issue
Block a user