Fixes in persistent error log

Address the following bugs in persistent error log:

1) Check nested clones, eg "fs->snap->clone->snap2->clone2".

2) When deleting files containing error blocks in those clones (from
   "clone" the example above), do not break the check chain.

3) When deleting files in the originating fs before syncing the errlog
   to disk, do not break the check chain. This happens because at the
   time of introducing the error block in the error list, we do not have
   its birth txg and the head filesystem. If the original file is
   deleted before the error list is synced to the error log (which is
   when we actually lookup the birth txg and the head filesystem), then
   we do not have access to this info anymore and break the check chain.

The most prominent change is related to achieving (3). We expand the
spa_error_entry_t structure to accommodate the newly introduced
zbookmark_err_phys_t structure (containing the birth txg of the error
block).Due to compatibility reasons we cannot remove the
zbookmark_phys_t structure and we also need to place the new structure
after se_avl, so it is not accounted for in avl_find(). Then we modify
spa_log_error() to also provide the birth txg of the error block. With
these changes in place we simplify the previously introduced function
get_head_and_birth_txg() (now named get_head_ds()).

We chose not to follow the same approach for the head filesystem (thus
completely removing get_head_ds()) to avoid introducing new lock
contentions.

The stack sizes of nested functions (as measured by checkstack.pl in the
linux kernel) are:
check_filesystem [zfs]: 272 (was 912)
check_clones [zfs]: 64

We also introduced two new tests covering the above changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14633
This commit is contained in:
George Amanakis
2023-03-29 01:51:58 +02:00
committed by GitHub
parent 65d10bd87c
commit 431083f75b
16 changed files with 423 additions and 173 deletions
+2
View File
@@ -1169,6 +1169,8 @@ nobase_dist_datadir_zfs_tests_tests_SCRIPTS += \
functional/cli_root/zpool_status/zpool_status_003_pos.ksh \
functional/cli_root/zpool_status/zpool_status_004_pos.ksh \
functional/cli_root/zpool_status/zpool_status_005_pos.ksh \
functional/cli_root/zpool_status/zpool_status_006_pos.ksh \
functional/cli_root/zpool_status/zpool_status_007_pos.ksh \
functional/cli_root/zpool_status/zpool_status_features_001_pos.ksh \
functional/cli_root/zpool_sync/cleanup.ksh \
functional/cli_root/zpool_sync/setup.ksh \
@@ -61,11 +61,13 @@ dd if=/$TESTPOOL2/10m_file bs=1M || true
log_must zfs snapshot $TESTPOOL2@snap
log_must zfs clone $TESTPOOL2@snap $TESTPOOL2/clone
log_must zfs create $TESTPOOL2/$TESTFS1
# Look to see that snapshot, clone and filesystem our files report errors
log_must zpool status -v $TESTPOOL2
log_must eval "zpool status -v | grep '$TESTPOOL2@snap:/10m_file'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone/10m_file'"
log_must eval "zpool status -v | grep '$TESTPOOL2/10m_file'"
log_mustnot eval "zpool status -v | grep '$TESTFS1'"
log_pass "'zpool status -v' outputs affected filesystem, snapshot & clone"
@@ -24,7 +24,6 @@
# Copyright (c) 2022 George Amanakis. All rights reserved.
#
. $STF_SUITE/include/libtest.shlib
#
# DESCRIPTION:
# Verify correct output with 'zpool status -v' after corrupting a file
@@ -34,7 +33,12 @@
# 2. zinject checksum errors
# 3. Unmount the filesystem and unload the key
# 4. Scrub the pool
# 5. Verify we report errors in the pool in 'zpool status -v'
# 5. Verify we report that errors were detected but we do not report
# the filename since the key is not loaded.
# 6. Load the key and mount the encrypted fs.
# 7. Verify we report errors in the pool in 'zpool status -v'
. $STF_SUITE/include/libtest.shlib
verify_runnable "both"
@@ -66,13 +70,21 @@ log_must dd if=/dev/urandom of=$file bs=1024 count=1024 oflag=sync
log_must eval "echo 'aaaaaaaa' >> "$file
corrupt_blocks_at_level $file 0
log_must zfs unmount $TESTPOOL2/$TESTFS1
log_must zfs unload-key $TESTPOOL2/$TESTFS1
log_must zfs umount $TESTPOOL2/$TESTFS1
log_must zfs unload-key -a
log_must zpool sync $TESTPOOL2
log_must zpool scrub $TESTPOOL2
log_must zpool wait -t scrub $TESTPOOL2
log_must zpool status -v $TESTPOOL2
log_must eval "zpool status -v $TESTPOOL2 | \
grep \"Permanent errors have been detected\""
log_mustnot eval "zpool status -v $TESTPOOL2 | grep '$file'"
log_must eval "cat /$TESTPOOL2/pwd | zfs load-key $TESTPOOL2/$TESTFS1"
log_must zfs mount $TESTPOOL2/$TESTFS1
log_must zpool status -v $TESTPOOL2
log_must eval "zpool status -v $TESTPOOL2 | \
grep \"Permanent errors have been detected\""
log_must eval "zpool status -v $TESTPOOL2 | grep '$file'"
log_pass "Verify reporting errors with unloaded keys works"
@@ -0,0 +1,97 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2023 George Amanakis. All rights reserved.
#
#
# DESCRIPTION:
# Verify reporting errors when deleting files
#
# STRATEGY:
# 1. Create a pool, and a file
# 2. zinject checksum errors
# 3. Create snapshots and clones like:
# fs->snap1->clone1->snap2->clone2->...
# 4. Read the original file and immediately delete it
# 5. Delete the file in clone2
# 6. Snapshot clone2->snapxx and clone into snapxx->clonexx
# 7. Verify we report errors in the pool in 'zpool status -v'
# 8. Promote clone1
# 9. Verify we report errors in the pool in 'zpool status -v'
. $STF_SUITE/include/libtest.shlib
verify_runnable "both"
function cleanup
{
log_must zinject -c all
destroy_pool $TESTPOOL2
rm -f $TESTDIR/vdev_a
}
log_assert "Verify reporting errors when deleting files"
log_onexit cleanup
typeset file="/$TESTPOOL2/$TESTFILE0"
truncate -s $MINVDEVSIZE $TESTDIR/vdev_a
log_must zpool create -f -o feature@head_errlog=enabled $TESTPOOL2 $TESTDIR/vdev_a
log_must dd if=/dev/urandom of=$file bs=1024 count=1024 oflag=sync
log_must zinject -t data -e checksum -f 100 -am $file
for i in {1..3}; do
lastfs="$(zfs list -r $TESTPOOL2 | tail -1 | awk '{print $1}')"
log_must zfs snap $lastfs@snap$i
log_must zfs clone $lastfs@snap$i $TESTPOOL2/clone$i
done
log_mustnot dd if=$file of=/dev/null bs=1024
log_must rm $file /$TESTPOOL2/clone2/$TESTFILE0
log_must zfs snap $TESTPOOL2/clone2@snapxx
log_must zfs clone $TESTPOOL2/clone2@snapxx $TESTPOOL2/clonexx
log_must zpool status -v $TESTPOOL2
log_must eval "zpool status -v $TESTPOOL2 | \
grep \"Permanent errors have been detected\""
log_must eval "zpool status -v | grep '$TESTPOOL2@snap1:/$TESTFILE0'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone1/$TESTFILE0'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone1@snap2:/$TESTFILE0'"
log_mustnot eval "zpool status -v | grep '$TESTPOOL2/clone2/$TESTFILE0'"
log_mustnot eval "zpool status -v | grep '$TESTPOOL2/clonexx/$TESTFILE0'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone2@snap3:/$TESTFILE0'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone3/$TESTFILE0'"
log_must zfs promote $TESTPOOL2/clone1
log_must eval "zpool status -v $TESTPOOL2 | \
grep \"Permanent errors have been detected\""
log_must eval "zpool status -v | grep '$TESTPOOL2/clone1@snap1:/$TESTFILE0'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone1/$TESTFILE0'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone1@snap2:/$TESTFILE0'"
log_mustnot eval "zpool status -v | grep '$TESTPOOL2/clone2/$TESTFILE0'"
log_mustnot eval "zpool status -v | grep '$TESTPOOL2/clonexx/$TESTFILE0'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone2@snap3:/$TESTFILE0'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone3/$TESTFILE0'"
log_pass "Verify reporting errors when deleting files"
@@ -0,0 +1,98 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2023 George Amanakis. All rights reserved.
#
#
# DESCRIPTION:
# Verify reporting errors when deleting corrupted files after scrub
#
# STRATEGY:
# 1. Create a pool, and a file
# 2. Corrupt the file
# 3. Create snapshots and clones like:
# fs->snap1->clone1->snap2->clone2->...
# 4. Read the original file and immediately delete it
# 5. Delete the file in clone2
# 6. Snapshot clone2->snapxx and clone into snapxx->clonexx
# 7. Verify we report errors in the pool in 'zpool status -v'
# 8. Promote clone1
# 9. Verify we report errors in the pool in 'zpool status -v'
. $STF_SUITE/include/libtest.shlib
verify_runnable "both"
function cleanup
{
destroy_pool $TESTPOOL2
rm -f $TESTDIR/vdev_a
}
log_assert "Verify reporting errors when deleting corrupted files after scrub"
log_onexit cleanup
typeset file="/$TESTPOOL2/$TESTFS1/$TESTFILE0"
truncate -s $MINVDEVSIZE $TESTDIR/vdev_a
log_must zpool create -f $TESTPOOL2 $TESTDIR/vdev_a
log_must zfs create -o primarycache=none $TESTPOOL2/$TESTFS1
log_must dd if=/dev/urandom of=$file bs=1024 count=1024 oflag=sync
corrupt_blocks_at_level $file 0
lastfs="$(zfs list -r $TESTPOOL2 | tail -1 | awk '{print $1}')"
for i in {1..3}; do
log_must zfs snap $lastfs@snap$i
log_must zfs clone $lastfs@snap$i $TESTPOOL2/clone$i
lastfs="$(zfs list -r $TESTPOOL2/clone$i | tail -1 | awk '{print $1}')"
done
log_must zpool scrub -w $TESTPOOL2
log_must rm $file /$TESTPOOL2/clone2/$TESTFILE0
log_must zfs snap $TESTPOOL2/clone2@snapxx
log_must zfs clone $TESTPOOL2/clone2@snapxx $TESTPOOL2/clonexx
log_must zpool status -v $TESTPOOL2
log_must eval "zpool status -v $TESTPOOL2 | \
grep \"Permanent errors have been detected\""
log_must eval "zpool status -v | grep '$TESTPOOL2/$TESTFS1@snap1:/$TESTFILE0'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone1/$TESTFILE0'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone1@snap2:/$TESTFILE0'"
log_mustnot eval "zpool status -v | grep '$TESTPOOL2/clone2/$TESTFILE0'"
log_mustnot eval "zpool status -v | grep '$TESTPOOL2/clonexx/$TESTFILE0'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone2@snap3:/$TESTFILE0'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone3/$TESTFILE0'"
log_must zfs promote $TESTPOOL2/clone1
log_must eval "zpool status -v $TESTPOOL2 | \
grep \"Permanent errors have been detected\""
log_must eval "zpool status -v | grep '$TESTPOOL2/clone1@snap1:/$TESTFILE0'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone1/$TESTFILE0'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone1@snap2:/$TESTFILE0'"
log_mustnot eval "zpool status -v | grep '$TESTPOOL2/clone2/$TESTFILE0'"
log_mustnot eval "zpool status -v | grep '$TESTPOOL2/clonexx/$TESTFILE0'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone2@snap3:/$TESTFILE0'"
log_must eval "zpool status -v | grep '$TESTPOOL2/clone3/$TESTFILE0'"
log_pass "Verify reporting errors when deleting corrupted files after scrub"