Allow zhack label repair to restore detached devices.

This commit expands on the zhack label repair command in d04b5c9 by
adding the -u option to undetach a device by regenerating uberblocks,
in addition to the existing functionality of fixing checksums, now
represented by -c. Previous behavior is retained in the case of no
options.

The changes are heavily inspired by Jeff Bonwick's labelfix
utility, as archived at:

https://gist.github.com/jjwhitney/baaa63144da89726e482

Additionally, it is now capable of properly determining the size of
block devices and other media, as well as handling sizes which are
not divisible by 2^18. This should make it viable for use on physical
devices and partitions, in addition to files.

These changes should make it possible to import zpools that have had
their uberblocks erased, such as in the case of pools rendered
inaccessible by erroneous detach commands.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: buzzingwires <buzzingwires@outlook.com>
Closes #14773
This commit is contained in:
buzzingwires
2023-05-03 12:03:57 -04:00
committed by GitHub
parent 9de5300c7f
commit a46001adb9
10 changed files with 932 additions and 165 deletions
+2 -1
View File
@@ -325,7 +325,8 @@ tests = ['zfs_wait_deleteq', 'zfs_wait_getsubopt']
tags = ['functional', 'cli_root', 'zfs_wait']
[tests/functional/cli_root/zhack]
tests = ['zhack_label_checksum']
tests = ['zhack_label_repair_001', 'zhack_label_repair_002',
'zhack_label_repair_003', 'zhack_label_repair_004']
pre =
post =
tags = ['functional', 'cli_root', 'zhack']
+5 -1
View File
@@ -250,6 +250,7 @@ nobase_dist_datadir_zfs_tests_tests_DATA += \
functional/cli_root/zpool_upgrade/zpool_upgrade.cfg \
functional/cli_root/zpool_upgrade/zpool_upgrade.kshlib \
functional/cli_root/zpool_wait/zpool_wait.kshlib \
functional/cli_root/zhack/library.kshlib \
functional/cli_user/misc/misc.cfg \
functional/cli_user/zfs_list/zfs_list.cfg \
functional/cli_user/zfs_list/zfs_list.kshlib \
@@ -932,7 +933,10 @@ nobase_dist_datadir_zfs_tests_tests_SCRIPTS += \
functional/cli_root/zfs/zfs_001_neg.ksh \
functional/cli_root/zfs/zfs_002_pos.ksh \
functional/cli_root/zfs/zfs_003_neg.ksh \
functional/cli_root/zhack/zhack_label_checksum.ksh \
functional/cli_root/zhack/zhack_label_repair_001.ksh \
functional/cli_root/zhack/zhack_label_repair_002.ksh \
functional/cli_root/zhack/zhack_label_repair_003.ksh \
functional/cli_root/zhack/zhack_label_repair_004.ksh \
functional/cli_root/zpool_add/add_nested_replacing_spare.ksh \
functional/cli_root/zpool_add/add-o_ashift.ksh \
functional/cli_root/zpool_add/add_prop_ashift.ksh \
@@ -0,0 +1,361 @@
#!/bin/ksh
#
# This file and its contents are supplied under the terms of the
# Common Development and Distribution License ("CDDL"), version 1.0.
# You may only use this file in accordance with the terms of version
# 1.0 of the CDDL.
#
# A full copy of the text of the CDDL should have accompanied this
# source. A copy of the CDDL is also available via the Internet at
# http://www.illumos.org/license/CDDL.
#
#
# Copyright (c) 2021 by vStack. All rights reserved.
#
. "$STF_SUITE"/include/libtest.shlib
. "$STF_SUITE"/include/blkdev.shlib
#
# Description:
#
# Test whether zhack label repair commands can recover detached devices
# and corrupted checksums with a variety of sizes, and ensure
# the purposes of either command is cleanly separated from the others.
#
# Strategy:
#
# Tests are done on loopback devices with sizes divisible by label size and sizes that are not.
#
# Test one:
#
# 1. Create pool on a loopback device with some test data
# 2. Export the pool.
# 3. Corrupt all label checksums in the pool
# 4. Check that pool cannot be imported
# 5. Verify that it cannot be imported after using zhack label repair -u
# to ensure that the -u option will quit on corrupted checksums.
# 6. Use zhack label repair -c on device
# 7. Check that pool can be imported and that data is intact
#
# Test two:
#
# 1. Create pool on a loopback device with some test data
# 2. Detach either device from the mirror
# 3. Export the pool
# 4. Remove the non-detached device and its backing file
# 5. Verify that the remaining detached device cannot be imported
# 6. Verify that it cannot be imported after using zhack label repair -c
# to ensure that the -c option will not undetach a device.
# 7. Use zhack label repair -u on device
# 8. Verify that the detached device can be imported and that data is intact
#
# Test three:
#
# 1. Create pool on a loopback device with some test data
# 2. Detach either device from the mirror
# 3. Export the pool
# 4. Remove the non-detached device and its backing file
# 5. Corrupt all label checksums on the remaining device
# 6. Verify that the remaining detached device cannot be imported
# 7. Verify that it cannot be imported after using zhack label repair -u
# to ensure that the -u option will quit on corrupted checksums.
# 8. Verify that it cannot be imported after using zhack label repair -c
# -c should repair the checksums, but not undetach a device.
# 9. Use zhack label repair -u on device
# 10. Verify that the detached device can be imported and that data is intact
#
# Test four:
#
# 1. Create pool on a loopback device with some test data
# 2. Detach either device from the mirror
# 3. Export the pool
# 4. Remove the non-detached device and its backing file
# 5. Corrupt all label checksums on the remaining device
# 6. Verify that the remaining detached device cannot be imported
# 7. Use zhack label repair -cu on device to attempt to fix checksums and
# undetach the device in a single operation.
# 8. Verify that the detached device can be imported and that data is intact
#
log_assert "Verify zhack label repair <operation> <vdev> will repair label checksums and uberblocks"
log_onexit cleanup
LABEL_SIZE="$((2**18))"
LABEL_NVLIST_END="$((LABEL_SIZE / 2))"
LABEL_CKSUM_SIZE="32"
LABEL_CKSUM_START="$(( LABEL_NVLIST_END - LABEL_CKSUM_SIZE ))"
VIRTUAL_DISK=$TEST_BASE_DIR/disk
VIRTUAL_MIRROR_DISK=$TEST_BASE_DIR/mirrordisk
VIRTUAL_DEVICE=
VIRTUAL_MIRROR_DEVICE=
function cleanup_lo
{
L_DEVICE="$1"
if [[ -e $L_DEVICE ]]; then
if is_linux; then
log_must losetup -d "$L_DEVICE"
elif is_freebsd; then
log_must mdconfig -d -u "$L_DEVICE"
else
log_must lofiadm -d "$L_DEVICE"
fi
fi
}
function cleanup
{
poolexists "$TESTPOOL" && destroy_pool "$TESTPOOL"
cleanup_lo "$VIRTUAL_DEVICE"
cleanup_lo "$VIRTUAL_MIRROR_DEVICE"
VIRTUAL_DEVICE=
VIRTUAL_MIRROR_DEVICE=
[[ -f "$VIRTUAL_DISK" ]] && log_must rm "$VIRTUAL_DISK"
[[ -f "$VIRTUAL_MIRROR_DISK" ]] && log_must rm "$VIRTUAL_MIRROR_DISK"
}
RAND_MAX="$((2**15 - 1))"
function get_devsize
{
if [ "$RANDOM" -gt "$(( RAND_MAX / 2 ))" ]; then
echo "$(( MINVDEVSIZE + RANDOM ))"
else
echo "$MINVDEVSIZE"
fi
}
function pick_logop
{
L_SHOULD_SUCCEED="$1"
l_logop="log_mustnot"
if [ "$L_SHOULD_SUCCEED" == true ]; then
l_logop="log_must"
fi
echo "$l_logop"
}
function check_dataset
{
L_SHOULD_SUCCEED="$1"
L_LOGOP="$(pick_logop "$L_SHOULD_SUCCEED")"
"$L_LOGOP" mounted "$TESTPOOL"/"$TESTFS"
"$L_LOGOP" test -f "$TESTDIR"/"test"
}
function setup_dataset
{
log_must zfs create "$TESTPOOL"/"$TESTFS"
log_must mkdir -p "$TESTDIR"
log_must zfs set mountpoint="$TESTDIR" "$TESTPOOL"/"$TESTFS"
log_must mounted "$TESTPOOL"/"$TESTFS"
log_must touch "$TESTDIR"/"test"
log_must test -f "$TESTDIR"/"test"
log_must zpool sync "$TESTPOOL"
check_dataset true
}
function get_practical_size
{
L_SIZE="$1"
if [ "$((L_SIZE % LABEL_SIZE))" -ne 0 ]; then
echo "$(((L_SIZE / LABEL_SIZE) * LABEL_SIZE))"
else
echo "$L_SIZE"
fi
}
function corrupt_sized_label_checksum
{
L_SIZE="$1"
L_LABEL="$2"
L_DEVICE="$3"
L_PRACTICAL_SIZE="$(get_practical_size "$L_SIZE")"
typeset -a L_OFFSETS=("$LABEL_CKSUM_START" \
"$((LABEL_SIZE + LABEL_CKSUM_START))" \
"$(((L_PRACTICAL_SIZE - LABEL_SIZE*2) + LABEL_CKSUM_START))" \
"$(((L_PRACTICAL_SIZE - LABEL_SIZE) + LABEL_CKSUM_START))")
dd if=/dev/urandom of="$L_DEVICE" \
seek="${L_OFFSETS["$L_LABEL"]}" bs=1 count="$LABEL_CKSUM_SIZE" \
conv=notrunc
}
function corrupt_labels
{
L_SIZE="$1"
L_DISK="$2"
corrupt_sized_label_checksum "$L_SIZE" 0 "$L_DISK"
corrupt_sized_label_checksum "$L_SIZE" 1 "$L_DISK"
corrupt_sized_label_checksum "$L_SIZE" 2 "$L_DISK"
corrupt_sized_label_checksum "$L_SIZE" 3 "$L_DISK"
}
function try_import_and_repair
{
L_REPAIR_SHOULD_SUCCEED="$1"
L_IMPORT_SHOULD_SUCCEED="$2"
L_OP="$3"
L_POOLDISK="$4"
L_REPAIR_LOGOP="$(pick_logop "$L_REPAIR_SHOULD_SUCCEED")"
L_IMPORT_LOGOP="$(pick_logop "$L_IMPORT_SHOULD_SUCCEED")"
log_mustnot zpool import "$TESTPOOL" -d "$L_POOLDISK"
"$L_REPAIR_LOGOP" zhack label repair "$L_OP" "$L_POOLDISK"
"$L_IMPORT_LOGOP" zpool import "$TESTPOOL" -d "$L_POOLDISK"
check_dataset "$L_IMPORT_SHOULD_SUCCEED"
}
function prepare_vdev
{
L_SIZE="$1"
L_BACKFILE="$2"
l_devname=
if truncate -s "$L_SIZE" "$L_BACKFILE"; then
if is_linux; then
l_devname="$(losetup -f "$L_BACKFILE" --show)"
elif is_freebsd; then
l_devname=/dev/"$(mdconfig -a -t vnode -f "$L_BACKFILE")"
else
l_devname="$(lofiadm -a "$L_BACKFILE")"
fi
fi
echo "$l_devname"
}
function run_test_one
{
L_SIZE="$1"
VIRTUAL_DEVICE="$(prepare_vdev "$L_SIZE" "$VIRTUAL_DISK")"
log_must test -e "$VIRTUAL_DEVICE"
log_must zpool create "$TESTPOOL" "$VIRTUAL_DEVICE"
setup_dataset
log_must zpool export "$TESTPOOL"
corrupt_labels "$L_SIZE" "$VIRTUAL_DISK"
try_import_and_repair false false "-u" "$VIRTUAL_DEVICE"
try_import_and_repair true true "-c" "$VIRTUAL_DEVICE"
cleanup
log_pass "zhack label repair corruption test passed with a randomized size of $L_SIZE"
}
function make_mirrored_pool
{
L_SIZE="$1"
VIRTUAL_DEVICE="$(prepare_vdev "$L_SIZE" "$VIRTUAL_DISK")"
log_must test -e "$VIRTUAL_DEVICE"
VIRTUAL_MIRROR_DEVICE="$(prepare_vdev "$L_SIZE" "$VIRTUAL_MIRROR_DISK")"
log_must test -e "$VIRTUAL_MIRROR_DEVICE"
log_must zpool create "$TESTPOOL" "$VIRTUAL_DEVICE"
log_must zpool attach "$TESTPOOL" "$VIRTUAL_DEVICE" "$VIRTUAL_MIRROR_DEVICE"
}
function export_and_cleanup_vdisk
{
log_must zpool export "$TESTPOOL"
cleanup_lo "$VIRTUAL_DEVICE"
VIRTUAL_DEVICE=
log_must rm "$VIRTUAL_DISK"
}
function run_test_two
{
L_SIZE="$1"
make_mirrored_pool "$L_SIZE"
setup_dataset
log_must zpool detach "$TESTPOOL" "$VIRTUAL_MIRROR_DEVICE"
export_and_cleanup_vdisk
try_import_and_repair false false "-c" "$VIRTUAL_MIRROR_DEVICE"
try_import_and_repair true true "-u" "$VIRTUAL_MIRROR_DEVICE"
cleanup
log_pass "zhack label repair detached test passed with a randomized size of $L_SIZE"
}
function run_test_three
{
L_SIZE="$1"
make_mirrored_pool "$L_SIZE"
setup_dataset
log_must zpool detach "$TESTPOOL" "$VIRTUAL_MIRROR_DEVICE"
export_and_cleanup_vdisk
corrupt_labels "$L_SIZE" "$VIRTUAL_MIRROR_DISK"
try_import_and_repair false false "-u" "$VIRTUAL_MIRROR_DEVICE"
try_import_and_repair true false "-c" "$VIRTUAL_MIRROR_DEVICE"
try_import_and_repair true true "-u" "$VIRTUAL_MIRROR_DEVICE"
cleanup
log_pass "zhack label repair corruption and detached test passed with a randomized size of $L_SIZE"
}
function run_test_four
{
L_SIZE="$1"
make_mirrored_pool "$L_SIZE"
setup_dataset
log_must zpool detach "$TESTPOOL" "$VIRTUAL_MIRROR_DEVICE"
export_and_cleanup_vdisk
corrupt_labels "$L_SIZE" "$VIRTUAL_MIRROR_DISK"
try_import_and_repair true true "-cu" "$VIRTUAL_MIRROR_DEVICE"
cleanup
log_pass "zhack label repair corruption and detached single-command test passed with a randomized size of $L_SIZE."
}
@@ -1,64 +0,0 @@
#!/bin/ksh
#
# This file and its contents are supplied under the terms of the
# Common Development and Distribution License ("CDDL"), version 1.0.
# You may only use this file in accordance with the terms of version
# 1.0 of the CDDL.
#
# A full copy of the text of the CDDL should have accompanied this
# source. A copy of the CDDL is also available via the Internet at
# http://www.illumos.org/license/CDDL.
#
#
# Copyright (c) 2021 by vStack. All rights reserved.
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/include/blkdev.shlib
#
# Description:
# zhack label repair <vdev> will calculate and rewrite label checksum if invalid
#
# Strategy:
# 1. Create pool with some number of vdevs and export it
# 2. Corrupt all labels checksums
# 3. Check that pool cannot be imported
# 4. Use zhack to repair labels checksums
# 5. Check that pool can be imported
#
log_assert "Verify zhack label repair <vdev> will repair labels checksums"
log_onexit cleanup
VIRTUAL_DISK=$TEST_BASE_DIR/disk
function cleanup
{
poolexists $TESTPOOL && destroy_pool $TESTPOOL
[[ -f $VIRTUAL_DISK ]] && log_must rm $VIRTUAL_DISK
}
log_must truncate -s $(($MINVDEVSIZE * 8)) $VIRTUAL_DISK
log_must zpool create $TESTPOOL $VIRTUAL_DISK
log_must zpool export $TESTPOOL
log_mustnot zhack label repair $VIRTUAL_DISK
corrupt_label_checksum 0 $VIRTUAL_DISK
corrupt_label_checksum 1 $VIRTUAL_DISK
corrupt_label_checksum 2 $VIRTUAL_DISK
corrupt_label_checksum 3 $VIRTUAL_DISK
log_mustnot zpool import $TESTPOOL -d $TEST_BASE_DIR
log_must zhack label repair $VIRTUAL_DISK
log_must zpool import $TESTPOOL -d $TEST_BASE_DIR
cleanup
log_pass "zhack label repair works correctly."
@@ -0,0 +1,30 @@
#!/bin/ksh
#
# This file and its contents are supplied under the terms of the
# Common Development and Distribution License ("CDDL"), version 1.0.
# You may only use this file in accordance with the terms of version
# 1.0 of the CDDL.
#
#
# Description:
#
# Test whether zhack label repair can recover
# corrupted checksums on devices of varied size,
# but not undetached devices.
#
# Strategy:
#
# 1. Create pool on a loopback device with some test data
# 2. Export the pool.
# 3. Corrupt all label checksums in the pool
# 4. Check that pool cannot be imported
# 5. Verify that it cannot be imported after using zhack label repair -u
# to ensure that the -u option will quit on corrupted checksums.
# 6. Use zhack label repair -c on device
# 7. Check that pool can be imported and that data is intact
. "$STF_SUITE"/tests/functional/cli_root/zhack/library.kshlib
run_test_one "$(get_devsize)"
@@ -0,0 +1,31 @@
#!/bin/ksh
#
# This file and its contents are supplied under the terms of the
# Common Development and Distribution License ("CDDL"), version 1.0.
# You may only use this file in accordance with the terms of version
# 1.0 of the CDDL.
#
#
# Description:
#
# Test whether zhack label repair can recover
# detached drives on devices of varied size, but not
# repair corrupted checksums.
#
# Strategy:
#
# 1. Create pool on a loopback device with some test data
# 2. Detach either device from the mirror
# 3. Export the pool
# 4. Remove the non-detached device and its backing file
# 5. Verify that the remaining detached device cannot be imported
# 6. Verify that it cannot be imported after using zhack label repair -c
# to ensure that the -c option will not undetach a device.
# 7. Use zhack label repair -u on device
# 8. Verify that the detached device can be imported and that data is intact
. "$STF_SUITE"/tests/functional/cli_root/zhack/library.kshlib
run_test_two "$(get_devsize)"
@@ -0,0 +1,33 @@
#!/bin/ksh
#
# This file and its contents are supplied under the terms of the
# Common Development and Distribution License ("CDDL"), version 1.0.
# You may only use this file in accordance with the terms of version
# 1.0 of the CDDL.
#
#
# Description:
#
# Test whether zhack label repair can recover a device of varied size with
# corrupted checksums and which has been detached.
#
# Strategy:
#
# 1. Create pool on a loopback device with some test data
# 2. Detach either device from the mirror
# 3. Export the pool
# 4. Remove the non-detached device and its backing file
# 5. Corrupt all label checksums on the remaining device
# 6. Verify that the remaining detached device cannot be imported
# 7. Verify that it cannot be imported after using zhack label repair -u
# to ensure that the -u option will quit on corrupted checksums.
# 8. Verify that it cannot be imported after using zhack label repair -c
# -c should repair the checksums, but not undetach a device.
# 9. Use zhack label repair -u on device
# 10. Verify that the detached device can be imported and that data is intact
. "$STF_SUITE"/tests/functional/cli_root/zhack/library.kshlib
run_test_three "$(get_devsize)"
@@ -0,0 +1,30 @@
#!/bin/ksh
#
# This file and its contents are supplied under the terms of the
# Common Development and Distribution License ("CDDL"), version 1.0.
# You may only use this file in accordance with the terms of version
# 1.0 of the CDDL.
#
#
# Description:
#
# Test whether zhack label repair can recover a device of varied size with
# corrupted checksums and which has been detached (in one command).
#
# Strategy:
#
# 1. Create pool on a loopback device with some test data
# 2. Detach either device from the mirror
# 3. Export the pool
# 4. Remove the non-detached device and its backing file
# 5. Corrupt all label checksums on the remaining device
# 6. Verify that the remaining detached device cannot be imported
# 7. Use zhack label repair -cu on device to attempt to fix checksums and
# undetach the device in a single operation.
# 8. Verify that the detached device can be imported and that data is intact
. "$STF_SUITE"/tests/functional/cli_root/zhack/library.kshlib
run_test_four "$(get_devsize)"