Add allocation profile export and zhack subcommand for import

When attempting to debug performance problems on large systems, one of
the major factors that affect performance is free space
fragmentation. This heavily affects the allocation process, which is an
area of active development in ZFS. Unfortunately, fragmenting a large
pool for testing purposes is time consuming; it usually involves filling
the pool and then repeatedly overwriting data until the free space
becomes fragmented, which can take many hours. And even if the time is
available, artificial workloads rarely generate the same fragmentation
patterns as the natural workloads they're attempting to mimic.

This patch has two parts. First, in zdb, we add the ability to export
the full allocation map of the pool. It iterates over each vdev,
printing every allocated segment in the ms_allocatable range tree. This
can be done while the pool is online, though in that case the allocation
map may actually be from several different TXGs as new ones are loaded
on demand.

The second is a new subcommand for zhack, zhack metaslab leak (and its
supporting kernel changes). This is a zhack subcommand that imports a
pool and then modified the range trees of the metaslabs, allowing the
sync process to write them out normall. It does not currently store
those allocations anywhere to make them reversible, and there is no
corresponding free subcommand (which would be extremely dangerous); this
is an irreversible process, only intended for performance testing. The
only way to reclaim the space afterwards is to destroy the pool or roll
back to a checkpoint.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17576
This commit is contained in:
Paul Dagnelie
2025-04-24 13:01:00 -07:00
committed by Brian Behlendorf
parent ca4f7d6d49
commit 26983d6fa7
10 changed files with 329 additions and 12 deletions
+1
View File
@@ -1009,6 +1009,7 @@ nobase_dist_datadir_zfs_tests_tests_SCRIPTS += \
functional/cli_root/zhack/zhack_label_repair_002.ksh \
functional/cli_root/zhack/zhack_label_repair_003.ksh \
functional/cli_root/zhack/zhack_label_repair_004.ksh \
functional/cli_root/zhack/zhack_metaslab_leak.ksh \
functional/cli_root/zpool_add/add_nested_replacing_spare.ksh \
functional/cli_root/zpool_add/add-o_ashift.ksh \
functional/cli_root/zpool_add/add_prop_ashift.ksh \
@@ -0,0 +1,70 @@
#!/bin/ksh
# SPDX-License-Identifier: CDDL-1.0
#
# This file and its contents are supplied under the terms of the
# Common Development and Distribution License ("CDDL"), version 1.0.
# You may only use this file in accordance with the terms of version
# 1.0 of the CDDL.
#
#
# Description:
#
# Test whether zhack metaslab leak functions correctly
#
# Strategy:
#
# 1. Create pool on a loopback device with some test data
# 2. Gather pool capacity stats
# 3. Generate fragmentation data with zdb
# 4. Destroy the pool
# 5. Create a new pool with the same configuration
# 6. Export the pool
# 7. Apply the fragmentation information with zhack metaslab leak
# 8. Import the pool
# 9. Verify that pool capacity stats match
. "$STF_SUITE"/include/libtest.shlib
verify_runnable "global"
function cleanup
{
zpool destroy $TESTPOOL
rm $tmp
}
log_onexit cleanup
log_assert "zhack metaslab leak leaks the right amount of space"
typeset tmp=$(mktemp)
log_must zpool create $TESTPOOL $DISKS
for i in `seq 1 16`; do
log_must dd if=/dev/urandom of=/$TESTPOOL/f$i bs=1M count=16
log_must zpool sync $TESTPOOL
done
for i in `seq 2 2 16`; do
log_must rm /$TESTPOOL/f$i
done
for i in `seq 1 16`; do
log_must touch /$TESTPOOL/g$i
log_must zpool sync $TESTPOOL
done
alloc=$(zpool get -Hpo value alloc $TESTPOOL)
log_must eval "zdb -m --allocated-map $TESTPOOL > $tmp"
log_must zpool destroy $TESTPOOL
log_must zpool create $TESTPOOL $DISKS
log_must zpool export $TESTPOOL
log_must eval "zhack metaslab leak $TESTPOOL < $tmp"
log_must zpool import $TESTPOOL
alloc2=$(zpool get -Hpo value alloc $TESTPOOL)
[[ $((alloc * 1.05)) -gt $alloc2 ]] && [[ $alloc -lt $alloc2 ]] || \
log_fail "space usage changed too much: $alloc to $alloc2"
log_pass "zhack metaslab leak behaved correctly"