CI: Add ZTS -O option, log Setup Testing Machines step

Add a -O option to zfs-test.sh to dump debug information on test
timeout.  The debug info includes:

- 30 lines from 'top'
- /proc/<PID>/stack output of process with highest CPU usage
- Last lines strace-ing process with highest CPU usage
- /proc/sysrq-trigger kernel stack traces

All debug information gets dumped to /dev/kmsg (Linux only).

In addition, print out the VM console lines from the "Setup Testing
Machines" step.  We have often see VMs timeout at this step and don't
know why.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17753
This commit is contained in:
Tony Hutter
2025-09-29 16:32:05 -07:00
parent 9d48e0150c
commit c2a641e4b8
5 changed files with 93 additions and 13 deletions
+8 -1
View File
@@ -37,6 +37,7 @@ DEBUG=""
CLEANUP="yes"
CLEANUPALL="no"
KMSG=""
TIMEOUT_DEBUG=""
LOOPBACK="yes"
STACK_TRACER="no"
FILESIZE="4G"
@@ -363,6 +364,7 @@ OPTIONS:
-k Disable cleanup after test failure
-K Log test names to /dev/kmsg
-f Use files only, disables block device tests
-O Dump debugging info to /dev/kmsg on test timeout
-S Enable stack tracer (negative performance impact)
-c Only create and populate constrained path
-R Automatically rerun failing tests
@@ -401,7 +403,7 @@ $0 -x
EOF
}
while getopts 'hvqxkKfScRmn:d:Ds:r:?t:T:u:I:' OPTION; do
while getopts 'hvqxkKfScRmOn:d:Ds:r:?t:T:u:I:' OPTION; do
case $OPTION in
h)
usage
@@ -444,6 +446,9 @@ while getopts 'hvqxkKfScRmn:d:Ds:r:?t:T:u:I:' OPTION; do
export NFS=1
. "$nfsfile"
;;
O)
TIMEOUT_DEBUG="yes"
;;
d)
FILEDIR="$OPTARG"
;;
@@ -766,6 +771,7 @@ msg "${TEST_RUNNER}" \
"${DEBUG:+-D}" \
"${KMEMLEAK:+-m}" \
"${KMSG:+-K}" \
"${TIMEOUT_DEBUG:+-O}" \
"-c \"${RUNFILES}\"" \
"-T \"${TAGS}\"" \
"-i \"${STF_SUITE}\"" \
@@ -776,6 +782,7 @@ msg "${TEST_RUNNER}" \
${DEBUG:+-D} \
${KMEMLEAK:+-m} \
${KMSG:+-K} \
${TIMEOUT_DEBUG:+-O} \
-c "${RUNFILES}" \
-T "${TAGS}" \
-i "${STF_SUITE}" \