ZTS: Use QEMU for tests on Linux and FreeBSD

This commit adds functional tests for these systems:
- AlmaLinux 8, AlmaLinux 9, ArchLinux
- CentOS Stream 9, Fedora 39, Fedora 40
- Debian 11, Debian 12
- FreeBSD 13, FreeBSD 14, FreeBSD 15
- Ubuntu 20.04, Ubuntu 22.04, Ubuntu 24.04

- enabled by default:
 - AlmaLinux 8, AlmaLinux 9
 - Debian 11, Debian 12
 - Fedora 39, Fedora 40
 - FreeBSD 13, FreeBSD 14

Workflow for each operating system:
- install qemu on the github runner
- download current cloud image of operating system
- start and init that image via cloud-init
- install dependencies and poweroff system
- start system and build openzfs and then poweroff again
- clone build system and start 2 instances of it
- run functional testings and complete in around 3h
- when tests are done, do some logfile preparing
- show detailed results for each system
- in the end, generate the job summary

Real-world benefits from this PR:

1. The github runner scripts are in the zfs repo itself. That means
   you can just open a PR against zfs, like "Add Fedora 41 tester", and
   see the results directly in the PR. ZFS admins no longer need
   manually to login to the buildbot server to update the buildbot config
   with new version of Fedora/Almalinux.

2. Github runners allow you to run the entire test suite against your
   private branch before submitting a formal PR to openzfs. Just open a
   PR against your private zfs repo, and the exact same
   Fedora/Alma/FreeBSD runners will fire up and run ZTS. This can be
   useful if you want to iterate on a ZTS change before submitting a
   formal PR.

3. buildbot is incredibly cumbersome. Our buildbot config files alone
   are ~1500 lines (not including any build/setup scripts)!
   It's a huge pain to setup.

4. We're running the super ancient buildbot 0.8.12. It's so ancient
   it requires python2. We actually have to build python2 from source
   for almalinux9 just to get it to run. Ugrading to a more modern
   buildbot is a huge undertaking, and the UI on the newer versions is
   worse.

5. Buildbot uses EC2 instances. EC2 is a pain because:
   * It costs money
   * They throttle IOPS and CPU usage, leading to mysterious,
   * hard-to-diagnose, failures and timeouts in ZTS.
   * EC2 is high maintenance. We have to setup security groups, SSH
   * keys, networking, users, etc, in AWS and it's a pain. We also
   * have to periodically go in an kill zombie EC2 instances that
   * buildbot is unable to kill off.

6. Buildbot doesn't always handle failures well. One of the things we
   saw in the past was the FreeBSD builders would often die, and each
   builder death would take up a "slot" in buildbot. So we would
   periodically have to restart buildbot via a cron job to get the slots
   back.

7. This PR divides up the ZTS test list into two parts, launches two
   VMs, and on each VM runs half the test suite. The test results are
   then merged and shown in the sumary page. So we're basically
   parallelizing ZTS on the same github runner. This leads to lower
   overall ZTS runtimes (2.5-3 hours vs 4+ hours on buildbot), and one
   unified set of results per runner, which is nice.

8. Since the tests are running on a VM, we have much more control over
   what happens. We can capture the serial console output even if the
   test completely brings down the VM. In the future, we could also
   restart the test on the VM where it left off, so that if a single test
   panics the VM, we can just restart it and run the remaining ZTS tests
   (this functionaly is not yet implemented though, just an idea).

9. Using the runners, users can manually kill or restart a test run
   via the github IU. That really isn't possible with buildbot unless
   you're an admin.

10. Anecdotally, the tests seem to be more stable and constant under
    the QEMU runners.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16537
This commit is contained in:
Tino Reichardt
2024-06-17 16:52:58 +02:00
committed by Brian Behlendorf
parent c4d1a19b33
commit bca9b64e7b
14 changed files with 1490 additions and 3 deletions
+70 -3
View File
@@ -1,5 +1,6 @@
#!/bin/sh
#!/usr/bin/env bash
# shellcheck disable=SC2154
# shellcheck disable=SC2292
#
# CDDL HEADER START
#
@@ -208,6 +209,49 @@ find_runfile() {
fi
}
# Given a TAGS with a format like "1/3" or "2/3" then divide up the test list
# into portions and print that portion. So "1/3" for "the first third of the
# test tags".
#
#
split_tags() {
# Get numerator and denominator
NUM=$(echo "$TAGS" | cut -d/ -f1)
DEN=$(echo "$TAGS" | cut -d/ -f2)
# At the point this is called, RUNFILES will contain a comma separated
# list of full paths to the runfiles, like:
#
# "/home/hutter/qemu/tests/runfiles/common.run,/home/hutter/qemu/tests/runfiles/linux.run"
#
# So to get tags for our selected tests we do:
#
# 1. Remove unneeded chars: [],\
# 2. Print out the last field of each tag line. This will be the tag
# for the test (like 'zpool_add').
# 3. Remove duplicates between the runfiles. If the same tag is defined
# in multiple runfiles, then when you do '-T <tag>' ZTS is smart
# enough to know to run the tag in each runfile. So '-T zpool_add'
# will run the zpool_add from common.run and linux.run.
# 4. Ignore the 'functional' tag since we only want individual tests
# 5. Print out the tests in our faction of all tests. This uses modulus
# so "1/3" will run tests 1,3,6,9 etc. That way the tests are
# interleaved so, say, "3/4" isn't running all the zpool_* tests that
# appear alphabetically at the end.
# 6. Remove trailing comma from list
#
# TAGS will then look like:
#
# "append,atime,bootfs,cachefile,checksum,cp_files,deadman,dos_attributes, ..."
# Change the comma to a space for easy processing
_RUNFILES=${RUNFILES//","/" "}
# shellcheck disable=SC2002,SC2086
cat $_RUNFILES | tr -d "[],\'" | awk '/tags = /{print $NF}' | sort | \
uniq | grep -v functional | \
awk -v num="$NUM" -v den="$DEN" '{ if(NR % den == (num - 1)) {printf "%s,",$0}}' | \
sed -E 's/,$//'
}
#
# Symlink file if it appears under any of the given paths.
#
@@ -331,10 +375,14 @@ OPTIONS:
-t PATH|NAME Run single test at PATH relative to test suite,
or search for test by NAME
-T TAGS Comma separated list of tags (default: 'functional')
Alternately, specify a fraction like "1/3" or "2/3" to
run the first third of tests or 2nd third of the tests. This
is useful for splitting up the test amongst different
runners.
-u USER Run single test as USER (default: root)
EXAMPLES:
# Run the default ($(echo "${DEFAULT_RUNFILES}" | sed 's/\.run//')) suite of tests and output the configuration used.
# Run the default ${DEFAULT_RUNFILES//\.run/} suite of tests and output the configuration used.
$0 -v
# Run a smaller suite of tests designed to run more quickly.
@@ -347,7 +395,7 @@ $0 -t tests/functional/cli_root/zfs_bookmark/zfs_bookmark_cliargs.ksh
$0 -t zfs_bookmark_cliargs
# Cleanup a previous run of the test suite prior to testing, run the
# default ($(echo "${DEFAULT_RUNFILES}" | sed 's/\.run//')) suite of tests and perform no cleanup on exit.
# default ${DEFAULT_RUNFILES//\.run//} suite of tests and perform no cleanup on exit.
$0 -x
EOF
@@ -489,6 +537,8 @@ fi
#
TAGS=${TAGS:='functional'}
#
# Attempt to locate the runfiles describing the test workload.
#
@@ -509,6 +559,23 @@ done
unset IFS
RUNFILES=${R#,}
# The tag can be a fraction to indicate which portion of ZTS to run, Like
#
# "1/3": Run first one third of all tests in runfiles
# "2/3": Run second one third of all test in runfiles
# "6/10": Run 6th tenth of all tests in runfiles
#
# This is useful for splitting up the test across multiple runners.
#
# After this code block, TAGS will be transformed from something like
# "1/3" to a comma separate taglist, like:
#
# "append,atime,bootfs,cachefile,checksum,cp_files,deadman,dos_attributes, ..."
#
if echo "$TAGS" | grep -Eq '^[0-9]+/[0-9]+$' ; then
TAGS=$(split_tags)
fi
#
# This script should not be run as root. Instead the test user, which may
# be a normal user account, needs to be configured such that it can