This commit adds functional tests for these systems:
- AlmaLinux 8, AlmaLinux 9, ArchLinux
- CentOS Stream 9, Fedora 39, Fedora 40
- Debian 11, Debian 12
- FreeBSD 13, FreeBSD 14, FreeBSD 15
- Ubuntu 20.04, Ubuntu 22.04, Ubuntu 24.04
- enabled by default:
- AlmaLinux 8, AlmaLinux 9
- Debian 11, Debian 12
- Fedora 39, Fedora 40
- FreeBSD 13, FreeBSD 14
Workflow for each operating system:
- install qemu on the github runner
- download current cloud image of operating system
- start and init that image via cloud-init
- install dependencies and poweroff system
- start system and build openzfs and then poweroff again
- clone build system and start 2 instances of it
- run functional testings and complete in around 3h
- when tests are done, do some logfile preparing
- show detailed results for each system
- in the end, generate the job summary
Real-world benefits from this PR:
1. The github runner scripts are in the zfs repo itself. That means
you can just open a PR against zfs, like "Add Fedora 41 tester", and
see the results directly in the PR. ZFS admins no longer need
manually to login to the buildbot server to update the buildbot config
with new version of Fedora/Almalinux.
2. Github runners allow you to run the entire test suite against your
private branch before submitting a formal PR to openzfs. Just open a
PR against your private zfs repo, and the exact same
Fedora/Alma/FreeBSD runners will fire up and run ZTS. This can be
useful if you want to iterate on a ZTS change before submitting a
formal PR.
3. buildbot is incredibly cumbersome. Our buildbot config files alone
are ~1500 lines (not including any build/setup scripts)!
It's a huge pain to setup.
4. We're running the super ancient buildbot 0.8.12. It's so ancient
it requires python2. We actually have to build python2 from source
for almalinux9 just to get it to run. Ugrading to a more modern
buildbot is a huge undertaking, and the UI on the newer versions is
worse.
5. Buildbot uses EC2 instances. EC2 is a pain because:
* It costs money
* They throttle IOPS and CPU usage, leading to mysterious,
* hard-to-diagnose, failures and timeouts in ZTS.
* EC2 is high maintenance. We have to setup security groups, SSH
* keys, networking, users, etc, in AWS and it's a pain. We also
* have to periodically go in an kill zombie EC2 instances that
* buildbot is unable to kill off.
6. Buildbot doesn't always handle failures well. One of the things we
saw in the past was the FreeBSD builders would often die, and each
builder death would take up a "slot" in buildbot. So we would
periodically have to restart buildbot via a cron job to get the slots
back.
7. This PR divides up the ZTS test list into two parts, launches two
VMs, and on each VM runs half the test suite. The test results are
then merged and shown in the sumary page. So we're basically
parallelizing ZTS on the same github runner. This leads to lower
overall ZTS runtimes (2.5-3 hours vs 4+ hours on buildbot), and one
unified set of results per runner, which is nice.
8. Since the tests are running on a VM, we have much more control over
what happens. We can capture the serial console output even if the
test completely brings down the VM. In the future, we could also
restart the test on the VM where it left off, so that if a single test
panics the VM, we can just restart it and run the remaining ZTS tests
(this functionaly is not yet implemented though, just an idea).
9. Using the runners, users can manually kill or restart a test run
via the github IU. That really isn't possible with buildbot unless
you're an admin.
10. Anecdotally, the tests seem to be more stable and constant under
the QEMU runners.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#16537
That URL shortening scheme should stop working soon [1], while we
don't really need it here.
1. https://developers.googleblog.com/en/google-url-shortener-links-will-no-longer-be-available/
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
The test runner accumulates output from individual tests, then writes it
to the log at the end. If a test hangs or crashes the system half way
through, we get no insight into how it got to where it did.
This adds a -D option for "debug". When set, all test output is written
to stdout.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16096
test-runner.py orchestrates all of the ZTS executions. The `Cmd` object
manages these process, and its `run` method specifically invokes these
possibly long-running processes, possibly retrying in the event of a
timeout. Since its inception, memory leak detection using the kmemleak
infrastructure [1], and kernel logging [2] have been added to this run
mechanism.
However, the callback to cull a process beyond its timeout threshold,
`kill_cmd`, has evaded modernization by both of these changes. As a
result, this function fails to properly invoke `run`, leading to an
untrapped exception and unreported test failure.
This patch extends `kill_cmd` to receive these kernel devices through
the `options` parameter, and regularizes all the `.run` calls from
`Cmd`, and its subclasses, to accept that parameter.
[1] Commit a69765ea5b
[2] Commit fc2c0256c5
Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes#14849
Make the test runner try to use the included python monotonic time
function instead of calling librt.
This makes the test runner work on macos where librt wasn't available.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Closes#14700
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes#14450
Add a -K option to the test suite to log each test name to /dev/kmsg
(on Linux), so if there's a kernel warning we'll be able to match
it up to a particular test.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13227
- Kmemleak `clear` is invoked right before every test case run.
- Kmemleak `scan` is requested right after each test case is finished.
- Kmemleak instrumentation is not used for
setup/cleanup/pretest/posttest/failsafe stages to shorten the test
case execution time.
- Kmemleak periodic scan is disabled (`scan=0`) before the test suite
run to avoid interfering with the on-demand scan results.
- There are unavoidable potential false positives coming from kernel
areas other than OpenZFS module.
- The ZTS with kmemleak enabled duration is increased by ~50%.
Example run
```
Running Time: 07:12:13
Percent passed: 98.3%
unreferenced object 0xffff9da82aea5410 (size 80):
comm "kworker/u32:10", pid 942206, jiffies 4296749716 (age 2615.516s)
hex dump (first 32 bytes):
00 30 30 00 00 00 00 00 ff 8f 30 00 00 00 00 00 .00.......0.....
51 e6 77 05 a8 9d ff ff 00 00 00 00 00 00 00 00 Q.w.............
backtrace:
[<000000005cf1fea2>] alloc_extent_state+0x1d/0xb0 [btrfs]
[<0000000083f78ae5>] set_extent_bit+0x2ff/0x670 [btrfs]
[<00000000de29249e>] lock_extent_bits+0x6b/0xa0 [btrfs]
[<00000000b241f424>] lock_and_cleanup_extent_if_need+0xaf/0x1c0
[btrfs]
[<0000000093ca72b5>] btrfs_buffered_write+0x297/0x7d0 [btrfs]
[<000000002c2938c8>] btrfs_file_write_iter+0x127/0x390 [btrfs]
[<00000000b888f720>] do_iter_readv_writev+0x152/0x1b0
[<00000000320f0bcc>] do_iter_write+0x7c/0x1c0
[<000000000b5a8fe0>] lo_write_bvec+0x62/0x150 [loop]
[<000000009aa03c73>] loop_process_work+0x250/0xbd0 [loop]
[<00000000c7487d8a>] process_one_work+0x1f1/0x390
[<000000000b236831>] worker_thread+0x53/0x3e0
[<0000000023cb3e57>] kthread+0x127/0x150
[<000000002d48676a>] ret_from_fork+0x22/0x30
```
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes#13084
CLOCK_MONOTONIC_RAW is only a thing on Linux and macOS. I'm not
actually sure why the previous hardcoding of a constant didn't
error out, but when we removed it, it sure does now.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12995
Deprecation of Python versions below 3.6 gives opportunity to unify the
build and install requirements for OpenZFS packages. The minimal
supported Python version is 3.6 as this is the most recent Python
package CentOS/RHEL 7 users can get.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes#12925
This was a project proposed as part of the Quality theme for the
hackthon for the 2021 OpenZFS Developer Summit. The idea is to improve
the usability of the automated tests that get run when a PR is created
by having failing tests automatically rerun in order to make flaky
tests less impactful.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes#12740
A bunch of places need to edit files to incorporate the configured paths
i.e. bindir, sbindir etc. Move this logic into a common file.
Create arc_summary by copying arc_summary[23] as appropriate at build
time instead of install time.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes#10559