mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-23 19:04:45 +03:00

Author	SHA1	Message	Date
Tony Hutter	7534fa4df7	CI/GCC: Add Fedora 44, fix build errors and threadsappend - Add Fedora 44 to CI tests - Fix build issues from the newer compiler. These are mostly 'char ' to 'const char ' conversions. - Fix threadsappend.c test waiting for the same thread TID twice. This caused the test to hang on F44 (but strangely not other OSs?) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #18478	2026-05-04 10:38:46 -07:00
Andriy Tkachuk	76fd64ac9f	Fix rare cksum errors after rebuild Currently, after rebuild (aka sequential resilver), checksum errors can be seen sometimes on the spare vdev or draid spare. On my laptop, it happens from 2 to 4 times of running redundancy_draid_spare1 test in a loop for 100 times. It looks like there's a race in vdev_rebuild_thread() when the rebuild of space map ranges is finished and we re-enable allocations from the metaslab too soon: a new allocations may happen from that metaslab before txg with the rebuilt ranges is sync-ed, causing undesirable interference. Solution: wait for the txg to be sync-ed before enabling metaslab. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Andriy Tkachuk <atkachuk@wasabi.com> Closes #18307 Closes #18319 Closes #18473	2026-05-04 10:38:46 -07:00
Brian Behlendorf	b0c1dcb531	ZTS: add targeted redundancy_draid_spare exception When sequentially resilvering a dRAID pool it's possible that a few correctable checksum errors will be reported. This is a known issue which is occasionally observed in the CI. Until it's resolved we want the test case to tolerate a few checksum errors in this scenario to prevent false positives in the CI. This change also has the additional side effect of standardizing in one location how the dRAID pool integrity is verified. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #18307 Issue #18319 Closes #18436	2026-05-04 10:38:46 -07:00
Gary Guo	e7524594a9	Fix read corruption after block clone after truncate When copy_file_range overwrites a recent truncation, subsequent reads can incorrectly determine that it is read hole instead of reading the cloned blocks. This can happen when the following conditions are met: - Truncate adds blkid to dn_free_ranges - A new TXG is created - copy_file_range calls dmu_brt_clone which override the block pointer and set DB_NOFILL - Subsequent read, given DB_NOFILL, hits dbuf_read_impl and dbuf_read_hole - dbuf_read_hole calls dnode_block_freed, which returns TRUE because the truncated blkids are still in dn_free_ranges This will not happen if the clone and truncate are in the same TXG, because the block clone would update the current TXG's dn_free_ranges, which is why this bug only triggers under high IO load (such as compilation). Fix this by skipping the dnode_block_freed call if the block is overridden. The fix shouldn't cause an issue when the cloned block is subsequently freed in later TXGs, as dbuf_undirty would remove the override. This requires a dedicated test program as it is much harder to trigger with scripts (this needs to generate a lot of I/O in short period of time for the bug to trigger reliably). Assisted-by: Gemini:gemini-3.1-pro Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Gary Guo <gary@kernel.org> Closes #18412 Closes #18421	2026-04-23 15:02:27 -07:00
Andriy Tkachuk	da44040bbb	draid: fix cksum errors after rebuild with degraded disks Currently, when more than nparity disks get faulted during the rebuild, only first nparity disks would go to faulted state, and all the remaining disks would go to degraded state. When a hot spare is attached to that degraded disk for rebuild creating the spare mirror, only that hot spare is getting rebuilt, but not the degraded device. So when later during scrub some other attached draid spare happens to map to that spare, it will end up with cksum error. Moreover, if the user clears the degraded disk from errors, the data won't be resilvered to it, hot spare will be detached almost immediately and the data that was resilvered only to it will be lost. Solution: write to all mirrored devices during rebuild, similar to traditional/healing resilvering, but only if we can verify the integrity of the data, or when it's the draid spare we are writing to, in which case we are writing to a reserved spare space, and there is no danger to overwrite any good data. The argument that writing only to rebuilding draid spare vdev is faster than writing to normal device doesn't hold since, at a specific offset being rebuilt, draid spare will be mapped to a normal device anyway. redundancy_draid_degraded2 automation test is added also to cover the scenario. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andriy Tkachuk <atkachuk@wasabi.com> Closes #18414	2026-04-23 15:00:46 -07:00
Brian Behlendorf	e9a8c6e080	draid: allow seq resilver reads from degraded vdevs When sequentially resilvering allow a dRAID child to be read as long as the DTLs indicate it should have a good copy of the data and the leaf isn't being rebuilt. The previous check was slightly too broad and would skip dRAID spare and replacing vdevs if one of their children was being replaced. As long as there exists enough additional redundancy this is fine, but when there isn't this vdev must be read in order to correctly reconstruct the missing data. A new test case has been added which exhausts the available redundancy, faults another device causing it to be degraded, and then performs a sequential resilver for the degraded device. In such a situation enough redundancy exists to perform the replacement and a scrub should detect no checksum errors. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Andriy Tkachuk <andriy.tkachuk@seagate.com> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #18405	2026-04-23 14:59:47 -07:00
Brian Behlendorf	1bc922516e	ZTS: Add back redundancy_draid_spare3 exception Observed again in the CI. Put the maybe exception back in place and reference a newly created issue for this sporadic failure. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #18320	2026-04-23 14:57:08 -07:00
Brian Behlendorf	7894a5e884	ZTS: redundancy_draid_spare{1,3} exceptions Update the redundancy_draid_spare1 exception to reference an issue which describes the failure. Remove the exception for the redundancy_draid_spare3 test. I have not observed it in local testing. If it reproduces in the CI we can create a new issue for it and put back the exception. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #18308	2026-04-23 14:57:00 -07:00
Andriy Tkachuk	938c8c98b1	draid: fix data corruption after disk clear Currently, when there there are several faulted disks with attached dRAID spares, and one of those disks is cleared from errors (zpool clear), followed by its spare being detached, the data in all the remaining spares that were attached while the cleared disk was in FAULTED state might get corrupted (which can be seen by running scrub). In some cases, when too many disks get cleared at a time, this can result in data corruption/loss. dRAID spare is a virtual device whose blocks are distributed among other disks. Those disks can be also in FAULTED state with attached spares on their own. When a disk gets sequentially resilvered (rebuilt), the changes made by that resilvering won't get captured in the DTL (Dirty Time Log) of other FAULTED disks with the attached spares to which the data is written during the resilvering (as it would normally be done for the changes made by the user if a new file is written or some existing one is deleted). It is because sequential resilvering works on the block level, without touching or looking into metadata, so it doesn't know anything about the old BPs or transactions groups that it is resilvering. So later on, when that disk gets cleared from errors and healing resilvering is trying to sync all the data from its spare onto it, all the changes made on its spare during the resilvering of other disks will be missed because they won't be captured in its DTL. That's why other dRAID spares may get corrupted. Here's another way to explain it that might be helpful. Imagine a scenario: 1. d1 fails and gets resilvered to some spare s1 - OK. 2. d2 fails and gets sequentially resilvered on draid spare s2. Now, in some slices, s2 would map to d1, which is failed. But d1 has s1 spare attached, so the data from that resilvering goes to s1, but not recorded in d1's DTL. 3. Now, d1 gets cleared and its s1 gets detached. All the changes done by the user (writes or deletions) have their txgs captured in d1's DTL, so they will be resilvered by the healing resilver from its spare (s1) - that part works fine. But the data which was written during resilvering of d2 and went to s1 - that one will be missed from d1's DTL and won't get resilvered to it. So here we are: 4. s2 under d2 is corrupted in the slices which map to d1, because d1 doesn't have that data resilvered from s1. Now, if there are more failed disks with draid spares attached which were sequentially resilvered while d1 was failed, d3+s3, d4+s4 and so on - all their spares will be corrupted. Because, in some slices, each of them will map to d1 which will miss their data. Solution: add all known txgs starting from TXG_INITIAL to DTLs of non-writable devices during sequential resilvering so when healing resilver starts on disk clear, it would be able to check and heal blocks from all txgs. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com> Closes #18286 Closes #18294	2026-04-23 14:54:23 -07:00
Rob Norris	faddb7f5ca	Linux 7.0: explicitly set setlease handler to kernel implementation The upcoming 7.0 kernel will no longer fall back to generic_setlease(), instead returning EINVAL if .setlease is NULL. So, we set it explicitly. To ensure that we catch any future kernel change, adds a sanity test for F_SETLEASE and F_GETLEASE too. Since this is a Linux-specific test, also a small adjustment to the test runner to allow OS-specific helper programs. Sponsored-by: TrueNAS Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18215	2026-04-23 14:30:53 -07:00
Rob Norris	fc44c73021	build: add SPDX license tags to build system files Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #18077	2026-04-23 14:29:46 -07:00
Austin Wise	612d4019f1	Fix activating large_microzap on receive This ensures that the in-memory state of the feature is recorded and that `dsl_dataset_activate_feature` is not called when the feature is already active. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Austin Wise <AustinWise@gmail.com> Closes #18143 Closes #18144	2026-02-17 11:54:58 -08:00
Marc Sladek	a0350f61c4	Fix `send:raw` permission for send `-w -I` When performing an incremental raw send with intermediates (-w -I), the standard 'send' permission was incorrectly required instead of allowing 'send:raw'. This was due to a strict boolean comparison on the 'rawok' flag in zfs_secpolicy_send() with non-boolean value. This change normalizes the 'rawok' variable to be strictly 0/1 and updates the test suite to properly verify delegated raw send behavior. Introduced-by: https://github.com/openzfs/zfs/pull/17543 Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Marc Sladek <marc@sladek.dev> Closes #18198 Closes #18193	2026-02-11 11:41:13 -08:00
Tony Hutter	936a98c716	ZTS: Fix zed_synchronous_zedlet Wait for scrub_finish (as the comments in the code suggest) rather than trim_finish in zed_synchronous_zedlet.ksh. This seems to workaround the ZTS failures in #18192. Also, fix some typos. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #18192 Closes #18196	2026-02-11 11:41:13 -08:00
Brian Behlendorf	618cfa02ea	ZTS: update the relevant mmp test cases - mmp_concurrent_import: added test case to verify that concurrent import correctness. The pool may only be imported once. - mmp_exported_import: an activity check is now required for pools which were cleanly exported if the system and pool hostids don't match. - mmp_inactive_import: an activity check is now required for any pool which wasn't cleanly exported, even if the system and pool hostids match. - mmp_on_uberblocks: updated expected uberblocks to take in to account the value MMP_INTERVAL_DEFAULT is set too. - mmp_reset_interval: reduce the number of iterations from 10 to 3. This is sufficient to verify functionality and significantly speeds up the test. - mmp_on_uberblocks: adjust the thresholds and increase the runtime to avoid false positives observed in CI. - Update tests to use 'zhack action idle' instead of ztest to improve the reliability of the tests. - Add additional log_note messages to test cases which have multiple verification steps to make it clear which portion of a test failed when reviewing the logs. - Replace default_setup/cleanup_noexit calls with 'zpool create' and 'zpool destroy' calls to avoid additional unnecessary dataset creation work. - Update activity/noactivity check helper functions to use the ZFS_LOAD_INFO_DEBUG information now available from 'zpool import' to determine if this activity check ran and why. This is more reliable in the CI than measuring the runtime. - Removed all mmp tests from the zts-report.py exceptions list. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com>	2026-02-10 17:01:29 -08:00
Alexander Moch	8dec2d94b4	CI: Add Alpine Linux 3.23 runner to the pipeline (#18087 ) Add an Alpine Linux 3.23 runner to the CI chain to run OpenZFS builds and tests against musl libc. Currently, zfs_send_sparse is killed after 10 minutes on Alpine, causing cascading EBUSY failures in the test suite. With zfs_send_sparse disabled, the ZFS test suite reaches a pass rate of 94.62%. This commit introduces the required Alpine-specific setup and a small set of shell and cloud-init compatibility fixes that also apply to existing Linux runners. The Alpine runner is not enabled by default and is not executed for new pull requests. Sponsored-by: ERNW Research GmbH - https://ernw-research.de/ Signed-off-by: Alexander Moch <amoch@ernw.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>	2026-02-10 16:59:18 -08:00
Martin Matuška	d69f7c5e9b	FreeBSD: unbreak compilation on i386 tests/zfs-tests/cmd/mmap_seek.c: use correct printf specifier module/zfs/vdev.c: vdev_clear(): correctly cast argument to atomic_add_64(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Martin Matuska <mm@FreeBSD.org> Closes #18096	2026-02-05 13:48:31 -08:00
shuppy	6218a5eb03	Fix history logging for `zpool create -t` `zpool create` is supposed to log the command to the new pool’s history, as a special record that never gets evicted from the ring buffer. but when you create a pool with `zpool create -t`, no such record is ever logged (#18102). that bug may be the cause of issues like #16408. `zpool create -t` (`83e9986f6e`) and `zpool import -t` (`26b42f3f9d`) are both designed to override the on-disk zpool property `name` with an in-core “temporary” name, but they work somewhat differently under the hood. importing with a temporary name sets `spa->spa_import_flags \|= ZFS_IMPORT_TEMP_NAME` in ZFS_IOC_POOL_IMPORT, which tells spa_write_cachefile() and spa_config_generate() to use the ZPOOL_CONFIG_POOL_NAME in `spa->spa_config` instead of `spa->spa_name`. creating with a temporary name permanently(!) sets the internal zpool property `tname` (ZPOOL_PROP_TNAME) in the `zc->zc_nvlist_src` of ZFS_IOC_POOL_CREATE, which tells zfs_ioc_pool_create() (`4ceb8dd6fd`) and spa_create() to use that name instead of `zc->zc_name`, then sets `spa->spa_import_flags \|= ZFS_IMPORT_TEMP_NAME` like an import. but zfsdev_ioctl_common() fails to check for `tname` when saving the pool name to `zfs_allow_log_key`, so when we call ZFS_IOC_LOG_HISTORY, we call spa_open() on the wrong pool name and get ENOENT, so the logging silently fails. this patch fixes #18102 by checking for `tname` in zfsdev_ioctl_common() like we do in zfs_ioc_pool_create(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: delan azabani <dazabani@igalia.com> Closes #18118 Closes #18102	2026-02-05 13:48:31 -08:00
Austin Wise	65e13c33d8	When receiving a stream with the large block flag, activate feature ZFS send streams include a feature flag DMU_BACKUP_FEATURE_LARGE_BLOCKS to indicate the presence of large blocks in the dataset. On the sending side, this flag is included if the `-L` flag is passed to `zfs send` and the feature is active in the dataset. On the receive side, the stream is refused if the feature is active in the destination dataset but the stream does not include the feature flag. The problem is the feature is only activated when a large block is born. If a large block has been born in the destination, but never the source, the send can't work. This can arise when sending streams back and forth between two datasets. This commit fixes the problem by always activating the large blocks feature when receiving a stream with the large block feature flag. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Austin Wise <AustinWise@gmail.com> Closes #18105	2026-02-05 13:48:31 -08:00
shuppy	bc3320f0cc	ZTS: add regression test for #17180 In #17180, we fixed an interesting bug that i believe i hit in one of my pools, but as far as i can tell, there was no test for it. this patch adds a regression test for #17180, minimised from my attempts to reproduce the bug in a way that resembled the history of my pool. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam Moss <c@yotes.com> Signed-off-by: delan azabani <dazabani@igalia.com> Closes #18109	2026-02-05 13:48:31 -08:00
Ivan Shapovalov	6ab8f46c6c	cmd/zfs: clone: accept `-u` to not mount newly created datasets Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #18080	2026-02-05 13:48:31 -08:00
Alexander Motin	d5724f8f3f	ZTS: Fix zvol_misc_fua SLOG writes check Instead of comparing number of SLOG writes to number of normal writes we should just make sure SLOG got the required number of writes. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #18033	2026-02-05 13:48:30 -08:00
Ameer Hamza	47319ef7a6	ZTS: Add test for snapshot automount race Add snapshot_019_pos to verify parallel snapshot automount operations don't cause AVL tree panic. Regression test for commit `4ce030e025`. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #18035	2025-12-10 10:21:29 -08:00
Chunwei Chen	028d66b9dd	Fix ddtprune causing space leak In zio_ddt_free, if a pruned dde is still in ddt, it would do nothing and cause space leak. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #17982 Closes #17983	2025-12-10 10:21:29 -08:00
Tony Hutter	206487b9b1	CI: Fix Ubuntu 22.01 rsend failures For whatever reason, the single `log_note` in the `directory_diff` function causes the function to stop executing on Ubuntu 22. This causes most of the rsend tests to fail. Remove the line since it's only informational. Signed-off-by: Tony Hutter <hutter2@llnl.gov>	2025-12-10 10:21:29 -08:00
Brian Behlendorf	ed87bc593f	ZTS: Add slow_vdev_degraded_sit_out retry While not common the draid3 vdev type has been observed to not always sit out a vdev when run in the CI. To prevent continued false positives allow the test to be retried up to three times before considering it a failure. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #18003	2025-12-10 10:21:29 -08:00
Alexander Motin	68c1df8db3	ZFS: Enable more logs for raidz_001_neg The output is not so big here, so lets collect something useful. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17977	2025-12-10 10:21:29 -08:00
Mariusz Zaborski	1e8c96d7d5	Add knob to disable slow io notifications Introduce a new vdev property `VDEV_PROP_SLOW_IO_REPORTING` that allows users to disable notifications for slow devices. This prevents ZED and/or ZFSD from degrading the pool due to slow I/O. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mariusz Zaborski <oshogbo@FreeBSD.org> Closes 17477	2025-11-12 13:07:14 -08:00
Alexander Motin	41878d57ea	Add BRT support to zpool prefetch command Implement BRT (Block Reference Table) prefetch functionality similar to existing DDT prefetch. This allows preloading BRT metadata into ARC to improve performance for block cloning operations and frees of earlier cloned blocks. Make -t parameter optional. When omitted, prefetch all supported metadata types (both DDT and BRT now). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17890	2025-11-12 13:07:09 -08:00
Toomas Soome	82d59f7666	ZTS: autotrim_config.ksh is missing pool type functional/trim tests do create pools of different types to test trim, autotrim_config.ksh is missing the type from zpool create command line while we are looping over different pool types. Sponsored-by: Edgecast Cloud LLC. Signed-off-by: Toomas Soome <tsoome@me.com> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17874	2025-11-12 13:04:47 -08:00
Rob Norris	f43839e7fd	ZTS: fail test run if test runner crashes unexpectedly zfs-tests.sh executes test-runner.py to do the actual test work. Any exit code < 4 is interpreted as success, with the actual value describing the outcome of the tests inside. If a Python program crashes in some way (eg an uncaught exception), the process exit code is 1. Taken together, this means that test-runner.py can crash during setup, but return a "success" error code to zfs-tests.sh, which will report and exit 0. This in turn causes the CI runner to believe the test run completed successfully. This commit addresses this by making zfs-tests.sh interpret an exit code of 255 as a failure in the runner itself. Then, in test-runner.py, the "fail()" function defaults to a 255 return, and the main function gets wrapped in a generic exception handler, which prints it and calls fail(). All together, this should mean that any unexpected failure in the test runner itself will be propagated out of zfs-tests.sh for CI or any other calling program to deal with. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17858	2025-11-12 12:57:59 -08:00
Rob Norris	1956417b54	mmap_seek: print error code and text on failure If lseek() returns an unexpected error, it's useful to know the error code to help connect it to the trouble spot inside the module. Since the two seek functions should be basically identical, lift them into a single generic function. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Robert Evans <evansr@google.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17843	2025-10-21 09:50:43 -07:00
Ameer Hamza	30a3e609a2	zpool_reopen_004_pos: Clear label from offline disk after destroy zpool_reopen_004_pos destroys a pool with an offline disk, leaving its label intact. In TrueNAS local repo, zpool_reopen_005_pos is skipped, causing zpool_reopen_007_pos to fail as it doesn't use -f flag when creating pools unlike zpool_reopen_005_pos. Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17831	2025-10-21 09:50:43 -07:00
Tony Hutter	e09c86cb1f	zvol: verify IO type is supported ZVOLs don't support all block layer IO request types. Add a check for the IO types we do support. Also, remove references to io_is_secure_erase() since they are not supported on ZVOLs. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #17803	2025-10-21 09:50:43 -07:00
Paul Dagnelie	073b34b3ee	Fix display of default xattr to show 'sa' When the default value of the xattr property was changed from 'dir' to 'sa', the code that displays the property's value was not affected. The problem with this state of affairs is that 1) user tooling that specifically looked for 'sa' before will be confused now that the code displays 'on' instead. And 2) users may be confused when manually running the commands about which specific type of xattr is in use unless they are up to date on the latest zfs changes. The fix here is to show the actual type always, rather than 'on' if we happen to be using the default. This turns out to be easy to do, by simply reordering the list of xattr values in the properties code. When the property is displayed, we iterate down the table until we find a row with a matching value, and use that row's name as the display. Reordering the row fixes the display without affecting any other code. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17801	2025-10-21 09:50:43 -07:00
Rob Norris	35ec4b14ab	zpool iostat: refresh pool list every interval When running zpool iostat in interval mode, it would not notice any new pools created or imported, and would forget any destroyed or exported, so would not notice if they came back. This leads to outputting "no pools available" every interval until killed. It looks like this was at least intended to work; the comment above zpool_do_iostat() indicates that it is expected to "deal with pool creation/destruction" and that pool_list_update() would detect new pools. That call however was removed in `3e43edd2c5`, though its unclear if that broke this behaviour and it wasn't noticed, or if it never worked, or if something later broke it. That said, the lack of pool_list_update() is only part of the reason it doesn't work properly. The fundamental problem is that the various things involved in refreshing or updating the list of pools would aggressively ignore, remove, skip or fail on pools that stop existing, or that already exist. Mostly this meant that once a pool is removed from the list, it will never be seen again. Restoring pool_list_update() to the zpool_do_iostat() loop only partially fixes this - it would find "new" pools again, but only in the "all pools" (no args) mode, and because its iterator callback add_pool() would abort the iterator if it already has a pool listed, it would only add pools if there weren't any already. So, this commit reworks the structure somewhat. pool_list_update() becomes pool_list_refresh(), and will ensure the state of all pools in the list are updated. In the "all pools" mode, it will also add new pools and remove pools that disappear, but when a fixed list of pools is used, the list doesn't change, only the state of the pools within it. The rest of the commit is adjusting things for this much simpler structure. Regardless of the mode in use, pool_list_refresh() will always do the right thing, so the driver code can just get on with the display. Now that pools can appear and disappear, I've made it so the header (if enabled) is re-printed when the list changes, so that its easier to see what's happening if the column widths change. Since this is all rather complicated, I've included tests for the "all pools" and "set of pools" modes. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17786	2025-09-29 16:50:49 -07:00
Tony Hutter	abda34b1c0	CI: Add ZTS -O option, log Setup Testing Machines step Add a -O option to zfs-test.sh to dump debug information on test timeout. The debug info includes: - 30 lines from 'top' - /proc/<PID>/stack output of process with highest CPU usage - Last lines strace-ing process with highest CPU usage - /proc/sysrq-trigger kernel stack traces All debug information gets dumped to /dev/kmsg (Linux only). In addition, print out the VM console lines from the "Setup Testing Machines" step. We have often see VMs timeout at this step and don't know why. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #17753	2025-09-29 16:50:46 -07:00
Tony Hutter	9079f986ae	zvol: Fix blk-mq sync The zvol blk-mq codepaths would erroneously send FLUSH and TRIM commands down the read codepath, rather than write. This fixes the issue, and updates the zvol_misc_fua test to verify that sync writes are actually happening. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #17761 Closes #17765	2025-09-29 16:50:43 -07:00
patrickxia	e1a6ec42d4	zdb: add ZFS_KEYFORMAT_RAW support for -K option This change adds support for ZFS_KEYFORMAT_RAW to zdb_derive_key in zdb.c. The implementation reads the raw key from the file specified by the -K option which is consistent with how raw keys are handled in the other parts of ZFS, along with a check to ensure that the keyfile doesn't have too many bytes. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Patrick Xia <patrickx@google.com> Closes #17783	2025-09-25 12:08:20 -07:00
Brian Behlendorf	d33d0cac5a	Fix 'zpool add' safety check corner cases Three cases were discovered where 'zpool add' would fail to warn when adding vdevs to a pool with a mismatched replication level. These are: 1. When a pool contains mixed file and disk vdevs. 2. When a pool contains an active dRAID distributed spare 3. When a pool contains an active hot spare The lack of warnings are caused by get_replication() assessing the current pool configuration an inconsistent and disabling the mismatched replication check for the new pool configuration after 'zpool add'. This change updates get_replication() to be slightly more tolerant in the non-fatal case. The zpool_add_010_pos.ksh test case was split in to separate tests: zpool_add_warn_create.ksh, pool_add_warn_degraded.ksh, and zpool_add_warn_removal. These test were extended to include coverage for dRAID pools and the three scenarios described above. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17780	2025-09-25 12:08:09 -07:00
Brian Behlendorf	9bd8f4379c	ZTS: update upgrade_readonly_pool.ksh Modify the test case to use the `zfs mount` command instead of directly calling the mount command, create a dedicated dataset, and use the default mount point. These changes are intended to preserve the intent of the original test case and resolve some spurious mount failures which have been observed by the CI. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17785	2025-09-25 12:08:06 -07:00
buzzingwires	a056b3c341	Add `typeset`s to `zhack label repair` test scripts As a quality assurance measure, `typeset` is added to local variable declarations to actually enforce their intended scope. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: buzzingwires <buzzingwires@outlook.com> Closes #17732	2025-09-17 16:34:04 -07:00
buzzingwires	5f7253ca11	Refactor `zhack label repair` and fix `-c` regression on nonzero TXG This commit fixes a likely regression introduced by 64db435 where the checksum repair functionality (`-c` or default behavior) will perform checks and access data associated with the newer undetach (`-u`) functionality, resulting in a failure when an uberblock's TXG is not 0 as required by `-u` but not `-c` Additionally, code is refactored for better separation of tasks. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: buzzingwires <buzzingwires@outlook.com> Closes #17732	2025-09-17 16:33:59 -07:00
Brian Behlendorf	a4cb155e8d	ZTS: default to random data in fill_fs Update the fill_fs helper function to request a random fill pattern when the "data" argument isn't specified. This ensures the default behavior is to perform a more realistic fill of incompressible blocks. Additionally, update a few test cases to specify a random fill. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17739	2025-09-15 12:43:52 -07:00
Brian Behlendorf	53c8d7071d	ZTS: Fix zfs_send_delegation_user test Correct the path in the common.run file. The zfs_send_delegation_user test is installed under cli_user not cli_root. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17740	2025-09-15 12:43:48 -07:00
Paul Dagnelie	cac483dbd4	Fix time database update calculations The time database update math assumed that the timestamps were in nanoseconds, but at some point in the development or review process they changed to seconds. This PR fixes the math to use seconds instead. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #17735	2025-09-15 12:43:34 -07:00
Brian Behlendorf	c9de42e089	ZTS: refreserv/refreserv_raidz improvements Several small changes intended to make this test reliable. - Leave the default compression enabled for the pool and switch to using /dev/urandom as the data source. Functionally this shouldn't impact the test but it's preferable to test with the pool defaults when possible. - Verify the device is created and removed as required. Switch to a unique volume name for a more clarity in the logs. - Use the ZVOL_DEVDIR to specify the device path. - Speed up the test by creating the pool with an ashift=12 and testing 4K, 8K, 128K volblocksizes. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17725	2025-09-12 15:05:26 -07:00
Paul Dagnelie	2f41193a26	Make new zhack test a little more reliable Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #17728	2025-09-12 15:05:14 -07:00
JT Pennington	43a9d9ac57	Add send:encrypted test Create tests for the new send:encrypted permission Sponsored-by: Klara, Inc. Sponsored-by: Karakun AG Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: JT Pennington <jt.pennington@klarasystems.com> Closes #17543	2025-09-12 15:05:05 -07:00
Tony Hutter	4a7a04630d	zed: Add synchronous zedlets Historically, ZED has blindly spawned off zedlets in parallel and never worried about their completion order. This means that you can potentially have zedlets for event number 2 starting before zedlets for event number 1 had finished. Most of the time this is fine, and it actually helps a lot when the system is getting spammed with hundreds of events. However, there are times when you want your zedlets to be executed in sequence with the event ID. That is where synchronous zedlets come in. ZED will wait for all previously spawned zedlets to finish before running a synchronous zedlet. Synchronous zedlets are guaranteed to be the only zedlet running. No other zedlets may run in parallel with a synchronous zedlet. Users should be careful to only use synchronous zedlets when needed, since they decrease parallelism. To make a zedlet synchronous, simply add a "-sync-" immediately following the event name in the zedlet's file name: EVENT_NAME-sync-ZEDLETNAME.sh For example, if you wanted a synchronous statechange script: statechange-sync-myzedlet.sh Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #17335	2025-09-11 15:58:59 -07:00

1 2 3 4 5 ...

1568 Commits