mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-26 12:12:13 +03:00

Author	SHA1	Message	Date
Ryan Moeller	ac0fd40c8c	Add zpool properties for allocation class space The existing zpool properties accounting pool space (size, allocated, fragmentation, expandsize, free, capacity) are based on the normal metaslab class or are cumulative properties of several classes combined. Add properties reporting the space accounting metrics for each metaslab class individually. Also introduce pool-wide AVAIL, USABLE, and USED properties reporting values corresponding to FREE, SIZE, and ALLOC deflated for raidz. Update ZTS to recognize the new properties and validate reported values. While in zpool_get_parsable.cfg, add "fragmentation" to the list of parsable properties. Sponsored-by: Klara, Inc. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Ryan Moeller <ryan.moeller@klarasystems.com> Cloes #18238	2026-03-02 15:50:23 -08:00
Ryan Moeller	6ba3f915d0	zcommon: Fix description of vdev capacity format Capacity is reported as a percentage not a size. Sponsored-by: Klara, Inc. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Ryan Moeller <ryan.moeller@klarasystems.com> Closes #18238	2026-03-02 15:49:23 -08:00
Akash B	f8e5af53e9	Fix redundant declaration of dsl_pool_t Remove redundant dsl_pool variable and duplicate spa_get_dsl() call in vdev_rebuild_thread. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Akash B <akash-b@hpe.com> Closes #18263	2026-02-27 10:39:52 -08:00
Andriy Tkachuk	f8457fbdc4	Fix deadlock on dmu_tx_assign() from vdev_rebuild() vdev_rebuild() is always called with spa_config_lock held in RW_WRITER mode. However, when it tries to call dmu_tx_assign() the latter may hang on dmu_tx_wait() waiting for available txg. But that available txg may not happen because txg_sync takes spa_config_lock in order to process the current txg. So we have a deadlock case here: - dmu_tx_assign() waits for txg holding spa_config_lock; - txg_sync waits for spa_config_lock not progressing with txg. Here are the stacks: __schedule+0x24e/0x590 schedule+0x69/0x110 cv_wait_common+0xf8/0x130 [spl] __cv_wait+0x15/0x20 [spl] dmu_tx_wait+0x8e/0x1e0 [zfs] dmu_tx_assign+0x49/0x80 [zfs] vdev_rebuild_initiate+0x39/0xc0 [zfs] vdev_rebuild+0x84/0x90 [zfs] spa_vdev_attach+0x305/0x680 [zfs] zfs_ioc_vdev_attach+0xc7/0xe0 [zfs] cv_wait_common+0xf8/0x130 [spl] __cv_wait+0x15/0x20 [spl] spa_config_enter+0xf9/0x120 [zfs] spa_sync+0x6d/0x5b0 [zfs] txg_sync_thread+0x266/0x2f0 [zfs] The solution is to pass txg returned by spa_vdev_enter(spa) at the top of spa_vdev_attach() to vdev_rebuild() and call dmu_tx_create_assigned(txg) which doesn't wait for txg. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com> Closes #18210 Closes #18258	2026-02-26 11:18:02 -08:00
Rob Norris	f3d4c79496	zpl_super: prefer "new" mount API when available This API has been available since kernel 5.2, and having it available (almost) everywhere should give us a lot more flexibility for mount management in the future. Sponsored-by: TrueNAS Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18260	2026-02-25 13:17:33 -08:00
Rob Norris	09c27a14a3	icp: add SHA512 implementation using Intel SHA512 extensions Generated from crypto/sha/asm/sha512-x86_64.pl in openssl/openssl@241d4826f8. Sponsored-by: TrueNAS Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18233	2026-02-25 12:48:30 -08:00
Rob Norris	3547a358fd	simd: detect and surface support for Intel SHA512 extensions Recent Intel CPUs (starting with Arrow Lake and Lunar Lake) include new vectorised SHA512 instructions. Detect them and make them available to the rest of the system. Note the internal name "sha512ext". This is to disambiguate from other uses of "sha512". Sponsored-by: TrueNAS Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18233	2026-02-25 12:47:48 -08:00
clefru	6495dafd58	range_tree: use zfs_panic_recover() for partial-overlap remove zfs_range_tree_remove_impl() used a bare panic() when a segment to be removed was not completely overlapped by an existing tree entry. Every other consistency check in range_tree.c uses zfs_panic_recover(), which respects the zfs_recover tunable and allows pools with on-disk corruption to be imported and recovered. This one call was inconsistent, making the partial-overlap case unrecoverable regardless of zfs_recover. Replace panic() with zfs_panic_recover() so that operators can set zfs_recover=1 to import a corrupted pool and reclaim data, consistent with all other range tree error paths. Related-to: https://github.com/openzfs/zfs/issues/13483 Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Closes #18255	2026-02-25 11:26:10 -08:00
Tony Hutter	4da3f059a3	CI: Remove deprecated Fedora 41 Fedora 41 was deprecated on Dec 15 2025. Remove it from CI tests. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #18261	2026-02-25 11:20:23 -08:00
Alexander Motin	991fc56fae	Introduce dedupused/dedupsaved pool properties Currently there is only a dedup ratio reported via pool properties. If dedup is enabled only for some datasets, it is impossible to say how much space the ratio actually covers. Fix this by introducing dedupused/dedupsaved pool properties, similar to earlier added block cloning ones. Combined with work to expose allocation classes stats, it should give user-space enough visibility to correlate `zpool list` and `zfs list` space numbers. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Ryan Moeller <ryan.moeller@klarasystems.com> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #18245	2026-02-25 09:41:38 -05:00
Mateusz Piotrowski	3408332d71	zhack: Fix importing large allocation profiles on small pools (#18256 ) This patch fixes a segmentation fault in zhack metaslab leak which might be triggered by feeding zhack with a fragmentation profile that's exported from a pool larger than the target pool. Fixes: `8f15d2e4d5` Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>	2026-02-24 10:24:22 -08:00
Rob Norris	0f608aa6ca	Linux 7.0: add shims for the fs_context-based mount API The traditional mount API has been removed, so detect when its not available and instead use a small adapter to allow our existing mount functions to keep working. Sponsored-by: TrueNAS Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18216	2026-02-23 09:45:12 -08:00
Rob Norris	d34fd6cff3	Linux 7.0: posix_acl_to_xattr() now allocates memory Kernel devs noted that almost all callers to posix_acl_to_xattr() would check the ACL value size and allocate a buffer before make the call. To reduce the repetition, they've changed it to allocate this buffer internally and return it. Unfortunately that's not true for us; most of our calls are from xattr_handler->get() to convert a stored ACL to an xattr, and that call provides a buffer. For now we have no other option, so this commit detects the new version and wraps to copy the value back into the provided buffer and then free it. Sponsored-by: TrueNAS Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18216	2026-02-23 09:44:48 -08:00
Rob Norris	204de946eb	Linux 7.0: blk_queue_nonrot() renamed to blk_queue_rot() It does exactly the same thing, just inverts the return. Detect its presence or absence and call the right one. Sponsored-by: TrueNAS Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18216	2026-02-23 09:44:20 -08:00
Attila Fülöp	7744f04962	SIMD: libspl: test the correct CPUID bit for AVX512VL Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #18254	2026-02-23 09:42:25 -08:00
Christos Longros	6a717f31e6	Improve misleading error messages for ZPOOL_STATUS_CORRUPT_POOL When devices are missing or claimed by another subsystem (e.g. mdadm, LVM), zpool import reports "The pool metadata is corrupted" and suggests destroying the pool. This is misleading because the metadata is not necessarily corrupted -- it may simply be incomplete due to inaccessible devices. Update the status, action, and recovery messages to acknowledge that missing devices can trigger this status, and suggest checking device availability before resorting to pool destruction. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Longros <chris.longros@gmail.com> Closes #18251 Closes #8236	2026-02-23 09:41:24 -08:00
Louis Leseur	bbf0106c6b	build: get objtool from $kernelbuild On systems where `$kernelsrc` is different than `$kernelbuild`, the objtool binary will be located in `$kernelbuild` as it's the result of running `make prepare` during kernel build. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Louis Leseur <louis.leseur@gmail.com> Closes #18248 Closes #18249	2026-02-23 09:39:51 -08:00
MigeljanImeri	4975430cf5	Add vdev property to disable vdev scheduler Added vdev property to disable the vdev scheduler. The intention behind this property is to improve IOPS performance when using o_direct. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: MigeljanImeri <ImeriMigel@gmail.com> Closes #17358	2026-02-23 09:34:33 -08:00
Tony Hutter	d2f5cb3a50	Move range_tree, btree, highbit64 to common code Break out the range_tree, btree, and highbit64/lowbit64 code from kernel space into shared kernel and userspace code. This is needed for the updated `zpool status -vv` error byte range reporting that will be coming in a future commit. That commit needs the range_tree code in kernel and userspace. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #18133	2026-02-22 11:43:51 -08:00
Rob Norris	168023b603	Linux 7.0: explicitly set setlease handler to kernel implementation The upcoming 7.0 kernel will no longer fall back to generic_setlease(), instead returning EINVAL if .setlease is NULL. So, we set it explicitly. To ensure that we catch any future kernel change, adds a sanity test for F_SETLEASE and F_GETLEASE too. Since this is a Linux-specific test, also a small adjustment to the test runner to allow OS-specific helper programs. Sponsored-by: TrueNAS Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18215	2026-02-22 11:39:06 -08:00
Rob Norris	d11c661544	zdb: handle key load/derive failures a bit more gracefully There's no real need to outright crash if key loading fails; we can just unwind nicely. Sponsored-by: TrueNAS Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18230	2026-02-20 13:37:43 -08:00
Rob Norris	9f874ad092	zdb: don't try to load key for unencrypted dataset Previously using -K/--key on an unencrypted dataset would trip a VERIFY, because the dataset has nowhere to load the key into. Now, just ignore it. This makes zdb much easier to drive when there's a mix of encrypt and non-encrypted datasets, as the key can provided for all of them (at least, assuming the same encryption root, which is a common enough case). Sponsored-by: TrueNAS Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18230	2026-02-20 13:37:11 -08:00
Rob Norris	b021cb60aa	ZTS: make get_same_blocks() fail harder if zdb fails Because it's called in $(...), it will swallow all errors, so we have to work harder to recognise falure and echo a string that can't ever match what the test is expecting. Sponsored-by: TrueNAS Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18230	2026-02-20 13:36:49 -08:00
Rob Norris	aeb9fb3828	sha2_test: do correctness checks for all implementations Sponsored-by: TrueNAS Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18232	2026-02-19 15:16:36 -08:00
Rob Norris	b291d9aa22	get_cpu_freq: handle CPUs with variable frequency If a CPU has variable frequency, then lscpu will list separate "CPU min freq" and "CPU max freq" values. In this case, take the maximum. Sponsored-by: TrueNAS Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18232	2026-02-19 15:16:18 -08:00
Alexander Motin	d06a1d9ac3	Fix available space accounting for special/dedup (#18222 ) Currently, spa_dspace (base to calculate dataset AVAIL) only includes the normal allocation class capacity, but dd_used_bytes tracks space allocated across all classes. Since we don't want to report free space of other classes as available (we can't promise new allocations will be able to use it), report only allocated space, similar to how we report space saved by dedup and block cloning. Since we need deflated space here, make allocation classes track deflated allocated space also. While here, make mc_deferred also deflated, matching its use contexts. Also while there, use atomic_load() to read the allocation class stats. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #18190 Closes #18222	2026-02-19 10:36:35 -08:00
Tony Hutter	640a217faf	CI: Test & fix Linux ZFS built-in build ZFS can be built directly into the Linux kernel. Add a test build of this to the CI to verify it works. The test build is only enabled on Fedora runners (since they run the newest kernels) and is done in parallel with ZTS. The test build is done on vm2, since it typically finishes ~15min before vm1 and thus has time to spare. In addition: - Update 'copy-builtin' to check that $1 is a directory - Fix some VERIFYs that were causing the built-in build to fail Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #18234	2026-02-19 10:15:41 -08:00
Attila Fülöp	c8a72a27e5	ICP: AES-GCM assembly: remove unused Gmul functions In the AES-GCM assembly files we are defining Gmul functions we don't use anywhere. Just remove the dead code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #18226	2026-02-19 10:10:02 -08:00
Alexander Motin	370570890f	Remove parent ZIO from dbuf_prefetch() I am not sure why it was added there 10 years ago, but it seems not needed now. According to my tests removing it improves sequential read performance with recordsize=4K by 5-10% by reducing the CPU overhead in prefetcher. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #18214	2026-02-18 18:12:13 -08:00
Attila Fülöp	d489677280	ICP: AES-GCM VAES-AVX2: fix typos and document source files Require AVX2 compiler support and document source files for `aesni-gcm-avx2-vaes.S`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #18225	2026-02-17 16:51:32 -08:00
Jessica Clarke	bfb276e55c	freebsd: Fix TIMESPEC_OVERFLOW for PowerPC Once upon a time, 32-bit PowerPC did indeed have a 32-bit time_t, but FreeBSD 12.0 switched to a 64-bit time_t for PowerPC as an ABI break, which predates the addition of FreeBSD support to OpenZFS. Moreover, 64-bit PowerPC has existed since FreeBSD 9.0, where __powerpc__ is also defined (alongside __powerpc64__ to disambiguate), which has always had a 64-bit time_t. This code has therefore always been wrong for all PowerPC variants. Fix this by limiting the 32-bit case to just i386, which is the only architecture in FreeBSD to have a 32-bit time_t and not have broken ABI, due to its special legacy compatibility status. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com> Closes #18217 Closes #18218	2026-02-17 16:46:02 -08:00
Attila Fülöp	bee53d8c10	Linux 6.19 compat: in-tree build: fix duplicate GCM assembly functions Linux 6.19 added an AES-GCM VAES-AVX2 assembly implementation. It's basically a translation from the BoringSSL perlasm syntax to macro assembly. We're using the same source but the perlasm generated flat assembly which shares some global function names with the former. When building in-tree this results in the linker failing due to the duplicate symbols. To avoid the error we prepend `icp_` via a macro to our function names. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Moch <mail@alexmoch.com> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #18204 Closes #18224	2026-02-17 13:09:41 -08:00
Alexander Motin	0f9564e85b	Simplify dnode_level_is_l2cacheable() We should not dereference through dn_handle->dnh_dnode once we already have a dnode pointer. The result will be the same. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #18212	2026-02-16 10:34:22 -05:00
Alexander Motin	ba970eb202	Cleanup allocation class selection - For multilevel gang blocks it seemed possible to fallback from normal to special class, since they don't have proper object type, and DMU_OT_NONE is a "metadata". They should never fallback. - Fix possible inversion with zfs_user_indirect_is_special = 0, when indirects written to normal vdev, while small data to special. Make small indirect blocks also follow special_small_blocks there. - With special_small_blocks now applying to both files and ZVOLs, make it apply to all non-metadata without extra checks, since there are no other non-metadata types. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #18208	2026-02-16 10:33:21 -05:00
Mariusz Zaborski	cdf89f413c	Flush RRD only when TXGs contain data This change modifies the behavior of spa_sync_time_logger when flushing the RRD database. Previously, once the sync interval elapsed, a flush would always be generated. On solid-state devices, especially when the pool was otherwise idle, this caused disks to wake up solely to write RRD data. Since RRD is best-effort telemetry, this behavior is unnecessary and wasteful. With this change, spa_sync_time_logger delays flushing until a TXG that already contains data is being synced. The RRD update is appended to that TXG instead of forcing the creation of a new write-only TXG. During pool export, flushing is forced regardless of whether the TXG contains user data. At that stage, data durability takes precedence and a write must be issued. Sponsored by: [Wasabi Technology, Inc.; Klara, Inc.] Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Closes #18082 Closes #18138	2026-02-11 11:35:45 -08:00
Marc Sladek	cc184fe98b	Fix `send:raw` permission for send `-w -I` When performing an incremental raw send with intermediates (-w -I), the standard 'send' permission was incorrectly required instead of allowing 'send:raw'. This was due to a strict boolean comparison on the 'rawok' flag in zfs_secpolicy_send() with non-boolean value. This change normalizes the 'rawok' variable to be strictly 0/1 and updates the test suite to properly verify delegated raw send behavior. Introduced-by: https://github.com/openzfs/zfs/pull/17543 Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Marc Sladek <marc@sladek.dev> Closes #18198 Closes #18193	2026-02-11 10:30:26 -08:00
Tony Hutter	3463d40779	ZTS: Fix zed_synchronous_zedlet Wait for scrub_finish (as the comments in the code suggest) rather than trim_finish in zed_synchronous_zedlet.ksh. This seems to workaround the ZTS failures in #18192. Also, fix some typos. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #18192 Closes #18196	2026-02-11 10:05:14 -08:00
Tony Hutter	fdd70565cb	Linux 6.19 compat: META Update the META file to reflect compatibility with the 6.19 kernel. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #18197	2026-02-11 09:37:02 -08:00
Christos Longros	040ba7a7ca	libzfs: improve error message for zpool create with ENXIO When zpool create fails because a vdev cannot be opened (ENXIO), the error falls through to zpool_standard_error() which reports the generic 'one or more devices is currently unavailable'. This is misleading when the real cause is a block size mismatch or other device open failure. Add an explicit ENXIO case in zpool_create()'s error handling to provide a more descriptive message. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christos Longros <chris.longros@gmail.com> Closes #18184 Closes #11087	2026-02-10 13:19:44 -08:00
Tony Hutter	e601a1fb77	CI: Test build Lustre against ZFS The Lustre filessytem calls a number of exported ZFS functions. Do a test build on the Almalinux runners to make sure we're not breaking Lustre. We do the Lustre build in parallel with the normal ZTS test for efficiency, since ZTS isn't very CPU intensive. The full Lustre build takes around 15min when run on its own. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #18161	2026-02-10 09:54:17 -08:00
Alexander Motin	aa29455dd7	Restrict cloning with different properties While technically its not a problem to clone between datasets with different properties, it might create expectation of new properties being applied during data move, while actually it won't happen. For copies and checksum it may mean incorrect safety expectations. For dedup, compression and special_small_blocks -- performance and space usage. New zfs_bclone_strict_properties tunable controls it. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #18180	2026-02-10 09:53:24 -08:00
rmacklem	1412bdc6c2	zfs_vnops_os.c: Move a vput() to after zfs_setattr_dir() Without this patch, the following crash can occur when a file system is configured with "xattr=dir". VNASSERT failed: locked not true at /posix-acl/freebsd-rdma/sys/kern/vfs_subr.c:5786 (assert_vop_locked) hold count flags () flags () lock type zfs: UNLOCKED panic: zfs_dirent_lookup: vnode is not locked but should be cpuid = 3 time = 1770520763 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b vpanic() at vpanic+0x136/frame 0xfffffe00914c8270 panic() at panic+0x43/frame 0xfffffe00914c82d0 assert_vop_locked() at assert_vop_locked+0x78 zfs_dirent_lookup() at zfs_dirent_lookup+0x41 zfs_setattr_dir() at zfs_setattr_dir+0x123 zfs_setattr() at zfs_setattr+0x1389 zfs_freebsd_setattr() at zfs_freebsd_setattr+0x56b VOP_SETATTR_APV() at VOP_SETATTR_APV+0x5d setfown() at setfown+0xb1 kern_fchownat() at kern_fchownat+0x192 This patch fixes the problem by moving the vput() call for attrzp to after the zfs_setattr_dir() call that takes it as an argument. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca> Closes: #18188	2026-02-10 09:29:37 -05:00
Tim Hatch	64bae56b00	Include missing newline in 'man' error Because the `strerror` result doesn't include a newline, we need to add one. Observed on a minimal system that doesn't have `man` installed, which behaves like this before the fix: ``` [root@upper tim]# zpool help import couldn't run man program: No such file or directory[root@upper tim]# ``` Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Hatch <tim@timhatch.com> Closes #18183	2026-02-09 10:19:08 -08:00
Alexander Motin	2646bd5585	Allow rewrite skip cloned and snapshotted blocks Rewrite of cloned and snapshotted blocks can allocate additional space, that may be undesired. In some cases it may have sense to still rewrite snapshotted blocks, expecting the snapshots to rotate with time, freeing space. In other cases rewrite of cloned blocks may be acceptable, despite persistent space usage increase. For this reason add them as separate flags to `zfs rewrite`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #18179	2026-02-09 10:17:56 -08:00
Rob Norris	15fbf534c6	AUTHORS: add names of recent new contributors "Welcome to my house! Enter freely. Go safely, and leave something of the happiness you bring!" Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #18189	2026-02-09 10:11:09 -08:00
Brian Behlendorf	ae488e496f	ZTS: update the relevant mmp test cases - mmp_concurrent_import: added test case to verify that concurrent import correctness. The pool may only be imported once. - mmp_exported_import: an activity check is now required for pools which were cleanly exported if the system and pool hostids don't match. - mmp_inactive_import: an activity check is now required for any pool which wasn't cleanly exported, even if the system and pool hostids match. - mmp_on_uberblocks: updated expected uberblocks to take in to account the value MMP_INTERVAL_DEFAULT is set too. - mmp_reset_interval: reduce the number of iterations from 10 to 3. This is sufficient to verify functionality and significantly speeds up the test. - mmp_on_uberblocks: adjust the thresholds and increase the runtime to avoid false positives observed in CI. - Update tests to use 'zhack action idle' instead of ztest to improve the reliability of the tests. - Add additional log_note messages to test cases which have multiple verification steps to make it clear which portion of a test failed when reviewing the logs. - Replace default_setup/cleanup_noexit calls with 'zpool create' and 'zpool destroy' calls to avoid additional unnecessary dataset creation work. - Update activity/noactivity check helper functions to use the ZFS_LOAD_INFO_DEBUG information now available from 'zpool import' to determine if this activity check ran and why. This is more reliable in the CI than measuring the runtime. - Removed all mmp tests from the zts-report.py exceptions list. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com>	2026-02-09 09:36:18 -08:00
Brian Behlendorf	d4c0e52188	zhack: add "action idle" subcommand In order to reliably test the multihost protection we need two (or more) systems attempting to import the pool at the same time. Historically, we've used ztest running in userspace to simulate an active pool and attempted to import the pool with the kernel modules. This works but ztest is a bit unwieldy for this and if it crashes for unrelated reasons it can result in false positives. All we really need is the pool imported in userspace so the MMP thread is active and writing out uberblocks. We can extend zhack which already knows how to import the pool read/write and add an option to leave the pool open and idle. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com>	2026-02-09 09:36:14 -08:00
Brian Behlendorf	731ff0a5ac	zhack: add -G option to dump debug buffer Add a -G option to zhack to dump the internal debug buffer on exit. We were able to use the same code from zdb for this which was nice. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com>	2026-02-09 09:36:10 -08:00
Brian Behlendorf	20176224ee	mmp: claim sequence id before final import As part of SPA_LOAD_IMPORT add an additional activity check to detect simultaneous imports from different hosts. This check is only required when the timing is such that there's no activity for the the read-only tryimport check to detect. This extra safety chceck operates as follows: 1. Repeats the following MMP check 10 times: a. Write out an MMP uberblock with the best txg and a random sequence id to all primary pool vdevs. b. Verify a minimum number of good writes such that even if the pool appears degraded on the remote host it will see at least one of the updated MMP uberblocks. c. Wait for the MMP interval this leaves a window for other racing hosts to make similar modifications which can be detected. d. Call vdev_uberblock_load() to determine the best uberblock to use, this should be the MMP uberblock just written. e. Verify the txg and random sequeunce number match the MMP uberblock written in 1a. 2. Restore the original MMP uberblocks. This allows the check to be performed again if the pool fails to import for an unrelated reason. This change also includes some refactoring and minor improvements. - Never try loading earlier txgs during import when the import fails with EREMOTEIO or EINTER. These errors don't indicate the txg is damaged but instead that its either in use on a remote host or the import was interactively cancelled. No rewind is also performed for EBADD which can result from a stale trusted config when doing a verbatim import. - Refactor the code for consistent logging of the multihost activity check using spa_load_note() and console messages indicating when the activity check was trigger and the result. - Added MMP_*_MASK and MMP_SEQ_CLEAR() macros to allow easier modification of the sequence number in an uberblock. - Added ZFS_LOAD_INFO_DEBUG environment variable which can be set to log to dump to stdout the spa_load_info nvlist returned during import. This is used by the updated mmp test cases to determine if an activity check was run and its result. - Standardize the mmp messages similarly to make it easier to find all the relevent mmp lines in the debug log. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com>	2026-02-09 09:36:01 -08:00
Brian Behlendorf	2f048ced4d	mmp: add spa_load_name() for tryimport Tryimport adds a unique prefix to the pool name to avoid name collisions. This makes it awkward to log user-friendly info during a tryimport. Add a spa_load_name() function which can be used to report the unmodified pool name. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com>	2026-02-09 09:35:03 -08:00

1 2 3 4 5 ...

10547 Commits