mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-31 11:14:09 +03:00

Author	SHA1	Message	Date
Tony Hutter	b44a3ecf4a	zpool: Change zpool offline spares policy The zpool offline man page says that you cannot use 'zpool offline' on spares. However, testing found that you could in fact force fault (zpool offline -f) spares. Change the policy to: 1. You can never force-fault or offline dRAID spares. 2. You can only force-fault or offline traditional spares if they're active. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #18282	2026-03-25 11:08:55 -07:00
Aditya Gollamudi	b481a8bbbf	Make zpool status dedup table support raw bytes -p output Check if -p flag is enabled, and if so print dedup table with raw bytes. Restructure the logic in zutil_pool to check if -p flag is enabled before printing either the bytes or raw numbers. Calls to print the data for DDT now all use zfs_nicenum_format(). Increased DDT histogram column buffers to 32 bytes to prevent truncation when -p is enabled. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Adi Gollamudi <adigollamudi@gmail.com> Closes #11626 Closes #17926	2026-03-13 09:53:56 -07:00
Sean Eric Fagan	06b0abfe62	Fix the send --exclude option to work with encryption When using --exclude, filtering needs to take place in two places: in zfs_main.c via the callback previously added to support the options, and in libzfs_sendrecv.c because it generates the nvlist during a first pass, and that results in it complaining if the excluded dataset is not available for sending. (eg, excluding an encrypted dataset so you don't have to use --raw wouldn't work, because the first pass would look at the dataset and decide you couldn't use it.) Add send --exclude tests, including one that tests excluding an encrypted hierarchy. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sean Eric Fagan <sef@kithrup.ie> Closes #18278	2026-03-12 15:10:28 -07:00
siv0	7f65e04abd	libzfs: scrub: only include start and end nv pairs if needed for scrub This patch addresses running `zpool scrub <pool>` with ZFS 2.4 userspace while the loaded kernel module is still 2.3, failing with: ``` cannot scrub <pool>: the loaded zfs module does not support an option for this operation. A reboot may be required to enable this option. ``` Checking for the source of the message via `strace` showed the scrub ioctl failing and setting errno to ZFS_ERR_IOC_ARG_UNAVAIL[0]. With that and the comments in `module/zfs/zfs_ioctl.c`[1] commit: `894edd084` seemed like a likely cause for the backward incompatibility. The corresponding kernelspace code in `module/zfs/zfs_ioctl.c` defaults to a setting of 0 if either parameter is not set, so not providing the nvpairs in case both are 0 should not make a semantic difference. Tested by: * loading zfs.ko in version 2.3.6 * running `zpool scrub testpool` with zpool from master (error occurs) * running `zpool scrub testpool` with this patch applied (scrub is started) This should help users who are still stuck on an older kernel module, while their distribution ships newer ZFS userspace. This was observed in the Proxmox community forum: https://forum.proxmox.com/threads/.180467/ [0] https://github.com/openzfs/zfs/blob/d35951b18d6e12afeb0d5b0539ff2467ab4bfbcf/include/sys/fs/zfs.h#L1762 [1] https://github.com/openzfs/zfs/blob/d35951b18d6e12afeb0d5b0539ff2467ab4bfbcf/module/zfs/zfs_ioctl.c#L7799 Fixes: `894edd084` ("Add TXG timestamp database") Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com> Co-authored-by: Stoiko Ivanov <s.ivanov@proxmox.com> Closes #18314	2026-03-12 15:06:23 -07:00
Christos Longros	d35951b18d	zpool clear: remove undocumented rewind flags Remove the -F, -n, and -X flags from zpool clear. These flags were inherited from OpenSolaris but are not applicable in this context. Unlike zpool import, where the pool is not yet loaded and a specific TXG can be selected, zpool clear operates on an already imported pool whose in-memory state is ahead of what is on disk. Rewinding transactions would require force-exporting the pool first. The rewind policy passed to zpool_clear() is now always ZPOOL_NO_REWIND. Tested on FreeBSD 16.0-CURRENT (amd64). Verified that -F, -n, and -X are properly rejected as invalid options and that the usage output reflects the change. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christos Longros <chris.longros@gmail.com> Closes #13825 Closes #18300	2026-03-11 15:15:45 -07:00
Rob Norris	62fa8bcb3c	abi: updates for mnttab cleanup Sponsored-by: TrueNAS Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18296	2026-03-10 13:07:07 -07:00
Rob Norris	a59e712d25	libspl/mnttab: remove struct extmnttab The two additional fields are never used by calling code, and we can replace their sole internal use with an extra stack param. Sponsored-by: TrueNAS Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18296	2026-03-10 13:07:07 -07:00
Rob Norris	c0ea89db9f	libzfs/mnttab: shorten names, reorg a bit We can't change the public interface, but internally we don't need so much redundant naming. Sponsored-by: TrueNAS Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18296	2026-03-10 13:07:07 -07:00
Rob Norris	f43cb1fef6	libzfs/mnttab: lift node alloc/free Sponsored-by: TrueNAS Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18296	2026-03-10 13:07:07 -07:00
Rob Norris	0ecf5e3f62	libzfs/mnttab: always enable the cache There's no real reason not to enable it always; the `zfs` command always enables it anyway, and right now there's multiple places that do mount work that don't go through the cache anyway. Having it always be on lets us remove a bunch of the fallback code. Sponsored-by: TrueNAS Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18296	2026-03-10 13:07:07 -07:00
Rob Norris	b5637fba1c	libzfs/mnttab: use SPL mutexes More consistent, less typing, and we can check ownership. Sponsored-by: TrueNAS Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18296	2026-03-10 13:07:07 -07:00
Rob Norris	02224bca40	libzfs/mnttab: lift mnttab cache into separate file Sponsored-by: TrueNAS Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18296	2026-03-10 13:07:07 -07:00
Alek P	ae7fcd5f92	fix libzfs diff mem leak in an error path Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Alek Pinchuk <apinchuk@axcient.com> Closes #18301	2026-03-10 12:39:49 -07:00
Ivan Shapovalov	8531621aba	zfs_main: create, clone, rename: accept `-pp` for non-mountable parents Teach `zfs {create,clone,rename}` to accept a doubled `-p` flag (`-pp`) to create non-existing ancestor datasets with `canmount=off`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name> Closes #17000	2026-03-09 14:50:18 -07:00
Ivan Shapovalov	2f3f1ab1ba	libzfs: teach zfs_create_ancestors() to accept properties This will be used to support creating non-mountable ancestors in zfs(8). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name> Closes #17000	2026-03-09 14:49:52 -07:00
Ameer Hamza	1eace59060	libzfs: use mount_setattr for selective remount including legacy mounts When a namespace property is changed via zfs set, libzfs remounts the filesystem to propagate the new VFS mount flags. The current approach uses mount(2) with MS_REMOUNT, which reads all namespace properties from ZFS and applies them together. This has two problems: 1. Linux VFS resets unspecified per-mount flags on remount. If an administrator sets a temporary flag (e.g. mount -o remount,noatime), a subsequent zfs set on any namespace property clobbers it. 2. Two concurrent zfs set operations on different namespace properties can overwrite each other's mount flags. Additionally, legacy datasets (mountpoint=legacy) were never remounted on namespace property changes since zfs_is_mountable() returns false for them. Add zfs_mount_setattr() which uses mount_setattr(2) to selectively update only the mount flags that correspond to the changed property. For legacy datasets, /proc/mounts is iterated to update all mountpoints. On kernels without mount_setattr (ENOSYS), non-legacy datasets fall back to a full remount; legacy mounts are skipped to avoid clobbering temporary flags. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #18257	2026-03-09 11:06:22 -07:00
Christos Longros	304de7f19b	libzfs: handle EDOM error in zpool_create When creating a pool with devices that have incompatible block sizes, the kernel returns EDOM. However, zpool_create() did not handle this errno, falling through to zpool_standard_error() which produced a confusing message about invalid property values. Add a case EDOM handler in zpool_create() to return EZFS_BADDEV with a descriptive auxiliary message, consistent with the existing EDOM handler in zpool_vdev_add(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christos Longros <chris.longros@gmail.com> Closes #18268	2026-03-08 12:59:10 -07:00
Andrew Walker	c5905b2cb7	Implement lzc_send_progress This commit adds an implementation of lzc_send_progress, which existed in the libzfs_core header, but not in ABI and lacked an actual implementation. The libzfs_send_progress function is altered so that it wraps around the lzc operation. This fills a functional gap in libzfs core. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Andrew Walker <andrew.walker@truenas.com> Closes #18288	2026-03-06 11:05:58 -08:00
Idefix2020	5dad9459d5	Add --no-preserve-encryption flag * Add an option to send datasets with params or replicate without preserving encryption * Add a test case for the new functionality Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Chris Jacobs <idefix2020dev@gmail.com> Closes #18240	2026-03-05 15:08:17 -08:00
Ryan Moeller	ac0fd40c8c	Add zpool properties for allocation class space The existing zpool properties accounting pool space (size, allocated, fragmentation, expandsize, free, capacity) are based on the normal metaslab class or are cumulative properties of several classes combined. Add properties reporting the space accounting metrics for each metaslab class individually. Also introduce pool-wide AVAIL, USABLE, and USED properties reporting values corresponding to FREE, SIZE, and ALLOC deflated for raidz. Update ZTS to recognize the new properties and validate reported values. While in zpool_get_parsable.cfg, add "fragmentation" to the list of parsable properties. Sponsored-by: Klara, Inc. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Ryan Moeller <ryan.moeller@klarasystems.com> Cloes #18238	2026-03-02 15:50:23 -08:00
Alexander Motin	991fc56fae	Introduce dedupused/dedupsaved pool properties Currently there is only a dedup ratio reported via pool properties. If dedup is enabled only for some datasets, it is impossible to say how much space the ratio actually covers. Fix this by introducing dedupused/dedupsaved pool properties, similar to earlier added block cloning ones. Combined with work to expose allocation classes stats, it should give user-space enough visibility to correlate `zpool list` and `zfs list` space numbers. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Ryan Moeller <ryan.moeller@klarasystems.com> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #18245	2026-02-25 09:41:38 -05:00
Christos Longros	6a717f31e6	Improve misleading error messages for ZPOOL_STATUS_CORRUPT_POOL When devices are missing or claimed by another subsystem (e.g. mdadm, LVM), zpool import reports "The pool metadata is corrupted" and suggests destroying the pool. This is misleading because the metadata is not necessarily corrupted -- it may simply be incomplete due to inaccessible devices. Update the status, action, and recovery messages to acknowledge that missing devices can trigger this status, and suggest checking device availability before resorting to pool destruction. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Longros <chris.longros@gmail.com> Closes #18251 Closes #8236	2026-02-23 09:41:24 -08:00
MigeljanImeri	4975430cf5	Add vdev property to disable vdev scheduler Added vdev property to disable the vdev scheduler. The intention behind this property is to improve IOPS performance when using o_direct. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: MigeljanImeri <ImeriMigel@gmail.com> Closes #17358	2026-02-23 09:34:33 -08:00
Tony Hutter	d2f5cb3a50	Move range_tree, btree, highbit64 to common code Break out the range_tree, btree, and highbit64/lowbit64 code from kernel space into shared kernel and userspace code. This is needed for the updated `zpool status -vv` error byte range reporting that will be coming in a future commit. That commit needs the range_tree code in kernel and userspace. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #18133	2026-02-22 11:43:51 -08:00
Christos Longros	040ba7a7ca	libzfs: improve error message for zpool create with ENXIO When zpool create fails because a vdev cannot be opened (ENXIO), the error falls through to zpool_standard_error() which reports the generic 'one or more devices is currently unavailable'. This is misleading when the real cause is a block size mismatch or other device open failure. Add an explicit ENXIO case in zpool_create()'s error handling to provide a more descriptive message. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christos Longros <chris.longros@gmail.com> Closes #18184 Closes #11087	2026-02-10 13:19:44 -08:00
Brian Behlendorf	20176224ee	mmp: claim sequence id before final import As part of SPA_LOAD_IMPORT add an additional activity check to detect simultaneous imports from different hosts. This check is only required when the timing is such that there's no activity for the the read-only tryimport check to detect. This extra safety chceck operates as follows: 1. Repeats the following MMP check 10 times: a. Write out an MMP uberblock with the best txg and a random sequence id to all primary pool vdevs. b. Verify a minimum number of good writes such that even if the pool appears degraded on the remote host it will see at least one of the updated MMP uberblocks. c. Wait for the MMP interval this leaves a window for other racing hosts to make similar modifications which can be detected. d. Call vdev_uberblock_load() to determine the best uberblock to use, this should be the MMP uberblock just written. e. Verify the txg and random sequeunce number match the MMP uberblock written in 1a. 2. Restore the original MMP uberblocks. This allows the check to be performed again if the pool fails to import for an unrelated reason. This change also includes some refactoring and minor improvements. - Never try loading earlier txgs during import when the import fails with EREMOTEIO or EINTER. These errors don't indicate the txg is damaged but instead that its either in use on a remote host or the import was interactively cancelled. No rewind is also performed for EBADD which can result from a stale trusted config when doing a verbatim import. - Refactor the code for consistent logging of the multihost activity check using spa_load_note() and console messages indicating when the activity check was trigger and the result. - Added MMP_*_MASK and MMP_SEQ_CLEAR() macros to allow easier modification of the sequence number in an uberblock. - Added ZFS_LOAD_INFO_DEBUG environment variable which can be set to log to dump to stdout the spa_load_info nvlist returned during import. This is used by the updated mmp test cases to determine if an activity check was run and its result. - Standardize the mmp messages similarly to make it easier to find all the relevent mmp lines in the debug log. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com>	2026-02-09 09:36:01 -08:00
Rob Norris	85391ee931	build: add SPDX license tags to build system files Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #18077	2026-01-08 15:08:03 -08:00
Wolfgang Hoschek	c77f17b750	Add snapshots_changed_nsecs dataset property Add a read-only dataset property, snapshots_changed_nsecs, which exposes the nanosecond resolution version of snapshots_changed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Wolfgang Hoschek <wolfgang.hoschek@mac.com> Closes #17998 Closes #18031	2026-01-06 09:36:20 -08:00
Rob Norris	0d44b58d7f	libshare: fold into libzfs and reorg headers a little libzfs is the only user of libshare, and only internally, so there's no particular reason to build it separately, nor to export its symbols. So, pull it into libzfs proper, remove its "public" header, and hide its symbols. The bare minimum "public" API is just to count and enumerate the supported share types. These are moved to libzfs.h with the other share API. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #18072	2025-12-19 19:52:33 -08:00
Ameer Hamza	88d012a1d6	Fix snapshot automount expiry cancellation deadlock A deadlock occurs when snapshot expiry tasks are cancelled while holding locks. The snapshot expiry task (snapentry_expire) spawns an umount process and waits for it to complete. Concurrently, ARC memory pressure triggers arc_prune which calls zfs_exit_fs(), attempting to cancel the expiry task while holding locks. The umount process spawned by the expiry task blocks trying to acquire locks held by arc_prune, which is blocked waiting for the expiry task to complete. This creates a circular dependency: expiry task waits for umount, umount waits for arc_prune, arc_prune waits for expiry task. Fix by adding non-blocking cancellation support to taskq_cancel_id(). The zfs_exit_fs() path calls zfsctl_snapshot_unmount_delay() to reschedule the unmount, which needs to cancel any existing expiry task. It now uses non-blocking cancellation to avoid waiting while holding locks, breaking the deadlock by returning immediately when the task is already running. The per-entry se_taskqid_lock has been removed, with all taskqid operations now protected by the global zfs_snapshot_lock held as WRITER. Additionally, an se_in_umount flag prevents recursive waits when zfsctl_destroy() is called during unmount. The taskqid is now only cleared by the caller on successful cancellation; running tasks clear their own taskqid upon completion. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #17941	2025-12-01 14:43:42 -08:00
Rob Norris	71609a9264	zfs: replace tpool with taskq They're basically the same thing; lets just carry one. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17948	2025-11-19 08:16:51 -08:00
Rob Norris	adb316f411	libuutil: remove the whole thing Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17934	2025-11-17 06:23:05 -08:00
Rob Norris	b593748287	zfs: replace uu_avl with sys/avl Lets just use the AVL implementation we use everywhere else. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17934	2025-11-17 06:21:26 -08:00
Brian Behlendorf	6015edb374	lib: update ABI meta following libspl changes In theory they should not have resulted in a change. In practice, the way visibility is set up currently means that many of our convenience libraries will "leak through" into the available symbols in our public libraries. In this commit, we're seeing all the new symbols in libspl through libuutil, libzfs and libzfs_core. Importantly, none have been removed, so consumers of these libraries will not notice. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17861	2025-11-12 10:25:14 -08:00
Brian Behlendorf	913bdbf4d1	libzpool: remove global libzpool includes Only include the zfs headers where they're currently required to compile. Unfortunately, including zfs_ioctl.h in user space pulls in a bunch of internal zfs headers as a side effect. We'll need to move these structures in to a new shared header to avoid this. We should not need to add the LIBZPOOL_CPPFLAGS when building the zed, zinject, zpool, libzfs, ior libzfs_core. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17861	2025-11-12 10:03:15 -08:00
Brian Behlendorf	cb6b249f8c	Update all ABI files Refresh all ABI files using the CI generated files to reflect the library interfaces to be published for the 2.4 release. Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17911	2025-11-12 09:39:00 -08:00
Brian Behlendorf	e4fe41a79f	Bump SONAME of libzfs and libzpool The ABI of libzfs and libzpool have breaking changes since the last major release. Bump the SONAME for the upcoming 2.4 release branch to libzfs7 and libzpool7. Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17911	2025-11-12 09:38:48 -08:00
Adi-Goll	24aaf3a3f9	Reduce timeout to zero when running inside a container Detect container environments and set timeout to zero unless ZFS_MODULE_TIMEOUT is already set. This avoids an unnecessary ten second delay after running zfs/zpool commands in a container where /dev/zfs is unavailable. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Adi Gollamudi <adigollamudi@gmail.com> Closes #15165 Closes #17922	2025-11-11 15:01:37 -08:00
Mariusz Zaborski	02fdd26e51	Add knob to disable slow io notifications Introduce a new vdev property `VDEV_PROP_SLOW_IO_REPORTING` that allows users to disable notifications for slow devices. This prevents ZED and/or ZFSD from degrading the pool due to slow I/O. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mariusz Zaborski <oshogbo@FreeBSD.org> Closes 17477	2025-11-11 10:42:17 -08:00
Alexander Motin	b4f073b5a6	Add BRT support to zpool prefetch command Implement BRT (Block Reference Table) prefetch functionality similar to existing DDT prefetch. This allows preloading BRT metadata into ARC to improve performance for block cloning operations and frees of earlier cloned blocks. Make -t parameter optional. When omitted, prefetch all supported metadata types (both DDT and BRT now). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17890	2025-11-10 16:16:22 -08:00
Toomas Soome	fe553581f0	libzfs: ignoring unreachable code We have infinite loop and on certain condition, we exit this loop and thread with pthread_exit(). But also after this loop, we have a code to perform pthread_cleanup_pop() and return from the thread. The problem is that modern compilers are able to recognize that we actually never get to the statements after loop and therefore it is dead code there. I think, instead of pthread_exit(), it is better to break out of loop and let the last statements to work as intended. This is because we do need to keep pthread_cleanup_pop() anyhow. Of course, it is matter of taste if we want to use return or pthread_exit as very last statement in this function. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #17900	2025-11-07 09:27:18 -05:00
Toomas Soome	5d33801802	get_key_material_https: label 'kfdok' defined but not used The label 'kfdok' is only used with O_TMPFILE, we need to use the same #ifdef around this label. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #17894	2025-11-04 13:13:07 -08:00
Alexander Motin	f4276479c9	Suppress some ashift warnings Do not warn about vdev ashifts being smaller then physical ashifts in a pool status if the pool ashift property set and vdev ashift satisfies it (bigger or equal), since user explicitly requested this. The ashift of individual vdevs are still reported. Do not warn about vdev ashifts in zpool import, since it doesn't matter much, and we don't even report individual vdevs ashifts there. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17830	2025-10-13 10:41:42 -07:00
Rob Norris	5605a6d79b	pool_iter_refresh: don't refresh pools twice In "all pools" mode, pool_iter_refresh() will call zpool_iter(), which will call zpool_refresh_stats() before calling add_pool(). If we already have the pool, this is a different handle, so we just release it and return. Back in pool_iter_refresh(), we then call zpool_stats_refresh() again for our handle on the same pool. All together, this means we're doing two ZFS_IOC_POOL_STATS calls into the kernel for every pool in the system. This isn't wrong, but it does double the pressure on global locks. Instead, we add a new function zpool_refresh_stats_from_handle() that simply copies the pool config and state from one handle to another, and use it to update our handle before we release it in add_pool(), so we only have one call per pool per interval. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17807	2025-10-03 14:39:09 -07:00
Paul Dagnelie	d64711c202	Detect a slow raidz child during reads A single slow responding disk can affect the overall read performance of a raidz group. When a raidz child disk is determined to be a persistent slow outlier, then have it sit out during reads for a period of time. The raidz group can use parity to reconstruct the data that was skipped. Each time a slow disk is placed into a sit out period, its `vdev_stat.vs_slow_ios count` is incremented and a zevent class `ereport.fs.zfs.delay` is posted. The length of the sit out period can be changed using the `raid_read_sit_out_secs` module parameter. Setting it to zero disables slow outlier detection. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Contributions-by: Don Brady <don.brady@klarasystems.com> Contributions-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17227	2025-09-10 15:25:03 -07:00
Rob Norris	f7bdd84328	Prefer VERIFY0P(n) over VERIFY(n == NULL) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17591	2025-08-07 11:41:37 -07:00
Rob Norris	c39e076f23	Prefer VERIFY0(n) over VERIFY(n == 0) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17591	2025-08-07 11:40:59 -07:00
Alexander Motin	60f714e6e2	Implement physical rewrites Based on previous commit this implements `zfs rewrite -P` flag, making ZFS to keep blocks logical birth times while rewriting files. It should exclude the rewritten blocks from incremental sends, snapshot diffs, etc. Snapshots space usage same time will reflect the additional space usage from newly allocated blocks. Since this begins to use new "rewrite" flag in the block pointers, this commit introduces a new read-compatible per-dataset feature physical_rewrite. It must be enabled for the command to not fail, it is activated on first use and deactivated on deletion of the last affected dataset. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17565	2025-08-06 10:36:56 -07:00
Mariusz Zaborski	894edd084e	Add TXG timestamp database This feature enables tracking of when TXGs are committed to disk, providing an estimated timestamp for each TXG. With this information, it becomes possible to perform scrubs based on specific date ranges, improving the granularity of data management and recovery operations. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #16853	2025-08-06 10:31:21 -07:00
Alexander Motin	f70c85086b	BRT: Fix ZAP entry endianness During original block cloning implementation a mistake was made, making BRT ZAP entries an array of 8 1-byte entries instead of 1 entry of 8 bytes. This makes the pools non-endian-safe. This commit introduces a new read-compatible pool feature "com.truenas:block_cloning_endian", fixing the endianness issue for new pools while maintaining compatibility with existing ones. The feature is automatically activated when creating the first BRT ZAP (ensuring we don't activate it on pools that already have BRT entries in the old format). When active, BRT entries are stored as single 8-byte values. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17572	2025-07-30 09:42:47 -07:00

1 2 3 4 5 ...

945 Commits