mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-01-25 10:12:13 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	82a0868ce4	CI: Remove Debian backports The latest Debian 11 image includes bullseye-backports as a default repository in the /etc/apt/sources.list. However, this repository has gone end of life which effectively breaks the default install. We shouldn't need anything in backports so lets unconditionally remove backports on all Debian builders to resolve the issue. Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17569	2025-08-12 17:22:45 -07:00
Coleman Kane	e7e0bb3b61	linux: Fix out-of-src builds The linux kernel modules haven't been building successfully when the build occurs in a separate directory than the source code, which is a common build pattern in Linux. Was not able to determine the root cause, but the %.o targets in subdirectories are no longer being matched by the pattern targets in the Linux Kbuild system. This change fixes the issue by dynamically creating the missing ones inside our Kbuild. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #17517	2025-08-12 17:22:40 -07:00
Paul Dagnelie	6af1f61ad4	Fix zdb pool/ with -k When examining the root dataset with zdb -k, we get into a mismatched state. main() knows we are not examining the whole pool, but it strips off the trailing slash. import_checkpointed_state() then thinks we are examining the whole pool, and does not update the target path appropriately. The fix is to directly inform import_checkpointed_state that we are examining a filesystem, and not the whole pool. Sponsored-by: Klara, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17536	2025-08-12 17:21:47 -07:00
Carl George	8c4f625c12	CI: Add CentOS Stream 9/10 to the FULL_OS runner list Testing on CentOS Stream provides several months advance notice of changes coming to the RHEL kernel. This should help OpenZFS be proactive instead of reactive to new RHEL minor versions. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Carl George <carlwgeorge@gmail.com> ZFS-CI-Type: full Closes #16904 Closes #17526	2025-08-12 17:20:16 -07:00
Tino Reichardt	7882e85a9b	Delete unused .cirrus.yml The Cirrus_CI was planned for testing FreeBSD, but never really used I think. Currently it's not needed anymore, so remove it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #17155 Closes #17535	2025-08-12 17:19:43 -07:00
Tino Reichardt	6b38d0f7ff	ZTS: Fix FreeBSD 15.0 ksh errors The package ksh93 is replaced by ksh now. This works for FreeBSD 13 and 14 also. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #17523	2025-08-12 17:19:32 -07:00
Alexander Motin	80b6457fcd	CI: Switch from FreeBSD 13.4 to 13.5 FreeBSD 13.4 is EOL since June 30, 2025. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #17519	2025-08-12 17:19:07 -07:00
Brian Behlendorf	2518f4b124	Revert "Fix incorrect expected error in ztest" This reverts commit `2076011e0c`. The comment which explains EINVAL should be expected for this case was wrong, not the code. The kernel will return ENOTSUP when attaching a distributed spare to the wrong top-level dRAID vdev. See the check for this in spa_vdev_attach(). Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17503	2025-08-12 17:18:40 -07:00
Igor Ostapenko	90d2c4407a	ztest: Fix false positive of ENOSPC handling Before running a pass zs_enospc_count is checked to free up some space by destroying a random dataset. But the space freed may still be not re-usable during the TXG_DEFER window breaking the next dataset creation in ztest_generic_run(). Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com> Closes #17506	2025-08-12 17:18:34 -07:00
Brian Behlendorf	f7698f47e8	CI: run ztest on compressed zpool When running ztest under the CI a common failure mode is for the underlying filesystem to run out of available free space. Since the storage associated with a GitHub-hosted running is fixed, we instead create a pool and use a compressed ZFS dataset to store the ztest vdev files. This significantly increases the available capacity since the data written by ztest is highly compressible. A compression ratio of over 40:1 is conservatively achieved using the default lz4 compression. Autotrimming is enabled to ensure freed blocks are discarded from the backing cipool vdev file. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17501	2025-08-12 17:17:49 -07:00
Martin Rüegg	6c1130a730	pyzfs: Adapt python lib directory evaluation from ax_python_devel.m4 `71216b91d2` introduced a regression on debian/ubuntu systems during build. The reason being, that building the RPM for pyzfs was using a different library path than building the library itself. This is now harmonized. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Martin Rüegg <martin.rueegg@metaworx.ch> Closes #16155 Closes #17480	2025-08-12 17:17:24 -07:00
Martin Rüegg	74b539d3dc	pyzfs: Update ax_python_devel.m4 to serial 37 Fixes an obvious typo, where a variable was missing the required leading dollar sign ($) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Martin Rüegg <martin.rueegg@metaworx.ch> Closes #17480	2025-08-12 17:17:17 -07:00
Chunwei Chen	024e60b927	Missing tests in make pkg ``` Warning: TestGroup '/var/tmp/tests/functional/ctime' not added to this run. Auxiliary script '/var/tmp/tests/functional/ctime/setup' failed verification. ``` Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #17491	2025-08-12 17:17:09 -07:00
Olivier Certner	5289f6f961	spa: ZIO_TASKQ_ISSUE: Use symbolic priority This allows to change the meaning of priority differences in FreeBSD without requiring code changes in ZFS. This upstreams commit fd141584cf89d7d2 from FreeBSD src. Sponsored-by: The FreeBSD Foundation Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Olivier Certner <olce@FreeBSD.org> Closes #17489	2025-08-12 17:16:00 -07:00
Paul Dagnelie	094305c937	Fix TestGroup warning due to missing tags Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17473	2025-08-12 15:59:59 -07:00
Tino Reichardt	a826f7a993	ZTS: Use FreeBSD cloudinit images FreeBSD provides CI-IMAGES since some time. These images are based on nuageinit, which does not support fqdn and sudo for example. So we need currently some workarounds to get it working. The FreeBSD images will be more compatible with cloud-init in some near future. Then we can remove the workaround things. These versions are used for testing: - freebsd13-4r (RELEASE) - freebsd14-3s (STABLE) - freebsd15-0c (CURRENT) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #17462	2025-08-12 15:58:46 -07:00
Attila Fülöp	86bf73c1eb	objtool wrapper: use absolute path to call the wrapper Older kernel versions run make outside of the build directory. This works since all paths are absolute. Relative paths will fail in such a scenario. Use an absolute path to the objtool wrapper as well, since the relative path breaks the build on older kernels. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #17541	2025-08-07 13:10:58 -04:00
Attila Fülöp	1d293b377a	Linux build: handle CONFIG_OBJTOOL_WERROR=y Linux 5.16 by default fails the build on objtool warnings. We have known and understood objtool warnings we can't fix without involving Linux maintainers. To work around this we introduce an objtool wrapper script which removes the `--Werror` flag. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #17456	2025-08-07 13:10:33 -04:00
Alexander Motin	22eb2bdce3	Make TX abort after assign safer It is not right, but there are few examples when TX is aborted after being assigned in case of error. To handle it better on production systems add extra cleanup steps. While here, replace couple dmu_tx_abort() in simple cases. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17438	2025-08-07 12:41:40 -04:00
Alexander Motin	809b553940	Introduce zfs rewrite subcommand (#17246 ) This allows to rewrite content of specified file(s) as-is without modifications, but at a different location, compression, checksum, dedup, copies and other parameter values. It is faster than read plus write, since it does not require data copying to user-space. It is also faster for sync=always datasets, since without data modification it does not require ZIL writing. Also since it is protected by normal range range locks, it can be done under any other load. Also it does not affect file's modification time or other properties. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com>	2025-08-07 12:34:28 -04:00
Rob Norris	abb6211e7a	Linux 6.16: remove writepage and readahead_page Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17443	2025-08-07 12:29:45 -04:00
khoang98	c405a7a35c	Skip dbuf_evict_one() from dbuf_evict_notify() for reclaim thread Avoid calling dbuf_evict_one() from memory reclaim contexts (e.g. Linux kswapd, FreeBSD pagedaemon). This prevents deadlock caused by reclaim threads waiting for the dbuf hash lock in the call sequence: dbuf_evict_one -> dbuf_destroy -> arc_buf_destroy Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Kaitlin Hoang <kthoang@amazon.com> Closes #17561	2025-08-07 12:15:14 -04:00
shodanshok	4808641e71	enforce arc_dnode_limit Linux kernel shrinker in the context of null/root memcg does not scan dentry and inode caches added by a task running in non-root memcg. For ZFS this means that dnode cache routinely overflows, evicting valuable meta/data and putting additional memory pressure on the system. This patch restores zfs_prune_aliases as fallback when the kernel shrinker does nothing, enabling zfs to actually free dnodes. Moreover, it (indirectly) calls arc_evict when dnode_size > dnode_limit. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes #17487 Closes #17542	2025-08-07 12:11:34 -04:00
Alexander Motin	30fa92bff3	Increase meta-dnode redundancy in "some" mode Loss of one indirect block of the meta dnode likely means loss of the whole dataset. It is worse than one file that the man page promises, and in my opinion is not much better than "none" mode. This change restores redundancy of the meta-dnode indirect blocks, while same time still corrects expectations in the man page. Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17339	2025-08-05 13:15:44 -04:00
Paul Dagnelie	fd5a27c9db	Ensure that gang_copies is always at least as large as copies As discussed in the comments of PR #17004, you can theoretically run into a case where a gang child has more copies than the gang header, which can lead to some odd accounting behavior (and even trip a VERIFY). While the accounting code could be changed to handle this, it fundamentally doesn't seem to make a lot of sense to allow this to happen. If the data is supposed to have a certain level of reliability, that isn't actually achieved unless the gang_copies property is set to match it. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17484	2025-08-05 13:14:45 -04:00
Rob Norris	3ad3f439bb	zts: add spdx license tags to gang_blocks tests (#17160 ) Missed in #17073, probably because that PR was branched before #17001 was landed and never rebased. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-08-05 13:14:11 -04:00
Paul Dagnelie	a46ce73ca8	Make ganging redundancy respect redundant_metadata property (#17073 ) The redundant_metadata setting in ZFS allows users to trade resilience for performance and space savings. This applies to all data and metadata blocks in zfs, with one exception: gang blocks. Gang blocks currently just take the copies property of the IO being ganged and, if it's 1, sets it to 2. This means that we always make at least two copies of a gang header, which is good for resilience. However, if the users care more about performance than resilience, their gang blocks will be even more of a penalty than usual. We add logic to calculate the number of gang headers copies directly, and store it as a separate IO property. This is stored in the IO properties and not calculated when we decide to gang because by that point we may not have easy access to the relevant information about what kind of block is being stored. We also check the redundant_metadata property when doing so, and use that to decide whether to store an extra copy of the gang headers, compared to the underlying blocks. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-08-05 13:10:40 -04:00
Ameer Hamza	90790955a6	SPDX: Add missing CDDL-1.0 license Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>	2025-08-05 12:51:35 -04:00
Igor Ostapenko	95abbc71c3	range_tree: Provide more debug details upon unexpected add/remove Sponsored-by: Klara, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com> Closes #17581	2025-08-05 12:34:54 -04:00
Tino Reichardt	fc658b9935	Faster checksum benchmark on system boot While booting, only the needed 256KiB benchmarks are done now. The delay for checking all checksums occurs when requested via: - Linux: cat /proc/spl/kstat/zfs/chksum_bench - FreeBSD: sysctl kstat.zfs.misc.chksum_bench Reported by: Lahiru Gunathilake <gunathilakebllg@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Co-authored-by: Colin Percival <cperciva@tarsnap.com> Closes #17563 Closes #17560	2025-08-05 12:34:13 -04:00
Paul Dagnelie	271b9797c5	Don't use wrong weight when passivating group When we're passivating a metaslab group we start by passivating the metaslabs that have been activated for each of the allocators. To do that, we need to provide a weight. However, currently this erroneously always uses a segment-based weight, even if segment-based weighting is disabled. Use the normal weight function, which will decide which type of weight to use. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17566	2025-08-05 12:33:52 -04:00
Brian Behlendorf	582e7847f6	Default to zfs_bclone_wait_dirty=1 Update the default FICLONE and FICLONERANGE ioctl behavior to wait on dirty blocks. While this does remove some control from the application, in practice ZFS is better positioned to the optimial thing and immediately force a TXG sync. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17455	2025-08-05 12:33:30 -04:00
Andriy Tkachuk	6d378564b4	zdb: fix checksum calculation for decompressed blocks Currently, when reading compressed blocks with -R and decompressing them with :d option and specifying lsize, which is normally bigger than psize for compressed blocks, the checksum is calculated on decompressed data. But it makes no sense since zfs always calculates checksum on physical, i.e. compressed data. So reading the same block produces different checksum results depending on how we read it, whether we decompress it or not, which, again, makes no sense. Fix: use psize instead of lsize when calculating the checksum so that it is always calculated on the physical block size, no matter was it compressed or not. Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17547	2025-08-05 12:33:00 -04:00
Ameer Hamza	0c928f7a37	ZED: Fix device type detection and pool iteration logic During hotplug REMOVED events, devid matching fails for partition-based spares because devid information is not stored in pool config for partitioned devices. However, when devid is populated by the hotplug event, the original code skipped the search logic entirely, skipping vdev_guid matching and resulting in wrong device type detection that caused spares to be incorrectly identified as l2arc devices. Additionally, fix zfs_agent_iter_pool() to use the return value from zfs_agent_iter_vdev() instead of relying on search parameters, which was previously ignored. Also add pool_guid optimization to enable targeted pool searching when pool_guid is available. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #17545	2025-08-05 12:32:28 -04:00
Chunwei Chen	c79d5e4f33	Define sops->free_inode() to prevent use-after-free during lookup On Linux, when doing path lookup with LOOKUP_RCU, dentry and inode can be dereferenced without refcounts and locks. For this reason, dentry and inode must only be freed after RCU grace period. However, zfs currently frees inode in zfs_inode_destroy synchronously and we can't use GPL-only call_rcu() in zfs directly. Fortunately, on Linux 5.2 and after, if we define sops->free_inode(), the kernel will do call_rcu() for us. This issue may be triggered more easily with init_on_free=1 boot parameter: BUG: kernel NULL pointer dereference, address: 0000000000000020 RIP: 0010:selinux_inode_permission+0x10e/0x1c0 Call Trace: ? show_trace_log_lvl+0x1be/0x2d9 ? show_trace_log_lvl+0x1be/0x2d9 ? show_trace_log_lvl+0x1be/0x2d9 ? security_inode_permission+0x37/0x60 ? __die_body.cold+0x8/0xd ? no_context+0x113/0x220 ? exc_page_fault+0x6d/0x130 ? asm_exc_page_fault+0x1e/0x30 ? selinux_inode_permission+0x10e/0x1c0 security_inode_permission+0x37/0x60 link_path_walk.part.0.constprop.0+0xb5/0x360 ? path_init+0x27d/0x3c0 path_lookupat+0x3e/0x1a0 filename_lookup+0xc0/0x1d0 ? __check_object_size.part.0+0x123/0x150 ? strncpy_from_user+0x4e/0x130 ? getname_flags.part.0+0x4b/0x1c0 vfs_statx+0x72/0x120 ? ioctl_has_perm.constprop.0.isra.0+0xbd/0x120 __do_sys_newlstat+0x39/0x70 ? __x64_sys_ioctl+0x8d/0xd0 do_syscall_64+0x30/0x40 entry_SYSCALL_64_after_hwframe+0x62/0xc7 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Co-authored-by: Chunwei Chen <david.chen@nutanix.com> Closes #17546	2025-08-05 12:30:23 -04:00
Alexander Motin	347d68048a	ZIL: Force writing of open LWB on suspend Under parallel workloads ZIL may delay writes of open LWBs that are not full enough. On suspend we do not expect anything new to appear since zil_get_commit_list() will not let it pass, only returning TXG number to wait for. But I suspect that waiting for the TXG commit without having the last LWB issued may not wait for its completion, resulting in panic described in #17509. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17521	2025-08-05 12:28:41 -04:00
Paul Dagnelie	acf3871ef8	Correct weight recalculation of space-based metaslabs Currently, after a failed allocation, the metaslab code recalculates the weight for a metaslab. However, for space-based metaslabs, it uses the maximum free segment size instead of the normal weighting algorithm. This is presumably because the normal metaslab weight is (roughly) intended to estimate the size of the largest free segment, but it doesn't do that reliably at most fragmentation levels. This means that recalculated metaslabs are forced to a weight that isn't really using the same units as the rest of them, resulting in undesirable behaviors. We switch this to use the normal space-weighting function. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Sponsored-by: Wasabi Technology, Inc. Sponsored-by: Klara, Inc. Closes #17531	2025-08-05 12:28:34 -04:00
Ameer Hamza	21d5f25724	Validate mountpoint on path-based unmount using statx Use statx to verify that path-based unmounts proceed only if the mountpoint reported by statx matches the MNTTAB entry reported by libzfs, aborting the operation if they differ. Align `zfs umount /path` behavior with `zfs umount dataset`. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #17481	2025-08-05 12:27:25 -04:00
Paul Dagnelie	7e945a5b3f	Fix other nonrot bugs There are still a variety of bugs involving the vdev_nonrot property that will cause problems if you try to run the test suite with segment-based weighting disabled, and with other things in the weighting code. Parents' nonrot property need to be updated when children are added. When vdevs are expanded and more metaslabs are added, the weights have to be recalculated (since the number of metaslabs is an input to the lba bias function). When opening, faulted or unopenable children should not be considered for whether a vdev is nonrot or not (since the nonrot property is determined during a successful open, this can cause false negatives). And draid spares need to have the nonrot property set correctly. Sponsored-by: Eshtek, creators of HexOS Sponsored-by: Klara, Inc. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17469	2025-08-05 12:25:26 -04:00
Alexander Motin	85ce6b8ab2	Polish db_rwlock scope dbuf_verify(): Don't need the lock, since we only compare pointers. dbuf_findbp(): Don't need the lock, since aside of unneeded assert we only produce the pointer, but don't de-reference it. dnode_next_offset_level(): When working on top level indirection should lock dnode buffer's db_rwlock, since it is our parent. If dnode has no buffer, then it is meta-dnode or one of quotas and we should lock the dataset's ds_bp_rwlock instead. Reviewed-by: Alan Somers <asomers@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17441	2025-08-05 12:23:32 -04:00
Mariusz Zaborski	954894ee53	scrub: generate scrub_finish event The `scn_min_txg` can now be used not only with resilver. Instead of checking `scn_min_txg` to determine whether it’s a resilver or a scrub, simply check which function is defined. Thanks to this change, a scrub_finish event is generated when performing a scrub from the saved txg. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Closes #17432	2025-08-05 12:21:46 -04:00
Alexander Motin	a4e775d2ca	Some arc_release() cleanup - Don't drop L2ARC header if we have more buffers in this header. Since we leave them the header, leave them the L2ARC header also. Honestly we are not required to drop it even if there are no other buffers, but then we'd need to allocate it a separate header, which we might drop soon if the old block is really deleted. Multiple buffers in a header likely mean active snapshots or dedup, so we know that the block in L2ARC will remain valid. It might be rare, but why not? - Remove some impossible assertions and conditions. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17126	2025-08-05 12:16:27 -04:00
Paul Dagnelie	661310ff5c	FDT dedup log sync -- remove incremental This PR condenses the FDT dedup log syncing into a single sync pass. This reduces the overhead of modifying indirect blocks for the dedup table multiple times per txg. In addition, changes were made to the formula for how much to sync per txg. We now also consider the backlog we have to clear, to prevent it from growing too large, or remaining large on an idle system. Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Authored-by: Don Brady <don.brady@klarasystems.com> Authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17038	2025-08-05 12:15:21 -04:00
Alexander Motin	f9d59b579e	ZIL: Relax parallel write ZIOs processing ZIL introduced dependencies between its write ZIOs to permit flush defer, when we flush vdev caches only once all the write ZIOs has completed. But it was recently spotted that it serializes not only ZIO completions handling, but also their ready stage. It means ZIO pipeline can't calculate checksums for the following ZIOs until all the previous are checksumed, even though it is not required. On a systems where memory throughput of a single CPU core is limited, it creates single-core CPU bottleneck, which is difficult to see due to ZIO pipeline design with many taskqueue threads. While it would be great to bypass the ready stage waits, it would require changes to ZIO code, and I haven't found a clean way to do it. But I've noticed that we don't need any dependency between the write ZIOs if the previous one has some waiters, which means it won't defer any flushes and work as a barrier for the earlier ones. Bypassing it won't help large single-thread writes, since all the write ZIOs except the last in that case won't have waiters, and so will be dependent. But in that case the ZIO processing might not be a bottleneck, since there will be only one thread populating the write buffers, that will likely be the bottleneck. But bypassing the ZIO dependency on multi-threaded write workloads really allows them to scale beyond the checksuming throughput of one CPU core. My tests with writing 12 files on a same dataset on a pool with 4 striped NVMes as SLOGs from 12 threads with 1MB blocks on a system with Xeon Silver 4114 CPU show total throughput increase from 4.3GB/s to 8.5GB/s, increasing the SLOGs busy from ~30% to ~70%. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17458	2025-08-05 12:14:18 -04:00
Brian Behlendorf	1af41fd203	Tag zfs-2.3.3 META file and changelog updated. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-06-19 08:21:04 -07:00
Tony Hutter	faefa5ffc3	Linux 6.15 compat: META Update the META file to reflect compatibility with the 6.15 kernel. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #17393	2025-06-17 10:50:27 -07:00
Germano Massullo	777d8ee345	Fix mixed-use-of-spaces-and-tabs rpmlint warning Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Germano Massullo <germano.massullo@gmail.com> Closes #17461	2025-06-17 10:50:27 -07:00
Rob Norris	b00bc81b05	ioctl: remove FICLONE/FICLONERANGE/FIDEDUPERANGE compat These are only required to support these ioctls on Linux <4.5. Since 4.18 is our cutoff, we don't need this code anymore. Also removing related test things that will never match again. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17308	2025-06-17 10:50:27 -07:00
Alexander Motin	f7e6dcc68d	Relax zfs_vnops_read_chunk_size limitations It makes no sense to limit read size below the block size, since DMU will any way consume resources for the whole block, while the current zfs_vnops_read_chunk_size is only 1MB, which is smaller that maximum block size of 16MB. Plus in case of misaligned Uncached I/O the buffer may get evicted between the chunks, requiring repeating I/Os. On 64-bit platforms increase zfs_vnops_read_chunk_size to 32MB. It allows to less depend on speculative prefetcher if application requests specific size, first not waiting for prefetcher to start and later not prefetching more than needed. Also while there, we don't need to align reads to the chunk size, but only to a block size, which is smaller and so more forgiving. My profiles show ~4% of CPU time saving when reading 16MB blocks. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17415	2025-06-17 10:50:27 -07:00
Rob Norris	e2de00ca44	dmu_traverse: remove 'ignore_hole_birth' tunable alias It's been many years, we can probably do without. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17376	2025-06-17 10:50:27 -07:00

1 2 3 4 5 ...

9983 Commits