mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-22 18:40:43 +03:00

Author	SHA1	Message	Date
rmacklem	725886d67a	FreeBSD: Add support for _PC_HAS_HIDDENSYSTEM In FreeBSD there is now a pathconf name _PC_HAS_HIDDENSYSTEM. This patch adds support for it to OpenZFS. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca> Closes #17518	2025-08-19 10:30:04 -07:00
Mark Johnston	c3d74a0d6f	FreeBSD: Ensure that z_pflags is initialized for new znodes The field is subsequently accessed in zfs_mknode(), in zfs_inherit_projid(). The Linux implementation of zfs_create_fs() has this initialization already; there is no counterpart to zfs_create_share_dir() that I can see. Reported-by: KMSAN Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #17486	2025-08-19 10:30:04 -07:00
Rob Norris	3b64a9619f	FreeBSD: zfs_putpages: don't undirty pages until after write completes In syncing mode, zfs_putpages() would put the entire range of pages onto the ZIL, then return VM_PAGER_OK for each page to the kernel. However, an associated zil_commit() or txg sync had not happened at this point, so the write may not actually be on disk. So, we rework that case to use a ZIL commit callback, and do the post-write work of undirtying the page and signaling completion there. We return VM_PAGER_PEND to the kernel instead so it knows that we will take care of it. The original version of this (`238eab7dc1`) copied the Linux model and did the cleanup in a ZIL callback for both async and sync. This was a mistake, as FreeBSD does not have a separate "busy for writeback" flag like Linux which keeps the page usable. The full sbusy flag locks the entire page out until the itx callback fires, which for async is after txg sync, which could be literal seconds in the future. For the async case, the data is already on the DMU and the in-memory ZIL, which is sufficient for async writeback, so the old method of logging it without a callback, undirtying the page and returning is more than sufficient and reclaims that lost performance. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Mark Johnston <markj@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17533	2025-08-12 22:41:17 -04:00
Mark Johnston	a072611eef	Revert "FreeBSD: zfs_putpages: don't undirty pages until after write completes" This causes async putpages to leave the pages sbusied for a long time, which hurts concurrency. Revert for now until we have a better approach. This reverts commit `238eab7dc1`. Reported by: Ihor Antonov <ngor@hugpoint.tech> Discussed with: Rob Norris <rob.norris@klarasystems.com> References: freebsd/freebsd-src@738a9a7 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Mark Johnston <markj@FreeBSD.org> Ported-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17533	2025-08-12 22:41:17 -04:00
Rob Norris	a49c957299	linux/zvol_os: fix crash with blk-mq on Linux 4.19 `03987f71e3` (#16069) added a workaround to get the blk-mq hardware context for older kernels that don't cache it in the struct request. However, this workaround appears to be incomplete. In 4.19, the rq data context is optional. If its not initialised, then the cached rq->cpu will be -1, and so using it to index into mq_map causes a crash. Given that the upstream 4.19 is now in extended LTS and rarely seen, RHEL8 4.18+ has long carried "modern" blk-mq support, and the cached hardware context has been available since 5.1, I'm not going to huge lengths to get queue selection correct for the very few people that are likely to feel it. To that end, we simply call raw_smp_processor_id() to get a valid CPU id and use that instead. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #17597	2025-08-12 17:24:11 -07:00
Rob Norris	0c7d6e20e6	Linux: zfs_putpage: document (and fix!) confusing sync/commit modes The structure of zfs_putpage() and its callers is tricky to follow. There's a lot more we could do to improve it, but at least now we have some description of one of the trickier bits. Writing this exposed a very subtle bug: most async pages pushed out through zpl_putpages() would go to the ZIL with commit=false, which can yield a less-efficient write policy. So this commit updates that too. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17584	2025-08-12 17:23:46 -07:00
Rob Norris	b9c45fe68c	Linux: zfs_putpage: complete async page writeback immediately For async page writeback, we do not need to wait for the page to be on disk before returning to the caller; it's enough that the data from the dirty page be on the DMU and in the in-memory ZIL, just like any other write. So, if this is not a syncing write, don't add a callback to the itx, and instead just unlock the page immediately. (This is effectively the same concept used for FreeBSD in `d323fbf49c`). Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17584 Closes #14290	2025-08-12 17:23:43 -07:00
Rob Norris	f72226a75c	Linux: sync: remove async/sync accounting All this machinery is there to try to understand when there an async writeback waiting to complete because the intent log callbacks are still outstanding, and force them with a timely zil_commit(). The next commit fixes this properly, so there's no need for all this extra housekeeping. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17584	2025-08-12 17:23:39 -07:00
Alexander Motin	809b553940	Introduce zfs rewrite subcommand (#17246 ) This allows to rewrite content of specified file(s) as-is without modifications, but at a different location, compression, checksum, dedup, copies and other parameter values. It is faster than read plus write, since it does not require data copying to user-space. It is also faster for sync=always datasets, since without data modification it does not require ZIL writing. Also since it is protected by normal range range locks, it can be done under any other load. Also it does not affect file's modification time or other properties. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com>	2025-08-07 12:34:28 -04:00
Rob Norris	abb6211e7a	Linux 6.16: remove writepage and readahead_page Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17443	2025-08-07 12:29:45 -04:00
khoang98	c405a7a35c	Skip dbuf_evict_one() from dbuf_evict_notify() for reclaim thread Avoid calling dbuf_evict_one() from memory reclaim contexts (e.g. Linux kswapd, FreeBSD pagedaemon). This prevents deadlock caused by reclaim threads waiting for the dbuf hash lock in the call sequence: dbuf_evict_one -> dbuf_destroy -> arc_buf_destroy Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Kaitlin Hoang <kthoang@amazon.com> Closes #17561	2025-08-07 12:15:14 -04:00
shodanshok	4808641e71	enforce arc_dnode_limit Linux kernel shrinker in the context of null/root memcg does not scan dentry and inode caches added by a task running in non-root memcg. For ZFS this means that dnode cache routinely overflows, evicting valuable meta/data and putting additional memory pressure on the system. This patch restores zfs_prune_aliases as fallback when the kernel shrinker does nothing, enabling zfs to actually free dnodes. Moreover, it (indirectly) calls arc_evict when dnode_size > dnode_limit. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes #17487 Closes #17542	2025-08-07 12:11:34 -04:00
Chunwei Chen	c79d5e4f33	Define sops->free_inode() to prevent use-after-free during lookup On Linux, when doing path lookup with LOOKUP_RCU, dentry and inode can be dereferenced without refcounts and locks. For this reason, dentry and inode must only be freed after RCU grace period. However, zfs currently frees inode in zfs_inode_destroy synchronously and we can't use GPL-only call_rcu() in zfs directly. Fortunately, on Linux 5.2 and after, if we define sops->free_inode(), the kernel will do call_rcu() for us. This issue may be triggered more easily with init_on_free=1 boot parameter: BUG: kernel NULL pointer dereference, address: 0000000000000020 RIP: 0010:selinux_inode_permission+0x10e/0x1c0 Call Trace: ? show_trace_log_lvl+0x1be/0x2d9 ? show_trace_log_lvl+0x1be/0x2d9 ? show_trace_log_lvl+0x1be/0x2d9 ? security_inode_permission+0x37/0x60 ? __die_body.cold+0x8/0xd ? no_context+0x113/0x220 ? exc_page_fault+0x6d/0x130 ? asm_exc_page_fault+0x1e/0x30 ? selinux_inode_permission+0x10e/0x1c0 security_inode_permission+0x37/0x60 link_path_walk.part.0.constprop.0+0xb5/0x360 ? path_init+0x27d/0x3c0 path_lookupat+0x3e/0x1a0 filename_lookup+0xc0/0x1d0 ? __check_object_size.part.0+0x123/0x150 ? strncpy_from_user+0x4e/0x130 ? getname_flags.part.0+0x4b/0x1c0 vfs_statx+0x72/0x120 ? ioctl_has_perm.constprop.0.isra.0+0xbd/0x120 __do_sys_newlstat+0x39/0x70 ? __x64_sys_ioctl+0x8d/0xd0 do_syscall_64+0x30/0x40 entry_SYSCALL_64_after_hwframe+0x62/0xc7 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Co-authored-by: Chunwei Chen <david.chen@nutanix.com> Closes #17546	2025-08-05 12:30:23 -04:00
Rob Norris	b00bc81b05	ioctl: remove FICLONE/FICLONERANGE/FIDEDUPERANGE compat These are only required to support these ioctls on Linux <4.5. Since 4.18 is our cutoff, we don't need this code anymore. Also removing related test things that will never match again. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17308	2025-06-17 10:50:27 -07:00
Rob Norris	a65225ec7e	FreeBSD: zfs_putpages: don't undirty pages until after write completes zfs_putpages() would put the entire range of pages onto the ZIL, then return VM_PAGER_OK for each page to the kernel. However, an associated zil_commit() or txg sync had not happened at this point, so the write may not actually be on disk. So, we rework it to use a ZIL commit callback, and do the post-write work of undirtying the page and signaling completion there. We return VM_PAGER_PEND to the kernel instead so it knows that we will take care of it. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Mark Johnston <markj@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17445	2025-06-17 10:50:26 -07:00
Rob Norris	e1dd433a44	zpl_sync_fs: work around kernels that ignore sync_fs errors If the kernel will honour our error returns, use them. If not, fool it by setting a writeback error on the superblock, if available. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17420	2025-06-17 10:50:26 -07:00
Rob Norris	08cec6532e	zfs_sync: return error when pool suspends If the pool is suspended, we'll just block in zil_commit(). If the system is shutting down, blocking wouldn't help anyone. So, we should keep this test for now, but at least return an error for anyone who is actually interested. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17420	2025-06-17 10:50:26 -07:00
Rob Norris	d944641502	zfs_sync: remove support for impossible scenarios The superblock pointer will always be set, as will z_log, so remove code supporting cases that can't occur (on Linux at least). Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17420	2025-06-17 10:50:26 -07:00
Rob Norris	04493ca819	linux/zvol_os: don't try to set disk ops if alloc fails If the kernel fails to allocate the gendisk, zvo_disk will be NULL, and derefencing it will explode. So don't do that. Sponsored-by: Klara, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17396	2025-06-17 10:50:26 -07:00
Rob Norris	d7bb6bbf13	tunables: fix spelling Three occurences with an 'e', and all of them mine. Maybe it's an British thing? Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17377	2025-06-17 10:50:26 -07:00
Rob Norris	06fd6dc6f7	tunables: use Linux ullong param ops for u64 Since 3.17 Linux has provided param ops for 64-bit ints, so we don't need to use our own anymore. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17377	2025-06-17 10:50:26 -07:00
Rob Norris	28ff5ff1c6	tunables: remove support for s64 tunables Nothing uses them now. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17377	2025-06-17 10:50:26 -07:00
Rob Norris	e9002887e2	tunables: remove direct use of module_param_cb The use for spl_taskq_kick was the only use, and the comment that module_param_call is obsolete is no longer true - it's still very much used even in recent kernels. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17377	2025-06-17 10:50:26 -07:00
Rob Norris	840b070ec7	tunables: remove FreeBSD compat macros for Linux module params Nothing in any FreeBSD code uses them. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17377	2025-06-17 10:50:26 -07:00
Fedor Uporov	d187e3e1a7	ZVOL: Comment platform-specific empty functions bodies on FreeBSD side Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com> Closes #17383	2025-06-17 10:49:40 -07:00
Rob Norris	8c0f7619b2	Linux 6.2/6.15: del_timer_sync() renamed to timer_delete_sync() Renamed in 6.2, and the compat wrapper removed in 6.15. No signature or functional change apart from that, so a very minimal update for us. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17229 (cherry picked from commit `841be1d049`)	2025-05-28 16:00:28 -07:00
Rob Norris	e64d4718a7	Linux 6.15: mkdir now returns struct dentry * The intent is that the filesystem may have a reference to an "old" version of the new directory, eg if it was keeping it alive because a remote NFS client still had it open. We don't need anything like that, so this really just changes things so we return error codes encoded in pointers. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17229 (cherry picked from commit `bb740d66de`)	2025-05-28 16:00:28 -07:00
Rob Norris	db988fabfb	linux/uio: remove "skip" offset for UIO_ITER For UIO_ITER, we are just wrapping a kernel iterator. It will take care of its own offsets if necessary. We don't need to do anything, and if we do try to do anything with it (like advancing the iterator by the skip in zfs_uio_advance) we're just confusing the kernel iterator, ending up at the wrong position or worse, off the end of the memory region. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17298 (cherry picked from commit `2ee5b51a57`)	2025-05-28 16:00:28 -07:00
Olivier Certner	51ed9640e9	FreeBSD: Use new SYSCTL_SIZEOF() SYSCTL_SIZEOF() has been introduced in FreeBSD by commit "sysctl(9): Ease exporting struct sizes; Discourage doing that" (713abc9880aa) in branch 'main'. It will soon be backported to 'stable/14'. We will thus be able to remove the old, alternate version left in the '#else' branch as soon as 'stable/13' goes out of support (April 30, 2026). Sponsored-by: The FreeBSD Foundation Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Olivier Certner <olce@FreeBSD.org> Closes #17309 (cherry picked from commit `78628a5c15`)	2025-05-28 16:00:28 -07:00
Rob Norris	c85f2fd531	cred: properly pass and test creds on other threads (#17273 ) ### Background Various admin operations will be invoked by some userspace task, but the work will be done on a separate kernel thread at a later time. Snapshots are an example, which are triggered through zfs_ioc_snapshot() -> dsl_dataset_snapshot(), but the actual work is from a task dispatched to dp_sync_taskq. Many such tasks end up in dsl_enforce_ds_ss_limits(), where various limits and permissions are enforced. Among other things, it is necessary to ensure that the invoking task (that is, the user) has permission to do things. We can't simply check if the running task has permission; it is a privileged kernel thread, which can do anything. However, in the general case it's not safe to simply query the task for its permissions at the check time, as the task may not exist any more, or its permissions may have changed since it was first invoked. So instead, we capture the permissions by saving CRED() in the user task, and then using it for the check through the secpolicy_* functions. ### Current implementation The current code calls CRED() to get the credential, which gets a pointer to the cred_t inside the current task and passes it to the worker task. However, it doesn't take a reference to the cred_t, and so expects that it won't change, and that the task continues to exist. In practice that is always the case, because we don't let the calling task return from the kernel until the work is done. For Linux, we also take a reference to the current task, because the Linux credential APIs for the most part do not check an arbitrary credential, but rather, query what a task can do. See secpolicy_zfs_proc(). Again, we don't take a reference on the task, just a pointer to it. ### Changes We change to calling crhold() on the task credential, and crfree() when we're done with it. This ensures it stays alive and unchanged for the duration of the call. On the Linux side, we change the main policy checking function priv_policy_ns() to use override_creds()/revert_creds() if necessary to make the provided credential active in the current task, allowing the standard task-permission APIs to do the needed check. Since the task pointer is no longer required, this lets us entirely remove secpolicy_zfs_proc() and the need to carry a task pointer around as well. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Kyle Evans <kevans@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> (cherry picked from commit `c8fa39b46c`)	2025-05-28 16:00:28 -07:00
Brian Atkinson	a77d641f01	Export correct symbols for Lustre Direct I/O Originally the Lustre ZFS OSD code was going to use zfs_uio_t structs for supporting Direct I/O with ZFS. However, this has changed to using abd_t structs instead. This exports the proper symbols that will be used by the Lustre ZFS OSD code. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #17256 (cherry picked from commit `7031a48c70`)	2025-05-28 16:00:28 -07:00
Tony Hutter	20f00819f3	Linux 6.0 compat: Check for migratepage VFS (#17217 ) The 6.0 kernel removes the 'migratepage' VFS op. Check for migratepage. Signed-off-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org	2025-04-16 09:59:45 -07:00
aokblast	153c982aac	spl_vfs: fix vrele task runner signature mismatch Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: SHENGYI HONG <aokblast@FreeBSD.org> Closes #17101	2025-04-16 09:59:45 -07:00
Pavel Snajdr	c22f5c1c55	Linux: Fix zfs_prune panics v2 (#17121 ) It turns out that approach taken in the original version of the patch was wrong. So now, we're taking approach in-line with how kernel actually does it - when sb is being torn down, access to it is serialized via sb->s_umount rwsem, only when that lock is taken is it okay to work with s_flags - and the other mistake I was doing was trying to make SB_ACTIVE work, but apparently the kernel checks the negative variant - not SB_DYING and not SB_BORN. Kernels pre-6.6 don't have SB_DYING, but check if sb is hashed instead. Signed-off-by: Pavel Snajdr <snajpa@snajpa.net> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-04-16 09:59:45 -07:00
Rob Norris	9e009acbdc	dmu_tx: rename dmu_tx_assign() flags from TXG_* to DMU_TX_* (#17143 ) This helps to avoids confusion with the similarly-named txg_wait_synced(). Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-04-16 09:59:45 -07:00
Rob Norris	6b2c046d18	SPDX: license tags: GPL-2.0-or-later Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-04-16 09:59:44 -07:00
Rob Norris	865ca576ab	SPDX: license tags: BSD-2-Clause Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-04-16 09:59:44 -07:00
Rob Norris	9530eb64e0	SPDX: license tags: CDDL-1.0 Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-04-16 09:59:44 -07:00
Rob Norris	6503f8c6f0	Linux/vnops: implement STATX_DIOALIGN This statx(2) mask returns the alignment restrictions for O_DIRECT access on the given file. We're expected to return both memory and IO alignment. For memory, it's always PAGE_SIZE. For IO, we return the current block size for the file, which is the required alignment for an arbitrary block, and for the first block we'll fall back to the ARC when necessary, so it should always work. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16972	2025-04-02 17:04:14 -07:00
Rob Norris	af062c480c	vdev_file: unify FreeBSD and Linux implementations (#17046 ) Kernel & userspace specifics are in zfs_file_os.c, so there's no particular reason these have to be separate. The one platform-specific part is in the Linux kernel part, to offload flushes to a taskq if we're already inside a filesystem transaction. This would be normally be an unsatisfying wart, but I'm intending to remove this shortly, so I'm content to leave it gated for the moment. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com>	2025-02-28 00:42:29 +05:00
vandanarungta	c4fa9c2962	Free memory in an error path in spl-kmem-cache.c skc->skc_name also needs to be freed in an error path. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Vandana Rungta <vrungta@amazon.com> Closes #17041	2025-02-28 00:42:29 +05:00
Rob Norris	3266d4d655	Linux 6.14: BLK_MQ_F_SHOULD_MERGE was removed According to the upstream change, all callers set it, and all block devices either honoured it or ignored it, so removing it entirely allows a bunch of handling for the "unset" case to be removed, and it becomes effectively implied. We follow suit, and keep setting it for older kernels. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-02-28 00:42:29 +05:00
Rob Norris	51bec16060	Linux 6.14: dops->d_revalidate now takes four args This is a convenience for filesystems that need the inode of their parent or their own name, as its often complicated to get that information. We don't need those things, so this is just detecting which prototype is expected and adjusting our callback to match. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-02-28 00:42:29 +05:00
Brian Atkinson	0e21e473a7	Update pin_user_pages() calls for Direct I/O Originally #16856 updated Linux Direct I/O requests to use the new pin_user_pages API. However, it was an oversight that this PR only handled iov_iter's of type ITER_IOVEC and ITER_UBUF. Other iov_iter types may try and use the pin_user_pages API if it is available. This can lead to panics as the iov_iter is not being iterated over correctly in zfs_uio_pin_user_pages(). Unfortunately, generic iov_iter API's that call pin_user_page_fast() are protected as GPL only. Rather than update zfs_uio_pin_user_pages() to account for all iov_iter types, we can simply just call zfs_uio_get_dio_page_iov_iter() if the iov_iter type is not ITER_IOVEC or ITER_UBUF. zfs_uio_get_dio_page_iov_iter() calls the iov_iter_get_pages() calls that can handle any iov_iter type. In the future it might be worth using the exposed iov_iter iterator functions that are included in the header iov_iter.h since v6.7. These functions allow for any iov_iter type to be iterated over and advanced while applying a step function during iteration. This could possibly be leveraged in zfs_uio_pin_user_pages(). A new ZFS test case was added to test that a ITER_BVEC is handled correctly using this new code path. This test case was provided though issue #16956. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #16956 Closes #17006	2025-02-25 22:33:25 +05:00
Alan Somers	6e9911212e	Make the vfs.zfs.vdev.raidz_impl sysctl cross-platform Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Alan Somers <asomers@gmail.com> Sponsored by: ConnectWise Closes #16980	2025-02-25 22:32:11 +05:00
rmacklem	42bad93414	FreeBSD: Add setting of the VFCF_FILEREV flag The flag VFCF_FILEREV was recently defined in FreeBSD so that a file system could indicate that it increments va_filerev by one for each change. Since ZFS does do this, set the flag if defined for the kernel being built. This allows the NFSv4.2 server to reply with the correct change_attr_type attribute value. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca> Closed #16976	2025-02-25 22:29:39 +05:00
Alexander Motin	675b49d2a1	FreeBSD: Use ashift in vdev_check_boot_reserve() We should not hardcode 512-byte read size when checking for loader in the boot area before RAIDZ expansion. Disk might be unable to handle that I/O as is, and the code zio_vdev_io_start() handling the padding asserts doing it only for top-level vdev. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16942	2025-02-25 22:24:59 +05:00
pstef	cfec8f13a2	zfs_vnops_os.c: fallocate is valid but not supported on FreeBSD This works around /usr/lib/go-1.18/pkg/tool/linux_amd64/link: mapping output file failed: invalid argument It's happened to me under a Linux jail, but it's also happened to other people, see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270247#c4 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: pstef <pstef@users.noreply.github.com> Closes #16918	2025-01-03 15:23:49 -08:00
Andrew Walker	679b164cd3	Add missing zfs_exit() when snapdir is disabled (#16912 ) zfs_vget doesn't zfs_exit when erroring out due to snapdir being disabled. Signed-off-by: Andrew Walker <awalker@ixsystems.com> Reviewed-by: @bmeagherix Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-01-02 17:04:10 -08:00
shodanshok	c2d9494f99	set zfs_arc_shrinker_limit to 0 by default zfs_arc_shrinker_limit was introduced to avoid ARC collapse due to aggressive kernel reclaim. While useful, the current default (10000) is too prone to OOM especially when MGLRU-enabled kernels with default min_ttl_ms are used. Even when no OOM happens, it often causes too much swap usage. This patch sets zfs_arc_shrinker_limit=0 to not ignore kernel reclaim requests. ARC now plays better with both kernel shrinker and pagecache but, should ARC collapse happen again, MGLRU behavior can be tuned or even disabled. Anyway, zfs should not cause OOM when ARC can be released. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes #16909	2024-12-29 11:53:45 -08:00

1 2 3 4 5 ...

960 Commits