mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-01-25 10:12:13 +03:00

Author	SHA1	Message	Date
Rob Norris	46a4075100	Linux 6.16: remove writepage and readahead_page Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17443	2025-06-23 15:51:02 -04:00
Rob Norris	238eab7dc1	FreeBSD: zfs_putpages: don't undirty pages until after write completes zfs_putpages() would put the entire range of pages onto the ZIL, then return VM_PAGER_OK for each page to the kernel. However, an associated zil_commit() or txg sync had not happened at this point, so the write may not actually be on disk. So, we rework it to use a ZIL commit callback, and do the post-write work of undirtying the page and signaling completion there. We return VM_PAGER_PEND to the kernel instead so it knows that we will take care of it. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Mark Johnston <markj@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17445	2025-06-12 14:45:18 -07:00
Fedor Uporov	3dfa98d013	ZVOL: Make zvol_inhibit_dev module parameter platform-independent The module parameter now is represented in FreeBSD sysctls list with name: 'vfs.zfs.vol.inhibit_dev'. The default value is '0', same as on Linux side. Sponsored-by: vStack, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com> Closes #17384	2025-05-29 09:37:41 -04:00
Rob Norris	9392be427e	tunables: remove __check_old_set_param workaround This was fully removed from Linux in 4.15, so we won't be seeing it again. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17377	2025-05-28 16:50:22 -07:00
Rob Norris	dd4e2f99f0	tunables: remove unused param get/set aliases Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17377	2025-05-28 16:50:22 -07:00
Rob Norris	589d99171f	tunables: use Linux ullong param ops for u64 Since 3.17 Linux has provided param ops for 64-bit ints, so we don't need to use our own anymore. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17377	2025-05-28 16:50:22 -07:00
Rob Norris	6e7e7ea7ef	tunables: remove support for s64 tunables Nothing uses them now. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17377	2025-05-28 16:50:22 -07:00
Rob Norris	7b183f1918	tunables: remove FreeBSD compat macros for Linux module params Nothing in any FreeBSD code uses them. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17377	2025-05-28 16:50:22 -07:00
Alexander Motin	0aa83dce99	Linux: Stop using NR_FILE_PAGES for ARC scaling I've found that QEMU/KVM guest memory accounted as shared also included into NR_FILE_PAGES. But it is actually a non-evictable anonymous memory. Using it as a base for zfs_arc_pc_percent parameter makes ARC to ignore shrinker requests while page cache does not really have anything to evict, ending up in OOM killer killing the QEMU process. Instead use of NR_ACTIVE_FILE + NR_INACTIVE_FILE should represent the part of a page cache that is actually evictable, which should be safer to use as a reference for ARC scaling. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17334	2025-05-14 09:29:02 -04:00
Rob Norris	2ee5b51a57	linux/uio: remove "skip" offset for UIO_ITER For UIO_ITER, we are just wrapping a kernel iterator. It will take care of its own offsets if necessary. We don't need to do anything, and if we do try to do anything with it (like advancing the iterator by the skip in zfs_uio_advance) we're just confusing the kernel iterator, ending up at the wrong position or worse, off the end of the memory region. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17298	2025-05-11 12:46:40 -04:00
Rob Norris	4800181b3b	ioctl: remove FICLONE/FICLONERANGE/FIDEDUPERANGE compat These are only required to support these ioctls on Linux <4.5. Since 4.18 is our cutoff, we don't need this code anymore. Also removing related test things that will never match again. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17308	2025-05-08 10:32:52 -04:00
Rob Norris	c8fa39b46c	cred: properly pass and test creds on other threads (#17273 ) ### Background Various admin operations will be invoked by some userspace task, but the work will be done on a separate kernel thread at a later time. Snapshots are an example, which are triggered through zfs_ioc_snapshot() -> dsl_dataset_snapshot(), but the actual work is from a task dispatched to dp_sync_taskq. Many such tasks end up in dsl_enforce_ds_ss_limits(), where various limits and permissions are enforced. Among other things, it is necessary to ensure that the invoking task (that is, the user) has permission to do things. We can't simply check if the running task has permission; it is a privileged kernel thread, which can do anything. However, in the general case it's not safe to simply query the task for its permissions at the check time, as the task may not exist any more, or its permissions may have changed since it was first invoked. So instead, we capture the permissions by saving CRED() in the user task, and then using it for the check through the secpolicy_* functions. ### Current implementation The current code calls CRED() to get the credential, which gets a pointer to the cred_t inside the current task and passes it to the worker task. However, it doesn't take a reference to the cred_t, and so expects that it won't change, and that the task continues to exist. In practice that is always the case, because we don't let the calling task return from the kernel until the work is done. For Linux, we also take a reference to the current task, because the Linux credential APIs for the most part do not check an arbitrary credential, but rather, query what a task can do. See secpolicy_zfs_proc(). Again, we don't take a reference on the task, just a pointer to it. ### Changes We change to calling crhold() on the task credential, and crfree() when we're done with it. This ensures it stays alive and unchanged for the duration of the call. On the Linux side, we change the main policy checking function priv_policy_ns() to use override_creds()/revert_creds() if necessary to make the provided credential active in the current task, allowing the standard task-permission APIs to do the needed check. Since the task pointer is no longer required, this lets us entirely remove secpolicy_zfs_proc() and the need to carry a task pointer around as well. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Kyle Evans <kevans@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-04-29 16:27:48 -07:00
Alexander Motin	4866c2fabf	Cleanup VERIFY() macros (#17163 ) - Fix VERIFY3B() when given non-boolean values. - Map EQUIV() into VERIFY3B(,==,) as equivalent. - Tune messages for better readability and to closer match source code for easier search. Unify user-space messages with kernel. - Tune printed types and remove %px outside of Linux kernel. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Reviewed-by: @ImAwsumm Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-04-16 09:01:32 -07:00
Ameer Hamza	2a8d9d9607	Add default user/group/project quota properties This adds default userquota, groupquota, and projectquota properties to MASTER_NODE_OBJ to make them accessible during zfsvfs_init() (regular DSL properties require dsl_config_lock, which cannot be safely acquired in this context). The zfs_fill_zplprops_impl() logic is updated to read these default properties directly from MASTER_NODE_OBJ. Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-04-03 10:35:22 -07:00
Piotr Kubaj	11ca12dbd3	simd_powerpc.h: enable FPU on FreeBSD FreeBSD nowadays supports FPU in the kernel on powerpc*, so enable it. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Piotr Kubaj <pkubaj@FreeBSD.org> Closes #17191	2025-04-01 09:18:38 -04:00
Rob Norris	d28d2e3007	linux/kstat: allow multi-level module names Module names are mapped directly to directory names in procfs, but nothing is done to create the intermediate directories, or remove them. This makes it impossible to sensibly present kstats about sub-objects. This commit loops through '/'-separated names in the full module name, creates a separate module for each, and hooks them up with a parent pointer and child counter, and then unrolls this on the other side when deleting a module. Sponsored-by: Klara, Inc. Sponsored-by: Syneto Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-03-20 16:24:50 -07:00
Rob Norris	f83431b3bd	SPDX: license tags: GPL-2.0-or-later Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-03-13 17:57:09 -07:00
Rob Norris	137045be98	SPDX: license tags: BSD-2-Clause Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-03-13 17:56:46 -07:00
Rob Norris	eb9098ed47	SPDX: license tags: CDDL-1.0 Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-03-13 17:56:27 -07:00
Fabian-Gruenbichler	41823a0ede	linux: zvols: correctly detect flush requests (#17131 ) since 4.10, bio->bi_opf needs to be checked to determine all kinds of flush requests. this was the case prior to the commit referenced below, but the order of ifdefs was not the usual one (newest up top), which might have caused this to slip through. this fixes a regression when using zvols as Qemu block devices, but might have broken other use cases as well. the symptoms are that all sync writes from within a VM configured to use such a virtual block devices are ignored and treated as async writes by the host ZFS layer. this can be verified using fio in sync mode inside the VM, for example with fio \ --filename=/dev/sda --ioengine=libaio --loops=1 --size=10G \ --time_based --runtime=60 --group_reporting --stonewall --name=cc1 \ --description="CC1" --rw=write --bs=4k --direct=1 --iodepth=1 \ --numjobs=1 --sync=1 which shows an IOPS number way above what the physical device underneath supports, with "zpool iostat -r 1" on the hypervisor side showing no sync IO occuring during the benchmark. with the regression fixed, both fio inside the VM and the IO stats on the host show the expected numbers. Fixes: `846b598519` "config: remove HAVE_REQ_OP_* and HAVE_REQ_*" Signed-off-by: Fabian-Gruenbichler <f.gruenbichler@proxmox.com> Co-authored-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-03-12 14:39:01 -07:00
Brian Atkinson	1e32c57893	Update pin_user_pages() calls for Direct I/O Originally #16856 updated Linux Direct I/O requests to use the new pin_user_pages API. However, it was an oversight that this PR only handled iov_iter's of type ITER_IOVEC and ITER_UBUF. Other iov_iter types may try and use the pin_user_pages API if it is available. This can lead to panics as the iov_iter is not being iterated over correctly in zfs_uio_pin_user_pages(). Unfortunately, generic iov_iter API's that call pin_user_page_fast() are protected as GPL only. Rather than update zfs_uio_pin_user_pages() to account for all iov_iter types, we can simply just call zfs_uio_get_dio_page_iov_iter() if the iov_iter type is not ITER_IOVEC or ITER_UBUF. zfs_uio_get_dio_page_iov_iter() calls the iov_iter_get_pages() calls that can handle any iov_iter type. In the future it might be worth using the exposed iov_iter iterator functions that are included in the header iov_iter.h since v6.7. These functions allow for any iov_iter type to be iterated over and advanced while applying a step function during iteration. This could possibly be leveraged in zfs_uio_pin_user_pages(). A new ZFS test case was added to test that a ITER_BVEC is handled correctly using this new code path. This test case was provided though issue #16956. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #16956 Closes #17006	2025-01-30 15:53:59 -08:00
Alan Somers	12f0baf348	Make the vfs.zfs.vdev.raidz_impl sysctl cross-platform Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Alan Somers <asomers@gmail.com> Sponsored by: ConnectWise Closes #16980	2025-01-29 09:18:09 -05:00
Brian Atkinson	c6442bd3b6	Removing old code outside of 4.18 kernsls There were checks still in place to verify we could completely use iov_iter's on the Linux side. All interfaces are available as of kernel 4.18, so there is no reason to check whether we should use that interface at this point. This PR completely removes the UIO_USERSPACE type. It also removes the check for the direct_IO interface checks. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #16856	2024-12-16 10:23:45 -08:00
Rob Norris	e0039c7057	Remove unnecessary CSTYLED escapes on top-level macro invocations cstyle can handle these cases now, so we don't need to disable it. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16840	2024-12-06 08:53:57 -08:00
Alexander Motin	654ade8ca2	FreeBSD: Remove some illumos compat from vnode.h Should make no difference, just some dead code cleanup. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Martin Matuska <mm@FreeBSD.org> Signed-off-by:Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16808	2024-12-03 09:32:54 -08:00
Alexander Motin	ae00c807dc	FreeBSD: Return ifndef IN_BASE back to fix the build FreeBSD's libprocstat seems to build kernel code in user space, which does not work here due to undefined vnode_t. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Martin Matuska <mm@FreeBSD.org> Signed-off-by:Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16808	2024-12-03 09:32:07 -08:00
Alexander Motin	3e9ba0f223	Add missing parenthesis in VERIFYF() Without them the order of operations might get unexpected. Reviewed-by: Adam Moss <c@yotes.com> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16826	2024-12-02 16:55:51 -08:00
Alexander Motin	b3b0ce64d5	FreeBSD: Lock vnode in zfs_ioctl() Previously vnode was not locked there, unlike Linux. It required locking it in vn_flush_cached_data(), which recursed on the lock if called from zfs_clone_range(), having the vnode locked. Reviewed-by: Alan Somers <asomers@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16789 Closes #16796	2024-11-23 14:26:52 -08:00
Alexander Motin	0d6306be8c	Fix few __VA_ARGS typos in assertions It should be __VA_ARGS__, not __VA_ARGS. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16780	2024-11-19 06:48:06 -08:00
Dimitry Andric	c480e06d88	Fix gcc unused value warning in FreeBSD simd.h The macros `simd_stat_init()` and `simd_stat_fini()` in FreeBSD's `simd.h` are defined as zero, but they are actually only used as statements. Replace the definitions with `do {} while (0)` instead, to avoid gcc `-Wunused-value` warnings. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Toomas Soome <tsoome@me.com> Signed-off-by: Dimitry Andric <dimitry@andric.com> Closes #16693	2024-10-29 14:49:54 -04:00
Rob Norris	96f382d113	spl/thread: explicitly define thread_func_t as noreturn All of our thread entry functions have this signature: void ()(void) __attribute__((noreturn)) The low-level `__thread_create()` function accepts a `thread_func_t` as the entry point, which is defined more simply as: void ()(void ) And then the `thread_create()` and `thread_create_named()` macros cast the passed-in function point down to `thread_func_t`, that is, casting away the `noreturn` attribute. Clang considers casting between these two types to be invalid because both the caller and the callee may have elided parts of the stack frame save and restore, knowing that they won't be needed. Recent Linux appears to be setting `-Wcast-function-type-strict`, which causes this invalid cast to emit a warning, which with `-Werror` is converted to an error, breaking the build. This commit fixes this in the simplest possible way: adding `noreturn` to the `thread_func_t` attribute. Since all our thread entry functions already have this attribute, it's arguably a just a consistency fix anyway. I considered removing the casts in the macros, which silences the warnings, but it turns out that Clang has a bug that won't emit this error for implicit conversions, only explicit casts. So leaving them there seems like a reasonable belt-and-suspenders approach. Also, frankly, this whole mechanism seems a little undercooked inside LLVM, so I'm content go with my intuition about the smallest, least invaisve change. NOTE: `__thread_create` is exported by `spl.ko` and has a `thread_func_t` arg, so this is an ABI break. Whether that matters in practice, I have no idea. Further reading: - `1aad641c79` - https://github.com/llvm/llvm-project/issues/7325 - https://github.com/llvm/llvm-project/issues/41465 Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16672 Closes #16673	2024-10-21 11:38:30 -07:00
Tomohiro Kusumi	a9851ea3dd	Fix compile-time warnings caused by duplicate struct typedefs Some compiler/versions warn these typedefs according to #16660. The platform specific header sys/abd_os.h shouldn't define or use abd_t, as it's defined in its non-platform specific consumer sys/abd.h. Do the same as what FreeBSD header does. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes #16660 Closes #16665	2024-10-20 09:43:16 -07:00
Umer Saleem	27e8f56102	Fix inconsistent mount options for ZFS root While mounting ZFS root during boot on Linux distributions from initrd, mount from busybox is effectively used which executes mount system call directly. This skips the ZFS helper mount.zfs, which checks and enables the mount options as specified in dataset properties. As a result, datasets mounted during boot from initrd do not have correct mount options as specified in ZFS dataset properties. There has been an attempt to use mount.zfs in zfs initrd script, responsible for mounting the ZFS root filesystem (PR#13305). This was later reverted (PR#14908) after discovering that using mount.zfs breaks mounting of snapshots on root (/) and other child datasets of root have the same issue (Issue#9461). This happens because switching from busybox mount to mount.zfs correctly parses the mount options but also adds 'mntpoint=/root' to the mount options, which is then prepended to the snapshot mountpoint in '.zfs/snapshot'. '/root' is the directory on Debian with initramfs-tools where root filesystem is mounted before pivot_root. When Linux runtime is reached, trying to access the snapshots on root results in automounting the snapshot on '/root/.zfs/*', which fails. This commit attempts to fix the automounting of snapshots on root, while using mount.zfs in initrd script. Since the mountpoint of dataset is stored in vfs_mntpoint field, we can check if current mountpoint of dataset and vfs_mntpoint are same or not. If they are not same, reset the vfs_mntpoint field with current mountpoint. This fixes the mountpoints of root dataset and children in respective vfs_mntpoint fields when we try to access the snapshots of root dataset or its children. With correct mountpoint for root dataset and children stored in vfs_mntpoint, all snapshots of root dataset are mounted correctly and become accessible. This fix will come into play only if current process, that is trying to access the snapshots is not in chroot context. The Linux kernel API that is used to convert struct path into char format (d_path), returns the complete path for given struct path. It works in chroot environment as well and returns the correct path from original filesystem root. However d_path fails to return the complete path if any directory from original root filesystem is mounted using --bind flag or --rbind flag in chroot environment. In this case, if we try to access the snapshot from outside the chroot environment, d_path returns the path correctly, i.e. it returns the correct path to the directory that is mounted with --bind flag. However inside the chroot environment, it only returns the path inside chroot. For now, there is not a better way in my understanding that gives the complete path in char format and handles the case where directories from root filesystem are mounted with --bind or --rbind on another path which user will later chroot into. So this fix gets enabled if current process trying to access the snapshot is not in chroot context. With the snapshots issue fixed for root filesystem, using mount.zfs in ZFS initrd script, mounts the datasets with correct mount options. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #16646	2024-10-17 09:09:39 -04:00
Warner Losh	38a04f0a7c	freebsd: Use compiler.h from FreeBSD's base's linuxkpi The FreeBSD linux/compiler.h in OpenZFS was copied from a very old version of FreeBSD's linuxkpi's linux/compiler.h. There's no need for this duplication. Use FreeBSD's linuxkpi version instead, and provide zfs_fallthrough to augment it (it's all that's needed). Use #pragma once to avoid naming issues for guard variables. Since this is a complete rewrite, use my copyright here (the original code in FreeBSD still credits everybody). This works back at least to FreeBSD 12.4, which is not out of support, and all newer releases. Remove extra copies of macros that were defined elsewhere, but are now properly defined in LinuxKPI so are redundant. Sponsored-by: Netflix Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Warner Losh <imp@bsdimp.com> Closes #16650	2024-10-16 13:00:40 -04:00
Martin Matuška	efeb60b86a	FreeBSD: ignore some includes when not building kernel The function abd_alloc_from_pages() is used only in kernel. Excluding sys/vm.h, and vm/vm_page.h includes avoids dependency problems. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Martin Matuska <mm@FreeBSD.org> Closes #16616	2024-10-09 09:27:46 -07:00
Brian Behlendorf	d34d4f97a8	snapdir: add 'disabled' value to make .zfs inaccessible In some environments, just making the .zfs control dir hidden from sight might not be enough. In particular, the following scenarios might warrant not allowing access at all: - old snapshots with wrong permissions/ownership - old snapshots with exploitable setuid/setgid binaries - old snapshots with sensitive contents Introducing a new 'disabled' value that not only hides the control dir, but prevents access to its contents by returning ENOENT solves all of the above. The new property value takes advantage of 'iuv' semantics ("ignore unknown value") to automatically fall back to the old default value when a pool is accessed by an older version of ZFS that doesn't yet know about 'disabled' semantics. I think that technically the zfs_dirlook change is enough to prevent access, but preventing lookups and dir entries in an already opened .zfs handle might also be a good idea to prevent races when modifying the property at runtime. Add zfs_snapshot_no_setuid parameter to control whether automatically mounted snapshots have the setuid mount option set or not. this could be considered a partial fix for one of the scenarios mentioned in desired. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Co-authored-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Closes #3963 Closes #16587	2024-10-02 09:12:02 -07:00
Rob Norris	0cf14bf4b5	Linux 6.12: PG_error flag was removed torvalds/linux@09022bc196 removes the flag, and the corresponding SetPageError() and ClearPageError() macros, with no replacement offered. Going back through the upstream history, use of this flag has been gradually removed over the last year as part of the long tail of converting everything to folios. Interesting tidbit comments from torvalds/linux@29e9412b25 and torvalds/linux@420e05d0de suggest that this flag has not been used meaningfully since page writeback failures started being recorded in errseq_t instead (the whole "fsyncgate" thing, ~2017, around torvalds/linux@8ed1e46aaf). Given that, it's possible that since perhaps Linux 4.13 we haven't been getting anything by setting the flag. I don't know if that's true and/or if there's something we should be doing instead, but my gut feel is that its probably fine we only use the page cache as a proxy to allow mmap() to work, rather than backing IO with it. As such, I'm expecting that removing this will do no harm, but I'm leaving it in for older kernels to maintain status quo, and if there is an overall better way, that is left for a future change. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16582	2024-10-01 13:54:05 -07:00
Rob Norris	1c7f2f6a50	Linux 6.12: f_version removed from struct file linux/torvalds@11068e0b64 removes it, suggesting this was a always there as a helper to handle concurrent seeks, which all filesystems now handle themselves if necessary. Without looking into the mechanism, I can imagine how it might have been used, but we have always set it to zero and never read from it, presumably because we've always tracked per-caller position through the znode anyway. So I don't see how there can be any functional change for us by removing it. I've stayed conservative though and left it in for older kernels, since its clearly not hurting anything there. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16582	2024-10-01 13:54:00 -07:00
Rob Norris	df3b9d881b	Linux 6.12: FMODE_UNSIGNED_OFFSET is now FOP_UNSIGNED_OFFSET torvalds/linux@641bb4394f asserts that this is a static flag, not intended to be variable per-file, so it moves it to file_operations instead. We just change our check to follow. No configure check is necessary because FOP_UNSIGNED_OFFSET didn't exist before this commit, and FMODE_UNSIGNED_OFFSET flag is removed in the same commit, so there's no chance of a conflict. It's not clear to me that we need this check at all, as we never set this flag on our own files, and I can't see any way that our llseek handler could recieve a file from another filesystem. But, the whole zpl_llseek() has a number of opportunities for pleasing cleanup that are nothing to do with this change, so I'll leave that for a future change. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16582	2024-10-01 13:53:55 -07:00
Rob Norris	bc96b80550	Linux 6.12: avoid kmem_cache_create redefinition torvalds/linux@b2e7456b5c makes kmem_cache_create() a macro, which gets in the way of our our own redefinition, so we undef the macro first for our own clients. This follows what we did for kmem_cache_alloc(), see `e951dba48`. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16582	2024-10-01 13:53:33 -07:00
Sanjeev Bagewadi	20232ecfaa	Support for longnames for files/directories (Linux part) This patch adds the ability for zfs to support file/dir name up to 1023 bytes. This number is chosen so we can support up to 255 4-byte characters. This new feature is represented by the new feature flag feature@longname. A new dataset property "longname" is also introduced to toggle longname support for each dataset individually. This property can be disabled, even if it contains longname files. In such case, new file cannot be created with longname but existing longname files can still be looked up. Note that, to my knowledge native Linux filesystems don't support name longer than 255 bytes. So there might be programs not able to work with longname. Note that NFS server may needs to use exportfs_get_name to reconnect dentries, and the buffer being passed is limit to NAME_MAX+1 (256). So NFS may not work when longname is enabled. Note, FreeBSD vfs layer imposes a limit of 255 name lengh, so even though we add code to support it here, it won't actually work. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #15921	2024-10-01 13:40:27 -07:00
Rich Ercolani	5d01243964	Add SIMD metadata in /proc on Linux Too many times, people's performance problems have amounted to "somehow your SIMD support isn't working", and determining that at runtime is difficult to describe to people. This adds a /proc/spl/kstat/zfs/simd node, which exposes metadata about which instructions ZFS thinks it can use, on AArch64 and x86_64 Linux, to make investigating things like this much easier. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #16530	2024-09-20 08:16:44 -07:00
Rob Norris	a83762b3f4	linux: remove kernel version checks for unsupported kernels Following `2b069768a` (#16479), anything gated on a kernel version before 4.18 can be always included/excluded. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16545	2024-09-19 15:43:44 -07:00
Rob Norris	5df65ca9c1	config: remove HAVE_GET_USER_PAGES_* get_user_pages_unlocked() had stabilised by 4.9. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16479	2024-09-18 11:23:51 -07:00
Rob Norris	1cb46d9d1a	config: remove HAVE_MODE_LOOKUP_BDEV Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16479	2024-09-18 11:23:51 -07:00
Rob Norris	0a61e51736	config: rework ZFS_GENHD_FL_* Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16479	2024-09-18 11:23:51 -07:00
Rob Norris	d4e5538014	config: remove HAVE_GENERIC_IO_ACCT_3ARG Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16479	2024-09-18 11:23:51 -07:00
Rob Norris	7a02229293	config: remove HAVE_VFSMOUNT_IOPS_GETATTR Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16479	2024-09-18 11:23:51 -07:00
Rob Norris	dcb8e5ec7c	config: remove HAVE_BLK_MQ Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16479	2024-09-18 11:23:50 -07:00
Rob Norris	1bf93713d8	config: remove HAVE_BLK_QUEUE_FLAG_* Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16479	2024-09-18 11:23:50 -07:00

1 2 3 4 5 ...

492 Commits