mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-04-17 08:54:52 +03:00

Author	SHA1	Message	Date
Paul Dagnelie	246e5883bb	Implement allocation size ranges and use for gang leaves (#17111 ) When forced to resort to ganging, ZFS currently allocates three child blocks, each one third of the size of the original. This is true regardless of whether larger allocations could be made, which would allow us to have fewer gang leaves. This improves performance when fragmentation is high enough to require ganging, but not so high that all the free ranges are only just big enough to hold a third of the recordsize. This is also useful for improving the behavior of a future change to allow larger gang headers. We add the ability for the allocation codepath to allocate a range of sizes instead of a single fixed size. We then use this to pre-allocate the DVAs for the gang children. If those allocations fail, we fall back to the normal write path, which will likely re-gang. Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-05-02 15:32:18 -07:00
Artem	27f3d94940	Sort the blocking snapshots list #12751 (#17264 ) When multiple snapshots prevent the destruction/rollback of the respective dataset/snapshot/volume via zfs destroy or zfs rollback, the error message does not list the blocking snapshots sorted according to their order of creation. This causes inconvenience and can lead to confusion, and also creates a contrast with a returned message from zfs list -t snap function. Closes: #12751 Signed-off-by: Artem-OSSRevival <artem.vlasenko@ossrevival.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-05-01 17:40:23 -07:00
Rob Norris	c8fa39b46c	cred: properly pass and test creds on other threads (#17273 ) ### Background Various admin operations will be invoked by some userspace task, but the work will be done on a separate kernel thread at a later time. Snapshots are an example, which are triggered through zfs_ioc_snapshot() -> dsl_dataset_snapshot(), but the actual work is from a task dispatched to dp_sync_taskq. Many such tasks end up in dsl_enforce_ds_ss_limits(), where various limits and permissions are enforced. Among other things, it is necessary to ensure that the invoking task (that is, the user) has permission to do things. We can't simply check if the running task has permission; it is a privileged kernel thread, which can do anything. However, in the general case it's not safe to simply query the task for its permissions at the check time, as the task may not exist any more, or its permissions may have changed since it was first invoked. So instead, we capture the permissions by saving CRED() in the user task, and then using it for the check through the secpolicy_* functions. ### Current implementation The current code calls CRED() to get the credential, which gets a pointer to the cred_t inside the current task and passes it to the worker task. However, it doesn't take a reference to the cred_t, and so expects that it won't change, and that the task continues to exist. In practice that is always the case, because we don't let the calling task return from the kernel until the work is done. For Linux, we also take a reference to the current task, because the Linux credential APIs for the most part do not check an arbitrary credential, but rather, query what a task can do. See secpolicy_zfs_proc(). Again, we don't take a reference on the task, just a pointer to it. ### Changes We change to calling crhold() on the task credential, and crfree() when we're done with it. This ensures it stays alive and unchanged for the duration of the call. On the Linux side, we change the main policy checking function priv_policy_ns() to use override_creds()/revert_creds() if necessary to make the provided credential active in the current task, allowing the standard task-permission APIs to do the needed check. Since the task pointer is no longer required, this lets us entirely remove secpolicy_zfs_proc() and the need to carry a task pointer around as well. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Kyle Evans <kevans@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-04-29 16:27:48 -07:00
Sebastian Pauka	1b4826b9a2	Support using llvm-libunwind This commit adds support for using llvm-libunwind for kernels built using llvm and clang. The two differences are that the largest register index is given by _LIBUNWIND_HIGHEST_DWARF_REGISTER, we need to check whether the register is a floating point register and the prototype for unw_regname takes the unwind cursor as the first argument. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Sebastian Pauka <me@spauka.se> Closes #17230	2025-04-24 13:58:48 -04:00
Artem-OSSRevival	37a3e26552	Add more descriptive destroy error message Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Artem-OSSRevival <artem.vlasenko@ossrevival.org> Fixes: #14538 Closes: #17234	2025-04-23 21:17:52 -04:00
Tony Hutter	8d1489735b	nvlist: Add nvlist_snprintf() and zfs_dbgmsg_nvlist() Add nvlist_snprintf() to print a nvlist to a buffer. This is basically the snprintf() version of dump_nvlist(). Along with that, add a zfs_dbgmsg_nvlist() to print out an nvlist to dbgmsg. This will aid in debugging. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #17215	2025-04-18 09:22:16 -04:00
Alexander Motin	4866c2fabf	Cleanup VERIFY() macros (#17163 ) - Fix VERIFY3B() when given non-boolean values. - Map EQUIV() into VERIFY3B(,==,) as equivalent. - Tune messages for better readability and to closer match source code for easier search. Unify user-space messages with kernel. - Tune printed types and remove %px outside of Linux kernel. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Reviewed-by: @ImAwsumm Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-04-16 09:01:32 -07:00
Rob Norris	131df3bbf2	vdev_to_nvlist_iter: ignore draid parameters when matching names (#17228 ) Various tools will display draid vdev names with parameters embedded in them, but would not accept them as valid vdev names when looking them up, making it difficult to build pipelines involving draid vdevs. This commit makes it so that if a full draid name is offered for match, it gets truncated at the first ':' character. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-04-14 17:10:48 -07:00
Richard Kojedzinszky	09fc7bb47e	Fix memory leaks in pool properties handling Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Richard Kojedzinszky <richard@kojedz.in> Closes #17208	2025-04-05 19:40:55 -04:00
Ameer Hamza	6f6c504700	Show default quotas in zfs userspace tools Update zfs userspace, groupspace, and projectspace to display the default quotas when no per-ID specific quota is configured. This ensures tool outputs align with enforced limits. Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-04-03 10:36:45 -07:00
Ameer Hamza	2a8d9d9607	Add default user/group/project quota properties This adds default userquota, groupquota, and projectquota properties to MASTER_NODE_OBJ to make them accessible during zfsvfs_init() (regular DSL properties require dsl_config_lock, which cannot be safely acquired in this context). The zfs_fill_zplprops_impl() logic is updated to read these default properties directly from MASTER_NODE_OBJ. Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-04-03 10:35:22 -07:00
Rob Norris	4eafa9e5e8	SPDX: license tags: BSD-3-Clause Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-03-13 17:56:50 -07:00
Rob Norris	137045be98	SPDX: license tags: BSD-2-Clause Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-03-13 17:56:46 -07:00
Rob Norris	eb9098ed47	SPDX: license tags: CDDL-1.0 Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-03-13 17:56:27 -07:00
Tony Hutter	ece35e0e66	zpool: allow relative vdev paths `zpool create` won't let you use relative paths to disks. This is annoying when you want to do: zpool create tank ./diskfile But have to do.. zpool create tank `pwd`/diskfile This fixes it. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #17042	2025-02-25 14:40:20 -05:00
Rob Norris	c43df8bbbf	vdev_file: unify FreeBSD and Linux implementations (#17046 ) Kernel & userspace specifics are in zfs_file_os.c, so there's no particular reason these have to be separate. The one platform-specific part is in the Linux kernel part, to offload flushes to a taskq if we're already inside a filesystem transaction. This would be normally be an unsatisfying wart, but I'm intending to remove this shortly, so I'm content to leave it gated for the moment. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com>	2025-02-20 10:42:42 -08:00
Umer Saleem	b901d4a0b6	Update the dataset name in handle after zfs_rename (#17040 ) For zfs_rename, after the dataset name is successfully updated, the dataset handle that was passed to zfs_rename, still contains the old name, due to which, the dataset handle becomes invalid. The following operations performed using this handle result in error since the dataset with old name cannot be found anymore. changelist_rename does update the names in dataset handles, but those are temporary handles that were created during changelist_gather. The original handle that was used to call zfs_rename is not updated. We should update the name in original ZFS handle after the IOCTL for rename returns success for the operation. Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-02-11 09:07:29 -08:00
George Amanakis	c2458ba921	optimize recv_fix_encryption_hierarchy() recv_fix_encryption_hierarchy() in its present state goes through all stream filesystems, and for each one traverses the snapshots in order to find one that exists locally. This happens by calling guid_to_name() for each snapshot, which iterates through all children of the filesystem. This results in CPU utilization of 100% for several minutes (for ~1000 filesystems on a Ryzen 4350G) for 1 thread at the end of a raw receive (-w, regardless whether encrypted or not, dryrun or not). Fix this by following a different logic: using the top_fs name, call gather_nvlist() to gather the nvlists for all local filesystems. For each one filesystem, go through the snapshots to find the corresponding stream's filesystem (since we know the snapshots guid and can search with it in stream_avl for the stream's fs). Then go on to fix the encryption roots and locations as in its present state. Avoiding guid_to_name() iteratively makes recv_fix_encryption_hierarchy() significantly faster (from several minutes to seconds for ~1000 filesystems on a Ryzen 4350G). Another problem is the following: in case we have promoted a clone of the filesystem outside the top filesystem specified in zfs send, zfs receive does not fail but returns an error: recv_incremental_replication() fails to find its origin and errors out with needagain=1. This results in recv_fix_hierarchy() not being called which may render some children of the top fs not mountable since their encryption root was not updated. To circumvent this make recv_incremental_replication() silently ignore this error. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #16929	2025-02-06 15:43:47 -05:00
Rob Norris	779c5a5deb	zpool_get_vdev_prop_value: show missing vdev userprops If a vdev userprop is not found, present it as value '-', default source, so it matches the output from pool userprops. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16887	2024-12-29 11:11:40 -08:00
Umer Saleem	219a89cbbf	Skip iterating over snapshots for share properties Setting sharenfs and sharesmb properties on a dataset can become costly if there are large number of snapshots, since setting the share properties iterates over all snapshots present for a dataset. If it is the root dataset for which we are trying to set the share property, snapshots for all child datasets and their children will also be iterated. There is no need to iterate over snapshots for share properties because we do not allow share properties or any other property, to be set on a snapshot itself execpt for user properties. This commit skips iterating over snapshots for share properties, instead iterate over all child dataset and their children for share properties. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #16877	2024-12-19 15:02:58 -05:00
Brian Atkinson	c6442bd3b6	Removing old code outside of 4.18 kernsls There were checks still in place to verify we could completely use iov_iter's on the Linux side. All interfaces are available as of kernel 4.18, so there is no reason to check whether we should use that interface at this point. This PR completely removes the UIO_USERSPACE type. It also removes the check for the direct_IO interface checks. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #16856	2024-12-16 10:23:45 -08:00
Rob Norris	ecc0970e3e	backtrace: fix off-by-one on string output sizeof("foo") includes the trailing null byte, so all the output had nulls through it. Most terminals quietly ignore it, but it makes some tools misdetect file types and other annoyances. Easy fix: subtract 1. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16862	2024-12-13 10:12:14 -08:00
Rob Norris	e0039c7057	Remove unnecessary CSTYLED escapes on top-level macro invocations cstyle can handle these cases now, so we don't need to disable it. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16840	2024-12-06 08:53:57 -08:00
Mariusz Zaborski	4b4e346b9f	Add ability to scrub from last scrubbed txg Some users might want to scrub only new data because they would like to know if the new write wasn't corrupted. This PR adds possibility scrub only newly written data. This introduces new `last_scrubbed_txg` property, indicating the transaction group (TXG) up to which the most recent scrub operation has checked and repaired the dataset, so users can run scrub only from the last saved point. We use a scn_max_txg and scn_min_txg which are already built into scrub, to accomplish that. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-By: Wasabi Technology, Inc. Sponsored-By: Klara Inc. Closes #16301	2024-12-04 14:21:45 -05:00
shodanshok	1cd2419ece	Fix race in libzfs_run_process_impl When replacing a disk, a child process is forked to run a script called zfs_prepare_disk (which can be useful for disk firmware update or health check). The parent than calls waitpid and checks the child error/status code. However, the _reap_children thread (created from zed_exec_process to manage zedlets) also waits for all children with the same PGID and can stole the signal, causing the replace operation to be aborted. As waitpid returns -1, the parent incorrectly assume that the child process had an error or was killed. This, in turn, leaves the newly added disk in REMOVED or UNAVAIL status rather than completing the replace process. This patch changes the PGID of the child process execuing the prepare script, shielding it from the _reap_children thread. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes #16801	2024-12-04 05:36:10 -05:00
Umer Saleem	1c9a4c8cb4	Fix user properties output for zpool list In zpool_get_user_prop, when called from zpool_expand_proplist and collect_pool, we often have zpool_props present in zpool_handle_t equal to NULL. This mostly happens when only one user property is requested using zpool list -o <user_property>. Checking for this case and correctly initializing the zpool_props field in zpool_handle_t fixes this issue. Interestingly, this issue does not occur if we query any other property like name or guid along with a user property with -o flag because while accessing properties like guid, zpool_prop_get_int is called which checks for this case specifically and calls zpool_get_all_props. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #16734	2024-11-11 09:46:45 -08:00
наб	1c7d4b4c94	module: unicode: remove unused uconv.c Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #16702	2024-11-01 12:12:13 -07:00
Alexander Motin	fba6a90696	zfs_debug: Restore log size limit for userspace For some reason it was dropped when split from kernel, that makes raidz_test to accumulate in RAM up to 100GB of logs we don't need. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16492 Closes #16566 Closes #16664	2024-10-20 09:39:05 -07:00
Rob Norris	b85c564161	libspl/backtrace: comment and harden libunwind backtracer This is the sort of code that we get right once and never look at again. Anyone reading this code is already likely in the middle of a debugging nightmare, and then they have a wall of manual string construction and an unfamiliar and idiosyncratic library to deal with. So, comment the whole thing to try to make it clear what's going on. In pursuit of the above, I've added return checks to some of the libunwind calls, fixed the frame loop to not skip the "top" frame (however unseful it may be), and fix a couple of calls to spl_bt_u64_to_hex_str() which requested 18 digits instead of 16. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16653	2024-10-20 09:36:02 -07:00
Rob Norris	2596a75306	libspl/backtrace: rename and document hex conversion function Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16653	2024-10-20 09:36:00 -07:00
Rob Norris	c7e47b3d9a	libspl/backtrace: helper macros for output My eyes are going blurry looking at all those write calls. This is much nicer. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Close #16653	2024-10-20 09:35:55 -07:00
Rob Norris	0a001f3088	libspl/backtrace: dump registers in libunwind backtraces More useful stuff, especially when trying to follow a disassembly. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16653	2024-10-20 09:35:43 -07:00
Brian Behlendorf	4319e71402	ztest: Fix scrub check in ztest_raidz_expand_check() The scrub code may return EBUSY under several possible scenarios causing ztest to incorrectly ASSERT when verifying the result of a raidz expansion. Update the test case to allow EBUSY since it does not indicate pool damage. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #16627	2024-10-08 20:41:17 -07:00
Martin Matuška	ab777f436c	Return boolean_t in inline functions of lib/libspl/include/sys/uio.h The inline functions zfs_dio_offset_aligned(), zfs_dio_size_aligned() and zfs_dio_aligned() are declared as boolean_t but return the bool type. This fixes the build of FreeBSD. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Martin Matuska <mm@FreeBSD.org> Closes #16613	2024-10-07 10:31:46 -07:00
Shengqi Chen	e8f0aa143e	Bump SONAME of libzfs and libzpool The ABI of libzfs and libzpool have breaking changes since last SONAME bump in commit `fe6babc`: * libzfs: `zpool_print_unsup_feat` removed (used by zpool cmd). * libzpool: multiple `ddt_*` symbols removed (used by zdb cmd). Bump them to avoid ABI breakage. See: https://github.com/openzfs/zfs/pull/11817 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Closes #16609	2024-10-06 14:49:33 -07:00
Rob Norris	224393a321	feature: large_microzap In `a4b21eadec` we added the zap_micro_max_size tuneable to raise the size at which "micro" (single-block) ZAPs are upgraded to "fat" (multi-block) ZAPs. Before this, a microZAP was limited to 128KiB, which was the old largest block size. The side effect of raising the max size past 128KiB is that it be stored in a large block, requiring the large_blocks feature. Unfortunately, this means that a backup stream created without the --large-block (-L) flag to zfs send would split the microZAP block into smaller blocks and send those, as is normal behaviour for large blocks. This would be received correctly, but since microZAPs are limited to the first block in the object by definition, the entries in the later blocks would be inaccessible. For directory ZAPs, this gives the appearance of files being lost. This commit adds a feature flag, large_microzap, that must be enabled for microZAPs to grow beyond 128KiB, and which will be activated the first time that occurs. This feature is later checked when generating the stream and if active, the send operation will abort unless --large-block has also been requested. Changing the limit still requires zap_micro_max_size to be changed. The state of this flag effectively sets the upper value for this tuneable, that is, if the feature is disabled, the tuneable will be clamped to 128KiB. A stream flag is also added to ensure that the receiver also activates its own feature flag upon receiving the stream. This is not strictly necessary to _use_ the received microZAP, since it doesn't care how large its block is, but it is required to send the microZAP object on, otherwise the original problem occurs again. Because it's difficult to reliably distinguish a microZAP from a fatZAP from outside the ZAP code, and because it seems unlikely that most users are affected (a fairly niche tuneable combined with what should be an uncommon use of send), and for the sake of expediency, this change activates the feature the first time a microZAP grows to use a large block, and is never deactivated after that. This can be improved in the future. This commit changes nothing for existing pools that already have large microZAPs. The feature will not be retroactively applied, but will be activated the next time a microZAP grows past the limit. Don't use large_blocks feature for enable/disable tests. The large_microzap depends on large_blocks, so it gets enabled as a dependency, breaking the test. Instead use feature "longname", which has the exact same feature characteristics. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16593	2024-10-02 20:47:11 -07:00
rilysh	86737c5927	Avoid computing strlen() inside loops Compiling with -O0 (no proper optimizations), strlen() call in loops for comparing the size, isn't being called/initialized before the actual loop gets started, which causes n-numbers of strlen() calls (as long as the string is). Keeping the length before entering in the loop is a good idea. On some places, even with -O2, both GCC and Clang can't recognize this pattern, which seem to happen in an array of char pointer. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: rilysh <nightquick@proton.me> Closes #16584	2024-10-02 09:10:06 -07:00
Brian Behlendorf	e8cbb5952d	Update all ABI files Refresh all ABI files using the CI generated files as of commit `0cf14bf4b5`. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #16592	2024-10-01 17:10:23 -07:00
Sanjeev Bagewadi	20232ecfaa	Support for longnames for files/directories (Linux part) This patch adds the ability for zfs to support file/dir name up to 1023 bytes. This number is chosen so we can support up to 255 4-byte characters. This new feature is represented by the new feature flag feature@longname. A new dataset property "longname" is also introduced to toggle longname support for each dataset individually. This property can be disabled, even if it contains longname files. In such case, new file cannot be created with longname but existing longname files can still be looked up. Note that, to my knowledge native Linux filesystems don't support name longer than 255 bytes. So there might be programs not able to work with longname. Note that NFS server may needs to use exportfs_get_name to reconnect dentries, and the buffer being passed is limit to NAME_MAX+1 (256). So NFS may not work when longname is enabled. Note, FreeBSD vfs layer imposes a limit of 255 name lengh, so even though we add code to support it here, it won't actually work. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #15921	2024-10-01 13:40:27 -07:00
Alexander Motin	3014dcb762	Reduce and handle EAGAIN errors on AIO label reads At least FreeBSD has a limit of 256 simultaneous AIO requests per process. Attempt to issue more results in EAGAIN errors. Since we issue 4 requests per disk/partition from 2xCPUs threads, it is quite easy to reach that limit on large systems, that results in random pool import failures. It annoyed me for quite a while on a system with 64 CPUs and 70+ partitioned disks. This patch from one side limits the number of threads to avoid the error, while from another should softly fall back to sync reads in case of error. It takes into account _SC_AIO_MAX as a system-wide AIO limit and _SC_AIO_LISTIO_MAX as a closest value to per-process limit. The last not exactly right, but it is the best I found. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16551	2024-09-21 10:36:25 -07:00
Rich Ercolani	5d01243964	Add SIMD metadata in /proc on Linux Too many times, people's performance problems have amounted to "somehow your SIMD support isn't working", and determining that at runtime is difficult to describe to people. This adds a /proc/spl/kstat/zfs/simd node, which exposes metadata about which instructions ZFS thinks it can use, on AArch64 and x86_64 Linux, to make investigating things like this much easier. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #16530	2024-09-20 08:16:44 -07:00
Rob Norris	e8ede2ba78	zfs_debug: specific variant for userspace Just nice and simple, with room to grow. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16492	2024-09-19 15:49:50 -07:00
Rob Norris	c22d56e3ed	zfs_znode: lift common code to a single shared file For now, userspace has no znode implementation. Some of the property and path handling code is used there though and is the same on all platforms, so we only need a single copy of it. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16492	2024-09-19 15:49:45 -07:00
Rob Norris	4c9b59e541	zfs_racct: copy Linux implementation for userspace The no-op is fine for both. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16492	2024-09-19 15:49:39 -07:00
Rob Norris	305d0a5fba	libzpool: don't include trace.c It does nothing in userspace anyway. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16492	2024-09-19 15:49:34 -07:00
Rob Norris	d70b2c0687	vdev_label_os: copy Linux implementation for userspace The no-op is fine for both. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16492	2024-09-19 15:49:29 -07:00
Rob Norris	8fc0beb66b	arc_os: split userspace and Linux kernel code The Linux arc_os.c carries userspace and kernel code, with very little overlap between the two. This lifts the userspace parts out into a separate arc_os.c for libzpool and removes it from the Linux side. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16492	2024-09-19 15:48:54 -07:00
Shengqi Chen	0ae4460c61	zcommon: add specialized versions of cityhash4 Specializing cityhash4 on 32-bit architectures can reduce the size of stack frames as well as instruction count. This is a tiny but useful optimization, since some callers invoke it frequently. When specializing into 1/2/3/4-arg versions, the stack usage (in bytes) on some 32-bit arches are listed as follows: - x86: 32, 32, 32, 40 - arm-v7a: 20, 20, 28, 36 - riscv: 0, 0, 0, 16 - power: 16, 16, 16, 32 - mipsel: 8, 8, 8, 24 And each actual argument (even if passing 0) contributes evenly to the number of multiplication instructions generated: - x86: 9, 12, 15 ,18 - arm-v7a: 6, 8, 10, 12 - riscv / power: 12, 18, 20, 24 - mipsel: 9, 12, 15, 19 On 64-bit architectures, the tendencies are similar. But both stack sizes and instruction counts are significantly smaller thus negligible. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Closes #16131 Closes #16483	2024-09-19 15:18:59 -07:00
Rob Norris	f245541e24	zfs_file: implement zfs_file_deallocate for FreeBSD 14 FreeBSD 14 gained a `VOP_DEALLOCATE` VFS operation and a `fspacectl` syscall to use it. At minimum, these zero the given region, and if the underlying filesystem supports it, can make the region sparse. We can use this to get TRIM-like behaviour for file vdevs. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16496	2024-09-18 11:35:48 -07:00
Rob Norris	fa330646b9	zfs_file: rename zfs_file_fallocate to zfs_file_deallocate We only use it on a specific way: to punch a hole in (make sparse) a region of a file, in order to implement TRIM-like behaviour. So, call the op "deallocate", and move the Linux-style mode flags down into the Linux implementation, since they're an implementation detail. FreeBSD gets a no-op stub (for the moment). Sponsored-by: https://despairlabs.com/sponsor/ Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16496	2024-09-18 11:35:04 -07:00

1 2 3 4 5 ...

1439 Commits