mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-23 10:54:35 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	7a919fb70c	Update all ABI files Refresh all ABI files using the CI generated files to reflect the library interfaces to be published for the 2.4 release. Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17911	2025-11-12 13:07:36 -08:00
Brian Behlendorf	5b2489caf2	Bump SONAME of libzfs and libzpool The ABI of libzfs and libzpool have breaking changes since the last major release. Bump the SONAME for the upcoming 2.4 release branch to libzfs7 and libzpool7. Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17911	2025-11-12 13:07:28 -08:00
Adi-Goll	7ebb5e9b3f	Reduce timeout to zero when running inside a container Detect container environments and set timeout to zero unless ZFS_MODULE_TIMEOUT is already set. This avoids an unnecessary ten second delay after running zfs/zpool commands in a container where /dev/zfs is unavailable. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Adi Gollamudi <adigollamudi@gmail.com> Closes #15165 Closes #17922	2025-11-12 13:07:20 -08:00
Mariusz Zaborski	1e8c96d7d5	Add knob to disable slow io notifications Introduce a new vdev property `VDEV_PROP_SLOW_IO_REPORTING` that allows users to disable notifications for slow devices. This prevents ZED and/or ZFSD from degrading the pool due to slow I/O. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mariusz Zaborski <oshogbo@FreeBSD.org> Closes 17477	2025-11-12 13:07:14 -08:00
Alexander Motin	41878d57ea	Add BRT support to zpool prefetch command Implement BRT (Block Reference Table) prefetch functionality similar to existing DDT prefetch. This allows preloading BRT metadata into ARC to improve performance for block cloning operations and frees of earlier cloned blocks. Make -t parameter optional. When omitted, prefetch all supported metadata types (both DDT and BRT now). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17890	2025-11-12 13:07:09 -08:00
Toomas Soome	4fd926ab40	libzfs: ignoring unreachable code We have infinite loop and on certain condition, we exit this loop and thread with pthread_exit(). But also after this loop, we have a code to perform pthread_cleanup_pop() and return from the thread. The problem is that modern compilers are able to recognize that we actually never get to the statements after loop and therefore it is dead code there. I think, instead of pthread_exit(), it is better to break out of loop and let the last statements to work as intended. This is because we do need to keep pthread_cleanup_pop() anyhow. Of course, it is matter of taste if we want to use return or pthread_exit as very last statement in this function. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #17900	2025-11-12 13:06:15 -08:00
Toomas Soome	612e8f1e57	get_key_material_https: label 'kfdok' defined but not used The label 'kfdok' is only used with O_TMPFILE, we need to use the same #ifdef around this label. Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #17894	2025-11-12 13:05:45 -08:00
Alexander Motin	f0bff230f9	Suppress some ashift warnings Do not warn about vdev ashifts being smaller then physical ashifts in a pool status if the pool ashift property set and vdev ashift satisfies it (bigger or equal), since user explicitly requested this. The ashift of individual vdevs are still reported. Do not warn about vdev ashifts in zpool import, since it doesn't matter much, and we don't even report individual vdevs ashifts there. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17830	2025-10-21 09:50:43 -07:00
Rob Norris	3e7e19e028	pool_iter_refresh: don't refresh pools twice In "all pools" mode, pool_iter_refresh() will call zpool_iter(), which will call zpool_refresh_stats() before calling add_pool(). If we already have the pool, this is a different handle, so we just release it and return. Back in pool_iter_refresh(), we then call zpool_stats_refresh() again for our handle on the same pool. All together, this means we're doing two ZFS_IOC_POOL_STATS calls into the kernel for every pool in the system. This isn't wrong, but it does double the pressure on global locks. Instead, we add a new function zpool_refresh_stats_from_handle() that simply copies the pool config and state from one handle to another, and use it to update our handle before we release it in add_pool(), so we only have one call per pool per interval. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17807	2025-10-21 09:50:43 -07:00
Paul Dagnelie	df55ba7c49	Detect a slow raidz child during reads A single slow responding disk can affect the overall read performance of a raidz group. When a raidz child disk is determined to be a persistent slow outlier, then have it sit out during reads for a period of time. The raidz group can use parity to reconstruct the data that was skipped. Each time a slow disk is placed into a sit out period, its `vdev_stat.vs_slow_ios count` is incremented and a zevent class `ereport.fs.zfs.delay` is posted. The length of the sit out period can be changed using the `raid_read_sit_out_secs` module parameter. Setting it to zero disables slow outlier detection. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Contributions-by: Don Brady <don.brady@klarasystems.com> Contributions-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #17227	2025-09-10 15:31:30 -07:00
Rob Norris	f7bdd84328	Prefer VERIFY0P(n) over VERIFY(n == NULL) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17591	2025-08-07 11:41:37 -07:00
Rob Norris	c39e076f23	Prefer VERIFY0(n) over VERIFY(n == 0) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17591	2025-08-07 11:40:59 -07:00
Alexander Motin	60f714e6e2	Implement physical rewrites Based on previous commit this implements `zfs rewrite -P` flag, making ZFS to keep blocks logical birth times while rewriting files. It should exclude the rewritten blocks from incremental sends, snapshot diffs, etc. Snapshots space usage same time will reflect the additional space usage from newly allocated blocks. Since this begins to use new "rewrite" flag in the block pointers, this commit introduces a new read-compatible per-dataset feature physical_rewrite. It must be enabled for the command to not fail, it is activated on first use and deactivated on deletion of the last affected dataset. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17565	2025-08-06 10:36:56 -07:00
Mariusz Zaborski	894edd084e	Add TXG timestamp database This feature enables tracking of when TXGs are committed to disk, providing an estimated timestamp for each TXG. With this information, it becomes possible to perform scrubs based on specific date ranges, improving the granularity of data management and recovery operations. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #16853	2025-08-06 10:31:21 -07:00
Alexander Motin	f70c85086b	BRT: Fix ZAP entry endianness During original block cloning implementation a mistake was made, making BRT ZAP entries an array of 8 1-byte entries instead of 1 entry of 8 bytes. This makes the pools non-endian-safe. This commit introduces a new read-compatible pool feature "com.truenas:block_cloning_endian", fixing the endianness issue for new pools while maintaining compatibility with existing ones. The feature is automatically activated when creating the first BRT ZAP (ensuring we don't activate it on pools that already have BRT entries in the old format). When active, BRT entries are stored as single 8-byte values. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17572	2025-07-30 09:42:47 -07:00
Akash B	b6e8db509d	zpool/zfs: Add '-a\|--all' option to scrub, trim, initialize Add support for the '-a \| --all' option to perform trim, scrub, and initialize operations on all pools. Previously, specifying a pool name was mandatory for these operations. With this enhancement, users can now execute these operations across all pools at once, without needing to manually iterate over each pool from the command line. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Akash B <akash-b@hpe.com> Closes #17524	2025-07-29 14:50:44 -07:00
Rob Norris	bf38c15071	everywhere: misc unnecessary var init/update These are all cases where we initialise or update a variable, and then never use it. None of them particularly matter, as the compiler should optimise them all away during dead store elimination, but some static analysers complain about them and they are extra work for casual readers to follow, so worth removing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17551	2025-07-22 15:23:58 -07:00
Rob Norris	cb9742e532	libspl: add API for manipulating tunables Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17537	2025-07-15 15:46:58 -07:00
Paul Dagnelie	a981cb69e4	Implement dynamic gang header sizes ZFS gang block headers are currently fixed at 512 bytes. This is increasingly wasteful in the era of larger disk sector sizes. This PR allows any size allocation to work as a gang header. It also contains supporting changes to ZDB to make gang headers easier to work with. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17004	2025-07-09 14:02:53 -07:00
Paul Dagnelie	e845be28e7	Add no-upgrade featureflag Adds a featureflag that is not enabled during upgrades unless listed explicitly. This is useful for features that could cause issues unless applied carefully; for example, a feature that could make a root pool unbootable if bootloaders don't yet have support for it. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17004	2025-07-09 14:01:59 -07:00
Alexander Motin	4e92aee233	Relax special_small_blocks restrictions special_small_blocks is applied to blocks after compression, so it makes no sense to demand its values to be power of 2. At most they could be multiple of 512, but that would still buy us nothing, so lets allow them be any within SPA_MAXBLOCKSIZE. Also special_small_blocks does not really need to depend on the set recordsize, enabled pool features or presence of special vdev. At worst in any of those cases it will just do nothing, so we should not complicate users lives by artificial limitations. While there, polish comments for recordsize and volblocksize. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17497	2025-07-02 11:11:37 -07:00
Rob Norris	44e3266894	events: include zio type in IO error reports Usually the IO type can be inferred from the other fields (in particular, priority and flags) sometimes it's not easy to see. This is just another little debug helper. May 27 2025 00:54:54.024110493 ereport.fs.zfs.data class = "ereport.fs.zfs.data" ena = 0x1f5ecfae600801 ... zio_delta = 0x0 zio_type = 0x2 [WRITE] zio_priority = 0x3 [ASYNC_WRITE] zio_objset = 0x0 Document zio_type and zio_priority. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17381	2025-05-30 10:29:29 -04:00
Rob Norris	06fa8f3f69	zfs_cmd: reorganise zfs_cmd_t to match original size `2aa3fbe761` extended zinject_record_t, and in doing so inadvertently extended zfs_cmd_t, which broke compatibility with userspace tools without the change. This fixes that by using some of the unused space in zfs_cmd_t for the extra fields. We also add an assert to trigger a compile error if the size ever changes. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17367	2025-05-27 20:01:06 -04:00
Ameer Hamza	2a91d577b1	Expose dataset encryption status via fast stat path In truenas_pylibzfs, we query list of encrypted datasets several times, which is expensive. This commit exposes a public API zfs_is_encrypted() to get encryption status from fast stat path without having to refresh the properties. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #17368	2025-05-26 22:11:03 -04:00
Rob Norris	f454cc1723	libzfs: ensure all ioctl calls go through lzc_ioctl_fd() Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17344	2025-05-21 09:23:23 -07:00
Paul Dagnelie	086105f4c4	Cause zpool scan resume commands to get logged in history Currently, commands that resume a scrub/errorscrub from a paused state don't get logged in the pool history. This is because resumes actually return ECANCELED, instead of 0. This causes the tsd code in the common ioctl logic to not think the ioctl succeeded, which causes the log_history ioctl to fail with EPERM. However, for resuming a scrub from a paused state, ECANCELED is success. There are two options for how to deal with this. The first is the one that I implemented here; I can't find a good reason for dmu_scan to return ECANCELED on resume instead of 0, so let's just not. The only place we check for the ECANCELED value is in zpool_scan, where we just convert it back to zero. However, I am aware that this is changing an ioctl interface, which I believe is a breaking change. I don't think it's an important change, but maybe there is someone who relies on it. The other option that could be implemented is to either allow ECANCELED specifically from dsl_scan in the common ioctl code, or add a generic facility to the common ioctl code that allows each command to specify whether or not success happened, regardless of the return values. I am open to feedback on which option people think would be better. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #17301	2025-05-16 13:19:04 -04:00
Paul Dagnelie	246e5883bb	Implement allocation size ranges and use for gang leaves (#17111 ) When forced to resort to ganging, ZFS currently allocates three child blocks, each one third of the size of the original. This is true regardless of whether larger allocations could be made, which would allow us to have fewer gang leaves. This improves performance when fragmentation is high enough to require ganging, but not so high that all the free ranges are only just big enough to hold a third of the recordsize. This is also useful for improving the behavior of a future change to allow larger gang headers. We add the ability for the allocation codepath to allocate a range of sizes instead of a single fixed size. We then use this to pre-allocate the DVAs for the gang children. If those allocations fail, we fall back to the normal write path, which will likely re-gang. Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-05-02 15:32:18 -07:00
Artem	27f3d94940	Sort the blocking snapshots list #12751 (#17264 ) When multiple snapshots prevent the destruction/rollback of the respective dataset/snapshot/volume via zfs destroy or zfs rollback, the error message does not list the blocking snapshots sorted according to their order of creation. This causes inconvenience and can lead to confusion, and also creates a contrast with a returned message from zfs list -t snap function. Closes: #12751 Signed-off-by: Artem-OSSRevival <artem.vlasenko@ossrevival.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-05-01 17:40:23 -07:00
Artem-OSSRevival	37a3e26552	Add more descriptive destroy error message Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Artem-OSSRevival <artem.vlasenko@ossrevival.org> Fixes: #14538 Closes: #17234	2025-04-23 21:17:52 -04:00
Tony Hutter	8d1489735b	nvlist: Add nvlist_snprintf() and zfs_dbgmsg_nvlist() Add nvlist_snprintf() to print a nvlist to a buffer. This is basically the snprintf() version of dump_nvlist(). Along with that, add a zfs_dbgmsg_nvlist() to print out an nvlist to dbgmsg. This will aid in debugging. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #17215	2025-04-18 09:22:16 -04:00
Rob Norris	131df3bbf2	vdev_to_nvlist_iter: ignore draid parameters when matching names (#17228 ) Various tools will display draid vdev names with parameters embedded in them, but would not accept them as valid vdev names when looking them up, making it difficult to build pipelines involving draid vdevs. This commit makes it so that if a full draid name is offered for match, it gets truncated at the first ':' character. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-04-14 17:10:48 -07:00
Richard Kojedzinszky	09fc7bb47e	Fix memory leaks in pool properties handling Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Richard Kojedzinszky <richard@kojedz.in> Closes #17208	2025-04-05 19:40:55 -04:00
Ameer Hamza	6f6c504700	Show default quotas in zfs userspace tools Update zfs userspace, groupspace, and projectspace to display the default quotas when no per-ID specific quota is configured. This ensures tool outputs align with enforced limits. Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-04-03 10:36:45 -07:00
Ameer Hamza	2a8d9d9607	Add default user/group/project quota properties This adds default userquota, groupquota, and projectquota properties to MASTER_NODE_OBJ to make them accessible during zfsvfs_init() (regular DSL properties require dsl_config_lock, which cannot be safely acquired in this context). The zfs_fill_zplprops_impl() logic is updated to read these default properties directly from MASTER_NODE_OBJ. Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-04-03 10:35:22 -07:00
Rob Norris	137045be98	SPDX: license tags: BSD-2-Clause Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-03-13 17:56:46 -07:00
Rob Norris	eb9098ed47	SPDX: license tags: CDDL-1.0 Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-03-13 17:56:27 -07:00
Umer Saleem	b901d4a0b6	Update the dataset name in handle after zfs_rename (#17040 ) For zfs_rename, after the dataset name is successfully updated, the dataset handle that was passed to zfs_rename, still contains the old name, due to which, the dataset handle becomes invalid. The following operations performed using this handle result in error since the dataset with old name cannot be found anymore. changelist_rename does update the names in dataset handles, but those are temporary handles that were created during changelist_gather. The original handle that was used to call zfs_rename is not updated. We should update the name in original ZFS handle after the IOCTL for rename returns success for the operation. Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-02-11 09:07:29 -08:00
George Amanakis	c2458ba921	optimize recv_fix_encryption_hierarchy() recv_fix_encryption_hierarchy() in its present state goes through all stream filesystems, and for each one traverses the snapshots in order to find one that exists locally. This happens by calling guid_to_name() for each snapshot, which iterates through all children of the filesystem. This results in CPU utilization of 100% for several minutes (for ~1000 filesystems on a Ryzen 4350G) for 1 thread at the end of a raw receive (-w, regardless whether encrypted or not, dryrun or not). Fix this by following a different logic: using the top_fs name, call gather_nvlist() to gather the nvlists for all local filesystems. For each one filesystem, go through the snapshots to find the corresponding stream's filesystem (since we know the snapshots guid and can search with it in stream_avl for the stream's fs). Then go on to fix the encryption roots and locations as in its present state. Avoiding guid_to_name() iteratively makes recv_fix_encryption_hierarchy() significantly faster (from several minutes to seconds for ~1000 filesystems on a Ryzen 4350G). Another problem is the following: in case we have promoted a clone of the filesystem outside the top filesystem specified in zfs send, zfs receive does not fail but returns an error: recv_incremental_replication() fails to find its origin and errors out with needagain=1. This results in recv_fix_hierarchy() not being called which may render some children of the top fs not mountable since their encryption root was not updated. To circumvent this make recv_incremental_replication() silently ignore this error. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #16929	2025-02-06 15:43:47 -05:00
Rob Norris	779c5a5deb	zpool_get_vdev_prop_value: show missing vdev userprops If a vdev userprop is not found, present it as value '-', default source, so it matches the output from pool userprops. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16887	2024-12-29 11:11:40 -08:00
Umer Saleem	219a89cbbf	Skip iterating over snapshots for share properties Setting sharenfs and sharesmb properties on a dataset can become costly if there are large number of snapshots, since setting the share properties iterates over all snapshots present for a dataset. If it is the root dataset for which we are trying to set the share property, snapshots for all child datasets and their children will also be iterated. There is no need to iterate over snapshots for share properties because we do not allow share properties or any other property, to be set on a snapshot itself execpt for user properties. This commit skips iterating over snapshots for share properties, instead iterate over all child dataset and their children for share properties. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #16877	2024-12-19 15:02:58 -05:00
Mariusz Zaborski	4b4e346b9f	Add ability to scrub from last scrubbed txg Some users might want to scrub only new data because they would like to know if the new write wasn't corrupted. This PR adds possibility scrub only newly written data. This introduces new `last_scrubbed_txg` property, indicating the transaction group (TXG) up to which the most recent scrub operation has checked and repaired the dataset, so users can run scrub only from the last saved point. We use a scn_max_txg and scn_min_txg which are already built into scrub, to accomplish that. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-By: Wasabi Technology, Inc. Sponsored-By: Klara Inc. Closes #16301	2024-12-04 14:21:45 -05:00
shodanshok	1cd2419ece	Fix race in libzfs_run_process_impl When replacing a disk, a child process is forked to run a script called zfs_prepare_disk (which can be useful for disk firmware update or health check). The parent than calls waitpid and checks the child error/status code. However, the _reap_children thread (created from zed_exec_process to manage zedlets) also waits for all children with the same PGID and can stole the signal, causing the replace operation to be aborted. As waitpid returns -1, the parent incorrectly assume that the child process had an error or was killed. This, in turn, leaves the newly added disk in REMOVED or UNAVAIL status rather than completing the replace process. This patch changes the PGID of the child process execuing the prepare script, shielding it from the _reap_children thread. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes #16801	2024-12-04 05:36:10 -05:00
Umer Saleem	1c9a4c8cb4	Fix user properties output for zpool list In zpool_get_user_prop, when called from zpool_expand_proplist and collect_pool, we often have zpool_props present in zpool_handle_t equal to NULL. This mostly happens when only one user property is requested using zpool list -o <user_property>. Checking for this case and correctly initializing the zpool_props field in zpool_handle_t fixes this issue. Interestingly, this issue does not occur if we query any other property like name or guid along with a user property with -o flag because while accessing properties like guid, zpool_prop_get_int is called which checks for this case specifically and calls zpool_get_all_props. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #16734	2024-11-11 09:46:45 -08:00
Brian Behlendorf	4319e71402	ztest: Fix scrub check in ztest_raidz_expand_check() The scrub code may return EBUSY under several possible scenarios causing ztest to incorrectly ASSERT when verifying the result of a raidz expansion. Update the test case to allow EBUSY since it does not indicate pool damage. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #16627	2024-10-08 20:41:17 -07:00
Shengqi Chen	e8f0aa143e	Bump SONAME of libzfs and libzpool The ABI of libzfs and libzpool have breaking changes since last SONAME bump in commit `fe6babc`: * libzfs: `zpool_print_unsup_feat` removed (used by zpool cmd). * libzpool: multiple `ddt_*` symbols removed (used by zdb cmd). Bump them to avoid ABI breakage. See: https://github.com/openzfs/zfs/pull/11817 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Closes #16609	2024-10-06 14:49:33 -07:00
Rob Norris	224393a321	feature: large_microzap In `a4b21eadec` we added the zap_micro_max_size tuneable to raise the size at which "micro" (single-block) ZAPs are upgraded to "fat" (multi-block) ZAPs. Before this, a microZAP was limited to 128KiB, which was the old largest block size. The side effect of raising the max size past 128KiB is that it be stored in a large block, requiring the large_blocks feature. Unfortunately, this means that a backup stream created without the --large-block (-L) flag to zfs send would split the microZAP block into smaller blocks and send those, as is normal behaviour for large blocks. This would be received correctly, but since microZAPs are limited to the first block in the object by definition, the entries in the later blocks would be inaccessible. For directory ZAPs, this gives the appearance of files being lost. This commit adds a feature flag, large_microzap, that must be enabled for microZAPs to grow beyond 128KiB, and which will be activated the first time that occurs. This feature is later checked when generating the stream and if active, the send operation will abort unless --large-block has also been requested. Changing the limit still requires zap_micro_max_size to be changed. The state of this flag effectively sets the upper value for this tuneable, that is, if the feature is disabled, the tuneable will be clamped to 128KiB. A stream flag is also added to ensure that the receiver also activates its own feature flag upon receiving the stream. This is not strictly necessary to _use_ the received microZAP, since it doesn't care how large its block is, but it is required to send the microZAP object on, otherwise the original problem occurs again. Because it's difficult to reliably distinguish a microZAP from a fatZAP from outside the ZAP code, and because it seems unlikely that most users are affected (a fairly niche tuneable combined with what should be an uncommon use of send), and for the sake of expediency, this change activates the feature the first time a microZAP grows to use a large block, and is never deactivated after that. This can be improved in the future. This commit changes nothing for existing pools that already have large microZAPs. The feature will not be retroactively applied, but will be activated the next time a microZAP grows past the limit. Don't use large_blocks feature for enable/disable tests. The large_microzap depends on large_blocks, so it gets enabled as a dependency, breaking the test. Instead use feature "longname", which has the exact same feature characteristics. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16593	2024-10-02 20:47:11 -07:00
rilysh	86737c5927	Avoid computing strlen() inside loops Compiling with -O0 (no proper optimizations), strlen() call in loops for comparing the size, isn't being called/initialized before the actual loop gets started, which causes n-numbers of strlen() calls (as long as the string is). Keeping the length before entering in the loop is a good idea. On some places, even with -O2, both GCC and Clang can't recognize this pattern, which seem to happen in an array of char pointer. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: rilysh <nightquick@proton.me> Closes #16584	2024-10-02 09:10:06 -07:00
Brian Behlendorf	e8cbb5952d	Update all ABI files Refresh all ABI files using the CI generated files as of commit `0cf14bf4b5`. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #16592	2024-10-01 17:10:23 -07:00
Sanjeev Bagewadi	20232ecfaa	Support for longnames for files/directories (Linux part) This patch adds the ability for zfs to support file/dir name up to 1023 bytes. This number is chosen so we can support up to 255 4-byte characters. This new feature is represented by the new feature flag feature@longname. A new dataset property "longname" is also introduced to toggle longname support for each dataset individually. This property can be disabled, even if it contains longname files. In such case, new file cannot be created with longname but existing longname files can still be looked up. Note that, to my knowledge native Linux filesystems don't support name longer than 255 bytes. So there might be programs not able to work with longname. Note that NFS server may needs to use exportfs_get_name to reconnect dentries, and the buffer being passed is limit to NAME_MAX+1 (256). So NFS may not work when longname is enabled. Note, FreeBSD vfs layer imposes a limit of 255 name lengh, so even though we add code to support it here, it won't actually work. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #15921	2024-10-01 13:40:27 -07:00
Shengqi Chen	0ae4460c61	zcommon: add specialized versions of cityhash4 Specializing cityhash4 on 32-bit architectures can reduce the size of stack frames as well as instruction count. This is a tiny but useful optimization, since some callers invoke it frequently. When specializing into 1/2/3/4-arg versions, the stack usage (in bytes) on some 32-bit arches are listed as follows: - x86: 32, 32, 32, 40 - arm-v7a: 20, 20, 28, 36 - riscv: 0, 0, 0, 16 - power: 16, 16, 16, 32 - mipsel: 8, 8, 8, 24 And each actual argument (even if passing 0) contributes evenly to the number of multiplication instructions generated: - x86: 9, 12, 15 ,18 - arm-v7a: 6, 8, 10, 12 - riscv / power: 12, 18, 20, 24 - mipsel: 9, 12, 15, 19 On 64-bit architectures, the tendencies are similar. But both stack sizes and instruction counts are significantly smaller thus negligible. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Closes #16131 Closes #16483	2024-09-19 15:18:59 -07:00

1 2 3 4 5 ...

910 Commits