mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2025-12-01 09:32:08 +03:00

Author	SHA1	Message	Date
Umer Saleem	40fccc423a	ZTS: Test for all known zpool feature sets zpool_create_features_007_pos only tested for compat-2020 feature set. It would be useful to test for all known features sets. If any additional feature is found enabled that is not present in compatibility list or feature set, it should be caught and reported earlier. This commit also removes encryption from openzfsonosx-1.8.1 compatibility list. Encryption enables bookmark_v2, since it is a dependency of encryption, but not listed in openzfsonoxx-1.8.1 compatibility list. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #15505	2023-11-09 10:58:23 -08:00
Don Brady	5caeef02fa	RAID-Z expansion feature This feature allows disks to be added one at a time to a RAID-Z group, expanding its capacity incrementally. This feature is especially useful for small pools (typically with only one RAID-Z group), where there isn't sufficient hardware to add capacity by adding a whole new RAID-Z group (typically doubling the number of disks). == Initiating expansion == A new device (disk) can be attached to an existing RAIDZ vdev, by running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank raidz2-0 sda`. The new device will become part of the RAIDZ group. A "raidz expansion" will be initiated, and the new device will contribute additional space to the RAIDZ group once the expansion completes. The `feature@raidz_expansion` on-disk feature flag must be `enabled` to initiate an expansion, and it remains `active` for the life of the pool. In other words, pools with expanded RAIDZ vdevs can not be imported by older releases of the ZFS software. == During expansion == The expansion entails reading all allocated space from existing disks in the RAIDZ group, and rewriting it to the new disks in the RAIDZ group (including the newly added device). The expansion progress can be monitored with `zpool status`. Data redundancy is maintained during (and after) the expansion. If a disk fails while the expansion is in progress, the expansion pauses until the health of the RAIDZ vdev is restored (e.g. by replacing the failed disk and waiting for reconstruction to complete). The pool remains accessible during expansion. Following a reboot or export/import, the expansion resumes where it left off. == After expansion == When the expansion completes, the additional space is available for use, and is reflected in the `available` zfs property (as seen in `zfs list`, `df`, etc). Expansion does not change the number of failures that can be tolerated without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after expansion). A RAIDZ vdev can be expanded multiple times. After the expansion completes, old blocks remain with their old data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distributed among the larger set of disks. New blocks will be written with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ vdev's "assumed parity ratio" does not change, so slightly less space than is expected may be reported for newly-written blocks, according to `zfs list`, `df`, `ls -s`, and similar tools. Sponsored-by: The FreeBSD Foundation Sponsored-by: iXsystems, Inc. Sponsored-by: vStack Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Authored-by: Matthew Ahrens <mahrens@delphix.com> Contributions-by: Fedor Uporov <fuporov.vstack@gmail.com> Contributions-by: Stuart Maybee <stuart.maybee@comcast.net> Contributions-by: Thorsten Behrens <tbehrens@outlook.com> Contributions-by: Fmstrat <nospam@nowsci.com> Contributions-by: Don Brady <dev.fs.zfs@gmail.com> Signed-off-by: Don Brady <dev.fs.zfs@gmail.com> Closes #15022	2023-11-08 10:19:41 -08:00
Umer Saleem	78ac868824	Remove obsolete_counts from grub2 compatibility list PR#15459 add all read-only compatible zpool features to grub2 compatibility list. 'obsolete_counts' is a read-only features that depends on 'device_removal' feature which is not read-only and is marked as ZFEATURE_FLAG_MOS. Creating a pool with grub2 compatibility enables 'device_removal' feature as well, which is not desired. This commit removes the 'obsolete_counts' feature from grub2 compatibility list, as GRUB only supports read-only compatible features. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #15499	2023-11-07 12:04:56 -08:00
Tony Hutter	358ce2cf28	zed: misc vdev_enc_sysfs_path fixes There have been rare cases where the VDEV_ENC_SYSFS_PATH value that zed gets passed is stale. To mitigate this, dynamically check the sysfs path at the time of zed event processing, and use the dynamic value if possible. Note that there will be other times when we can not dynamically detect the sysfs path (like if a disk disappears) and have to rely on the old value for things like turning on the fault LED. That is to say, we can't just blindly use the dynamic path in every case. Also: - Add enclosure sysfs entry when running 'zpool add' - Fix 'slot' and 'enc' zpool.d scripts for nvme Reviewed-by: Don Brady <dev.fs.zfs@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #15462	2023-11-07 09:09:24 -08:00
Umer Saleem	cba99a046e	Add all read-only compatible zpool features to grub2 compatibility GRUB opens the boot pool in read-only mode. All read-only compatible features for zpool can be enabled and added to grub2 compatibility, as GRUB does not open the boot-pool for write. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #15459	2023-10-31 09:51:54 -07:00
Alexander Motin	3afdc97d91	ZIO: Remove READY pipeline stage from root ZIOs zio_root() has no arguments for ready callback or parent ZIO. Except one recent case in ZIL code if root ZIOs ever have a parent it is also a root ZIO. It means we do not need READY pipeline stage for them, which takes some time to process, but even more time to wait for the children and be woken by them, and both for no good reason. The most visible effect of this change is that it avoids one taskq wakeup per ZIL block written, previously used to run zio_ready() for lwb_root_zio and skipped now. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15398	2023-10-25 15:22:25 -07:00
ofthesun9	e57909265b	"ARC prefetch metadata accesses:" appears twice in the output. The first occurrence should be "ARC prefetch data accesses:" Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: ofthesun9 <olivier@ofthesun.net> Closes #15427	2023-10-23 13:41:29 -07:00
Don Brady	f0f330e121	Fix ZED auto-replace for VDEVs using by-id paths The change is simple -- restore the original code so that the VDEV path is updated when using by-id paths. The more challenging part was to devise a second ZTS test, that would test auto-replace for 'by-id' and help prevent a future regression. With that new test, we can now do an A\|B test with , and without, the fix to confirm that auto-replace for by-id paths works. The existing auto-replace test, functional/fault/auto_replace_001_pos, will confirm that we didn't break auto-replace for 'by-vdev' paths. In the original functional/fault/auto_replace_001_pos test, the disk wipe (using dd) was not effective in removing the partitioning since the kernel was never informed of the wipe. Added a call to wipefs(8) so that the kernel is informed and ZED will re-partition the device. Added a validation step that the re-partitioning occurred by confirming that the GPT partition UUID changes. Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Don Brady <don.brady@klarasystems.com> Closes #15363	2023-10-20 09:29:02 -07:00
Rob Norris	54b1b1d893	import: require force when cachefile hostid doesn't match on-disk Previously, if a cachefile is passed to zpool import, the cached config is mostly offered as-is to ZFS_IOC_POOL_TRYIMPORT->spa_tryimport(), and the results are taken as the canonical pool config and handed back to ZFS_IOC_POOL_IMPORT. In the course of its operation, spa_load() will inspect the pool and build a new config from what it finds on disk. However, it then regenerates a new config ready to import, and so rightly sets the hostid and hostname for the local host in the config it returns. Because of this, the "require force" checks always decide the pool is exported and last touched by the local host, even if this is not true, which is possible in a HA environment when MMP is not enabled. The pool may be imported on another head, but the import checks still pass here, so the pool ends up imported on both. (This doesn't happen when a cachefile isn't used, because the pool config is discovered in userspace in zpool_find_import(), and that does find the on-disk hostid and hostname correctly). Since the systemd zfs-import-cache.service unit uses cachefile imports, this can lead to a system returning after a crash with a "valid" cachefile on disk and automatically, quietly, importing a pool that has already been taken up by a secondary head. This commit causes the on-disk hostid and hostname to be included in the ZPOOL_CONFIG_LOAD_INFO item in the returned config, and then changes the "force" checks for zpool import to use them if present. This method should give no change in behaviour for old userspace on new kernels (they won't know to look for the new config items) and for new userspace on old kernels (the won't find the new config items). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #15290	2023-10-06 09:24:44 -07:00
Umer Saleem	4e16964e1c	Add '-u' - nomount flag for zfs set This commit adds '-u' flag for zfs set operation. With this flag, mountpoint, sharenfs and sharesmb properties can be updated without actually mounting or sharing the dataset. Previously, if dataset was unmounted, and mountpoint property was updated, dataset was not mounted after the update. This behavior is changed in #15240. We mount the dataset whenever mountpoint property is updated, regardless if it's mounted or not. To provide the user with option to keep the dataset unmounted and still update the mountpoint without mounting the dataset, '-u' flag can be used. If any of mountpoint, sharenfs or sharesmb properties are updated with '-u' flag, the property is set to desired value but the operation to (re/un)mount and/or (re/un)share the dataset is not performed and dataset remains as it was before. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #15322	2023-10-02 16:58:54 -07:00
George Amanakis	fe4d055b36	Report ashift of L2ARC devices in zdb Commit `8af1104f` does not actually store the ashift of cache devices in their label. However, in order to facilitate reporting the ashift through zdb, we enable this in the present commit. We also document how the retrieval of the ashift is done. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #15331	2023-10-02 16:57:09 -07:00
Paul Dagnelie	2e2a46e0a5	Invoke zdb by guid to avoid import errors The problem that was occurring is basically that a device was removed by ztest and replaced with another device. It was then reguided. The import then failed because there were two possible imports with the same name; one with the new guid, and one with the old. This can happen because the label writes from the device removal/replacement can be subject to ztest's error injection. The other ways to fix this would be to change the error injection to not trigger on removals (which may not be technically feasible), or to change the import code to not report configurations that are so short on devices (which would potentially have unpleasant end-user effects when trying to recover from data losses/device configuration issues). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #15298	2023-09-22 16:08:51 -07:00
Alexander Motin	e5d70f4677	ZIL: Avoid dbuf_read() in ztest_get_data() While working on similar patches for zfs and zvol in #15153 I've forgot about ztest. Update it also so that we test the same code paths as use in production. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15301	2023-09-21 18:40:13 -07:00
Tony Hutter	b53077a9e7	Add zfs_prepare_disk script for disk firmware install Have libzfs call a special `zfs_prepare_disk` script before a disk is included into the pool. The user can edit this script to add things like a disk firmware update or a disk health check. Use of the script is totally optional. See the zfs_prepare_disk manpage for full details. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #15243	2023-09-21 08:36:26 -07:00
Umer Saleem	c63aabaf1c	Improve the handling of sharesmb,sharenfs properties For sharesmb and sharenfs properties, the status of setting the property is tied with whether we succeed to share the dataset or not. In case sharing the dataset is not successful, this is treated as overall failure of setting the property. In this case, if we check the property after the failure, it is set to on. This commit updates this behavior and the status of setting the share properties is not returned as failure, when we fail to share the dataset. For sharenfs property, if access list is provided, the syntax errors in access list/host adresses are not validated until after setting the property during postfix phase while trying to share the dataset. This is not correct, since the property has already been set when we reach there. Syntax errors in access list/host addresses are validated while validating the property list, before setting the property and failure is returned to user in this case when there are errors in access list. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #15240	2023-09-19 17:16:14 -07:00
Umer Saleem	bbac1d2977	Update the behavior of mountpoint property There are some inconsistencies in the handling of mountpoint property. This commit updates the behavior and makes it consistent. If mountpoint property is set when dataset is unmounted, this would update the mountpoint property. The mountpoint could be valid or invalid in this case. Setting the mountpoint property would result in success in this case. Dataset would still be unmounted here. On the other hand, if dataset is mounted and mountpoint property is updated to something invalid where mount cannot be successful, for example, setting the mountpoint inside a readonly directory. This would unmount the dataset, set the mountpoint property to requested value and tries to mount the dataset. The mount operation returns error and this error is treated as overall failure of setting the property while the property is actually set. To make the behavior consistent in case dataset is mounted or unmounted, we should try to mount the dataset whenever mountpoint property is updated. This would result in mounting the datasets if canmount property is set to on, regardless if the dataset was previously unmounted. The failure in mount operation while setting the mountpoint property should not be treated as failure, since the property is actually set now to user requested value. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #15240	2023-09-19 17:15:24 -07:00
Rob N	7228ba1114	cmd: add 'help' subcommand to zpool and zfs 'program help subcommand' is a reasonably common pattern for multifunction command-line programs. This commit adds support for that style to the zpool and zfs commands. When run as 'zpool help [<topic>]' or 'zfs help [<topic>]', executes the 'man' program on the PATH with the most likely manpage name for the requested topic: "zpool-<topic>" or "zfs-<topic>" for subcommands, or "zpool<topic>" or "zfs<topic>" for the "concepts" and "props" topics. If no topic is supplied, uses the top "zpool" or "zfs" pages. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kay Pedersen <mail@mkwg.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #15288	2023-09-19 09:06:47 -07:00
Paul Dagnelie	2076011e0c	Fix incorrect expected error in ztest There is an occasional ztest failure that looks like ztest: attach (/var/tmp/zloop-run/ztest.13a 570425344, draid1-1-0 532152320, 1) returned 22, expected 95. This is because the value that we return is EINVAL, but expected_error is set incorrectly. Change the expected_error value to match both the comment and the actual error value. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #15295	2023-09-19 09:02:23 -07:00
Tony Hutter	529bec7d7b	zed: Allow autoreplace and fault LEDs for removed vdevs Allow zed to autoreplace vdevs marked as REMOVED. Also update statechange-led zedlet to toggle fault LEDs for REMOVED vdevs. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #15281	2023-09-18 16:25:58 -07:00
Umer Saleem	71472bf375	Relax error reporting in zpool import and zpool split For zpool import and zpool split, zpool_enable_datasets is called to mount and share all datasets in a pool. If there is an error while mounting or sharing any dataset in the pool, the status of import or split is reported as failure. However, the changes do show up in zpool list. This commit updates the error reporting in zpool import and zpool split path. More descriptive messages are shown to user in case there is an error during mount or share. Errors in mount or share do not effect the overall status of zpool import and zpool split. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #15216	2023-09-01 17:25:11 -07:00
Serapheim Dimitropoulos	cad00d5180	checkstyle: fix action failures Reviewed-by: Don Brady <dev.fs.zfs@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #15220	2023-08-29 09:12:40 -07:00
Paul Dagnelie	bee9cfb813	Increase limit of redaction list by using spill block Currently redaction bookmarks and their associated redaction lists have a relatively low limit of 36 redaction snapshots. This is imposed by the number of snapshot GUIDs that fit in the bonus buffer of the redaction list object. While this is more than enough for most use cases, there are some limited cases where larger numbers would be useful to support. We tweak the redaction list creation code to use a spill block if the number of redaction snapshots is above the amount that would fit in the bonus buffer. We also make a small change to allow spill blocks to be use for types of data besides SA. In order to fully leverage this logic, we also change the redaction code to use vmem_alloc, to handle extremely large allocations if needed. Finally, small tweaks were made to the zfs commands and the test suite. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #15018	2023-08-26 11:34:43 -07:00
Brian Behlendorf	f0e34c8879	zed: update zed.d/statechange-slot_off.sh The statechange-slot_off.sh zedlet which was added in #15200 needed to be installed so it's included by the packages. Additional testing has also shown that multiple retries are often needed for the script to operate reliably. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #15210	2023-08-26 11:22:28 -07:00
Alexander Motin	eda3fcd56f	ZIL: Second attempt to reduce scope of zl_issuer_lock. The previous patch #14841 appeared to have significant flaw, causing deadlocks if zl_get_data callback got blocked waiting for TXG sync. I already handled some of such cases in the original patch, but issue #14982 shown cases that were impossible to solve in that design. This patch fixes the problem by postponing log blocks allocation till the very end, just before the zios issue, leaving nothing blocking after that point to cause deadlocks. Before that point though any sleeps are now allowed, not causing sync thread blockage. This require slightly more complicated lwb state machine to allocate blocks and issue zios in proper order. But with removal of special early issue workarounds the new code is much cleaner now, and should even be more efficient. Since this patch uses null zios between write, I've found that null zios do not wait for logical children ready status in zio_ready(), that makes parent write to proceed prematurely, producing incorrect log blocks. Added ZIO_CHILD_LOGICAL_BIT to zio_wait_for_children() fixes it. Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15122	2023-08-24 17:08:49 -07:00
Tony Hutter	11fbcacf37	zed: Add zedlet to power off slot when drive is faulted If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then power off the drive's slot in the enclosure if it becomes FAULTED. This can help silence misbehaving drives. This assumes your drive enclosure fully supports slot power control via sysfs. Reviewed-by: @AllKind Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #15200	2023-08-24 11:59:03 -07:00
Rob N	114a39964f	zdb: include cloned blocks in block statistics This gives `zdb -b` support for clone blocks. Previously, it didn't know what clones were, so would count their space allocation multiple times and then report leaked space (or, in debug, would assert trying to claim blocks a second time). This commit fixes those bugs, and reports the number of clones and the space "used" (saved) by them. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kay Pedersen <mail@mkwg.de> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15123	2023-08-01 08:56:30 -07:00
Brian Behlendorf	782312c612	zed: Reduce log noise for large JBODs For large JBODs the log message "zfs_iter_vdev: no match" can account for the bulk of the log messages (over 70%). Since this message is purely informational and not that useful we remove it. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #15086 Closes #15094	2023-07-25 13:55:29 -07:00
Tony Hutter	ab0b0393cb	zed: Fix zed ASSERT on slot power cycle We would see zed assert on one of our systems if we powered off a slot. Further examination showed zfs_retire_recv() was reporting a GUID of 0, which in turn would return a NULL nvlist. Add in a check for a zero GUID. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #15084	2023-07-21 11:46:58 -07:00
Alexander Motin	fa7b2390d4	Do not report bytes skipped by scan as issued. Scan process may skip blocks based on their birth time, DVA, etc. Traditionally those blocks were accounted as issued, that caused reporting of hugely over-inflated numbers, having nothing to do with actual disk I/O. This change utilizes never used field in struct dsl_scan_phys to account such skipped bytes, allowing to report how much data were actually scrubbed/resilvered and what is the actual I/O speed. While formally it is an on-disk format change, it should be compatible both ways, so should not need a feature flag. This should partially address the same issue as `c85ac731a0`, but from a different perspective, complementing it. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15007	2023-06-30 08:47:13 -07:00
Arshad Hussain	6052060c13	Don't use hard-coded 'size' value in snprintf() This patch changes the passing of "size" to snprintf from hard-coded (openended) to sizeof(errbuf). This is bringing to standard with rest of the code where- ever 'errbuf' is used. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Closes #15003	2023-06-30 08:37:26 -07:00
Mateusz Piotrowski	62ace21a14	zdb: Add missing poolname to -C synopsis Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org> Sponsored-by: Klara Inc. Closes #15014	2023-06-29 10:54:43 -07:00
Alexander Motin	70ea484e3e	Finally drop long disabled vdev cache. It was a vdev level read cache, designed to aggregate many small reads by speculatively issuing bigger reads instead and caching the result. But since it has almost no idea about what is going on with exception of ZIO_FLAG_DONT_CACHE flag set by higher layers, it was found to make more harm than good, for which reason it was disabled for the past 12 years. These days we have much better instruments to enlarge the I/Os, such as speculative and prescient prefetches, I/O scheduler, I/O aggregation etc. Besides just the dead code removal this removes one extra mutex lock/unlock per write inside vdev_cache_write(), not otherwise disabled and trying to do some work. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14953	2023-06-09 12:40:55 -07:00
Alexander Motin	90ccfd426d	Improve l2arc reporting in arc_summary. - Do not report L2ARC as FAULTED in presence of in-flight writes. - Report read and write I/Os, bytes and errors. - Remove few numbers not important to average user. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #12304 Closes #14946	2023-06-09 10:14:05 -07:00
Alexander Motin	b3ad3f48d9	Use list_remove_head() where possible. ... instead of list_head() + list_remove(). On FreeBSD the list functions are not inlined, so in addition to more compact code this also saves another function call. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14955	2023-06-09 10:12:52 -07:00
Rob Norris	8653f1de48	zdb: add -B option to generate backup stream This is more-or-less like `zfs send`, but specifying the snapshot by its objset id for situations where it can't be referenced any other way. Sponsored-By: Klara, Inc. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: WHR <msl0000023508@gmail.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #14642	2023-06-05 11:54:42 -07:00
Mike Swanson	365bae0eab	Add compatibility symlinks for FreeBSD 12.{3,4} and 13.{0,1,2} Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com> Closes #14902	2023-05-26 15:37:15 -07:00
Colm	d3e0138a3d	Adding new read-only compatible zpool features to compatibility.d/grub2 GRUB2 is compatible with all "read-only compatible" features, so it is safe to add new features of this type to the grub2 compatibility list. We generally want to include all compatible features, to minimize the differences between grub2-compatible pools and no-compatibility pools. Adding new properties `livelist` and `zpool_checkpoint` accordingly. Also adding them to the man page which references this file as an example, for consistency. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Colm Buckley <colm@tuatha.org> Closes #14893	2023-05-26 10:04:19 -07:00
Richard Yao	677c6f8457	btree: Implement faster binary search algorithm This implements a binary search algorithm for B-Trees that reduces branching to the absolute minimum necessary for a binary search algorithm. It also enables the compiler to inline the comparator to ensure that the only slowdown when doing binary search is from waiting for memory accesses. Additionally, it instructs the compiler to unroll the loop, which gives an additional 40% improve with Clang and 8% improvement with GCC. Consumers must opt into using the faster algorithm. At present, only B-Trees used inside kernel code have been modified to use the faster algorithm. Micro-benchmarks suggest that this can improve binary search performance by up to 3.5 times when compiling with Clang 16 and up to 1.9 times when compiling with GCC 12.2. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14866	2023-05-26 10:03:12 -07:00
Brian Behlendorf	91a2325c4a	Update compatibility.d files Add an openzfs-2.2 compatibility file for the next release. Edon-R support has been enabled for FreeBSD removing the need for different FreeBSD and Linux files. Symlinks for the -linux and -freebsd names are created for any scripts expecting that convention. Additionally, a symlink for ubunutu-22.04 was added. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14833	2023-05-25 13:53:08 -07:00
Alexander Motin	b6fbe61fa6	zil: Add some more statistics. In addition to a number of actual log bytes written, account also a total written bytes including padding and total allocated bytes (bytes <= write <= alloc). It should allow to monitor zil traffic and space efficiency. Add dtrace probe for zil block size selection. Make zilstat report more information and fit it into less width. Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14863	2023-05-25 13:51:53 -07:00
Brian Behlendorf	577e835f30	Probe vdevs before marking removed Before allowing the ZED to mark a vdev as REMOVED due to a hotplug event confirm that it is non-responsive with probe. Any device which can be successfully probed should be left ONLINE to prevent a healthy pool from being incorrectly SUSPENDED. This may occur for at least the following two scenarios. 1) Drive expansion (zpool online -e) in VMware environments. If, during the partition resize operation, a partition is removed and re-created then udev will send a removed event. 2) Re-scanning the namespaces of an NVMe device (nvme ns-rescan) may result in a udev remove and add event being delivered. Finally, update the ZED to only kick in a spare when the removal was successful. Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #14859 Closes #14861	2023-05-19 13:05:09 -07:00
George Amanakis	482eeef804	Teach zpool scrub to scrub only blocks in error log Added a flag '-e' in zpool scrub to scrub only blocks in error log. A user can pause, resume and cancel the error scrub by passing additional command line arguments -p -s just like a regular scrub. This involves adding a new flag, creating new libzfs interfaces, a new ioctl, and the actual iteration and read-issuing logic. Error scrubbing is executed in multiple txg to make sure pool performance is not affected. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Co-authored-by: TulsiJain tulsi.jain@delphix.com Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #8995 Closes #12355	2023-05-18 11:59:42 -07:00
Brian Behlendorf	e34e15ed6d	Add the ability to uninitialize zpool initialize functions well for touching every free byte...once. But if we want to do it again, we're currently out of luck. So let's add zpool initialize -u to clear it. Co-authored-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12451 Closes #14873	2023-05-18 10:02:20 -07:00
Matthew Ahrens	3095ca91c2	Verify block pointers before writing them out If a block pointer is corrupted (but the block containing it checksums correctly, e.g. due to a bug that overwrites random memory), we can often detect it before the block is read, with the `zfs_blkptr_verify()` function, which is used in `arc_read()`, `zio_free()`, etc. However, such corruption is not typically recoverable. To recover from it we would need to detect the memory error before the block pointer is written to disk. This PR verifies BP's that are contained in indirect blocks and dnodes before they are written to disk, in `dbuf_write_ready()`. This way, we'll get a panic before the on-disk data is corrupted. This will help us to diagnose what's causing the corruption, as well as being much easier to recover from. To minimize performance impact, only checks that can be done without holding the spa_config_lock are performed. Additionally, when corruption is detected, the raw words of the block pointer are logged. (Note that `dprintf_bp()` is a no-op by default, but if enabled it is not safe to use with invalid block pointers.) Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #14817	2023-05-08 11:20:23 -07:00
Brian Behlendorf	dd19821149	zdb: consistent xattr output When using zdb to output the value of an xattr only interpret it as printable characters if the entire byte array is printable. Additionally, if the --parseable option is set always output the buffer contents as octal for easy parsing. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14830	2023-05-08 11:17:41 -07:00
buzzingwires	a46001adb9	Allow zhack label repair to restore detached devices. This commit expands on the zhack label repair command in `d04b5c9` by adding the -u option to undetach a device by regenerating uberblocks, in addition to the existing functionality of fixing checksums, now represented by -c. Previous behavior is retained in the case of no options. The changes are heavily inspired by Jeff Bonwick's labelfix utility, as archived at: https://gist.github.com/jjwhitney/baaa63144da89726e482 Additionally, it is now capable of properly determining the size of block devices and other media, as well as handling sizes which are not divisible by 2^18. This should make it viable for use on physical devices and partitions, in addition to files. These changes should make it possible to import zpools that have had their uberblocks erased, such as in the case of pools rendered inaccessible by erroneous detach commands. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: buzzingwires <buzzingwires@outlook.com> Closes #14773	2023-05-03 09:03:57 -07:00
Brian Behlendorf	d960beca61	zdb: Fix minor memory leak Commit `6b6aaf6dc2` introduced a small memory leak in zdb. This was detected by the LeakSanitizer and was causing all ztest runs to fail. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14796	2023-04-26 08:43:39 -07:00
Rich Ercolani	6b6aaf6dc2	Taught zdb -bb to print metadata totals People often want estimates of how much of their pool is occupied by metadata, but they end up using lots of text processing on zdb's output to get it. So let's just...provide it for them. Now, zdb -bbbs will output something like: Blocks LSIZE PSIZE ASIZE avg comp %Total Type [...] 68 1.06M 272K 544K 8K 4.00 0.00 L6 Total 1.71K 212M 6.85M 13.7M 8K 30.91 0.00 L5 Total 1.71K 212M 6.85M 13.7M 8K 30.91 0.00 L4 Total 1.73K 214M 6.92M 13.8M 8K 30.89 0.00 L3 Total 18.7K 2.29G 111M 221M 11.8K 21.19 0.00 L2 Total 3.56M 454G 28.4G 56.9G 16.0K 15.97 0.19 L1 Total 308M 36.8T 28.2T 28.6T 95.1K 1.30 99.80 L0 Total 311M 37.3T 28.3T 28.6T 94.2K 1.32 100.00 Total 50.4M 774G 113G 291G 5.77K 6.85 0.99 Metadata Total Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #14746	2023-04-24 16:55:07 -07:00
Allan Jude	8eae2d214c	Add support for zpool user properties Usage: zpool set org.freebsd:comment="this is my pool" poolname Tests are based on zfs_set's user property tests. Also stop truncating property values at MAXNAMELEN, use ZFS_MAXPROPLEN. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com> Sponsored-by: Beckhoff Automation GmbH & Co. KG. Sponsored-by: Klara Inc. Closes #11680	2023-04-21 10:20:36 -07:00
rob-wing	3e4ed4213d	Create zap for root vdev And add it to the AVZ, this is not backwards compatible with older pools due to an assertion in spa_sync() that verifies the number of ZAPs of all vdevs matches the number of ZAPs in the AVZ. Granted, the assertion only applies to #DEBUG builds - still, a feature flag is introduced to avoid the assertion, com.klarasystems:vdev_zaps_v2 Notably, this allows to get/set properties on the root vdev: % zpool set user:prop=value <pool> root-0 Before this commit, it was already possible to get/set properties on top-level vdevs with the syntax <type>-<vdev_id> (e.g. mirror-0): % zpool set user:prop=value <pool> mirror-0 This syntax also applies to the root vdev as it is is of type 'root' with a vdev_id of 0, root-0. The keyword 'root' as an alias for 'root-0'. The following tests have been added: - zpool get all properties from root vdev - zpool set a property on root vdev - verify root vdev ZAP is created Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Sponsored-by: Seagate Technology Submitted-by: Klara, Inc. Closes #14405	2023-04-20 10:07:56 -07:00

1 2 3 4 5 ...

1442 Commits