mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-04-17 08:54:52 +03:00

Author	SHA1	Message	Date
Tony Hutter	a9520e6e59	zpool: Add slot power control, print power status Add `zpool` flags to control the slot power to drives. This assumes your SAS or NVMe enclosure supports slot power control via sysfs. The new `--power` flag is added to `zpool offline\|online\|clear`: zpool offline --power <pool> <device> Turn off device slot power zpool online --power <pool> <device> Turn on device slot power zpool clear --power <pool> [device] Turn on device slot power If the ZPOOL_AUTO_POWER_ON_SLOT env var is set, then the '--power' option is automatically implied for `zpool online` and `zpool clear` and does not need to be passed. zpool status also gets a --power option to print the slot power status. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mart Frauenlob <AllKind@fastest.cc> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #15662	2023-12-21 10:53:16 -08:00
Mauricio Faria de Oliveira	3c7650491b	zed: fix typo in variable ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT Replace ENCLO_US_RE with ENCLO_SU_RE in the name of the variable. Note this changes the user-visible string in zed.rc, thus might break current users with the wrong string, but it's ~2 months since zfs-2.2.0 tag is out, thus should not be widespread yet. Mechanical change: $ grep -rl ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT cmd/zed/zed.d/zed.rc cmd/zed/zed.d/statechange-slot_off.sh $ sed -i 's/ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT/<linebreak> ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT/g' \ cmd/zed/zed.d/zed.rc \ cmd/zed/zed.d/statechange-slot_off.sh $ grep -rl ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT $ Fixes `11fbcacf37` ("zed: Add zedlet to power off slot when drive is faulted") Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Closes #15651	2023-12-08 16:32:35 -08:00
Rob N	450f2d0b08	import: ignore return on hostid lookups Just silencing a warning. Its totally fine for a hostid to not be there. Reported-by: Coverity (CID-1573336) Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #15650	2023-12-07 08:41:54 -08:00
Alexander Motin	f9765b182e	zdb: Dump encrypted write and clone ZIL records Block pointers are not encrypted in TX_WRITE and TX_CLONE_RANGE records, so we can dump them, that may be useful for debugging. Related to #15543. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15629	2023-12-06 12:39:12 -08:00
Rob N	5f2700eee5	zpool: flush output before sleeping Several zpool commands (status, list, iostat) have modes that present some information, sleep a while, present the current state, sleep, etc. Some of those had ways to invoke them that when piped would appear to do nothing for a while, because non-terminals are block-buffered, not line-buffered, by default. Fix this by forcing a flush before sleeping. In particular, all of these buffered: - zpool status <pool> <interval> - zpool iostat -y<m> <pool> <interval> - zpool list <pool> <interval> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #15593	2023-12-05 11:53:14 -08:00
Dex Wood	014265f4e6	Add Ntfy notification support to ZED This commit adds the zed_notify_ntfy() function and hooks it into zed_notify(). This will allow ZED to send notifications to ntfy.sh or a self-hosted Ntfy service, which can be received on a desktop or mobile device. It is configured with ZED_NTFY_TOPIC, ZED_NTFY_URL, and ZED_NTFY_ACCESS_TOKEN variables in zed.rc. Reviewed-by: @classabbyamp Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Dex Wood <slash2314@gmail.com> Closes #15584	2023-12-01 15:25:17 -08:00
Martin Matuška	1c38cdfe98	zdb: fix printf() length for uint64_t devid Bug introduced in `213d682967`. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Warner Losh <imp@FreeBSD.org> Signed-off-by: Martin Matuska <mm@FreeBSD.org> Closes #15606	2023-11-29 09:18:30 -08:00
Akash B	c1a47de86f	zdb: Fix zdb '-O\|-r' options with -e/exported zpool zdb with '-e' or exported zpool doesn't work along with '-O' and '-r' options as we process them before '-e' has been processed. Below errors are seen: ~> zdb -e pool-mds65/mdt65 -O oi.9/0x200000009:0x0:0x0 failed to hold dataset 'pool-mds65/mdt65': No such file or directory ~> zdb -e pool-oss0/ost0 -r file1 /tmp/filecopy1 -p. failed to hold dataset 'pool-oss0/ost0': No such file or directory zdb: internal error: No such file or directory We need to make sure to process '-O\|-r' options after the '-e' option has been processed, which imports the pool to the namespace if it's not in the cachefile. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Akash B <akash-b@hpe.com> Closes #15532	2023-11-27 13:41:58 -08:00
Rob Norris	213d682967	zdb: show BRT statistics and dump its contents Same idea as the dedup stats, but for block cloning. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Kay Pedersen <mail@mkwg.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #15541	2023-11-27 13:35:07 -08:00
Rob N	92dc4ad83d	Consider `dnode_t` allocations in dbuf cache size accounting Entries in the dbuf cache contribute only the size of the dbuf data to the cache size. Attached "user" data is not counted. This can lead to the data currently "owned" by the cache consuming more memory accounting appears to show. In some cases (eg a metadnode data block with all child dnode_t slots allocated), the actual size can be as much as 3x as what the cache believes it to be. This is arguably correct behaviour, as the cache is only tracking the size of the dbuf data, not even the overhead of the dbuf_t. On the other hand, in the above case of dnodes, evicting cached metadnode dbufs is the only current way to reclaim the dnode objects, and can lead to the situation where the dbuf cache appears to be comfortably within its target memory window and yet is holding enormous amounts of slab memory that cannot be reclaimed. This commit adds a facility for a dbuf user to artificially inflate the apparent size of the dbuf for caching purposes. This at least allows for cache tuning to be adjusted to match something closer to the real memory overhead. metadnode dbufs carry a >1KiB allocation per dnode in their user data. This informs the dbuf cache machinery of that fact, allowing it to make better decisions when evicting dbufs. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #15511	2023-11-17 13:25:53 -08:00
Umer Saleem	40fccc423a	ZTS: Test for all known zpool feature sets zpool_create_features_007_pos only tested for compat-2020 feature set. It would be useful to test for all known features sets. If any additional feature is found enabled that is not present in compatibility list or feature set, it should be caught and reported earlier. This commit also removes encryption from openzfsonosx-1.8.1 compatibility list. Encryption enables bookmark_v2, since it is a dependency of encryption, but not listed in openzfsonoxx-1.8.1 compatibility list. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #15505	2023-11-09 10:58:23 -08:00
Don Brady	5caeef02fa	RAID-Z expansion feature This feature allows disks to be added one at a time to a RAID-Z group, expanding its capacity incrementally. This feature is especially useful for small pools (typically with only one RAID-Z group), where there isn't sufficient hardware to add capacity by adding a whole new RAID-Z group (typically doubling the number of disks). == Initiating expansion == A new device (disk) can be attached to an existing RAIDZ vdev, by running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank raidz2-0 sda`. The new device will become part of the RAIDZ group. A "raidz expansion" will be initiated, and the new device will contribute additional space to the RAIDZ group once the expansion completes. The `feature@raidz_expansion` on-disk feature flag must be `enabled` to initiate an expansion, and it remains `active` for the life of the pool. In other words, pools with expanded RAIDZ vdevs can not be imported by older releases of the ZFS software. == During expansion == The expansion entails reading all allocated space from existing disks in the RAIDZ group, and rewriting it to the new disks in the RAIDZ group (including the newly added device). The expansion progress can be monitored with `zpool status`. Data redundancy is maintained during (and after) the expansion. If a disk fails while the expansion is in progress, the expansion pauses until the health of the RAIDZ vdev is restored (e.g. by replacing the failed disk and waiting for reconstruction to complete). The pool remains accessible during expansion. Following a reboot or export/import, the expansion resumes where it left off. == After expansion == When the expansion completes, the additional space is available for use, and is reflected in the `available` zfs property (as seen in `zfs list`, `df`, etc). Expansion does not change the number of failures that can be tolerated without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after expansion). A RAIDZ vdev can be expanded multiple times. After the expansion completes, old blocks remain with their old data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distributed among the larger set of disks. New blocks will be written with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ vdev's "assumed parity ratio" does not change, so slightly less space than is expected may be reported for newly-written blocks, according to `zfs list`, `df`, `ls -s`, and similar tools. Sponsored-by: The FreeBSD Foundation Sponsored-by: iXsystems, Inc. Sponsored-by: vStack Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Authored-by: Matthew Ahrens <mahrens@delphix.com> Contributions-by: Fedor Uporov <fuporov.vstack@gmail.com> Contributions-by: Stuart Maybee <stuart.maybee@comcast.net> Contributions-by: Thorsten Behrens <tbehrens@outlook.com> Contributions-by: Fmstrat <nospam@nowsci.com> Contributions-by: Don Brady <dev.fs.zfs@gmail.com> Signed-off-by: Don Brady <dev.fs.zfs@gmail.com> Closes #15022	2023-11-08 10:19:41 -08:00
Umer Saleem	78ac868824	Remove obsolete_counts from grub2 compatibility list PR#15459 add all read-only compatible zpool features to grub2 compatibility list. 'obsolete_counts' is a read-only features that depends on 'device_removal' feature which is not read-only and is marked as ZFEATURE_FLAG_MOS. Creating a pool with grub2 compatibility enables 'device_removal' feature as well, which is not desired. This commit removes the 'obsolete_counts' feature from grub2 compatibility list, as GRUB only supports read-only compatible features. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #15499	2023-11-07 12:04:56 -08:00
Tony Hutter	358ce2cf28	zed: misc vdev_enc_sysfs_path fixes There have been rare cases where the VDEV_ENC_SYSFS_PATH value that zed gets passed is stale. To mitigate this, dynamically check the sysfs path at the time of zed event processing, and use the dynamic value if possible. Note that there will be other times when we can not dynamically detect the sysfs path (like if a disk disappears) and have to rely on the old value for things like turning on the fault LED. That is to say, we can't just blindly use the dynamic path in every case. Also: - Add enclosure sysfs entry when running 'zpool add' - Fix 'slot' and 'enc' zpool.d scripts for nvme Reviewed-by: Don Brady <dev.fs.zfs@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #15462	2023-11-07 09:09:24 -08:00
Umer Saleem	cba99a046e	Add all read-only compatible zpool features to grub2 compatibility GRUB opens the boot pool in read-only mode. All read-only compatible features for zpool can be enabled and added to grub2 compatibility, as GRUB does not open the boot-pool for write. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #15459	2023-10-31 09:51:54 -07:00
Alexander Motin	3afdc97d91	ZIO: Remove READY pipeline stage from root ZIOs zio_root() has no arguments for ready callback or parent ZIO. Except one recent case in ZIL code if root ZIOs ever have a parent it is also a root ZIO. It means we do not need READY pipeline stage for them, which takes some time to process, but even more time to wait for the children and be woken by them, and both for no good reason. The most visible effect of this change is that it avoids one taskq wakeup per ZIL block written, previously used to run zio_ready() for lwb_root_zio and skipped now. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15398	2023-10-25 15:22:25 -07:00
ofthesun9	e57909265b	"ARC prefetch metadata accesses:" appears twice in the output. The first occurrence should be "ARC prefetch data accesses:" Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: ofthesun9 <olivier@ofthesun.net> Closes #15427	2023-10-23 13:41:29 -07:00
Don Brady	f0f330e121	Fix ZED auto-replace for VDEVs using by-id paths The change is simple -- restore the original code so that the VDEV path is updated when using by-id paths. The more challenging part was to devise a second ZTS test, that would test auto-replace for 'by-id' and help prevent a future regression. With that new test, we can now do an A\|B test with , and without, the fix to confirm that auto-replace for by-id paths works. The existing auto-replace test, functional/fault/auto_replace_001_pos, will confirm that we didn't break auto-replace for 'by-vdev' paths. In the original functional/fault/auto_replace_001_pos test, the disk wipe (using dd) was not effective in removing the partitioning since the kernel was never informed of the wipe. Added a call to wipefs(8) so that the kernel is informed and ZED will re-partition the device. Added a validation step that the re-partitioning occurred by confirming that the GPT partition UUID changes. Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Don Brady <don.brady@klarasystems.com> Closes #15363	2023-10-20 09:29:02 -07:00
Rob Norris	54b1b1d893	import: require force when cachefile hostid doesn't match on-disk Previously, if a cachefile is passed to zpool import, the cached config is mostly offered as-is to ZFS_IOC_POOL_TRYIMPORT->spa_tryimport(), and the results are taken as the canonical pool config and handed back to ZFS_IOC_POOL_IMPORT. In the course of its operation, spa_load() will inspect the pool and build a new config from what it finds on disk. However, it then regenerates a new config ready to import, and so rightly sets the hostid and hostname for the local host in the config it returns. Because of this, the "require force" checks always decide the pool is exported and last touched by the local host, even if this is not true, which is possible in a HA environment when MMP is not enabled. The pool may be imported on another head, but the import checks still pass here, so the pool ends up imported on both. (This doesn't happen when a cachefile isn't used, because the pool config is discovered in userspace in zpool_find_import(), and that does find the on-disk hostid and hostname correctly). Since the systemd zfs-import-cache.service unit uses cachefile imports, this can lead to a system returning after a crash with a "valid" cachefile on disk and automatically, quietly, importing a pool that has already been taken up by a secondary head. This commit causes the on-disk hostid and hostname to be included in the ZPOOL_CONFIG_LOAD_INFO item in the returned config, and then changes the "force" checks for zpool import to use them if present. This method should give no change in behaviour for old userspace on new kernels (they won't know to look for the new config items) and for new userspace on old kernels (the won't find the new config items). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #15290	2023-10-06 09:24:44 -07:00
Umer Saleem	4e16964e1c	Add '-u' - nomount flag for zfs set This commit adds '-u' flag for zfs set operation. With this flag, mountpoint, sharenfs and sharesmb properties can be updated without actually mounting or sharing the dataset. Previously, if dataset was unmounted, and mountpoint property was updated, dataset was not mounted after the update. This behavior is changed in #15240. We mount the dataset whenever mountpoint property is updated, regardless if it's mounted or not. To provide the user with option to keep the dataset unmounted and still update the mountpoint without mounting the dataset, '-u' flag can be used. If any of mountpoint, sharenfs or sharesmb properties are updated with '-u' flag, the property is set to desired value but the operation to (re/un)mount and/or (re/un)share the dataset is not performed and dataset remains as it was before. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #15322	2023-10-02 16:58:54 -07:00
George Amanakis	fe4d055b36	Report ashift of L2ARC devices in zdb Commit `8af1104f` does not actually store the ashift of cache devices in their label. However, in order to facilitate reporting the ashift through zdb, we enable this in the present commit. We also document how the retrieval of the ashift is done. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #15331	2023-10-02 16:57:09 -07:00
Paul Dagnelie	2e2a46e0a5	Invoke zdb by guid to avoid import errors The problem that was occurring is basically that a device was removed by ztest and replaced with another device. It was then reguided. The import then failed because there were two possible imports with the same name; one with the new guid, and one with the old. This can happen because the label writes from the device removal/replacement can be subject to ztest's error injection. The other ways to fix this would be to change the error injection to not trigger on removals (which may not be technically feasible), or to change the import code to not report configurations that are so short on devices (which would potentially have unpleasant end-user effects when trying to recover from data losses/device configuration issues). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #15298	2023-09-22 16:08:51 -07:00
Alexander Motin	e5d70f4677	ZIL: Avoid dbuf_read() in ztest_get_data() While working on similar patches for zfs and zvol in #15153 I've forgot about ztest. Update it also so that we test the same code paths as use in production. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15301	2023-09-21 18:40:13 -07:00
Tony Hutter	b53077a9e7	Add zfs_prepare_disk script for disk firmware install Have libzfs call a special `zfs_prepare_disk` script before a disk is included into the pool. The user can edit this script to add things like a disk firmware update or a disk health check. Use of the script is totally optional. See the zfs_prepare_disk manpage for full details. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #15243	2023-09-21 08:36:26 -07:00
Umer Saleem	c63aabaf1c	Improve the handling of sharesmb,sharenfs properties For sharesmb and sharenfs properties, the status of setting the property is tied with whether we succeed to share the dataset or not. In case sharing the dataset is not successful, this is treated as overall failure of setting the property. In this case, if we check the property after the failure, it is set to on. This commit updates this behavior and the status of setting the share properties is not returned as failure, when we fail to share the dataset. For sharenfs property, if access list is provided, the syntax errors in access list/host adresses are not validated until after setting the property during postfix phase while trying to share the dataset. This is not correct, since the property has already been set when we reach there. Syntax errors in access list/host addresses are validated while validating the property list, before setting the property and failure is returned to user in this case when there are errors in access list. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #15240	2023-09-19 17:16:14 -07:00
Umer Saleem	bbac1d2977	Update the behavior of mountpoint property There are some inconsistencies in the handling of mountpoint property. This commit updates the behavior and makes it consistent. If mountpoint property is set when dataset is unmounted, this would update the mountpoint property. The mountpoint could be valid or invalid in this case. Setting the mountpoint property would result in success in this case. Dataset would still be unmounted here. On the other hand, if dataset is mounted and mountpoint property is updated to something invalid where mount cannot be successful, for example, setting the mountpoint inside a readonly directory. This would unmount the dataset, set the mountpoint property to requested value and tries to mount the dataset. The mount operation returns error and this error is treated as overall failure of setting the property while the property is actually set. To make the behavior consistent in case dataset is mounted or unmounted, we should try to mount the dataset whenever mountpoint property is updated. This would result in mounting the datasets if canmount property is set to on, regardless if the dataset was previously unmounted. The failure in mount operation while setting the mountpoint property should not be treated as failure, since the property is actually set now to user requested value. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #15240	2023-09-19 17:15:24 -07:00
Rob N	7228ba1114	cmd: add 'help' subcommand to zpool and zfs 'program help subcommand' is a reasonably common pattern for multifunction command-line programs. This commit adds support for that style to the zpool and zfs commands. When run as 'zpool help [<topic>]' or 'zfs help [<topic>]', executes the 'man' program on the PATH with the most likely manpage name for the requested topic: "zpool-<topic>" or "zfs-<topic>" for subcommands, or "zpool<topic>" or "zfs<topic>" for the "concepts" and "props" topics. If no topic is supplied, uses the top "zpool" or "zfs" pages. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kay Pedersen <mail@mkwg.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #15288	2023-09-19 09:06:47 -07:00
Paul Dagnelie	2076011e0c	Fix incorrect expected error in ztest There is an occasional ztest failure that looks like ztest: attach (/var/tmp/zloop-run/ztest.13a 570425344, draid1-1-0 532152320, 1) returned 22, expected 95. This is because the value that we return is EINVAL, but expected_error is set incorrectly. Change the expected_error value to match both the comment and the actual error value. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #15295	2023-09-19 09:02:23 -07:00
Tony Hutter	529bec7d7b	zed: Allow autoreplace and fault LEDs for removed vdevs Allow zed to autoreplace vdevs marked as REMOVED. Also update statechange-led zedlet to toggle fault LEDs for REMOVED vdevs. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #15281	2023-09-18 16:25:58 -07:00
Umer Saleem	71472bf375	Relax error reporting in zpool import and zpool split For zpool import and zpool split, zpool_enable_datasets is called to mount and share all datasets in a pool. If there is an error while mounting or sharing any dataset in the pool, the status of import or split is reported as failure. However, the changes do show up in zpool list. This commit updates the error reporting in zpool import and zpool split path. More descriptive messages are shown to user in case there is an error during mount or share. Errors in mount or share do not effect the overall status of zpool import and zpool split. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #15216	2023-09-01 17:25:11 -07:00
Serapheim Dimitropoulos	cad00d5180	checkstyle: fix action failures Reviewed-by: Don Brady <dev.fs.zfs@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #15220	2023-08-29 09:12:40 -07:00
Paul Dagnelie	bee9cfb813	Increase limit of redaction list by using spill block Currently redaction bookmarks and their associated redaction lists have a relatively low limit of 36 redaction snapshots. This is imposed by the number of snapshot GUIDs that fit in the bonus buffer of the redaction list object. While this is more than enough for most use cases, there are some limited cases where larger numbers would be useful to support. We tweak the redaction list creation code to use a spill block if the number of redaction snapshots is above the amount that would fit in the bonus buffer. We also make a small change to allow spill blocks to be use for types of data besides SA. In order to fully leverage this logic, we also change the redaction code to use vmem_alloc, to handle extremely large allocations if needed. Finally, small tweaks were made to the zfs commands and the test suite. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #15018	2023-08-26 11:34:43 -07:00
Brian Behlendorf	f0e34c8879	zed: update zed.d/statechange-slot_off.sh The statechange-slot_off.sh zedlet which was added in #15200 needed to be installed so it's included by the packages. Additional testing has also shown that multiple retries are often needed for the script to operate reliably. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #15210	2023-08-26 11:22:28 -07:00
Alexander Motin	eda3fcd56f	ZIL: Second attempt to reduce scope of zl_issuer_lock. The previous patch #14841 appeared to have significant flaw, causing deadlocks if zl_get_data callback got blocked waiting for TXG sync. I already handled some of such cases in the original patch, but issue #14982 shown cases that were impossible to solve in that design. This patch fixes the problem by postponing log blocks allocation till the very end, just before the zios issue, leaving nothing blocking after that point to cause deadlocks. Before that point though any sleeps are now allowed, not causing sync thread blockage. This require slightly more complicated lwb state machine to allocate blocks and issue zios in proper order. But with removal of special early issue workarounds the new code is much cleaner now, and should even be more efficient. Since this patch uses null zios between write, I've found that null zios do not wait for logical children ready status in zio_ready(), that makes parent write to proceed prematurely, producing incorrect log blocks. Added ZIO_CHILD_LOGICAL_BIT to zio_wait_for_children() fixes it. Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15122	2023-08-24 17:08:49 -07:00
Tony Hutter	11fbcacf37	zed: Add zedlet to power off slot when drive is faulted If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then power off the drive's slot in the enclosure if it becomes FAULTED. This can help silence misbehaving drives. This assumes your drive enclosure fully supports slot power control via sysfs. Reviewed-by: @AllKind Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #15200	2023-08-24 11:59:03 -07:00
Rob N	114a39964f	zdb: include cloned blocks in block statistics This gives `zdb -b` support for clone blocks. Previously, it didn't know what clones were, so would count their space allocation multiple times and then report leaked space (or, in debug, would assert trying to claim blocks a second time). This commit fixes those bugs, and reports the number of clones and the space "used" (saved) by them. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kay Pedersen <mail@mkwg.de> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15123	2023-08-01 08:56:30 -07:00
Brian Behlendorf	782312c612	zed: Reduce log noise for large JBODs For large JBODs the log message "zfs_iter_vdev: no match" can account for the bulk of the log messages (over 70%). Since this message is purely informational and not that useful we remove it. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #15086 Closes #15094	2023-07-25 13:55:29 -07:00
Tony Hutter	ab0b0393cb	zed: Fix zed ASSERT on slot power cycle We would see zed assert on one of our systems if we powered off a slot. Further examination showed zfs_retire_recv() was reporting a GUID of 0, which in turn would return a NULL nvlist. Add in a check for a zero GUID. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #15084	2023-07-21 11:46:58 -07:00
Alexander Motin	fa7b2390d4	Do not report bytes skipped by scan as issued. Scan process may skip blocks based on their birth time, DVA, etc. Traditionally those blocks were accounted as issued, that caused reporting of hugely over-inflated numbers, having nothing to do with actual disk I/O. This change utilizes never used field in struct dsl_scan_phys to account such skipped bytes, allowing to report how much data were actually scrubbed/resilvered and what is the actual I/O speed. While formally it is an on-disk format change, it should be compatible both ways, so should not need a feature flag. This should partially address the same issue as `c85ac731a0`, but from a different perspective, complementing it. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15007	2023-06-30 08:47:13 -07:00
Arshad Hussain	6052060c13	Don't use hard-coded 'size' value in snprintf() This patch changes the passing of "size" to snprintf from hard-coded (openended) to sizeof(errbuf). This is bringing to standard with rest of the code where- ever 'errbuf' is used. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Closes #15003	2023-06-30 08:37:26 -07:00
Mateusz Piotrowski	62ace21a14	zdb: Add missing poolname to -C synopsis Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org> Sponsored-by: Klara Inc. Closes #15014	2023-06-29 10:54:43 -07:00
Alexander Motin	70ea484e3e	Finally drop long disabled vdev cache. It was a vdev level read cache, designed to aggregate many small reads by speculatively issuing bigger reads instead and caching the result. But since it has almost no idea about what is going on with exception of ZIO_FLAG_DONT_CACHE flag set by higher layers, it was found to make more harm than good, for which reason it was disabled for the past 12 years. These days we have much better instruments to enlarge the I/Os, such as speculative and prescient prefetches, I/O scheduler, I/O aggregation etc. Besides just the dead code removal this removes one extra mutex lock/unlock per write inside vdev_cache_write(), not otherwise disabled and trying to do some work. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14953	2023-06-09 12:40:55 -07:00
Alexander Motin	90ccfd426d	Improve l2arc reporting in arc_summary. - Do not report L2ARC as FAULTED in presence of in-flight writes. - Report read and write I/Os, bytes and errors. - Remove few numbers not important to average user. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #12304 Closes #14946	2023-06-09 10:14:05 -07:00
Alexander Motin	b3ad3f48d9	Use list_remove_head() where possible. ... instead of list_head() + list_remove(). On FreeBSD the list functions are not inlined, so in addition to more compact code this also saves another function call. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14955	2023-06-09 10:12:52 -07:00
Rob Norris	8653f1de48	zdb: add -B option to generate backup stream This is more-or-less like `zfs send`, but specifying the snapshot by its objset id for situations where it can't be referenced any other way. Sponsored-By: Klara, Inc. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: WHR <msl0000023508@gmail.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #14642	2023-06-05 11:54:42 -07:00
Mike Swanson	365bae0eab	Add compatibility symlinks for FreeBSD 12.{3,4} and 13.{0,1,2} Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com> Closes #14902	2023-05-26 15:37:15 -07:00
Colm	d3e0138a3d	Adding new read-only compatible zpool features to compatibility.d/grub2 GRUB2 is compatible with all "read-only compatible" features, so it is safe to add new features of this type to the grub2 compatibility list. We generally want to include all compatible features, to minimize the differences between grub2-compatible pools and no-compatibility pools. Adding new properties `livelist` and `zpool_checkpoint` accordingly. Also adding them to the man page which references this file as an example, for consistency. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Colm Buckley <colm@tuatha.org> Closes #14893	2023-05-26 10:04:19 -07:00
Richard Yao	677c6f8457	btree: Implement faster binary search algorithm This implements a binary search algorithm for B-Trees that reduces branching to the absolute minimum necessary for a binary search algorithm. It also enables the compiler to inline the comparator to ensure that the only slowdown when doing binary search is from waiting for memory accesses. Additionally, it instructs the compiler to unroll the loop, which gives an additional 40% improve with Clang and 8% improvement with GCC. Consumers must opt into using the faster algorithm. At present, only B-Trees used inside kernel code have been modified to use the faster algorithm. Micro-benchmarks suggest that this can improve binary search performance by up to 3.5 times when compiling with Clang 16 and up to 1.9 times when compiling with GCC 12.2. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14866	2023-05-26 10:03:12 -07:00
Brian Behlendorf	91a2325c4a	Update compatibility.d files Add an openzfs-2.2 compatibility file for the next release. Edon-R support has been enabled for FreeBSD removing the need for different FreeBSD and Linux files. Symlinks for the -linux and -freebsd names are created for any scripts expecting that convention. Additionally, a symlink for ubunutu-22.04 was added. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14833	2023-05-25 13:53:08 -07:00
Alexander Motin	b6fbe61fa6	zil: Add some more statistics. In addition to a number of actual log bytes written, account also a total written bytes including padding and total allocated bytes (bytes <= write <= alloc). It should allow to monitor zil traffic and space efficiency. Add dtrace probe for zil block size selection. Make zilstat report more information and fit it into less width. Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14863	2023-05-25 13:51:53 -07:00
Brian Behlendorf	577e835f30	Probe vdevs before marking removed Before allowing the ZED to mark a vdev as REMOVED due to a hotplug event confirm that it is non-responsive with probe. Any device which can be successfully probed should be left ONLINE to prevent a healthy pool from being incorrectly SUSPENDED. This may occur for at least the following two scenarios. 1) Drive expansion (zpool online -e) in VMware environments. If, during the partition resize operation, a partition is removed and re-created then udev will send a removed event. 2) Re-scanning the namespaces of an NVMe device (nvme ns-rescan) may result in a udev remove and add event being delivered. Finally, update the ZED to only kick in a spare when the removal was successful. Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #14859 Closes #14861	2023-05-19 13:05:09 -07:00
George Amanakis	482eeef804	Teach zpool scrub to scrub only blocks in error log Added a flag '-e' in zpool scrub to scrub only blocks in error log. A user can pause, resume and cancel the error scrub by passing additional command line arguments -p -s just like a regular scrub. This involves adding a new flag, creating new libzfs interfaces, a new ioctl, and the actual iteration and read-issuing logic. Error scrubbing is executed in multiple txg to make sure pool performance is not affected. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Co-authored-by: TulsiJain tulsi.jain@delphix.com Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #8995 Closes #12355	2023-05-18 11:59:42 -07:00
Brian Behlendorf	e34e15ed6d	Add the ability to uninitialize zpool initialize functions well for touching every free byte...once. But if we want to do it again, we're currently out of luck. So let's add zpool initialize -u to clear it. Co-authored-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12451 Closes #14873	2023-05-18 10:02:20 -07:00
Matthew Ahrens	3095ca91c2	Verify block pointers before writing them out If a block pointer is corrupted (but the block containing it checksums correctly, e.g. due to a bug that overwrites random memory), we can often detect it before the block is read, with the `zfs_blkptr_verify()` function, which is used in `arc_read()`, `zio_free()`, etc. However, such corruption is not typically recoverable. To recover from it we would need to detect the memory error before the block pointer is written to disk. This PR verifies BP's that are contained in indirect blocks and dnodes before they are written to disk, in `dbuf_write_ready()`. This way, we'll get a panic before the on-disk data is corrupted. This will help us to diagnose what's causing the corruption, as well as being much easier to recover from. To minimize performance impact, only checks that can be done without holding the spa_config_lock are performed. Additionally, when corruption is detected, the raw words of the block pointer are logged. (Note that `dprintf_bp()` is a no-op by default, but if enabled it is not safe to use with invalid block pointers.) Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #14817	2023-05-08 11:20:23 -07:00
Brian Behlendorf	dd19821149	zdb: consistent xattr output When using zdb to output the value of an xattr only interpret it as printable characters if the entire byte array is printable. Additionally, if the --parseable option is set always output the buffer contents as octal for easy parsing. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14830	2023-05-08 11:17:41 -07:00
buzzingwires	a46001adb9	Allow zhack label repair to restore detached devices. This commit expands on the zhack label repair command in `d04b5c9` by adding the -u option to undetach a device by regenerating uberblocks, in addition to the existing functionality of fixing checksums, now represented by -c. Previous behavior is retained in the case of no options. The changes are heavily inspired by Jeff Bonwick's labelfix utility, as archived at: https://gist.github.com/jjwhitney/baaa63144da89726e482 Additionally, it is now capable of properly determining the size of block devices and other media, as well as handling sizes which are not divisible by 2^18. This should make it viable for use on physical devices and partitions, in addition to files. These changes should make it possible to import zpools that have had their uberblocks erased, such as in the case of pools rendered inaccessible by erroneous detach commands. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: buzzingwires <buzzingwires@outlook.com> Closes #14773	2023-05-03 09:03:57 -07:00
Brian Behlendorf	d960beca61	zdb: Fix minor memory leak Commit `6b6aaf6dc2` introduced a small memory leak in zdb. This was detected by the LeakSanitizer and was causing all ztest runs to fail. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14796	2023-04-26 08:43:39 -07:00
Rich Ercolani	6b6aaf6dc2	Taught zdb -bb to print metadata totals People often want estimates of how much of their pool is occupied by metadata, but they end up using lots of text processing on zdb's output to get it. So let's just...provide it for them. Now, zdb -bbbs will output something like: Blocks LSIZE PSIZE ASIZE avg comp %Total Type [...] 68 1.06M 272K 544K 8K 4.00 0.00 L6 Total 1.71K 212M 6.85M 13.7M 8K 30.91 0.00 L5 Total 1.71K 212M 6.85M 13.7M 8K 30.91 0.00 L4 Total 1.73K 214M 6.92M 13.8M 8K 30.89 0.00 L3 Total 18.7K 2.29G 111M 221M 11.8K 21.19 0.00 L2 Total 3.56M 454G 28.4G 56.9G 16.0K 15.97 0.19 L1 Total 308M 36.8T 28.2T 28.6T 95.1K 1.30 99.80 L0 Total 311M 37.3T 28.3T 28.6T 94.2K 1.32 100.00 Total 50.4M 774G 113G 291G 5.77K 6.85 0.99 Metadata Total Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #14746	2023-04-24 16:55:07 -07:00
Allan Jude	8eae2d214c	Add support for zpool user properties Usage: zpool set org.freebsd:comment="this is my pool" poolname Tests are based on zfs_set's user property tests. Also stop truncating property values at MAXNAMELEN, use ZFS_MAXPROPLEN. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com> Sponsored-by: Beckhoff Automation GmbH & Co. KG. Sponsored-by: Klara Inc. Closes #11680	2023-04-21 10:20:36 -07:00
rob-wing	3e4ed4213d	Create zap for root vdev And add it to the AVZ, this is not backwards compatible with older pools due to an assertion in spa_sync() that verifies the number of ZAPs of all vdevs matches the number of ZAPs in the AVZ. Granted, the assertion only applies to #DEBUG builds - still, a feature flag is introduced to avoid the assertion, com.klarasystems:vdev_zaps_v2 Notably, this allows to get/set properties on the root vdev: % zpool set user:prop=value <pool> root-0 Before this commit, it was already possible to get/set properties on top-level vdevs with the syntax <type>-<vdev_id> (e.g. mirror-0): % zpool set user:prop=value <pool> mirror-0 This syntax also applies to the root vdev as it is is of type 'root' with a vdev_id of 0, root-0. The keyword 'root' as an alias for 'root-0'. The following tests have been added: - zpool get all properties from root vdev - zpool set a property on root vdev - verify root vdev ZAP is created Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Sponsored-by: Seagate Technology Submitted-by: Klara, Inc. Closes #14405	2023-04-20 10:07:56 -07:00
Low-power	f9e1c63f8c	Values printed by zpool-iostat(8) should be right-aligned This inappropriate left-alignment was introduced in 7bb7b1f. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: WHR <msl0000023508@gmail.com> Closes #14751	2023-04-18 11:34:41 -07:00
Rob N	baca06c258	libzfs: add v2 iterator interfaces `f6a0dac84` modified the zfs_iter_* functions to take a new "flags" parameter, and introduced a variety of flags to ask the kernel to limit the results in various ways, reducing the amount of work the caller needed to do to filter out things they didn't need. Unfortunately this change broke the ABI for existing clients (read: older versions of the `zfs` program), and was reverted `399b98198`. `dc95911d2` reintroduced the original patch, with the understanding that a backwards-compatible fix would be made before the 2.2 release branch was tagged. This commit is that fix. This introduces zfs_iter_*_v2 functions that have the new flags argument, and reverts the existing functions to not have the flags parameter, as they were before. The old functions are now reimplemented in terms of the new, with flags set to 0. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <george.wilson@delphix.com> Original-patch-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Closes #14597	2023-04-10 11:53:02 -07:00
Paul Dagnelie	b66c2a0899	Storage device expansion "silently" fails on degraded vdev When a vdev is degraded or faulted, we refuse to expand it when doing online -e. However, we also don't actually cause the online command to fail, even though the disk didn't expand. This is confusing and misleading, and can result in violated expectations. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes 14145	2023-04-06 10:29:27 -07:00
Tino Reichardt	6ecdd35bdb	Fix "Add colored output to zfs list" Running `zfs list -o avail rpool` resulted in a core dump. This commit will fix this. Run the needed overhead only, when `use_color()` is true. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #14712	2023-04-05 09:57:01 -07:00
Tino Reichardt	2bd0490faf	Add colored output to zfs list Use a bold header row and colorize the AVAIL column based on the used space percentage of volume. We define these colors: - when > 80%, use yellow - when > 90%, use red Reviewed-by: WHR <msl0000023508@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ethan Coe-Renner <coerenner1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #14621 Closes #14350	2023-03-24 10:24:11 -07:00
Tino Reichardt	7bde396aa2	Colorize zpool iostat output Use a bold header and colorize the space suffixes in iostat by order of magnitude like this: - K is green - M is yellow - G is red - T is lightblue - P is magenta - E is cyan - 0 space is colored gray Reviewed-by: WHR <msl0000023508@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ethan Coe-Renner <coerenner1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #14621 Closes #14459	2023-03-24 10:23:52 -07:00
Richard Yao	d1807f168e	nvpair: Constify string functions After addressing coverity complaints involving `nvpair_name()`, the compiler started complaining about dropping const. This lead to a rabbit hole where not only `nvpair_name()` needed to be constified, but also `nvpair_value_string()`, `fnvpair_value_string()` and a few other static functions, plus variable pointers throughout the code. The result became a fairly big change, so it has been split out into its own patch. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14612	2023-03-14 15:25:50 -07:00
Pawel Jakub Dawidek	67a1b03791	Implementation of block cloning for ZFS Block Cloning allows to manually clone a file (or a subset of its blocks) into another (or the same) file by just creating additional references to the data blocks without copying the data itself. Those references are kept in the Block Reference Tables (BRTs). The whole design of block cloning is documented in module/zfs/brt.c. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net> Closes #13392	2023-03-10 11:59:53 -08:00
Richard Yao	37edc7ea98	Refactor loop in dump_histogram() The current loop triggers a complaint that we are using an array offset prior to a range check from cpp/offset-use-before-range-check when we are actually calculating maximum and minimum values. I was about to file a false positive report with CodeQL, but after looking at how the code is structured, I really cannot blame CodeQL for mistaking this for a range check. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14575	2023-03-08 13:52:20 -08:00
Richard Yao	17443e0b20	Cleanup: Remove constant comparisons reported by CodeQL CodeQL's cpp/constant-comparison query from its security-and-extended query set reported 4 instances where we have comparions that always evaluate the same way. In `draid_config_by_type()`, we have an early `if (nparity == 0)` check that returns `EINVAL`, making a later `if (nparity == 0 \|\| nparity > VDEV_DRAID_MAXPARITY)` partially redundant. The later check prints an error message when parity is 0, but the early check does not. This is not useful feedback, so we move the later check to the place where the early check runs to replace the early check. In `perform_thread_merge()`, we return when `num_threads == 0`. After that block, we do `if (num_threads > 0) {`, which will always be true. We remove the `if` statement. In `sa_modify_attrs()`, we have a loop condition that is `k != 2`, but at the end of the loop, we have `if (k == 0 && hdl->sa_spill)` followed by an else that does a break. The result is that k != 2 will never be evaluated when it is false. We drop the comparison. In `zap_leaf_array_read()`, we have a for loop condition that is `i < ZAP_LEAF_ARRAY_BYTES && len > 0`. However, that loop itself is in a loop that is `while (len > 0)` and while the value of len is decremented inside the loop, when `len == 0`, it will return, such that `len > 0` inside the loop condition will always be true. We drop that part of the condition. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14575	2023-03-08 13:51:46 -08:00
Richard Yao	9368b3877c	Fix TOCTOU race in zpool_do_labelclear() Coverity reported a TOCTOU race in `zpool_do_labelclear()`. This is not believed to be a real security issue, but fixing it reduces the number of syscalls we do and will prevent other static analyzers from complaining about this. The code is expected to be equivalent. However, under rare circumstances, such as ELOOP, ENAMETOOLONG, ENOMEM, ENOTDIR and EOVERFLOW, we will display the error message that we currently display for the `open()` syscall rather than the one that we currently display for the `stat()` syscall. This is considered to be an improvement. Reported-by: Coverity (CID-1524188) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14575	2023-03-08 13:50:51 -08:00
Alexander Motin	a8d83e2a24	More adaptive ARC eviction Traditionally ARC adaptation was limited to MRU/MFU distribution. But for years people with metadata-centric workload demanded mechanisms to also manage data/metadata distribution, that in original ZFS was just a FIFO. As result ZFS effectively got separate states for data and metadata, minimum and maximum metadata limits etc, but it all required manual tuning, was not adaptive and in its heart remained a bad FIFO. This change removes most of existing eviction logic, rewriting it from scratch. This makes MRU/MFU adaptation individual for data and meta- data, same as the distribution between data and metadata themselves. Since most of required states separation was already done, it only required to make arcs_size state field specific per data/metadata. The adaptation logic is still based on previous concept of ghost hits, just now it balances ARC capacity between 4 states: MRU data, MRU metadata, MFU data and MFU metadata. To simplify arc_c changes instead of arc_p measured in bytes, this code uses 3 variable arc_meta, arc_pd and arc_pm, representing ARC balance between metadata and data, MRU and MFU for data, and MRU and MFU for metadata respectively as 32-bit fixed point fractions. Since we care about the math result only when need to evict, this moves all the logic from arc_adapt() to arc_evict(), that reduces per-block overhead, since per-block operations are limited to stats collection, now moved from arc_adapt() to arc_access() and using cheaper wmsums. This also allows to remove ugly ARC_HDR_DO_ADAPT flag from many places. This change also removes number of metadata specific tunables, part of which were actually not functioning correctly, since not all metadata are equal and some (like L2ARC headers) are not really evictable. Instead it introduced single opaque knob zfs_arc_meta_balance, tuning ARC's reaction on ghost hits, allowing administrator give more or less preference to metadata without setting strict limits. Some of old code parts like arc_evict_meta() are just removed, because since introduction of ABD ARC they really make no sense: only headers referenced by small number of buffers are not evictable, and they are really not evictable no matter what this code do. Instead just call arc_prune_async() if too much metadata appear not evictable. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14359	2023-03-08 11:17:23 -08:00
Rob N	b988f32c70	Better handling for future crypto parameters The intent is that this is like ENOTSUP, but specifically for when something can't be done because we have no support for the requested crypto parameters; eg unlocking a dataset or receiving a stream encrypted with a suite we don't support. Its not intended to be recoverable without upgrading ZFS itself. If the request could be made to work by enabling a feature or modifying some other configuration item, then some other code should be used. load-key: In the future we might have more crypto suites (ie new values for the `encryption` property. Right now trying to load a key on such a future crypto suite will look up suite parameters off the end of the crypto table, resulting in misbehaviour and/or crashes (or, with debug enabled, trip the assertion in `zio_crypt_key_unwrap`). Instead, lets check the value we got from the dataset, and if we can't handle it, abort early. recv: When receiving a raw stream encrypted with an unknown crypto suite, `zfs recv` would report a generic `invalid backup stream` (EINVAL). While technically correct, its not super helpful, so lets ship a more specific error code and message. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #14577	2023-03-07 14:05:14 -08:00
George Amanakis	12a240ac0b	Fix a typo in `ac2038a` Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #14585 Closes #14592	2023-03-07 13:50:44 -08:00
Richard Yao	bc4d210783	Fix memory leak in ztest This is tripping LeakSanitizer, which causes zloop test failures on pull requests. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14583	2023-03-06 15:30:29 -08:00
Tino Reichardt	f9f9bef22f	Update BLAKE3 for using the new impl handling This commit changes the BLAKE3 implementation handling and also the calls to it from the ztest command. Tested-by: Rich Ercolani <rincebrain@gmail.com> Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13741	2023-03-02 13:52:27 -08:00
Rob N	163f3d3a1f	zdb: add decryption support The approach is straightforward: for dataset ops, if a key was offered, find the encryption root and the various encryption parameters, derive a wrapping key if necessary, and then unlock the encryption root. After that all the regular dataset ops will return unencrypted data, and that's kinda the whole thing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #11551 Closes #12707 Closes #14503	2023-03-02 13:39:09 -08:00
Paul Dagnelie	d9e64a4030	Improve error message of zfs redact We improve the error message of zfs redact by checking if the target snapshot exists, and if all the redaction snapshots exist. As a future improvement we could iterate over every snapshot provided and use that to determine which one specifically doesn't exist. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #11426 Closes #14496	2023-02-21 17:30:05 -08:00
Rob N ★	ac7648179c	zdb: zero-pad checksum output The leading zeroes are part of the checksum so we should show them. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #14464	2023-02-07 13:48:22 -08:00
George Amanakis	ac2038a19c	Teach zdb about DMU_OT_ERROR_LOG objects With the persistent error log feature we need to account for spa_errlog_{scrub, last} containing mappings to other error log objects, which need to be marked as in-use as well. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #14442 Closes #14434	2023-02-02 15:17:37 -08:00
rob-wing	326f1e3d88	zfs_main.c: fix unused variable error with GCC zfs_setproctitle_init() is stubbed out on FreeBSD. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Rob Wing <rob.fx907@gmail.com> Closes #14441	2023-02-02 15:16:40 -08:00
Ameer Hamza	05b72415d1	Fix console progress reporting for recursive send After commit `19d3961`, progress reporting (-v) with replication flag enabled does not report the progress on the console. This commit fixes the issue by updating the logic to check for pa->progress instead of pa_verbosity in send_progress_thread(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14448	2023-02-02 15:09:57 -08:00
Brian Behlendorf	c85ac731a0	Improve resilver ETAs When resilvering the estimated time remaining is calculated using the average issue rate over the current pass. Where the current pass starts when a scan was started, or restarted, if the pool was exported/imported. For dRAID pools in particular this can result in wildly optimistic estimates since the issue rate will be very high while scanning when non-degraded regions of the pool are scanned. Once repair I/O starts being issued performance drops to a realistic number but the estimated performance is still significantly skewed. To address this we redefine a pass such that it starts after a scanning phase completes so the issue rate is more reflective of recent performance. Additionally, the zfs_scan_report_txgs module option can be set to reset the pass statistics more often. Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14410	2023-01-25 11:28:54 -08:00
Richard Yao	73968defdd	Reject streams that set ->drr_payloadlen to unreasonably large values In the zstream code, Coverity reported: "The argument could be controlled by an attacker, who could invoke the function with arbitrary values (for example, a very high or negative buffer size)." It did not report this in the kernel. This is likely because the userspace code stored this in an int before passing it into the allocator, while the kernel code stored it in a uint32_t. However, this did reveal a potentially real problem. On 32-bit systems and systems with only 4GB of physical memory or less in general, it is possible to pass a large enough value that the system will hang. Even worse, on Linux systems, the kernel memory allocator is not able to support allocations up to the maximum 4GB allocation size that this allows. This had already been limited in userspace to 64MB by `ZFS_SENDRECV_MAX_NVLIST`, but we need a hard limit in the kernel to protect systems. After some discussion, we settle on 256MB as a hard upper limit. Attempting to receive a stream that requires more memory than that will result in E2BIG being returned to user space. Reported-by: Coverity (CID-1529836) Reported-by: Coverity (CID-1529837) Reported-by: Coverity (CID-1529838) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14285	2023-01-23 13:16:22 -08:00
rob-wing	69f024a56e	Configure zed's diagnosis engine with vdev properties Introduce four new vdev properties: checksum_n checksum_t io_n io_t These properties can be used for configuring the thresholds of zed's diagnosis engine and are interpeted as <N> events in T <seconds>. When this property is set to a non-default value on a top-level vdev, those thresholds will also apply to its leaf vdevs. This behavior can be overridden by explicitly setting the property on the leaf vdev. Note that, these properties do not persist across vdev replacement. For this reason, it is advisable to set the property on the top-level vdev instead of the leaf vdev. The default values for zed's diagnosis engine (10 events, 600 seconds) remains unchanged. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Sponsored-by: Seagate Technology LLC Closes #13805	2023-01-23 13:14:25 -08:00
Ameer Hamza	19d3961589	Use setproctitle to report progress of zfs send This allows parsing of zfs send progress by checking the process title. Doing so requires some changes to the send code in libzfs_sendrecv.c; primarily these changes move some of the accounting around, to allow for the code to be verbose as normal, or set the process title. Unlike BSD, setproctitle() isn't standard in Linux; thus, borrowed it from libbsd with slight modifications. Authored-by: Sean Eric Fagan <sef@FreeBSD.org> Co-authored-by: Ryan Moeller <ryan@iXsystems.com> Co-authored-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14376	2023-01-17 10:17:35 -08:00
Rob Wing	7a85f58db6	zpool-set: print error message when pool or vdev is not valid Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Sponsored-by: Seagate Technology Submitted-by: Klara, Inc. Closes #14310	2023-01-17 09:47:24 -08:00
Rob Wing	a0276f7048	zpool-set: update usage text Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Sponsored-by: Seagate Technology Submitted-by: Klara, Inc. Closes #14310	2023-01-17 09:46:05 -08:00
rob-wing	6f2ffd272c	zpool: do guid-based comparison in is_vdev_cb() is_vdev_cb() uses string comparison to find a matching vdev and will fallback to comparing the guid via a string. These changes drop the string comparison and compare the guids instead. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Sponsored-by: Seagate Technology Submitted-by: Klara, Inc. Closes #14311	2023-01-11 15:14:35 -08:00
Brian Behlendorf	0c8fbe5b6a	ztest: update ztest_dmu_snapshot_create_destroy() ECHRNG is returned when the channel program encounters a runtime error. For example, this can happen when a snapshot doesn't exist. We handle this error the same way as the existing EEXIST and ENOENT error checks. Additionally, improve the internal debug message to include the error describing why a pool couldn't be opened. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14351	2023-01-10 13:27:48 -08:00
Brian Behlendorf	549aafb7c8	ztest: ztest_dsl_prop_set_uint64() ENOSPC consistency It is possible for ztest_dsl_prop_set_uint64() to fail with ENOSPC and this needs to be handled consistently. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14351	2023-01-10 13:27:48 -08:00
Brian Behlendorf	f7788883ab	ztest: reduce `zpool split` frequency There's no need to so aggressively test splitting a pool. Reduce the occurence of this test to once every 10 seconds. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14351	2023-01-10 13:27:48 -08:00
Brian Behlendorf	4208a052c2	ztest: update expectation for sparing a special device Commit `c23738c70e` modified the expected behavior of attach to prevent hot spares from being used as special vdev replacements. We update ztest's expectations accordingly to prevent it from failing when testing the updated behavior. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14351	2023-01-10 13:26:44 -08:00
Ameer Hamza	5091867ee6	zed: add hotplug support for spare vdevs This commit supports for spare vdev hotplug. The spare vdev associated with all the pools will be marked as "Removed" when the drive is physically detached and will become "Available" when the drive is reattached. Currently, the spare vdev status does not change on the drive removal and the same is the case with reattachment. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14295	2023-01-09 12:43:03 -08:00
Alexander Motin	792a6ee462	Update arc_summary and arcstat outputs Recent ARC commits added new statistic counters, such as iohits, uncached state, etc. Represent those. Also some of previously reported numbers were confusing or even made no sense. Cleanup and restructure existing reports. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Issue #14115 Issue #14123 Issue #14243 Closes #14320	2023-01-05 09:29:13 -08:00
Matthew Ahrens	018f26041d	deadlock between spa_errlog_lock and dp_config_rwlock There is a lock order inversion deadlock between `spa_errlog_lock` and `dp_config_rwlock`: A thread in `spa_delete_dataset_errlog()` is running from a sync task. It is holding the `dp_config_rwlock` for writer (see `dsl_sync_task_sync()`), and waiting for the `spa_errlog_lock`. A thread in `dsl_pool_config_enter()` is holding the `spa_errlog_lock` (see `spa_get_errlog_size()`) and waiting for the `dp_config_rwlock` (as reader). Note that this was introduced by #12812. This commit address this by defining the lock ordering to be dp_config_rwlock first, then spa_errlog_lock / spa_errlist_lock. spa_get_errlog() and spa_get_errlog_size() can acquire the locks in this order, and then process_error_block() and get_head_and_birth_txg() can verify that the dp_config_rwlock is already held. Additionally, a buffer overrun in `spa_get_errlog()` is corrected. Many code paths didn't check if `*count` got to zero, instead continuing to overwrite past the beginning of the userspace buffer at `uaddr`. Tested by having some errors in the pool (via `zinject -t data /path/to/file`), one thread running `zpool iostat 0.001`, and another thread runs `zfs destroy` (in a loop, although it hits the first time). This reproduces the problem easily without the fix, and works with the fix. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: George Amanakis <gamanakis@gmail.com> Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #14239 Closes #14289	2022-12-22 11:48:49 -08:00
Brian Behlendorf	b4cd4fe1aa	Revert "zdb: zdb_ddt_leak_init() reads uninitialized memory..." This reverts commit `d30db519af`. With this change applied zloop.sh fails reliably with the following ASSERT. zio_wait(zio_claim(NULL, zcb->zcb_spa, refcnt ? 0 : spa_min_claim_txg( zcb->zcb_spa), bp, NULL, NULL, ZIO_FLAG_CANFAIL)) == 0 (0x2 == 0x0) ASSERT at cmd/zdb/zdb.c:5452:zdb_count_block() Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14306	2022-12-21 09:17:00 -08:00
Allan Jude	dc95911d21	zfs list: Allow more fields in ZFS_ITER_SIMPLE mode If the fields to be listed and sorted by are constrained to those populated by dsl_dataset_fast_stat(), then zfs list is much faster, as it does not need to open each objset and reads its properties. A previous optimization by Pawel Dawidek (`0cee24064a`) took advantage of this to make listing snapshot names sorted only by name much faster. However, it was limited to `-o name -s name`, this work extends this optimization to work with: - name - guid - createtxg - numclones - inconsistent - redacted - origin and could be further extended to any other properties supported by dsl_dataset_fast_stat() or similar, that do not require extra locking or reading from disk. This was committed before (9a9e2e343dfa2af28bf7910de77ae73aa006de62), but was reverted due to a regression when used with an older kernel. If the kernel does not populate zc->zc_objset_stats, we now fallback to getting the properties via the slower interface, to avoid problems with newer userland and older kernels. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com> Closes #14110	2022-12-13 17:27:54 -08:00
Marcel Menzel	70ac2654f5	Change ZEVENT_POOL_GUID to ZEVENT_POOL to display pool names Outgoing mails for ZFS pool events include the pool GUID, but not the actual pool name. Let's change this for better readability, as it is already done in the mails for finished pool resilvers. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Marcel Menzel <mail@mcl.gg> Closes #14272	2022-12-13 17:26:10 -08:00
Richard Yao	d31a7cb4fa	Address theoretical uninitialized variable usage in zstream Coverity has long complained about the checksum being uninitialized if an END record is processed before its BEGIN record. This should not happen, but there was no code to check for it. I had left this unfixed since it was a low priority issue, but then `9f4ede63d2` added another instance of this. I am making an effort to "hold the line" to keep new coverity defect reports from going unaddressed, so I find myself forced to fix this much earlier than I had originally planned to address it. The solution is to maintain a counter and a flag. Then use VERIFY statements to verify the following runtime constraints: * Every record either has a corresponding BEGIN record, is a BEGIN record or is the end of stream END record for replication streams. * BEGIN records cannot be nested. i.e. There must be an END record before another BEGIN record may be seen. Failure to meet these constraints will cause the program to exit. This is sufficient to ensure that the checksum is never accessed when uninitialized. Reported-by: Coverity (CID 1524578) Reported-by: Coverity (CID 1524633) Reported-by: Coverity (CID 1527295) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14176	2022-12-12 10:40:05 -08:00
Richard Yao	f954ea26a6	zdb: Handle theoretical buffer overflow when printing float CodeQL pointed out that for extreme floating point values, `sprintf()` will overwrite a 32 character buffer. It cited 1e304 as an example, which causes `sprintf()` to print 308 characters. In practice, the numbers should never exceed 100, so this should not happen. To silence the warning and also handle unexpected situations, we change the code to use `snprintf()`. This was missed during my audit of our use of `sprintf()`, since I did not think to consider extreme floating point representations. It also really should not happen, so this change is purely defensive programming. This was found by CodeQL's cpp/overrunning-write-with-float check. Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14264	2022-12-08 14:15:15 -08:00
Richard Yao	d30db519af	zdb: zdb_ddt_leak_init() reads uninitialized memory when birth == 0 This was written by Jeff Bonick and was committed to OpenSolaris on November 1, 2009. It appears that Jeff meant to continue the outer loop iteration when `ddp->ddp_phys_birth == 0`, but put his check inside the inner loop. This causes a pointer to uninitialized memory to be passed to ddt_lookup() inside a VERIFY() statement whenever that condition is true. Reported-by: Coverity (CID 1524462) Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14264	2022-12-08 14:15:10 -08:00
Richard Yao	2709ace096	ztest: comparisons against errno should not assign to it `888914486e` introduced this regression. I used cscope to verify that there are no other instances of this in the codebase. This is the one of the few bugs that are extremely easy to identify using cscope. Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14264	2022-12-08 14:15:04 -08:00
Richard Yao	ba87ed1410	Fix potential buffer overflow in zpool command The ZPOOL_SCRIPTS_PATH environment variable can be passed here. This allows for arbitrarily long strings to be passed to sprintf(), which can overflow the buffer. I missed this in my earlier audit of the codebase. CodeQL's cpp/unbounded-write check caught this. Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14264	2022-12-08 14:14:30 -08:00
Richard Yao	ecccaede68	zdb: Fix big parameter passed by value This is not in performance critical code, but static analyzers will complain about it, so lets switch to pass by pointer here. Reported-by: Coverity (CID-1524384) Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14263	2022-12-08 13:52:53 -08:00
Richard Yao	aaa9a6700f	Cleanup: zhack should not declare function prototypes in main() Instead, it should include the proper header. CodeQL caught this. Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14263	2022-12-08 13:51:24 -08:00
szubersk	3c1e1933b6	Fix GCC 12 compilation errors Squelch false positives reported by GCC 12 with UBSan. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #14150	2022-11-30 13:45:53 -08:00
Richard Yao	887fb37843	zdb: Silence Coverity complaint about verify_livelist_allocs() svb is declared on the stack. We then set parts of svb.svb_dva with DVA_SET_VDEV(), DVA_SET_OFFSET() and DVA_SET_ASIZE(). However, the DVA contains other fields for pad, GRID and G. When setting the fields we use, we technically read uninitialized bits from the fields we do not use. This makes Coverity and Clang's Static Analyzer complain. Presumably, other static analyzers might complain too. There is no real bug here, but we are still technically reading undefined data and unless we stop doing that, static analyzers will complain about it in perpetuum and this could obscure real issues. We silence the static analyzer complaints by using a 0 struct initializer. Reported by: Coverity (CID 1524627) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14210	2022-11-29 10:00:45 -08:00
Ameer Hamza	e996c502e4	zed: unclean disk attachment faults the vdev If the attached disk already contains a vdev GUID, it means the disk is not clean. In such a scenario, the physical path would be a match that makes the disk faulted when trying to online it. So, we would only want to proceed if either GUID matches with the last attached disk or the disk is in a clean state. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14181	2022-11-29 09:24:10 -08:00
наб	3069872ef5	cmd: zfs: fix missing mention of zfs diff -h Fixes: `344bbc82e7` Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #14224	2022-11-28 13:37:07 -08:00
Paul Dagnelie	9f4ede63d2	Add ability to recompress send streams with new compression algorithm As new compression algorithms are added to ZFS, it could be useful for people to recompress data with new algorithms. There is currently no mechanism to do this aside from copying the data manually into a new filesystem with the new algorithm enabled. This tool allows the transformation to happen through zfs send, allowing it to be done efficiently to remote systems and in an incremental fashion. A new zstream command is added that decompresses WRITE records and then recompresses them with a provided algorithm, and then re-emits the modified send stream. It may also be possible to re-compress embedded block pointers, but that was not attempted for the initial version. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #14106	2022-11-10 15:23:46 -08:00
Mohamed Tawfik	41715771b5	Adds the `-p` option to `zfs holds` This allows for printing a machine-readable, accurate to the second, hold creation time in the form of a unix epoch timestamp. Additionally, updates relevant documentation and man pages accordingly. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mohamed Tawfik <m_tawfik@aucegypt.edu> Closes #13690 Closes #14152	2022-11-08 10:08:21 -08:00
Richard Yao	f47f6a055d	Address warnings about possible division by zero from clangsa * The complaint in ztest_replay_write() is only possible if something went horribly wrong. An assertion will silence this and if it goes off, we will know that something is wrong. * The complaint in spa_estimate_metaslabs_to_flush() is not impossible, but seems very unlikely. We resolve this by passing the value from the `MIN()` that does not go to infinity when the variable is zero. There was a third report from Clang's scan-build, but that was a definite false positive and disappeared when checked again through Clang's static analyzer with Z3 refution via CodeChecker. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14124	2022-11-03 09:58:14 -07:00
Brooks Davis	b9041e1f27	Use intptr_t when storing an integer in a pointer Cast the integer type to (u)intptr_t before casting to "void *". In CHERI C/C++ we warn on bare casts from integers to pointers to catch attempts to create pointers our of thin air. We allow the warning to be supressed with a suitable cast through (u)intptr_t. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #14131	2022-11-03 09:52:23 -07:00
Richard Yao	97143b9d31	Introduce kmem_scnprintf() `snprintf()` is meant to protect against buffer overflows, but operating on the buffer using its return value, possibly by calling it again, can cause a buffer overflow, because it will return how many characters it would have written if it had enough space even when it did not. In a number of places, we repeatedly call snprintf() by successively incrementing a buffer offset and decrementing a buffer length, by its return value. This is a potentially unsafe usage of `snprintf()` whenever the buffer length is reached. CodeQL complained about this. To fix this, we introduce `kmem_scnprintf()`, which will return 0 when the buffer is zero or the number of written characters, minus 1 to exclude the NULL character, when the buffer was too small. In all other cases, it behaves like snprintf(). The name is inspired by the Linux and XNU kernels' `scnprintf()`. The implementation was written before I thought to look at `scnprintf()` and had a good name for it, but it turned out to have identical semantics to the Linux kernel version. That lead to the name, `kmem_scnprintf()`. CodeQL only catches this issue in loops, so repeated use of snprintf() outside of a loop was not caught. As a result, a thorough audit of the codebase was done to examine all instances of `snprintf()` usage for potential problems and a few were caught. Fixes for them are included in this patch. Unfortunately, ZED is one of the places where `snprintf()` is potentially used incorrectly. Since using `kmem_scnprintf()` in it would require changing how it is linked, we modify its usage to make it safe, no matter what buffer length is used. In addition, there was a bug in the use of the return value where the NULL format character was not being written by pwrite(). That has been fixed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14098	2022-10-29 13:05:11 -07:00
Richard Yao	2e08df84d8	Cleanup dump_bookmarks() Assertions are meant to check assumptions, but the way that this assertion is written does not check an assumption, since it is provably always true. Removing the assertion will cause a compiler warning (made into an error by -Werror) about printing up to 512 bytes to a 256-byte buffer, so instead, we change the assertion to verify the assumption that we never do a snprintf() that is truncated to avoid overrunning the 256-byte buffer. This was caught by an audit of the codebase to look for misuse of `snprintf()` after CodeQL reported that we had misused `snprintf()`. An explanation of how snprintf() can be misused is here: https://www.redhat.com/en/blog/trouble-snprintf This particular instance did not misuse `snprintf()`, but it was caught by the audit anyway. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14098	2022-10-29 13:05:02 -07:00
Aleksa Sarai	dbf6108b4d	zfs_rename: support RENAME_* flags Implement support for Linux's RENAME_* flags (for renameat2). Aside from being quite useful for userspace (providing race-free ways to exchange paths and implement mv --no-clobber), they are used by overlayfs and are thus required in order to use overlayfs-on-ZFS. In order for us to represent the new renameat2(2) flags in the ZIL, we create two new transaction types for the two flags which need transactional-level support (RENAME_EXCHANGE and RENAME_WHITEOUT). RENAME_NOREPLACE does not need any ZIL support because we know that if the operation succeeded before creating the ZIL entry, there was no file to be clobbered and thus it can be treated as a regular TX_RENAME. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Closes #12209 Closes #14070	2022-10-28 09:49:20 -07:00
Andrew Innes	e09fdda977	Fix multiplication converted to larger type This fixes the instances of the "Multiplication result converted to larger type" alert that codeQL scanning found. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Andrew Innes <andrew.c12@gmail.com> Closes #14094	2022-10-28 09:30:37 -07:00
Ameer Hamza	0b2428da20	zed: Avoid core dump if wholedisk property does not exist zed aborts and dumps core in vdev_whole_disk_from_config() if wholedisk property does not exist. make_leaf_vdev() adds the property but there may be already pools that don't have the wholedisk in the label. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14062	2022-10-21 10:46:38 -07:00
Richard Yao	4ecd96371b	Fix theoretical use of uninitialized values Clang's static analyzer complains about this. In get_configs(), if we have an invalid configuration that has no top level vdevs, we can read a couple of uninitialized variables. Aborting upon seeing this would break the userland tools for healthy pools, so we instead initialize the two variables to 0 to allow the userland tools to continue functioning for the pools with valid configurations. In zfs_do_wait(), if no wait activities are enabled, we read an uninitialized error variable. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14043	2022-10-19 17:10:21 -07:00
Richard Yao	219cf0f928	Fix userland memory leak in zfs_do_send() Clang 15's static analyzer caught this. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14045	2022-10-19 17:08:33 -07:00
Richard Yao	aa822e4d9c	Fix NULL pointer dereference in zdb Clang's static analyzer complained that we dereference a NULL pointer in dump_path() if we return 0 when there is an error. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14044	2022-10-18 15:34:24 -07:00
Richard Yao	09453dea6a	ZED: Fix uninitialized value reads Coverity complained about a couple of uninitialized value reads in ZED. * zfs_deliver_dle() can pass an uninitialized string to zed_log_msg() * An uninitialized sev.sigev_signo is passed to timer_create() The former would log garbage while the latter is not a real issue, but we might as well suppress it by initializing the field to 0 for consistency's sake. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14047	2022-10-18 12:42:14 -07:00
Tino Reichardt	27218a32fc	Fix declarations of non-global variables This patch inserts the `static` keyword to non-global variables, which where found by the analysis tool smatch. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13970	2022-10-18 11:05:32 -07:00
Alan Somers	a1034ee909	zstream: allow decompress to fix metadata for uncompressed records If a record is uncompressed on-disk but the block pointer insists otherwise, reading it will return EIO. This commit adds an "off" type to the "zstream decompress" command. Using it will set the compression field in a zfs stream to "off" without changing the record's data. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Alan Somers <asomers@FreeBSD.org> Sponsored by: Axcient Closes #13997	2022-10-14 13:40:00 -07:00
Richard Yao	6a42939fcd	Cleanup: Address Clang's static analyzer's unused code complaints These were categorized as the following: * Dead assignment 23 * Dead increment 4 * Dead initialization 6 * Dead nested assignment 18 Most of these are harmless, but since actual issues can hide among them, we correct them. That said, there were a few return values that were being ignored that appeared to merit some correction: * `destroy_callback()` in `cmd/zfs/zfs_main.c` ignored the error from `destroy_batched()`. We handle it by returning -1 if there is an error. * `zfs_do_upgrade()` in `cmd/zfs/zfs_main.c` ignored the error from `zfs_for_each()`. We handle it by doing a binary OR of the error value from the subsequent `zfs_for_each()` call to the existing value. This is how errors are mostly handled inside `zfs_for_each()`. The error value here is passed to exit from the zfs command, so doing a binary or on it is better than what we did previously. * `get_zap_prop()` in `module/zfs/zcp_get.c` ignored the error from `dsl_prop_get_ds()` when the property is not of type string. We return an error when it does. There is a small concern that the `zfs_get_temporary_prop()` call would handle things, but in the case that it does not, we would be pushing an uninitialized numval onto the lua stack. It is expected that `dsl_prop_get_ds()` will succeed anytime that `zfs_get_temporary_prop()` does, so that not giving it a chance to fix things is not a problem. * `draid_merge_impl()` in `tests/zfs-tests/cmd/draid.c` used `nvlist_add_nvlist()` twice in ways in which errors are expected to be impossible, so we switch to `fnvlist_add_nvlist()`. A few notable ones did not merit use of the return value, so we suppressed it with `(void)`: * `write_free_diffs()` in `lib/libzfs/libzfs_diff.c` ignored the error value from `describe_free()`. A look through the commit history revealed that this was intentional. * `arc_evict_hdr()` in `module/zfs/arc.c` did not need to use the returned handle from `arc_hdr_realloc()` because it is already referenced in lists. * `spa_vdev_detach()` in `module/zfs/spa.c` has a comment explicitly saying not to use the error from `vdev_label_init()` because whatever causes the error could be the reason why a detach is being done. Unfortunately, I am not presently able to analyze the kernel modules with Clang's static analyzer, so I could have missed some cases of this. In cases where reports were present in code that is duplicated between Linux and FreeBSD, I made a conscious effort to fix the FreeBSD version too. After this commit is merged, regressions like `dee8934` should become extremely obvious with Clang's static analyzer since a regression would appear in the results as the only instance of unused code. That assumes that Coverity does not catch the issue first. My local branch with fixes from all of my outstanding non-draft pull requests shows 118 reports from Clang's static anlayzer after this patch. That is down by 51 from 169. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Cedric Berger <cedric@precidata.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13986	2022-10-14 13:37:54 -07:00
Richard Yao	ab8d9c1783	Cleanup: 64-bit kernel module parameters should use fixed width types Various module parameters such as `zfs_arc_max` were originally `uint64_t` on OpenSolaris/Illumos, but were changed to `unsigned long` for Linux compatibility because Linux's kernel default module parameter implementation did not support 64-bit types on 32-bit platforms. This caused problems when porting OpenZFS to Windows because its LLP64 memory model made `unsigned long` a 32-bit type on 64-bit, which created the undesireable situation that parameters that should accept 64-bit values could not on 64-bit Windows. Upon inspection, it turns out that the Linux kernel module parameter interface is extensible, such that we are allowed to define our own types. Rather than maintaining the original type change via hacks to to continue shrinking module parameters on 32-bit Linux, we implement support for 64-bit module parameters on Linux. After doing a review of all 64-bit kernel parameters (found via the man page and also proposed changes by Andrew Innes), the kernel module parameters fell into a few groups: Parameters that were originally 64-bit on Illumos: * dbuf_cache_max_bytes * dbuf_metadata_cache_max_bytes * l2arc_feed_min_ms * l2arc_feed_secs * l2arc_headroom * l2arc_headroom_boost * l2arc_write_boost * l2arc_write_max * metaslab_aliquot * metaslab_force_ganging * zfetch_array_rd_sz * zfs_arc_max * zfs_arc_meta_limit * zfs_arc_meta_min * zfs_arc_min * zfs_async_block_max_blocks * zfs_condense_max_obsolete_bytes * zfs_condense_min_mapping_bytes * zfs_deadman_checktime_ms * zfs_deadman_synctime_ms * zfs_initialize_chunk_size * zfs_initialize_value * zfs_lua_max_instrlimit * zfs_lua_max_memlimit * zil_slog_bulk Parameters that were originally 32-bit on Illumos: * zfs_per_txg_dirty_frees_percent Parameters that were originally `ssize_t` on Illumos: * zfs_immediate_write_sz Note that `ssize_t` is `int32_t` on 32-bit and `int64_t` on 64-bit. It has been upgraded to 64-bit. Parameters that were `long`/`unsigned long` because of Linux/FreeBSD influence: * l2arc_rebuild_blocks_min_l2size * zfs_key_max_salt_uses * zfs_max_log_walking * zfs_max_logsm_summary_length * zfs_metaslab_max_size_cache_sec * zfs_min_metaslabs_to_flush * zfs_multihost_interval * zfs_unflushed_log_block_max * zfs_unflushed_log_block_min * zfs_unflushed_log_block_pct * zfs_unflushed_max_mem_amt * zfs_unflushed_max_mem_ppm New parameters that do not exist in Illumos: * l2arc_trim_ahead * vdev_file_logical_ashift * vdev_file_physical_ashift * zfs_arc_dnode_limit * zfs_arc_dnode_limit_percent * zfs_arc_dnode_reduce_percent * zfs_arc_meta_limit_percent * zfs_arc_sys_free * zfs_deadman_ziotime_ms * zfs_delete_blocks * zfs_history_output_max * zfs_livelist_max_entries * zfs_max_async_dedup_frees * zfs_max_nvlist_src_size * zfs_rebuild_max_segment * zfs_rebuild_vdev_limit * zfs_unflushed_log_txg_max * zfs_vdev_max_auto_ashift * zfs_vdev_min_auto_ashift * zfs_vnops_read_chunk_size * zvol_max_discard_blocks Rather than clutter the lists with commentary, the module parameters that need comments are repeated below. A few parameters were defined in Linux/FreeBSD specific code, where the use of ulong/long is not an issue for portability, so we leave them alone: * zfs_delete_blocks * zfs_key_max_salt_uses * zvol_max_discard_blocks The documentation for a few parameters was found to be incorrect: * zfs_deadman_checktime_ms - incorrectly documented as int * zfs_delete_blocks - not documented as Linux only * zfs_history_output_max - incorrectly documented as int * zfs_vnops_read_chunk_size - incorrectly documented as long * zvol_max_discard_blocks - incorrectly documented as ulong The documentation for these has been fixed, alongside the changes to document the switch to fixed width types. In addition, several kernel module parameters were percentages or held ashift values, so being 64-bit never made sense for them. They have been downgraded to 32-bit: * vdev_file_logical_ashift * vdev_file_physical_ashift * zfs_arc_dnode_limit_percent * zfs_arc_dnode_reduce_percent * zfs_arc_meta_limit_percent * zfs_per_txg_dirty_frees_percent * zfs_unflushed_log_block_pct * zfs_vdev_max_auto_ashift * zfs_vdev_min_auto_ashift Of special note are `zfs_vdev_max_auto_ashift` and `zfs_vdev_min_auto_ashift`, which were already defined as `uint64_t`, and passed to the kernel as `ulong`. This is inherently buggy on big endian 32-bit Linux, since the values would not be written to the correct locations. 32-bit FreeBSD was unaffected because its sysctl code correctly treated this as a `uint64_t`. Lastly, a code comment suggests that `zfs_arc_sys_free` is Linux-specific, but there is nothing to indicate to me that it is Linux-specific. Nothing was done about that. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Original-patch-by: Andrew Innes <andrew.c12@gmail.com> Original-patch-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13984 Closes #14004	2022-10-13 10:03:29 -07:00
Richard Yao	a6ccb36b94	Add defensive assertions Coverity complains about possible bugs involving referencing NULL return values and division by zero. The division by zero bugs require that a block pointer be corrupt, either from in-memory corruption, or on-disk corruption. The NULL return value complaints are only bugs if assumptions that we make about the state of data structures are wrong. Some seem impossible to be wrong and thus are false positives, while others are hard to analyze. Rather than dismiss these as false positives by assuming we know better, we add defensive assertions to let us know when our assumptions are wrong. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13972	2022-10-12 11:25:18 -07:00
Serapheim Dimitropoulos	e5646c5e37	zvol_wait logic may terminate prematurely Setups that have a lot of zvols may see zvol_wait terminate prematurely even though the script is still making progress. For example, we have a customer that called zvol_wait for ~7100 zvols and by the last iteration of that script it was still waiting on ~2900. Similarly another one called zvol_wait for 2200 and by the time the script terminated there were only 50 left. This patch adjusts the logic to stay within the outer loop of the script if we are making any progress whatsoever. Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #13998	2022-10-11 12:12:04 -07:00
Richard Yao	72c99dc959	Handle possible null pointers from malloc/strdup/strndup() GCC 12.1.1_p20220625's static analyzer caught these. Of the two in the btree test, one had previously been caught by Coverity and Smatch, but GCC flagged it as a false positive. Upon examining how other test cases handle this, the solution was changed from `ASSERT3P(node, !=, NULL);` to using `perror()` to be consistent with the fixes to the other fixes done to the ZTS code. That approach was also used in ZED since I did not see a better way of handling this there. Also, upon inspection, additional unchecked pointers from malloc()/calloc()/strdup() were found in ZED, so those were handled too. In other parts of the code, the existing methods to avoid issues from memory allocators returning NULL were used, such as using `umem_alloc(size, UMEM_NOFAIL)` or returning `ENOMEM`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13979	2022-10-06 17:18:40 -07:00
shodanshok	062d3d056b	Remove ambiguity on demand vs prefetch stats reported by arc_summary arc_summary currently list prefetch stats as "demand prefetch" However, a hit/miss can be due to demand or prefetch, not both. To remove any confusion, this patch removes the "Demand" word from the affected lines. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes #13985	2022-10-04 11:00:02 -07:00
Umer Saleem	d9ac17a57f	Expose libzutil error info in libpc_handle_t In libzutil, for zpool_search_import and zpool_find_config, we use libpc_handle_t internally, which does not maintain error code and it is not exposed in the interface. Due to this, the error information is not propagated to the caller. Instead, an error message is printed on stderr. This commit adds lpc_error field in libpc_handle_t and exposes it in the interface, which can be used by the users of libzutil to get the appropriate error information and handle it accordingly. Users of the API can also control if they want to print the error message on stderr. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #13969	2022-10-04 09:54:35 -07:00
Richard Yao	67395be0c2	Fix userland dereference NULL return value bugs * `zstream_do_token()` does not handle failures from `libzfs_init()` * `ztest_global_vars_to_zdb_args()` does not handle failures from `calloc()`. * `zfs_snapshot_nvl()` will pass an offset to a NULL pointer as a source to `strlcpy()` if the provided nvlist is `NULL`. We handle these by doing what the existing error handling does for other errors involving these functions. Coverity complained about these. It had complained about several more, but one was fixed by `570ca4441e` and another was a false positive. The remaining complaints labelled "dereferece null return vaue" involve fetching things stored in in-kernel data structures via `list_head()/list_next()`, `AVL_PREV()/AVL_NEXT()` and `zfs_btree_find()`. Most of them occur in void functions that have no error handling. They are much harder to analyze than the two fixed in this patch, so they are left for a follow-up patch. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13971	2022-09-30 17:02:57 -07:00
Richard Yao	55d7afa4ad	Reduce false positives from Static Analyzers Both Clang's Static Analyzer and Synopsys' Coverity would ignore assertions. Following Clang's advice, we annotate our assertions: https://clang-analyzer.llvm.org/annotations.html#custom_assertions This makes both Clang's Static Analyzer and Coverity properly identify assertions. This change reduced Clang's reported defects from 246 to 180. It also reduced the false positives reported by Coverityi by 10, while enabling Coverity to find 9 more defects that previously were false negatives. A couple examples of this would be CID-1524417 and CID-1524423. After submitting a build to coverity with the modified assertions, CID-1524417 disappeared while the report for CID-1524423 no longer claimed that the assertion tripped. Coincidentally, it turns out that it is possible to more accurately annotate our headers than the Coverity modelling file permits in the case of format strings. Since we can do that and this patch annotates headers whenever `__coverity_panic__()` would have been used in the model file, we drop all models that use `__coverity_panic__()` from the model file. Upon seeing the success in eliminating false positives involving assertions, it occurred to me that we could also modify our headers to eliminate coverity's false positives involving byte swaps. We now have coverity specific byteswap macros, that do nothing, to disable Coverity's false positives when we do byte swaps. This allowed us to also drop the byteswap definitions from the model file. Lastly, a model file update has been done beyond the mentioned deletions: * The definitions of `umem_alloc_aligned()`, `umem_alloc()` andi `umem_zalloc()` were originally implemented in a way that was intended to inform coverity that when KM_SLEEP has been passed these functions, they do not return NULL. A small error in how this was done was found, so we correct it. * Definitions for umem_cache_alloc() and umem_cache_free() have been added. In practice, no false positives were avoided by making these changes, but in the interest of correctness from future coverity builds, we make them anyway. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13902	2022-09-30 15:30:12 -07:00
Richard Yao	dee8934e8f	Fix unreachable code in zstreamdump `82226e4f44` was intended to prevent a warning from being printed in situations where it was inappropriate, but accidentally disabled it entirely by setting featureflags in the wrong case statement. Coverity reported this as dead code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13946	2022-09-29 10:16:37 -07:00
Richard Yao	1b87195c3c	Fix unchecked return values `2a493a4c71` was intended to fix all instances of coverity reported unchecked return values, but unfortunately, two were missed by mistake. This commit fixes the unchecked return values that had been missed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Neal Gompa <ngompa@datto.com> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13945	2022-09-29 09:02:57 -07:00
Ameer Hamza	55c12724d3	zed: mark disks as REMOVED when they are removed ZED does not take any action for disk removal events if there is no spare VDEV available. Added zpool_vdev_remove_wanted() in libzfs and vdev_remove_wanted() in vdev.c to remove the VDEV through ZED on removal event. This means that if you are running zed and remove a disk, it will be properly marked as REMOVED. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #13797	2022-09-28 09:48:46 -07:00
Richard Yao	a51288aabb	Fix unsafe string operations Coverity caught unsafe use of `strcpy()` in `ztest_dmu_objset_own()`, `nfs_init_tmpfile()` and `dump_snapshot()`. It also caught an unsafe use of `strlcat()` in `nfs_init_tmpfile()`. Inspired by this, I did an audit of every single usage of `strcpy()` and `strcat()` in the code. If I could not prove that the usage was safe, I changed the code to use either `strlcpy()` or `strlcat()`, depending on which function was originally used. In some cases, `snprintf()` was used to replace multiple uses of `strcat` because it was cleaner. Whenever I changed a function, I preferred to use `sizeof(dst)` when the compiler is able to provide the string size via that. When it could not because the string was passed by a caller, I checked the entire call tree of the function to find out how big the buffer was and hard coded it. Hardcoding is less than ideal, but it is safe unless someone shrinks the buffer sizes being passed. Additionally, Coverity reported three more string related issues: * It caught a case where we do an overlapping memory copy in a call to `snprintf()`. We fix that via `kmem_strdup()` and `kmem_strfree()`. * It caught `sizeof (buf)` being used instead of `buflen` in `zdb_nicenum()`'s call to `zfs_nicenum()`, which is passed to `snprintf()`. We change that to pass `buflen`. * It caught a theoretical unterminated string passed to `strcmp()`. This one is likely a false positive, but we have the information needed to do this more safely, so we change this to silence the false positive not just in coverity, but potentially other static analysis tools too. We switch to `strncmp()`. * There was a false positive in tests/zfs-tests/cmd/dir_rd_update.c. We suppress it by switching to `snprintf()` since other static analysis tools might complain about it too. Interestingly, there is a possible real bug there too, since it assumes that the passed directory path ends with '/'. We add a '/' to fix that potential bug. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13913	2022-09-27 16:47:24 -07:00
Richard Yao	fdc2d30371	Cleanup: Specify unsignedness on things that should not be signed In #13871, zfs_vdev_aggregation_limit_non_rotating and zfs_vdev_aggregation_limit being signed was pointed out as a possible reason not to eliminate an unnecessary MAX(unsigned, 0) since the unsigned value was assigned from them. There is no reason for these module parameters to be signed and upon inspection, it was found that there are a number of other module parameters that are signed, but should not be, so we make them unsigned. Making them unsigned made it clear that some other variables in the code should also be unsigned, so we also make those unsigned. This prevents users from setting negative values that could potentially cause bad behaviors. It also makes the code slightly easier to understand. Mostly module parameters that deal with timeouts, limits, bitshifts and percentages are made unsigned by this. Any that are boolean are left signed, since whether booleans should be considered signed or unsigned does not matter. Making zfs_arc_lotsfree_percent unsigned caused a `zfs_arc_lotsfree_percent >= 0` check to become redundant, so it was removed. Removing the check was also necessary to prevent a compiler error from -Werror=type-limits. Several end of line comments had to be moved to their own lines because replacing int with uint_t caused us to exceed the 80 character limit enforced by cstyle.pl. The following were kept signed because they are passed to taskq_create(), which expects signed values and modifying the OpenSolaris/Illumos DDI is out of scope of this patch: * metaslab_load_pct * zfs_sync_taskq_batch_pct * zfs_zil_clean_taskq_nthr_pct * zfs_zil_clean_taskq_minalloc * zfs_zil_clean_taskq_maxalloc * zfs_arc_prune_task_threads Also, negative values in those parameters was found to be harmless. The following were left signed because either negative values make sense, or more analysis was needed to determine whether negative values should be disallowed: * zfs_metaslab_switch_threshold * zfs_pd_bytes_max * zfs_livelist_min_percent_shared zfs_multihost_history was made static to be consistent with other parameters. A number of module parameters were marked as signed, but in reality referenced unsigned variables. upgrade_errlog_limit is one of the numerous examples. In the case of zfs_vdev_async_read_max_active, it was already uint32_t, but zdb had an extern int declaration for it. Interestingly, the documentation in zfs.4 was right for upgrade_errlog_limit despite the module parameter being wrongly marked, while the documentation for zfs_vdev_async_read_max_active (and friends) was wrong. It was also wrong for zstd_abort_size, which was unsigned, but was documented as signed. Also, the documentation in zfs.4 incorrectly described the following parameters as ulong when they were int: * zfs_arc_meta_adjust_restarts * zfs_override_estimate_recordsize They are now uint_t as of this patch and thus the man page has been updated to describe them as uint. dbuf_state_index was left alone since it does nothing and perhaps should be removed in another patch. If any module parameters were missed, they were not found by `grep -r 'ZFS_MODULE_PARAM' \| grep ', INT'`. I did find a few that grep missed, but only because they were in files that had hits. This patch intentionally did not attempt to address whether some of these module parameters should be elevated to 64-bit parameters, because the length of a long on 32-bit is 32-bit. Lastly, it was pointed out during review that uint_t is a better match for these variables than uint32_t because FreeBSD kernel parameter definitions are designed for uint_t, whose bit width can change in future memory models. As a result, we change the existing parameters that are uint32_t to use uint_t. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Neal Gompa <ngompa@datto.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13875	2022-09-27 16:42:41 -07:00
Richard Yao	7584fbe846	Cleanup: Switch to strlcpy from strncpy Coverity found a bug in `zfs_secpolicy_create_clone()` where it is possible for us to pass an unterminated string when `zfs_get_parent()` returns an error. Upon inspection, it is clear that using `strlcpy()` would have avoided this issue. Looking at the codebase, there are a number of other uses of `strncpy()` that are unsafe and even when it is used safely, switching to `strlcpy()` would make the code more readable. Therefore, we switch all instances where we use `strncpy()` to use `strlcpy()`. Unfortunately, we do not portably have access to `strlcpy()` in tests/zfs-tests/cmd/zfs_diff-socket.c because it does not link to libspl. Modifying the appropriate Makefile.am to try to link to it resulted in an error from the naming choice used in the file. Trying to disable the check on the file did not work on FreeBSD because Clang ignores `#undef` when a definition is provided by `-Dstrncpy(...)=...`. We workaround that by explictly including the C file from libspl into the test. This makes things build correctly everywhere. We add a deprecation warning to `config/Rules.am` and suppress it on the remaining `strncpy()` usage. `strlcpy()` is not portably avaliable in tests/zfs-tests/cmd/zfs_diff-socket.c, so we use `snprintf()` there as a substitute. This patch does not tackle the related problem of `strcpy()`, which is even less safe. Thankfully, a quick inspection found that it is used far more correctly than strncpy() was used. A quick inspection did not find any problems with `strcpy()` usage outside of zhack, but it should be said that I only checked around 90% of them. Lastly, some of the fields in kstat_t varied in size by 1 depending on whether they were in userspace or in the kernel. The origin of this discrepancy appears to be `04a479f706` where it was made for no apparent reason. It conflicts with the comment on KSTAT_STRLEN, so we shrink the kernel field sizes to match the userspace field sizes. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13876	2022-09-27 16:35:29 -07:00
Richard Yao	ebe1d03616	Fix userland resource leaks Coverity caught these. With the exception of the file descriptor leak in tests/zfs-tests/cmd/draid.c, they are all memory leaks. Also, there is a piece of dead code in zfs_get_enclosure_sysfs_path(). We delete it as cleanup. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13921	2022-09-23 16:55:26 -07:00
Richard Yao	2a493a4c71	Fix unchecked return values and unused return values Coverity complained about unchecked return values and unused values that turned out to be unused return values. Different approaches were used to handle the different cases of unchecked return values: * cmd/zdb/zdb.c: VERIFY0 was used in one place since the existing code had no error handling. An error message was printed in another to match the rest of the code. * cmd/zed/agents/zfs_retire.c: We dismiss the return value with `(void)` because the value is expected to be potentially unset. * cmd/zpool_influxdb/zpool_influxdb.c: We dismiss the return value with `(void)` because the values are expected to be potentially unset. * cmd/ztest.c: VERIFY0 was used since we want failures if something goes wrong in ztest. * module/zfs/dsl_dir.c: We dismiss the return value with `(void)` because there is no guarantee that the zap entry will always be there. For example, old pools imported readonly would not have it and we do not want to fail here because of that. * module/zfs/zfs_fm.c: `fnvlist_add_()` was used since the allocations sleep and thus can never fail. module/zfs/zvol.c: We dismiss the return value with `(void)` because we do not need it. This matches what is already done in the analogous `zfs_replay_write2()`. * tests/zfs-tests/cmd/draid.c: We suppress one return value with `(void)` since the code handles errors already. The other return value is handled by switching to `fnvlist_lookup_uint8_array()`. * tests/zfs-tests/cmd/file/file_fadvise.c: We add error handling. * tests/zfs-tests/cmd/mmap_sync.c: We add error handling for munmap, but ignore failures on remove() with (void) since it is expected to be able to fail. * tests/zfs-tests/cmd/mmapwrite.c: We add error handling. As for unused return values, they were all in places where there was error handling, so logic was added to handle the return values. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13920	2022-09-23 16:52:03 -07:00
Tony Hutter	e9b12d4196	zpool: Don't print "repairing" on force faulted drives If you force fault a drive that's resilvering, it's scan stats can get frozen in time, giving the false impression that it's being resilvered. This commit checks the vdev state to see if the vdev is healthy before reporting "resilvering" or "repairing" in zpool status. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #13927 Closes #13930	2022-09-23 10:24:19 -07:00
Richard Yao	e506a0ce40	Cleanup: Change 1 used in bitshifts to 1ULL Coverity complains about this. It is not a bug as long as we never shift by more than 31, but it is not terrible to change the constants from 1 to 1ULL as clean up. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13914	2022-09-22 11:28:33 -07:00
Richard Yao	de6c0d3d8c	Fix potential NULL pointer dereference in zfsdle_vdev_online() Coverity complained about this. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Chunwei Chen <david.chen@nutanix.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13903	2022-09-20 15:20:04 -07:00
Richard Yao	f272960d52	Fix usage of zed_log_msg() and zfs_panic_recover() Coverity complained about the format specifiers not matching variables. In one case, the variable is a constant, so we fix it. In another, we were missing an argument (about which coverity also complained). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13888	2022-09-19 17:32:18 -07:00
Tino Reichardt	75e8b5ad84	Fix BLAKE3 tuneable and module loading on Linux and FreeBSD Apply similar options to BLAKE3 as it is done for zfs_fletcher_4_impl. The zfs module parameter on Linux changes from icp_blake3_impl to zfs_blake3_impl. You can check and set it on Linux via sysfs like this: ``` [bash]# cat /sys/module/zfs/parameters/zfs_blake3_impl cycle [fastest] generic sse2 sse41 avx2 [bash]# echo sse2 > /sys/module/zfs/parameters/zfs_blake3_impl [bash]# cat /sys/module/zfs/parameters/zfs_blake3_impl cycle fastest generic [sse2] sse41 avx2 ``` The modprobe module parameters may also be used now: ``` [bash]# modprobe zfs zfs_blake3_impl=sse41 [bash]# cat /sys/module/zfs/parameters/zfs_blake3_impl cycle fastest generic sse2 [sse41] avx2 ``` On FreeBSD the BLAKE3 implementation can be set via sysctl like this: ``` [bsd]# sysctl vfs.zfs.blake3_impl vfs.zfs.blake3_impl: cycle [fastest] generic sse2 sse41 avx2 [bsd]# sysctl vfs.zfs.blake3_impl=sse2 vfs.zfs.blake3_impl: cycle [fastest] generic sse2 sse41 avx2 \ -> cycle fastest generic [sse2] sse41 avx2 ``` This commit changes also some Blake3 internals like these: - blake3_impl_ops_t was renamed to blake3_ops_t - all functions are named blake3_impl_NAME() now Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Co-authored-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13725	2022-09-16 14:25:53 -07:00
Richard Yao	b24d1c77f7	Add zfs_btree_verify_intensity kernel module parameter I see a few issues in the issue tracker that might be aided by being able to turn this on. We have no module parameter for it, so I would like to add one. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13874	2022-09-15 16:22:33 -07:00
Richard Yao	8fdc229a9c	Fix memory leak in ztest Coverity found this. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Neal Gompa <ngompa@datto.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13863	2022-09-13 16:53:21 -07:00
Richard Yao	7195c04d98	Fix file descriptor handling in zdb_copy_object() Coverity found a file descriptor leak. Eyeballing it showed that we had no handling for the `open()` call failing either. We can address both of these at once. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Neal Gompa <ngompa@datto.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13862	2022-09-12 12:34:10 -07:00

1 2 3 4 5 ...

1552 Commits