mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-04-17 08:54:52 +03:00

Author	SHA1	Message	Date
Rob N	baca06c258	libzfs: add v2 iterator interfaces `f6a0dac84` modified the zfs_iter_* functions to take a new "flags" parameter, and introduced a variety of flags to ask the kernel to limit the results in various ways, reducing the amount of work the caller needed to do to filter out things they didn't need. Unfortunately this change broke the ABI for existing clients (read: older versions of the `zfs` program), and was reverted `399b98198`. `dc95911d2` reintroduced the original patch, with the understanding that a backwards-compatible fix would be made before the 2.2 release branch was tagged. This commit is that fix. This introduces zfs_iter_*_v2 functions that have the new flags argument, and reverts the existing functions to not have the flags parameter, as they were before. The old functions are now reimplemented in terms of the new, with flags set to 0. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <george.wilson@delphix.com> Original-patch-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Closes #14597	2023-04-10 11:53:02 -07:00
Paul Dagnelie	b66c2a0899	Storage device expansion "silently" fails on degraded vdev When a vdev is degraded or faulted, we refuse to expand it when doing online -e. However, we also don't actually cause the online command to fail, even though the disk didn't expand. This is confusing and misleading, and can result in violated expectations. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes 14145	2023-04-06 10:29:27 -07:00
Tino Reichardt	6ecdd35bdb	Fix "Add colored output to zfs list" Running `zfs list -o avail rpool` resulted in a core dump. This commit will fix this. Run the needed overhead only, when `use_color()` is true. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #14712	2023-04-05 09:57:01 -07:00
Tino Reichardt	2bd0490faf	Add colored output to zfs list Use a bold header row and colorize the AVAIL column based on the used space percentage of volume. We define these colors: - when > 80%, use yellow - when > 90%, use red Reviewed-by: WHR <msl0000023508@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ethan Coe-Renner <coerenner1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #14621 Closes #14350	2023-03-24 10:24:11 -07:00
Tino Reichardt	7bde396aa2	Colorize zpool iostat output Use a bold header and colorize the space suffixes in iostat by order of magnitude like this: - K is green - M is yellow - G is red - T is lightblue - P is magenta - E is cyan - 0 space is colored gray Reviewed-by: WHR <msl0000023508@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ethan Coe-Renner <coerenner1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #14621 Closes #14459	2023-03-24 10:23:52 -07:00
Richard Yao	d1807f168e	nvpair: Constify string functions After addressing coverity complaints involving `nvpair_name()`, the compiler started complaining about dropping const. This lead to a rabbit hole where not only `nvpair_name()` needed to be constified, but also `nvpair_value_string()`, `fnvpair_value_string()` and a few other static functions, plus variable pointers throughout the code. The result became a fairly big change, so it has been split out into its own patch. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14612	2023-03-14 15:25:50 -07:00
Pawel Jakub Dawidek	67a1b03791	Implementation of block cloning for ZFS Block Cloning allows to manually clone a file (or a subset of its blocks) into another (or the same) file by just creating additional references to the data blocks without copying the data itself. Those references are kept in the Block Reference Tables (BRTs). The whole design of block cloning is documented in module/zfs/brt.c. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net> Closes #13392	2023-03-10 11:59:53 -08:00
Richard Yao	37edc7ea98	Refactor loop in dump_histogram() The current loop triggers a complaint that we are using an array offset prior to a range check from cpp/offset-use-before-range-check when we are actually calculating maximum and minimum values. I was about to file a false positive report with CodeQL, but after looking at how the code is structured, I really cannot blame CodeQL for mistaking this for a range check. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14575	2023-03-08 13:52:20 -08:00
Richard Yao	17443e0b20	Cleanup: Remove constant comparisons reported by CodeQL CodeQL's cpp/constant-comparison query from its security-and-extended query set reported 4 instances where we have comparions that always evaluate the same way. In `draid_config_by_type()`, we have an early `if (nparity == 0)` check that returns `EINVAL`, making a later `if (nparity == 0 \|\| nparity > VDEV_DRAID_MAXPARITY)` partially redundant. The later check prints an error message when parity is 0, but the early check does not. This is not useful feedback, so we move the later check to the place where the early check runs to replace the early check. In `perform_thread_merge()`, we return when `num_threads == 0`. After that block, we do `if (num_threads > 0) {`, which will always be true. We remove the `if` statement. In `sa_modify_attrs()`, we have a loop condition that is `k != 2`, but at the end of the loop, we have `if (k == 0 && hdl->sa_spill)` followed by an else that does a break. The result is that k != 2 will never be evaluated when it is false. We drop the comparison. In `zap_leaf_array_read()`, we have a for loop condition that is `i < ZAP_LEAF_ARRAY_BYTES && len > 0`. However, that loop itself is in a loop that is `while (len > 0)` and while the value of len is decremented inside the loop, when `len == 0`, it will return, such that `len > 0` inside the loop condition will always be true. We drop that part of the condition. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14575	2023-03-08 13:51:46 -08:00
Richard Yao	9368b3877c	Fix TOCTOU race in zpool_do_labelclear() Coverity reported a TOCTOU race in `zpool_do_labelclear()`. This is not believed to be a real security issue, but fixing it reduces the number of syscalls we do and will prevent other static analyzers from complaining about this. The code is expected to be equivalent. However, under rare circumstances, such as ELOOP, ENAMETOOLONG, ENOMEM, ENOTDIR and EOVERFLOW, we will display the error message that we currently display for the `open()` syscall rather than the one that we currently display for the `stat()` syscall. This is considered to be an improvement. Reported-by: Coverity (CID-1524188) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14575	2023-03-08 13:50:51 -08:00
Alexander Motin	a8d83e2a24	More adaptive ARC eviction Traditionally ARC adaptation was limited to MRU/MFU distribution. But for years people with metadata-centric workload demanded mechanisms to also manage data/metadata distribution, that in original ZFS was just a FIFO. As result ZFS effectively got separate states for data and metadata, minimum and maximum metadata limits etc, but it all required manual tuning, was not adaptive and in its heart remained a bad FIFO. This change removes most of existing eviction logic, rewriting it from scratch. This makes MRU/MFU adaptation individual for data and meta- data, same as the distribution between data and metadata themselves. Since most of required states separation was already done, it only required to make arcs_size state field specific per data/metadata. The adaptation logic is still based on previous concept of ghost hits, just now it balances ARC capacity between 4 states: MRU data, MRU metadata, MFU data and MFU metadata. To simplify arc_c changes instead of arc_p measured in bytes, this code uses 3 variable arc_meta, arc_pd and arc_pm, representing ARC balance between metadata and data, MRU and MFU for data, and MRU and MFU for metadata respectively as 32-bit fixed point fractions. Since we care about the math result only when need to evict, this moves all the logic from arc_adapt() to arc_evict(), that reduces per-block overhead, since per-block operations are limited to stats collection, now moved from arc_adapt() to arc_access() and using cheaper wmsums. This also allows to remove ugly ARC_HDR_DO_ADAPT flag from many places. This change also removes number of metadata specific tunables, part of which were actually not functioning correctly, since not all metadata are equal and some (like L2ARC headers) are not really evictable. Instead it introduced single opaque knob zfs_arc_meta_balance, tuning ARC's reaction on ghost hits, allowing administrator give more or less preference to metadata without setting strict limits. Some of old code parts like arc_evict_meta() are just removed, because since introduction of ABD ARC they really make no sense: only headers referenced by small number of buffers are not evictable, and they are really not evictable no matter what this code do. Instead just call arc_prune_async() if too much metadata appear not evictable. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14359	2023-03-08 11:17:23 -08:00
Rob N	b988f32c70	Better handling for future crypto parameters The intent is that this is like ENOTSUP, but specifically for when something can't be done because we have no support for the requested crypto parameters; eg unlocking a dataset or receiving a stream encrypted with a suite we don't support. Its not intended to be recoverable without upgrading ZFS itself. If the request could be made to work by enabling a feature or modifying some other configuration item, then some other code should be used. load-key: In the future we might have more crypto suites (ie new values for the `encryption` property. Right now trying to load a key on such a future crypto suite will look up suite parameters off the end of the crypto table, resulting in misbehaviour and/or crashes (or, with debug enabled, trip the assertion in `zio_crypt_key_unwrap`). Instead, lets check the value we got from the dataset, and if we can't handle it, abort early. recv: When receiving a raw stream encrypted with an unknown crypto suite, `zfs recv` would report a generic `invalid backup stream` (EINVAL). While technically correct, its not super helpful, so lets ship a more specific error code and message. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #14577	2023-03-07 14:05:14 -08:00
George Amanakis	12a240ac0b	Fix a typo in `ac2038a` Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #14585 Closes #14592	2023-03-07 13:50:44 -08:00
Richard Yao	bc4d210783	Fix memory leak in ztest This is tripping LeakSanitizer, which causes zloop test failures on pull requests. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14583	2023-03-06 15:30:29 -08:00
Tino Reichardt	f9f9bef22f	Update BLAKE3 for using the new impl handling This commit changes the BLAKE3 implementation handling and also the calls to it from the ztest command. Tested-by: Rich Ercolani <rincebrain@gmail.com> Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13741	2023-03-02 13:52:27 -08:00
Rob N	163f3d3a1f	zdb: add decryption support The approach is straightforward: for dataset ops, if a key was offered, find the encryption root and the various encryption parameters, derive a wrapping key if necessary, and then unlock the encryption root. After that all the regular dataset ops will return unencrypted data, and that's kinda the whole thing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #11551 Closes #12707 Closes #14503	2023-03-02 13:39:09 -08:00
Paul Dagnelie	d9e64a4030	Improve error message of zfs redact We improve the error message of zfs redact by checking if the target snapshot exists, and if all the redaction snapshots exist. As a future improvement we could iterate over every snapshot provided and use that to determine which one specifically doesn't exist. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #11426 Closes #14496	2023-02-21 17:30:05 -08:00
Rob N ★	ac7648179c	zdb: zero-pad checksum output The leading zeroes are part of the checksum so we should show them. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #14464	2023-02-07 13:48:22 -08:00
George Amanakis	ac2038a19c	Teach zdb about DMU_OT_ERROR_LOG objects With the persistent error log feature we need to account for spa_errlog_{scrub, last} containing mappings to other error log objects, which need to be marked as in-use as well. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #14442 Closes #14434	2023-02-02 15:17:37 -08:00
rob-wing	326f1e3d88	zfs_main.c: fix unused variable error with GCC zfs_setproctitle_init() is stubbed out on FreeBSD. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Rob Wing <rob.fx907@gmail.com> Closes #14441	2023-02-02 15:16:40 -08:00
Ameer Hamza	05b72415d1	Fix console progress reporting for recursive send After commit `19d3961`, progress reporting (-v) with replication flag enabled does not report the progress on the console. This commit fixes the issue by updating the logic to check for pa->progress instead of pa_verbosity in send_progress_thread(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14448	2023-02-02 15:09:57 -08:00
Brian Behlendorf	c85ac731a0	Improve resilver ETAs When resilvering the estimated time remaining is calculated using the average issue rate over the current pass. Where the current pass starts when a scan was started, or restarted, if the pool was exported/imported. For dRAID pools in particular this can result in wildly optimistic estimates since the issue rate will be very high while scanning when non-degraded regions of the pool are scanned. Once repair I/O starts being issued performance drops to a realistic number but the estimated performance is still significantly skewed. To address this we redefine a pass such that it starts after a scanning phase completes so the issue rate is more reflective of recent performance. Additionally, the zfs_scan_report_txgs module option can be set to reset the pass statistics more often. Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14410	2023-01-25 11:28:54 -08:00
Richard Yao	73968defdd	Reject streams that set ->drr_payloadlen to unreasonably large values In the zstream code, Coverity reported: "The argument could be controlled by an attacker, who could invoke the function with arbitrary values (for example, a very high or negative buffer size)." It did not report this in the kernel. This is likely because the userspace code stored this in an int before passing it into the allocator, while the kernel code stored it in a uint32_t. However, this did reveal a potentially real problem. On 32-bit systems and systems with only 4GB of physical memory or less in general, it is possible to pass a large enough value that the system will hang. Even worse, on Linux systems, the kernel memory allocator is not able to support allocations up to the maximum 4GB allocation size that this allows. This had already been limited in userspace to 64MB by `ZFS_SENDRECV_MAX_NVLIST`, but we need a hard limit in the kernel to protect systems. After some discussion, we settle on 256MB as a hard upper limit. Attempting to receive a stream that requires more memory than that will result in E2BIG being returned to user space. Reported-by: Coverity (CID-1529836) Reported-by: Coverity (CID-1529837) Reported-by: Coverity (CID-1529838) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14285	2023-01-23 13:16:22 -08:00
rob-wing	69f024a56e	Configure zed's diagnosis engine with vdev properties Introduce four new vdev properties: checksum_n checksum_t io_n io_t These properties can be used for configuring the thresholds of zed's diagnosis engine and are interpeted as <N> events in T <seconds>. When this property is set to a non-default value on a top-level vdev, those thresholds will also apply to its leaf vdevs. This behavior can be overridden by explicitly setting the property on the leaf vdev. Note that, these properties do not persist across vdev replacement. For this reason, it is advisable to set the property on the top-level vdev instead of the leaf vdev. The default values for zed's diagnosis engine (10 events, 600 seconds) remains unchanged. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Sponsored-by: Seagate Technology LLC Closes #13805	2023-01-23 13:14:25 -08:00
Ameer Hamza	19d3961589	Use setproctitle to report progress of zfs send This allows parsing of zfs send progress by checking the process title. Doing so requires some changes to the send code in libzfs_sendrecv.c; primarily these changes move some of the accounting around, to allow for the code to be verbose as normal, or set the process title. Unlike BSD, setproctitle() isn't standard in Linux; thus, borrowed it from libbsd with slight modifications. Authored-by: Sean Eric Fagan <sef@FreeBSD.org> Co-authored-by: Ryan Moeller <ryan@iXsystems.com> Co-authored-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14376	2023-01-17 10:17:35 -08:00
Rob Wing	7a85f58db6	zpool-set: print error message when pool or vdev is not valid Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Sponsored-by: Seagate Technology Submitted-by: Klara, Inc. Closes #14310	2023-01-17 09:47:24 -08:00
Rob Wing	a0276f7048	zpool-set: update usage text Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Sponsored-by: Seagate Technology Submitted-by: Klara, Inc. Closes #14310	2023-01-17 09:46:05 -08:00
rob-wing	6f2ffd272c	zpool: do guid-based comparison in is_vdev_cb() is_vdev_cb() uses string comparison to find a matching vdev and will fallback to comparing the guid via a string. These changes drop the string comparison and compare the guids instead. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Sponsored-by: Seagate Technology Submitted-by: Klara, Inc. Closes #14311	2023-01-11 15:14:35 -08:00
Brian Behlendorf	0c8fbe5b6a	ztest: update ztest_dmu_snapshot_create_destroy() ECHRNG is returned when the channel program encounters a runtime error. For example, this can happen when a snapshot doesn't exist. We handle this error the same way as the existing EEXIST and ENOENT error checks. Additionally, improve the internal debug message to include the error describing why a pool couldn't be opened. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14351	2023-01-10 13:27:48 -08:00
Brian Behlendorf	549aafb7c8	ztest: ztest_dsl_prop_set_uint64() ENOSPC consistency It is possible for ztest_dsl_prop_set_uint64() to fail with ENOSPC and this needs to be handled consistently. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14351	2023-01-10 13:27:48 -08:00
Brian Behlendorf	f7788883ab	ztest: reduce `zpool split` frequency There's no need to so aggressively test splitting a pool. Reduce the occurence of this test to once every 10 seconds. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14351	2023-01-10 13:27:48 -08:00
Brian Behlendorf	4208a052c2	ztest: update expectation for sparing a special device Commit `c23738c70e` modified the expected behavior of attach to prevent hot spares from being used as special vdev replacements. We update ztest's expectations accordingly to prevent it from failing when testing the updated behavior. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14351	2023-01-10 13:26:44 -08:00
Ameer Hamza	5091867ee6	zed: add hotplug support for spare vdevs This commit supports for spare vdev hotplug. The spare vdev associated with all the pools will be marked as "Removed" when the drive is physically detached and will become "Available" when the drive is reattached. Currently, the spare vdev status does not change on the drive removal and the same is the case with reattachment. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14295	2023-01-09 12:43:03 -08:00
Alexander Motin	792a6ee462	Update arc_summary and arcstat outputs Recent ARC commits added new statistic counters, such as iohits, uncached state, etc. Represent those. Also some of previously reported numbers were confusing or even made no sense. Cleanup and restructure existing reports. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Issue #14115 Issue #14123 Issue #14243 Closes #14320	2023-01-05 09:29:13 -08:00
Matthew Ahrens	018f26041d	deadlock between spa_errlog_lock and dp_config_rwlock There is a lock order inversion deadlock between `spa_errlog_lock` and `dp_config_rwlock`: A thread in `spa_delete_dataset_errlog()` is running from a sync task. It is holding the `dp_config_rwlock` for writer (see `dsl_sync_task_sync()`), and waiting for the `spa_errlog_lock`. A thread in `dsl_pool_config_enter()` is holding the `spa_errlog_lock` (see `spa_get_errlog_size()`) and waiting for the `dp_config_rwlock` (as reader). Note that this was introduced by #12812. This commit address this by defining the lock ordering to be dp_config_rwlock first, then spa_errlog_lock / spa_errlist_lock. spa_get_errlog() and spa_get_errlog_size() can acquire the locks in this order, and then process_error_block() and get_head_and_birth_txg() can verify that the dp_config_rwlock is already held. Additionally, a buffer overrun in `spa_get_errlog()` is corrected. Many code paths didn't check if `*count` got to zero, instead continuing to overwrite past the beginning of the userspace buffer at `uaddr`. Tested by having some errors in the pool (via `zinject -t data /path/to/file`), one thread running `zpool iostat 0.001`, and another thread runs `zfs destroy` (in a loop, although it hits the first time). This reproduces the problem easily without the fix, and works with the fix. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: George Amanakis <gamanakis@gmail.com> Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #14239 Closes #14289	2022-12-22 11:48:49 -08:00
Brian Behlendorf	b4cd4fe1aa	Revert "zdb: zdb_ddt_leak_init() reads uninitialized memory..." This reverts commit `d30db519af`. With this change applied zloop.sh fails reliably with the following ASSERT. zio_wait(zio_claim(NULL, zcb->zcb_spa, refcnt ? 0 : spa_min_claim_txg( zcb->zcb_spa), bp, NULL, NULL, ZIO_FLAG_CANFAIL)) == 0 (0x2 == 0x0) ASSERT at cmd/zdb/zdb.c:5452:zdb_count_block() Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14306	2022-12-21 09:17:00 -08:00
Allan Jude	dc95911d21	zfs list: Allow more fields in ZFS_ITER_SIMPLE mode If the fields to be listed and sorted by are constrained to those populated by dsl_dataset_fast_stat(), then zfs list is much faster, as it does not need to open each objset and reads its properties. A previous optimization by Pawel Dawidek (`0cee24064a`) took advantage of this to make listing snapshot names sorted only by name much faster. However, it was limited to `-o name -s name`, this work extends this optimization to work with: - name - guid - createtxg - numclones - inconsistent - redacted - origin and could be further extended to any other properties supported by dsl_dataset_fast_stat() or similar, that do not require extra locking or reading from disk. This was committed before (9a9e2e343dfa2af28bf7910de77ae73aa006de62), but was reverted due to a regression when used with an older kernel. If the kernel does not populate zc->zc_objset_stats, we now fallback to getting the properties via the slower interface, to avoid problems with newer userland and older kernels. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com> Closes #14110	2022-12-13 17:27:54 -08:00
Marcel Menzel	70ac2654f5	Change ZEVENT_POOL_GUID to ZEVENT_POOL to display pool names Outgoing mails for ZFS pool events include the pool GUID, but not the actual pool name. Let's change this for better readability, as it is already done in the mails for finished pool resilvers. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Marcel Menzel <mail@mcl.gg> Closes #14272	2022-12-13 17:26:10 -08:00
Richard Yao	d31a7cb4fa	Address theoretical uninitialized variable usage in zstream Coverity has long complained about the checksum being uninitialized if an END record is processed before its BEGIN record. This should not happen, but there was no code to check for it. I had left this unfixed since it was a low priority issue, but then `9f4ede63d2` added another instance of this. I am making an effort to "hold the line" to keep new coverity defect reports from going unaddressed, so I find myself forced to fix this much earlier than I had originally planned to address it. The solution is to maintain a counter and a flag. Then use VERIFY statements to verify the following runtime constraints: * Every record either has a corresponding BEGIN record, is a BEGIN record or is the end of stream END record for replication streams. * BEGIN records cannot be nested. i.e. There must be an END record before another BEGIN record may be seen. Failure to meet these constraints will cause the program to exit. This is sufficient to ensure that the checksum is never accessed when uninitialized. Reported-by: Coverity (CID 1524578) Reported-by: Coverity (CID 1524633) Reported-by: Coverity (CID 1527295) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14176	2022-12-12 10:40:05 -08:00
Richard Yao	f954ea26a6	zdb: Handle theoretical buffer overflow when printing float CodeQL pointed out that for extreme floating point values, `sprintf()` will overwrite a 32 character buffer. It cited 1e304 as an example, which causes `sprintf()` to print 308 characters. In practice, the numbers should never exceed 100, so this should not happen. To silence the warning and also handle unexpected situations, we change the code to use `snprintf()`. This was missed during my audit of our use of `sprintf()`, since I did not think to consider extreme floating point representations. It also really should not happen, so this change is purely defensive programming. This was found by CodeQL's cpp/overrunning-write-with-float check. Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14264	2022-12-08 14:15:15 -08:00
Richard Yao	d30db519af	zdb: zdb_ddt_leak_init() reads uninitialized memory when birth == 0 This was written by Jeff Bonick and was committed to OpenSolaris on November 1, 2009. It appears that Jeff meant to continue the outer loop iteration when `ddp->ddp_phys_birth == 0`, but put his check inside the inner loop. This causes a pointer to uninitialized memory to be passed to ddt_lookup() inside a VERIFY() statement whenever that condition is true. Reported-by: Coverity (CID 1524462) Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14264	2022-12-08 14:15:10 -08:00
Richard Yao	2709ace096	ztest: comparisons against errno should not assign to it `888914486e` introduced this regression. I used cscope to verify that there are no other instances of this in the codebase. This is the one of the few bugs that are extremely easy to identify using cscope. Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14264	2022-12-08 14:15:04 -08:00
Richard Yao	ba87ed1410	Fix potential buffer overflow in zpool command The ZPOOL_SCRIPTS_PATH environment variable can be passed here. This allows for arbitrarily long strings to be passed to sprintf(), which can overflow the buffer. I missed this in my earlier audit of the codebase. CodeQL's cpp/unbounded-write check caught this. Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14264	2022-12-08 14:14:30 -08:00
Richard Yao	ecccaede68	zdb: Fix big parameter passed by value This is not in performance critical code, but static analyzers will complain about it, so lets switch to pass by pointer here. Reported-by: Coverity (CID-1524384) Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14263	2022-12-08 13:52:53 -08:00
Richard Yao	aaa9a6700f	Cleanup: zhack should not declare function prototypes in main() Instead, it should include the proper header. CodeQL caught this. Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14263	2022-12-08 13:51:24 -08:00
szubersk	3c1e1933b6	Fix GCC 12 compilation errors Squelch false positives reported by GCC 12 with UBSan. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #14150	2022-11-30 13:45:53 -08:00
Richard Yao	887fb37843	zdb: Silence Coverity complaint about verify_livelist_allocs() svb is declared on the stack. We then set parts of svb.svb_dva with DVA_SET_VDEV(), DVA_SET_OFFSET() and DVA_SET_ASIZE(). However, the DVA contains other fields for pad, GRID and G. When setting the fields we use, we technically read uninitialized bits from the fields we do not use. This makes Coverity and Clang's Static Analyzer complain. Presumably, other static analyzers might complain too. There is no real bug here, but we are still technically reading undefined data and unless we stop doing that, static analyzers will complain about it in perpetuum and this could obscure real issues. We silence the static analyzer complaints by using a 0 struct initializer. Reported by: Coverity (CID 1524627) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14210	2022-11-29 10:00:45 -08:00
Ameer Hamza	e996c502e4	zed: unclean disk attachment faults the vdev If the attached disk already contains a vdev GUID, it means the disk is not clean. In such a scenario, the physical path would be a match that makes the disk faulted when trying to online it. So, we would only want to proceed if either GUID matches with the last attached disk or the disk is in a clean state. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14181	2022-11-29 09:24:10 -08:00
наб	3069872ef5	cmd: zfs: fix missing mention of zfs diff -h Fixes: `344bbc82e7` Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #14224	2022-11-28 13:37:07 -08:00
Paul Dagnelie	9f4ede63d2	Add ability to recompress send streams with new compression algorithm As new compression algorithms are added to ZFS, it could be useful for people to recompress data with new algorithms. There is currently no mechanism to do this aside from copying the data manually into a new filesystem with the new algorithm enabled. This tool allows the transformation to happen through zfs send, allowing it to be done efficiently to remote systems and in an incremental fashion. A new zstream command is added that decompresses WRITE records and then recompresses them with a provided algorithm, and then re-emits the modified send stream. It may also be possible to re-compress embedded block pointers, but that was not attempted for the initial version. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #14106	2022-11-10 15:23:46 -08:00

1 2 3 4 5 ...

1391 Commits