mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-04-17 08:54:52 +03:00

Author	SHA1	Message	Date
Richard Yao	f9e109223b	Suppress Clang Static Analyzer warning in dbuf_dnode_findbp() Clang's static analyzer reports that if a `blkid == DMU_SPILL_BLKID` is passed, then we can have a NULL pointer dereference when either ->dn_have_spill or `DNODE_FLAG_SPILL_BLKPTR` is not set. This should not happen. We add an `ASSERT()` to suppress reports about NULL pointer dereferences. Originally, I wanted to use one or two IMPLY statements on pre-conditions before the call to `dbuf_findbp()`, but Clang's static analyzer did not understand it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14575	2023-03-08 13:51:36 -08:00
Richard Yao	399bb81607	Suppress Clang Static Analyzer warning in vdev_split() Clang's static analyzer pointed out that we can have a NULL pointer dereference if we ever attempt to split a vdev that has only 1 child. If that happens, we are left with zero children, but then try to access a non-existent child. Calling vdev_split() on a vdev with only 1 child should be impossible due to how the code is structured. If this ever happens, it would be best to stop execution immediately even in a production environment to allow for the best possible chance of recovery by an expert, so we use `VERIFY3U()` instead of `ASSERT3U()`. Unfortunately, while that defensive assertion will prevent execution from ever reaching the NULL pointer dereference, Clang's static analyzer does not realize that, so we add an `ASSERT()` to inform it of this. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14575	2023-03-08 13:51:31 -08:00
Richard Yao	0b831cabc6	Suppress Clang Static Analyzer warning about SNPRINTF_BLKPTR() Clang's static analyzer pointed out that if we can pass a -1 array index to copyname[copies] if there are no valid DVAs. This is an absurd situation, but it suggests that we are missing an assertion, so we add it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14575	2023-03-08 13:51:26 -08:00
Richard Yao	a4240a8ac7	Suppress Clang Static Analyzer false positive in the AVL tree code. This has been filed as llvm/llvm-project#60694. Switching from a copy through a C pointer dereference to an explicit memcpy() is a workaround that prevents a false positive. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14575	2023-03-08 13:51:21 -08:00
Richard Yao	8b72dfed11	Suppress Clang Static Analyzer defect report in abd_get_size() Clang's static analyzer reports a possible NULL pointer dereference in abd_get_size() when called from vdev_draid_map_alloc_write() called from vdev_draid_map_alloc_row() and vdc->vdc_nparity == 0. This should be impossible, so we add an assertion to silence the defect report. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14575	2023-03-08 13:51:09 -08:00
Richard Yao	9368b3877c	Fix TOCTOU race in zpool_do_labelclear() Coverity reported a TOCTOU race in `zpool_do_labelclear()`. This is not believed to be a real security issue, but fixing it reduces the number of syscalls we do and will prevent other static analyzers from complaining about this. The code is expected to be equivalent. However, under rare circumstances, such as ELOOP, ENAMETOOLONG, ENOMEM, ENOTDIR and EOVERFLOW, we will display the error message that we currently display for the `open()` syscall rather than the one that we currently display for the `stat()` syscall. This is considered to be an improvement. Reported-by: Coverity (CID-1524188) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14575	2023-03-08 13:50:51 -08:00
Alexander Motin	a8d83e2a24	More adaptive ARC eviction Traditionally ARC adaptation was limited to MRU/MFU distribution. But for years people with metadata-centric workload demanded mechanisms to also manage data/metadata distribution, that in original ZFS was just a FIFO. As result ZFS effectively got separate states for data and metadata, minimum and maximum metadata limits etc, but it all required manual tuning, was not adaptive and in its heart remained a bad FIFO. This change removes most of existing eviction logic, rewriting it from scratch. This makes MRU/MFU adaptation individual for data and meta- data, same as the distribution between data and metadata themselves. Since most of required states separation was already done, it only required to make arcs_size state field specific per data/metadata. The adaptation logic is still based on previous concept of ghost hits, just now it balances ARC capacity between 4 states: MRU data, MRU metadata, MFU data and MFU metadata. To simplify arc_c changes instead of arc_p measured in bytes, this code uses 3 variable arc_meta, arc_pd and arc_pm, representing ARC balance between metadata and data, MRU and MFU for data, and MRU and MFU for metadata respectively as 32-bit fixed point fractions. Since we care about the math result only when need to evict, this moves all the logic from arc_adapt() to arc_evict(), that reduces per-block overhead, since per-block operations are limited to stats collection, now moved from arc_adapt() to arc_access() and using cheaper wmsums. This also allows to remove ugly ARC_HDR_DO_ADAPT flag from many places. This change also removes number of metadata specific tunables, part of which were actually not functioning correctly, since not all metadata are equal and some (like L2ARC headers) are not really evictable. Instead it introduced single opaque knob zfs_arc_meta_balance, tuning ARC's reaction on ghost hits, allowing administrator give more or less preference to metadata without setting strict limits. Some of old code parts like arc_evict_meta() are just removed, because since introduction of ABD ARC they really make no sense: only headers referenced by small number of buffers are not evictable, and they are really not evictable no matter what this code do. Instead just call arc_prune_async() if too much metadata appear not evictable. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14359	2023-03-08 11:17:23 -08:00
Attila Fülöp	8d9752569b	ICP: AES-GCM: Unify gcm_init_ctx() and gmac_init_ctx() gmac_init_ctx() duplicates most of the code in gcm_int_ctx() while it just needs to set its own IV length and AAD tag length. Introduce gcm_init_ctx_impl() which handles the GCM and GMAC differences while reusing the duplicated code. While here, fix a flaw where the AVX implementation would accept a context using a byte swapped key schedule which it could not handle. Also constify the IV and AAD pointers passed to gcm_init{,_avx}(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #14529	2023-03-08 11:12:15 -08:00
Richard Yao	7d638df09b	Do not hold spa_config in ZIL while blocked on IO Otherwise, we can get a deadlock that looks like this: 1. fsync() grabs spa_config_enter(zilog->zl_spa, SCL_STATE, lwb, RW_READER) as part of zil_lwb_write_issue() . It then blocks on the txg_sync when a flush fails from a drive power cycling. 2. The txg_sync then blocks on the pool suspending due to the loss of too many disks. 3. zpool clear then blocks on spa_config_enter(spa, SCL_STATE \| SCL_L2ARC \| SCL_ZIO, spa, RW_WRITER) because it is a writer. The disks cannot be brought online due to fsync() holding that lock and the user gets upset since fsync() is uninterruptibly blocked inside the kernel. We need to grab the lock for vdev_lookup_top(), but we do not need to hold it while there is outstanding IO. This fixes a regression introduced by `1ce23dcaff`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@klarasystems.com> Sponsored-By: Wasabi Technology, Inc. Closes #14519	2023-03-07 16:12:28 -08:00
Low-power	1f196e3107	Fix build for Linux/powerpc without CONFIG_ALTIVEC or CONFIG_VSX This fixes building ZFS for Linux 4.7+ powerpc* architecture, where Linux was configured without CONFIG_ALTIVEC or CONFIG_VSX. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: WHR <msl0000023508@gmail.com> Closes #14591	2023-03-07 14:06:52 -08:00
Rob N	b988f32c70	Better handling for future crypto parameters The intent is that this is like ENOTSUP, but specifically for when something can't be done because we have no support for the requested crypto parameters; eg unlocking a dataset or receiving a stream encrypted with a suite we don't support. Its not intended to be recoverable without upgrading ZFS itself. If the request could be made to work by enabling a feature or modifying some other configuration item, then some other code should be used. load-key: In the future we might have more crypto suites (ie new values for the `encryption` property. Right now trying to load a key on such a future crypto suite will look up suite parameters off the end of the crypto table, resulting in misbehaviour and/or crashes (or, with debug enabled, trip the assertion in `zio_crypt_key_unwrap`). Instead, lets check the value we got from the dataset, and if we can't handle it, abort early. recv: When receiving a raw stream encrypted with an unknown crypto suite, `zfs recv` would report a generic `invalid backup stream` (EINVAL). While technically correct, its not super helpful, so lets ship a more specific error code and message. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #14577	2023-03-07 14:05:14 -08:00
George Amanakis	12a240ac0b	Fix a typo in `ac2038a` Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #14585 Closes #14592	2023-03-07 13:50:44 -08:00
Andriy Gapon	a55254be7a	[FreeBSD] fix false assert in cache_vop_rmdir when replaying ZIL The assert is enabled when DEBUG_VFS_LOCKS kernel option is set. The exact panic is: panic: condition seqc_in_modify(_vp->v_seqc) not met It happens because seqc protocol is not followed for ZIL replay. But we actually do not need to make any namecache calls at that stage, because the namecache use is not enabled until after the replay is completed. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Andriy Gapon <avg@FreeBSD.org> Closes #14566	2023-03-07 13:48:43 -08:00
Attila Fülöp	1191387012	spl: Add cmn_err_once() to log a message only on the first call Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #14567	2023-03-07 13:44:11 -08:00
q66	4628bb9c60	initramfs: fix zpool get argument order When using the zfs initramfs scripts on my system, I get various errors at initramfs stage, such as: cannot open '-o': name must begin with a letter My zfs binaries are compiled with musl libc, which may be why this happens. In any case, fix the argument order to make the zpool binary happy, and to match its --help output. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Daniel Kolesa <daniel@octaforge.org> Closes #14572	2023-03-06 17:07:01 -08:00
Tino Reichardt	84a1c48c86	Fix detection of IBM Power8 machines (ISA 2.07) An IBM POWER7 system with Power ISA 2.06 tried to execute zfs_sha256_power8() - which should only be run on ISA 2.07 machines. The detection is implemented via the zfs_isa207_available() call, but this check was not used. This pull request will fix this. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Low-power <msl0000023508@gmail.com> Closes #14576	2023-03-06 17:01:01 -08:00
Andriy Gapon	28bf26acb6	[FreeBSD] zfs_znode_alloc: lock the vnode earlier This is needed because of a possible error path where zfs_vnode_forget() is called. That function calls vgone() and vput(), the former requires the vnode to be exclusively locked and the latter expects it to be locked. It should be safe to lock the vnode as early as possible because it is not yet visible, so there is no interaction with other locks. While here, remove a tautological assignment to 'vp'. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Andriy Gapon <avg@FreeBSD.org> Closes #14565	2023-03-06 16:30:54 -08:00
George Amanakis	ca9e32d3a7	Optimize the is_l2cacheable functions by placing the most common use case (no special vdevs) first and avoid allocating new variables. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #14494 Closes #14563	2023-03-06 16:13:05 -08:00
Richard Yao	bc4d210783	Fix memory leak in ztest This is tripping LeakSanitizer, which causes zloop test failures on pull requests. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14583	2023-03-06 15:30:29 -08:00
Richard Yao	b79e7114bb	Add missing increment to dsl_deadlist_move_bpobj() `dc5c8006f6` was recently merged to prefetch up to 128 deadlists. Unfortunately, a loop was missing an increment, such that it will prefetch all deadlists. The performance properties of that patch probably should be re-evaluated. This was caught by CodeQL's cpp/constant-comparison check in an experimental branch where I am testing the security-and-extended queries. It complained about the `i < 128` part of the loop condition always evaluating to the same thing. The standard CodeQL configuration we use missed this because it does not include that check. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14573	2023-03-06 15:28:26 -08:00
Richard Yao	8846139b45	SHA2Init() should use signed assertions when checking an enum The recent `4c5fec01a4` commit caused Coverity to report that ASSERT3U(algotype, >=, SHA256_MECH_INFO_TYPE); is always true. That is because the signed algotype and signed SHA256_MECH_INFO_TYPE values were cast to unsigned types. To fix this, we switch the assertions to use ASSERT3S(), which retains the signedness of the original values for the comparison. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reported-by: Coverity (CID-1535300) Closes #14573	2023-03-06 15:26:43 -08:00
Jorgen Lundman	47119d60ef	Restore ASMABI and other Unify work Make sure all SHA2 transform function has wrappers For ASMABI to work, it is required the calling convention is consistent. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Joergen Lundman <lundman@lundman.net> Closes #14569	2023-03-06 15:24:05 -08:00
Tino Reichardt	620a977f22	Use SECTION_STATIC macro for sha2 x86_64 assembly - instead of ".section .rodata" we should use SECTION_STATIC Tested-by: Rich Ercolani <rincebrain@gmail.com> Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13741	2023-03-02 13:52:33 -08:00
Tino Reichardt	f9f9bef22f	Update BLAKE3 for using the new impl handling This commit changes the BLAKE3 implementation handling and also the calls to it from the ztest command. Tested-by: Rich Ercolani <rincebrain@gmail.com> Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13741	2023-03-02 13:52:27 -08:00
Tino Reichardt	4c5fec01a4	Add generic implementation handling and SHA2 impl The skeleton file module/icp/include/generic_impl.c can be used for iterating over different implementations of algorithms. It is used by SHA256, SHA512 and BLAKE3 currently. The Solaris SHA2 implementation got replaced with a version which is based on public domain code of cppcrypto v0.10. These assembly files are taken from current openssl master: - sha256-x86_64.S: x64, SSSE3, AVX, AVX2, SHA-NI (x86_64) - sha512-x86_64.S: x64, AVX, AVX2 (x86_64) - sha256-armv7.S: ARMv7, NEON, ARMv8-CE (arm) - sha512-armv7.S: ARMv7, NEON (arm) - sha256-armv8.S: ARMv7, NEON, ARMv8-CE (aarch64) - sha512-armv8.S: ARMv7, ARMv8-CE (aarch64) - sha256-ppc.S: Generic PPC64 LE/BE (ppc64) - sha512-ppc.S: Generic PPC64 LE/BE (ppc64) - sha256-p8.S: Power8 ISA Version 2.07 LE/BE (ppc64) - sha512-p8.S: Power8 ISA Version 2.07 LE/BE (ppc64) Tested-by: Rich Ercolani <rincebrain@gmail.com> Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13741	2023-03-02 13:52:21 -08:00
Tino Reichardt	ac678c8eee	Add SHA2 SIMD feature tests for libspl These are added via HWCAP interface: - zfs_neon_available() for arm and aarch64 - zfs_sha256_available() for arm and aarch64 - zfs_sha512_available() for aarch64 This one via cpuid() call: - zfs_shani_available() for x86_64 Tested-by: Rich Ercolani <rincebrain@gmail.com> Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13741	2023-03-02 13:52:15 -08:00
Tino Reichardt	cef1253135	Add SHA2 SIMD feature tests for Linux These are added: - zfs_neon_available() for arm and aarch64 - zfs_sha256_available() for arm and aarch64 - zfs_sha512_available() for aarch64 - zfs_shani_available() for x86_64 Tested-by: Rich Ercolani <rincebrain@gmail.com> Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Co-Authored-By: Sebastian Gottschall <s.gottschall@dd-wrt.com> Closes #13741	2023-03-02 13:52:04 -08:00
Tino Reichardt	589143c225	Add SHA2 SIMD feature tests for FreeBSD These are added: - zfs_neon_available() for arm and aarch64 - zfs_sha256_available() for arm and aarch64 - zfs_sha512_available() for aarch64 - zfs_shani_available() for x86_64 Changes: - simd_powerpc.h: change license from CDDL to BSD Tested-by: Rich Ercolani <rincebrain@gmail.com> Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13741	2023-03-02 13:51:56 -08:00
Tino Reichardt	6723d1110f	Add ARM architecture to OpenZFS buildsystem Tested-by: Rich Ercolani <rincebrain@gmail.com> Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13741	2023-03-02 13:51:50 -08:00
Tino Reichardt	3e254aaad0	Remove old or redundant SHA2 files We had three sha2.h headers in different places. The FreeBSD version, the Linux version and the generic solaris version. The only assembly used for acceleration was some old x86-64 openssl implementation for sha256 within the icp module. For FreeBSD the whole SHA2 files of FreeBSD were copied into OpenZFS, these files got removed also. Tested-by: Rich Ercolani <rincebrain@gmail.com> Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13741	2023-03-02 13:50:21 -08:00
Rob N	163f3d3a1f	zdb: add decryption support The approach is straightforward: for dataset ops, if a key was offered, find the encryption root and the various encryption parameters, derive a wrapping key if necessary, and then unlock the encryption root. After that all the regular dataset ops will return unencrypted data, and that's kinda the whole thing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #11551 Closes #12707 Closes #14503	2023-03-02 13:39:09 -08:00
Alexander Motin	5f42d1dbf2	System-wide speculative prefetch limit. With some pathological access patterns it is possible to make ZFS accumulate almost unlimited amount of speculative prefetch ZIOs. Combined with linear ABD allocations in RAIDZ code, it appears to be possible to exhaust system KVA, triggering kernel panic. Address this by introducing a system-wide counter of active prefetch requests and blocking prefetch distance doubling per stream hits if the number of active requests is higher that ~6% of ARC size. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14516	2023-03-01 15:27:40 -08:00
Brian Behlendorf	cd560c4474	Accommodate debug-kernel stack frame size The blk_queue_discard() and blk_queue_sector_erase() functions slightly exceed the allowed 4096 maximum stack frame size when building with the RedHat debug kernel which causes their configure checks to fail. Add an exception for these two tests so the interfaces are correctly detected. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14540	2023-03-01 14:40:45 -08:00
Richard Yao	4c856fb333	Fix data race between zil_commit() and zil_suspend() openzfsonwindows/openzfs#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed by the scheduler long enough for a parallel `zil_suspend()` operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_suspend_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to `txg_wait_synced()` after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (#11097), so we can safely disregard that issue for now. Reported-by: Arun KV <arun.kv@datacore.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14514	2023-03-01 13:23:09 -08:00
Richard Yao	9fd5fedc67	Remove bad kmem_free() oversight from previous zfsdev_state_list patch I forgot to remove the corresponding kmem_free() from zfs_kmod_fini() in `9a14ce43c3`. Clang's static analyzer did not complain, but the Coverity scan that was run after the patch was merged did. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reported-by: Coverity (CID-1535275) Closes #14556	2023-03-01 13:20:53 -08:00
Richard Yao	dd108f5d73	Linux: zfs_fillpage() should handle partial pages from end of file After `89cd2197b9` was merged, Clang's static analyzer began complaining about a dead assignment in `zfs_fillpage()`. Upon inspection, I noticed that the dead assignment was because we are not using the calculated io_len that we should use to avoid asking the DMU to read past the end of a file. This should result in `dmu_buf_hold_array_by_dnode()` calling `zfs_panic_recover()`. This issue predates `89cd2197b9`, but its simplification of zfs_fillpage() eliminated the only use of the assignment to io_len, which made Clang's static analyzer complain about the issue. Also, as a precaution, we add an assertion that io_offset < i_size. If this ever fails, bad things will happen. Otherwise, we are blindly trusting the kernel not to give us invalid offsets. We continue to blindly trust it on non-debug kernels. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14534	2023-03-01 13:19:47 -08:00
Tino Reichardt	0f9ed58f43	Update reclaim disk space script The script uses systemd-run, which does the job in background. We should take the the time and wait for the job to finish. Maybe some functional tests suffer from not really freed disk space and fail because of this. We also add some trimming in the end of the script. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #14554	2023-03-01 12:25:09 -08:00
Richard Yao	3a7c35119e	Handle unexpected errors in zil_lwb_commit() without ASSERT() We tripped `ASSERT(error == ENOENT \|\| error == EEXIST \|\| error == EALREADY)` in `zil_lwb_commit()` at Klara when doing robustness testing of ZIL against drive power cycles. That assertion presumably exists because when this code was written, the only errors expected from here were EIO, ENOENT, EEXIST and EALREADY, with EIO having its own handling before the assertion. However, upon doing a manual depth first search traversal of the source tree, it turns out that a large number of unexpected errors are possible here. In theory, EINVAL and ENOSPC can come from dnode_hold_impl(). However, most unexpected errors originate in the block layer and come to us from zio_wait() in various ways. One way is ->zl_get_data() -> dmu_buf_hold() -> dbuf_read() -> zio_wait(). From vdev_disk.c on Linux alone, zio_wait() can return the unexpected errors ENXIO, ENOTSUP, EOPNOTSUPP, ETIMEDOUT, ENOSPC, ENOLINK, EREMOTEIO, EBADE, ENODATA, EILSEQ and ENOMEM This was only observed after what have been likely over 1000 test iterations, so we do not expect to reproduce this again to find out what the error code was. However, circumstantial evidence suggests that the error was ENXIO. When ENXIO or any other unexpected error occurs, the `fsync()` or equivalent operation that called zil_commit() will return success, when in fact, dirty data has not been committed to stable storage. This is a violation of the Single UNIX Specification. The code should be able to handle this and any other unknown error by calling `txg_wait_synced()`. In addition to changing the code to call txg_wait_synced() on unexpected errors instead of returning, we modify it to print information about unexpected errors to dmesg. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@klarasystems.com> Sponsored-By: Wasabi Technology, Inc. Closes #14532	2023-03-01 09:39:41 -08:00
Rob N	25c4d1f032	mancheck: exclude dotfiles in man dir Its not uncommon for an editor to drop a hidden swap file in the dir while editing a file there. mancheck would find it and run mandoc on it, which would complain about its distinctly not-manpage format. A more correct solution might be to reconfigure the editor to not put swap files in the same dir, but its the default a lot of the time, and this is a very small change that gives a very nice quality-of-life improvement. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #14549	2023-03-01 09:38:31 -08:00
Rob N	4fe9cc5437	man: note that zdb operates directly on pool storage A frequent misunderstanding is that zdb accesses the pool through the kernel or filesystem, leading to confusion particularly when it can't access something that it seems like it should be able to. I've seen this confusion recently when zdb couldn't access a pool because the user didn't have permission to read directly from the block devices, and when it couldn't show attributes of encrypted files even though the dataset was unlocked. The manpage already speaks to another symptom of this, namely that zdb may "behave erratically" on an active pool; here I'm trying to make that a little more explicit. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #14539	2023-02-28 17:35:33 -08:00
Allan Jude	cbd76b4032	Makefile.bsd: cleanup and sync with FreeBSD xxhash.c was not being compiled, so when FreeBSD's kernel switched to a newer version of ZSTD a few weeks ago, out-of-tree ZFS failed to build Sync module/Makefile.bsd with FreeBSD's sys/modules/zfs/Makefile And restore the alphabetical sort in a number of places Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@klarasystems.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Sponsored-by: Klara, Inc. Closes #14508	2023-02-28 17:33:59 -08:00
Richard Yao	ae7e700650	Suppress static analyzer warning in dmu_objset_create_impl_dnstats() Clang's static analyzer claims that dereferencing ds in dmu_objset_create_impl_dnstats() could cause a NULL pointer dereference when a previous NULL check confirms that it is NULL. It is only NULL on the MOS, for which dmu_objset_userused_enabled(os) should always return false, so ds will never be dereferenced when it is NULL. We add an assertion to suppress this warning. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14470	2023-02-28 17:31:58 -08:00
Richard Yao	5cc4950901	Suppress static analyzer warning in dbuf_hold_copy() Clang's static analyzer claims that dbuf_hold_copy() will have a NULL pointer dereference in data->b_data when called by dbuf_hold_impl(). This is impossible because data is dr->dt.dl.dr_data, which is non-NULL whenever db->db_level == 0, which is always the case whenever dbuf_hold_impl() calls dbuf_hold_copy(). We add an assertion to suppress the complaint. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14470	2023-02-28 17:31:50 -08:00
Richard Yao	9a14ce43c3	Statically allocate first node of zfsdev_state_list This avoids a call to kmem_alloc() during module load. It also suppresses a defect report from Clang's static analyzer that claims that we will have a NULL pointer dereference in zfsdev_state_init() because it does not understand that this has already been allocated in zfs_kmod_init(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14470	2023-02-28 17:31:41 -08:00
Richard Yao	7fc48f8378	Suppress static analyzer warning in sa_attr_iter() Clang's static analyzer points out that when IS_SA_BONUSTYPE(type) is true and .sa_length is 0 for an attribute, we have a NULL pointer dereference. We suppress this with an IMPLY() statement. This was also identified by Coverity. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reported-by: Coverity (CID-1017954) Closes #14470	2023-02-28 17:31:30 -08:00
Richard Yao	4d9bb5514c	Suppress static analyzer warnings in zio_checksum_error_impl() Clang's static analyzer informs us of multiple NULL pointer dereferences involving zio_checksum_error_impl(). The first is a NULL pointer dereference if bp is NULL and ci->ci_flags & ZCHECKSUM_FLAG_EMBEDDED is false, but bp is NULL implies that ci->ci_flags & ZCHECKSUM_FLAG_EMBEDDED is true, so we add an IMPLY() statement to suppress the report. The second and third are identical, and are duplicated because while the NULL pointer dereference occurs in zio_checksum_gang_verifier(), it is called by zio_checksum_error_impl() and there is a report for each of the two functions. The reports state that when bp is NULL, ci->ci_flags & ZCHECKSUM_FLAG_EMBEDDED is true and checksum is not ZIO_CHECKSUM_LABEL, we also have a NULL pointer dereference. bp is NULL should imply that checksum == ZIO_CHECKSUM_LABEL, so we add an IMPLY() statement to suppress the second report. The two reports are functionally identical. A fourth variation of this was also reported by Coverity. It occurs when checksum == ZIO_CHECKSUM_ZILOG2. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reported-by: Coverity (CID-1524672) Closes #14470	2023-02-28 17:31:08 -08:00
Richard Yao	d634d20d1b	icp: Prevent compilers from optimizing away memset() in gcm_clear_ctx() The recently merged `f58e513f74` was intended to zero sensitive data before exit from encryption functions to harden the code against theoretical information leaks. Unfortunately, the method by which it did that is optimized away by the compiler, so some information still leaks. This was confirmed by counting function calls in disassembly. After studying how the OpenBSD, FreeBSD and Linux kernels handle this, and looking at our disassembly, I decided on a two-factor approach to protect us from compiler dead store elimination passes. The first factor is to stop trying to inline gcm_clear_ctx(). GCC does not actually inline it in the first place, and testing suggests that dead store elimination passes appear to become more powerful in a bad way when inlining is forced, so we recognize that and move gcm_clear_ctx() to a C file. The second factor is to implement an explicit_memset() function based on the technique used by `secure_zero_memory()` in FreeBSD's blake2 implementation, which coincidentally is functionally identical to the one used by Linux. The source for this appears to be a LLVM bug: https://llvm.org/bugs/show_bug.cgi?id=15495 Unlike both FreeBSD and Linux, we explicitly avoid the inline keyword, based on my observations that GCC's dead store elimination pass becomes more powerful when inlining is forced, under the assumption that it will be equally powerful when the compiler does decide to inline function calls. Disassembly of GCC's output confirms that all 6 memset() calls are executed with this patch applied. Reviewed-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14544	2023-02-28 17:28:50 -08:00
Richard Yao	2f76797ad9	Linux: Assert mutex is held in mutex_exit() A spurious mutex_exit() in a development branch caused weird issues until I identified it. An assertion prior to mutex_exit() would have caught it. Rather than adding assertions before invocations of mutex_exit() in the code, let us simply add an assertion to mutex_exit(). It is cheap and will likely improve developer productivity. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@klarasystems.com> Sponsored-By: Wasabi Technology, Inc. Closes #14541	2023-02-28 17:27:20 -08:00
George Amanakis	13ff72ba0a	Revert zfeature_active() to static Commit `34ce4c4` made zfeature_active() non-static. This is not required. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #14546	2023-02-28 14:03:52 -08:00
Tino Reichardt	727339b118	Fix github build failures because of vdevprops.7 This small fix adds the manpage vdevprops.7 to the file contrib/debian/openzfs-zfsutils.install and the github actions will work again. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #14553	2023-02-28 14:02:48 -08:00

... 2 3 4 5 6 ...

8603 Commits