mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-04-17 08:54:52 +03:00

Author	SHA1	Message	Date
Tino Reichardt	589143c225	Add SHA2 SIMD feature tests for FreeBSD These are added: - zfs_neon_available() for arm and aarch64 - zfs_sha256_available() for arm and aarch64 - zfs_sha512_available() for aarch64 - zfs_shani_available() for x86_64 Changes: - simd_powerpc.h: change license from CDDL to BSD Tested-by: Rich Ercolani <rincebrain@gmail.com> Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13741	2023-03-02 13:51:56 -08:00
Tino Reichardt	6723d1110f	Add ARM architecture to OpenZFS buildsystem Tested-by: Rich Ercolani <rincebrain@gmail.com> Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13741	2023-03-02 13:51:50 -08:00
Tino Reichardt	3e254aaad0	Remove old or redundant SHA2 files We had three sha2.h headers in different places. The FreeBSD version, the Linux version and the generic solaris version. The only assembly used for acceleration was some old x86-64 openssl implementation for sha256 within the icp module. For FreeBSD the whole SHA2 files of FreeBSD were copied into OpenZFS, these files got removed also. Tested-by: Rich Ercolani <rincebrain@gmail.com> Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13741	2023-03-02 13:50:21 -08:00
Rob N	163f3d3a1f	zdb: add decryption support The approach is straightforward: for dataset ops, if a key was offered, find the encryption root and the various encryption parameters, derive a wrapping key if necessary, and then unlock the encryption root. After that all the regular dataset ops will return unencrypted data, and that's kinda the whole thing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #11551 Closes #12707 Closes #14503	2023-03-02 13:39:09 -08:00
Alexander Motin	5f42d1dbf2	System-wide speculative prefetch limit. With some pathological access patterns it is possible to make ZFS accumulate almost unlimited amount of speculative prefetch ZIOs. Combined with linear ABD allocations in RAIDZ code, it appears to be possible to exhaust system KVA, triggering kernel panic. Address this by introducing a system-wide counter of active prefetch requests and blocking prefetch distance doubling per stream hits if the number of active requests is higher that ~6% of ARC size. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14516	2023-03-01 15:27:40 -08:00
Brian Behlendorf	cd560c4474	Accommodate debug-kernel stack frame size The blk_queue_discard() and blk_queue_sector_erase() functions slightly exceed the allowed 4096 maximum stack frame size when building with the RedHat debug kernel which causes their configure checks to fail. Add an exception for these two tests so the interfaces are correctly detected. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14540	2023-03-01 14:40:45 -08:00
Richard Yao	4c856fb333	Fix data race between zil_commit() and zil_suspend() openzfsonwindows/openzfs#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed by the scheduler long enough for a parallel `zil_suspend()` operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_suspend_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to `txg_wait_synced()` after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (#11097), so we can safely disregard that issue for now. Reported-by: Arun KV <arun.kv@datacore.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14514	2023-03-01 13:23:09 -08:00
Richard Yao	9fd5fedc67	Remove bad kmem_free() oversight from previous zfsdev_state_list patch I forgot to remove the corresponding kmem_free() from zfs_kmod_fini() in `9a14ce43c3`. Clang's static analyzer did not complain, but the Coverity scan that was run after the patch was merged did. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reported-by: Coverity (CID-1535275) Closes #14556	2023-03-01 13:20:53 -08:00
Richard Yao	dd108f5d73	Linux: zfs_fillpage() should handle partial pages from end of file After `89cd2197b9` was merged, Clang's static analyzer began complaining about a dead assignment in `zfs_fillpage()`. Upon inspection, I noticed that the dead assignment was because we are not using the calculated io_len that we should use to avoid asking the DMU to read past the end of a file. This should result in `dmu_buf_hold_array_by_dnode()` calling `zfs_panic_recover()`. This issue predates `89cd2197b9`, but its simplification of zfs_fillpage() eliminated the only use of the assignment to io_len, which made Clang's static analyzer complain about the issue. Also, as a precaution, we add an assertion that io_offset < i_size. If this ever fails, bad things will happen. Otherwise, we are blindly trusting the kernel not to give us invalid offsets. We continue to blindly trust it on non-debug kernels. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14534	2023-03-01 13:19:47 -08:00
Tino Reichardt	0f9ed58f43	Update reclaim disk space script The script uses systemd-run, which does the job in background. We should take the the time and wait for the job to finish. Maybe some functional tests suffer from not really freed disk space and fail because of this. We also add some trimming in the end of the script. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #14554	2023-03-01 12:25:09 -08:00
Richard Yao	3a7c35119e	Handle unexpected errors in zil_lwb_commit() without ASSERT() We tripped `ASSERT(error == ENOENT \|\| error == EEXIST \|\| error == EALREADY)` in `zil_lwb_commit()` at Klara when doing robustness testing of ZIL against drive power cycles. That assertion presumably exists because when this code was written, the only errors expected from here were EIO, ENOENT, EEXIST and EALREADY, with EIO having its own handling before the assertion. However, upon doing a manual depth first search traversal of the source tree, it turns out that a large number of unexpected errors are possible here. In theory, EINVAL and ENOSPC can come from dnode_hold_impl(). However, most unexpected errors originate in the block layer and come to us from zio_wait() in various ways. One way is ->zl_get_data() -> dmu_buf_hold() -> dbuf_read() -> zio_wait(). From vdev_disk.c on Linux alone, zio_wait() can return the unexpected errors ENXIO, ENOTSUP, EOPNOTSUPP, ETIMEDOUT, ENOSPC, ENOLINK, EREMOTEIO, EBADE, ENODATA, EILSEQ and ENOMEM This was only observed after what have been likely over 1000 test iterations, so we do not expect to reproduce this again to find out what the error code was. However, circumstantial evidence suggests that the error was ENXIO. When ENXIO or any other unexpected error occurs, the `fsync()` or equivalent operation that called zil_commit() will return success, when in fact, dirty data has not been committed to stable storage. This is a violation of the Single UNIX Specification. The code should be able to handle this and any other unknown error by calling `txg_wait_synced()`. In addition to changing the code to call txg_wait_synced() on unexpected errors instead of returning, we modify it to print information about unexpected errors to dmesg. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@klarasystems.com> Sponsored-By: Wasabi Technology, Inc. Closes #14532	2023-03-01 09:39:41 -08:00
Rob N	25c4d1f032	mancheck: exclude dotfiles in man dir Its not uncommon for an editor to drop a hidden swap file in the dir while editing a file there. mancheck would find it and run mandoc on it, which would complain about its distinctly not-manpage format. A more correct solution might be to reconfigure the editor to not put swap files in the same dir, but its the default a lot of the time, and this is a very small change that gives a very nice quality-of-life improvement. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #14549	2023-03-01 09:38:31 -08:00
Rob N	4fe9cc5437	man: note that zdb operates directly on pool storage A frequent misunderstanding is that zdb accesses the pool through the kernel or filesystem, leading to confusion particularly when it can't access something that it seems like it should be able to. I've seen this confusion recently when zdb couldn't access a pool because the user didn't have permission to read directly from the block devices, and when it couldn't show attributes of encrypted files even though the dataset was unlocked. The manpage already speaks to another symptom of this, namely that zdb may "behave erratically" on an active pool; here I'm trying to make that a little more explicit. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #14539	2023-02-28 17:35:33 -08:00
Allan Jude	cbd76b4032	Makefile.bsd: cleanup and sync with FreeBSD xxhash.c was not being compiled, so when FreeBSD's kernel switched to a newer version of ZSTD a few weeks ago, out-of-tree ZFS failed to build Sync module/Makefile.bsd with FreeBSD's sys/modules/zfs/Makefile And restore the alphabetical sort in a number of places Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@klarasystems.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Sponsored-by: Klara, Inc. Closes #14508	2023-02-28 17:33:59 -08:00
Richard Yao	ae7e700650	Suppress static analyzer warning in dmu_objset_create_impl_dnstats() Clang's static analyzer claims that dereferencing ds in dmu_objset_create_impl_dnstats() could cause a NULL pointer dereference when a previous NULL check confirms that it is NULL. It is only NULL on the MOS, for which dmu_objset_userused_enabled(os) should always return false, so ds will never be dereferenced when it is NULL. We add an assertion to suppress this warning. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14470	2023-02-28 17:31:58 -08:00
Richard Yao	5cc4950901	Suppress static analyzer warning in dbuf_hold_copy() Clang's static analyzer claims that dbuf_hold_copy() will have a NULL pointer dereference in data->b_data when called by dbuf_hold_impl(). This is impossible because data is dr->dt.dl.dr_data, which is non-NULL whenever db->db_level == 0, which is always the case whenever dbuf_hold_impl() calls dbuf_hold_copy(). We add an assertion to suppress the complaint. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14470	2023-02-28 17:31:50 -08:00
Richard Yao	9a14ce43c3	Statically allocate first node of zfsdev_state_list This avoids a call to kmem_alloc() during module load. It also suppresses a defect report from Clang's static analyzer that claims that we will have a NULL pointer dereference in zfsdev_state_init() because it does not understand that this has already been allocated in zfs_kmod_init(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14470	2023-02-28 17:31:41 -08:00
Richard Yao	7fc48f8378	Suppress static analyzer warning in sa_attr_iter() Clang's static analyzer points out that when IS_SA_BONUSTYPE(type) is true and .sa_length is 0 for an attribute, we have a NULL pointer dereference. We suppress this with an IMPLY() statement. This was also identified by Coverity. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reported-by: Coverity (CID-1017954) Closes #14470	2023-02-28 17:31:30 -08:00
Richard Yao	4d9bb5514c	Suppress static analyzer warnings in zio_checksum_error_impl() Clang's static analyzer informs us of multiple NULL pointer dereferences involving zio_checksum_error_impl(). The first is a NULL pointer dereference if bp is NULL and ci->ci_flags & ZCHECKSUM_FLAG_EMBEDDED is false, but bp is NULL implies that ci->ci_flags & ZCHECKSUM_FLAG_EMBEDDED is true, so we add an IMPLY() statement to suppress the report. The second and third are identical, and are duplicated because while the NULL pointer dereference occurs in zio_checksum_gang_verifier(), it is called by zio_checksum_error_impl() and there is a report for each of the two functions. The reports state that when bp is NULL, ci->ci_flags & ZCHECKSUM_FLAG_EMBEDDED is true and checksum is not ZIO_CHECKSUM_LABEL, we also have a NULL pointer dereference. bp is NULL should imply that checksum == ZIO_CHECKSUM_LABEL, so we add an IMPLY() statement to suppress the second report. The two reports are functionally identical. A fourth variation of this was also reported by Coverity. It occurs when checksum == ZIO_CHECKSUM_ZILOG2. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reported-by: Coverity (CID-1524672) Closes #14470	2023-02-28 17:31:08 -08:00
Richard Yao	d634d20d1b	icp: Prevent compilers from optimizing away memset() in gcm_clear_ctx() The recently merged `f58e513f74` was intended to zero sensitive data before exit from encryption functions to harden the code against theoretical information leaks. Unfortunately, the method by which it did that is optimized away by the compiler, so some information still leaks. This was confirmed by counting function calls in disassembly. After studying how the OpenBSD, FreeBSD and Linux kernels handle this, and looking at our disassembly, I decided on a two-factor approach to protect us from compiler dead store elimination passes. The first factor is to stop trying to inline gcm_clear_ctx(). GCC does not actually inline it in the first place, and testing suggests that dead store elimination passes appear to become more powerful in a bad way when inlining is forced, so we recognize that and move gcm_clear_ctx() to a C file. The second factor is to implement an explicit_memset() function based on the technique used by `secure_zero_memory()` in FreeBSD's blake2 implementation, which coincidentally is functionally identical to the one used by Linux. The source for this appears to be a LLVM bug: https://llvm.org/bugs/show_bug.cgi?id=15495 Unlike both FreeBSD and Linux, we explicitly avoid the inline keyword, based on my observations that GCC's dead store elimination pass becomes more powerful when inlining is forced, under the assumption that it will be equally powerful when the compiler does decide to inline function calls. Disassembly of GCC's output confirms that all 6 memset() calls are executed with this patch applied. Reviewed-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14544	2023-02-28 17:28:50 -08:00
Richard Yao	2f76797ad9	Linux: Assert mutex is held in mutex_exit() A spurious mutex_exit() in a development branch caused weird issues until I identified it. An assertion prior to mutex_exit() would have caught it. Rather than adding assertions before invocations of mutex_exit() in the code, let us simply add an assertion to mutex_exit(). It is cheap and will likely improve developer productivity. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@klarasystems.com> Sponsored-By: Wasabi Technology, Inc. Closes #14541	2023-02-28 17:27:20 -08:00
George Amanakis	13ff72ba0a	Revert zfeature_active() to static Commit `34ce4c4` made zfeature_active() non-static. This is not required. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #14546	2023-02-28 14:03:52 -08:00
Tino Reichardt	727339b118	Fix github build failures because of vdevprops.7 This small fix adds the manpage vdevprops.7 to the file contrib/debian/openzfs-zfsutils.install and the github actions will work again. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #14553	2023-02-28 14:02:48 -08:00
John Poduska	73c383f541	Prevent incorrect datasets being mounted During a mount, zpl_mount_impl(), uses sget() with the callback zpl_test_super() to find a super_block with a matching objset, stored in z_os. It does so without taking the teardown lock on the zfsvfs. The problem is that operations like rollback will replace the z_os. And, there is a window where the objset in the rollback is freed, but z_os still points to it. Then, a mount like operation, for instance a clone, can reallocate that exact same pointer and zpl_test_super() will then match the super_block associated with the rollback as opposed to the clone. This fix tests for a match and if so, takes the teardown lock before doing the final match test. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Poduska <jpoduska@datto.com> Closes #14518	2023-02-27 16:49:34 -08:00
D. Ebdrup	4b9bc6345e	Add vdevprops.7 to the Makefile Adding vdevprops.7 to the Makefile ensures that it gets installed properly on FreeBSD. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Daniel Ebdrup Jensen <debdrup@FreeBSD.org> Closes #14527	2023-02-27 16:38:09 -08:00
Richard Yao	bff26b0220	Skip memory allocation when compressing holes Hole detection in the zio compression code allows us to opportunistically skip compression on holes. We can go a step further by not doing memory allocations on holes either. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Richard Yao <richard.yao@klarasystems.com> Sponsored-by: Wasabi Technology, Inc. Closes #14500	2023-02-27 14:41:02 -08:00
Attila Fülöp	f58e513f74	ICP: AES-GCM: Refactor gcm_clear_ctx() Currently the temporary buffer in which decryption takes place isn't cleared on context destruction. Further in some routines we fail to call gcm_clear_ctx() on error exit. Both flaws may result in leaking sensitive data. We follow best practices and zero out the plaintext buffer before freeing the memory holding it. Also move all cleanup into gcm_clear_ctx() and call it on any context destruction. The performance impact should be negligible. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #14528	2023-02-27 14:38:12 -08:00
Mariusz Zaborski	3b9309aabe	Move zap_attribute_t to the heap in dsl_deadlist_merge In the case of a regular compilation, the compiler raises a warning for a dsl_deadlist_merge function, that the stack size is to large. In debug build this can generate an error. Move large structures to heap. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #14524	2023-02-27 14:27:58 -08:00
Brian Behlendorf	000985fc15	Workaround GitHub Action failure Ubuntu 20.04 and 22.04 workflows are failing due to an error which is hit when running `apt-get update`. Until the problematic package is fixed apply the suggested workaround described here: https://github.com/orgs/community/discussions/47863 Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14530	2023-02-27 09:19:25 -08:00
Dimitry Andric	bf1bec394e	Use .section .rodata instead of .rodata on FreeBSD In commit `0a5b942d4` the FreeBSD SECTION_STATIC macro was set to ".rodata". This assembler directive is supported by LLVM (as a convenience alias for ".section .rodata") by not by GNU as. This caused the FreeBSD builds that are done with gcc to fail. Therefore, use ".section .rodata" instead, similar to the other asm_linkage.h headers. Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Dimitry Andric <dimitry@andric.com> Closes #14526	2023-02-24 16:45:48 -08:00
George Amanakis	d816bc5ec7	Move dmu_buf_rele() after dsl_dataset_sync_done() Otherwise the dataset may be freed after the last dmu_buf_rele() leading to a panic. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #14522 Closes #14523	2023-02-23 18:14:52 -07:00
Brian Behlendorf	6109d83df8	ZTS: Minor fixes - The migration_012_pos.ksh test case was failing because of a missing space after `log_must`. - None of the tests listed in the runfiles should include the .ksh suffix. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14515	2023-02-23 17:10:46 -08:00
Brian Behlendorf	89cd2197b9	Fix buffered/direct/mmap I/O race When a page is faulted in for memory mapped I/O the page lock may be dropped before it has been read and marked up to date. If a buffered read encounters such a page in mappedread() it must wait until the page has been updated. Failure to do so will result in a panic on debug builds and incorrect data on production builds. The critical part of this change is in mappedread() where pages which are not up to date are now handled. Additionally, it includes the following simplifications. - zfs_getpage() and zfs_fillpage() could be passed an array of pages. This could be more efficient if it was used but in practice only a single page was ever provided. These interfaces were simplified to acknowledge that. - update_pages() was modified to correctly set the PG_error bit on a page when it cannot be read by dmu_read(). - Setting PG_error and PG_uptodate was moved to zfs_fillpage() from zpl_readpage_common(). This is consistent with the handling in update_pages() and mappedread(). - Minor additional refactoring to comments and variable declarations to improve readability. - Add a test case to exercise concurrent buffered, direct, and mmap IO to the same file. - Reduce the mmap_sync test case default run time. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #13608 Closes #14498	2023-02-23 10:57:24 -08:00
Richard Yao	7cb67d627c	Fix NULL pointer dereference in zio_ready() Clang's static analyzer correctly identified a NULL pointer dereference in zio_ready() when ZIO_FLAG_NODATA has been set on a zio that is missing a block pointer. The NULL pointer dereference occurs because we have logic intended to disable ZIO_FLAG_NODATA when it has been set on a gang block. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14469	2023-02-23 10:19:08 -08:00
Richard Yao	c9e39da9a4	Use rw_tryupgrade() in dmu_bonus_hold_by_dnode() When dn->dn_bonus == NULL, dmu_bonus_hold_by_dnode() will unlock its read lock on dn->dn_struct_rwlock and grab a write lock. This can be micro-optimized by calling rw_tryupgrade(). Linux will not benefit from this since it does not support rwlock upgrades, but FreeBSD will. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14517	2023-02-22 16:33:23 -08:00
Paul Dagnelie	d9e64a4030	Improve error message of zfs redact We improve the error message of zfs redact by checking if the target snapshot exists, and if all the redaction snapshots exist. As a future improvement we could iterate over every snapshot provided and use that to determine which one specifically doesn't exist. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #11426 Closes #14496	2023-02-21 17:30:05 -08:00
rob-wing	28251d81d7	FreeBSD: don't verify recycled vnode for zfs control directory Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 #1 0xffffffff8058e86f at vpanic+0x17f #2 0xffffffff8058e6e3 at panic+0x43 #3 0xffffffff808adc15 at trap_fatal+0x385 #4 0xffffffff808adc6f at trap_pfault+0x4f #5 0xffffffff80886da8 at calltrap+0x8 #6 0xffffffff80669186 at vgonel+0x186 #7 0xffffffff80669841 at vgone+0x31 #8 0xffffffff8065806d at vfs_hash_insert+0x26d #9 0xffffffff81a39069 at sfs_vgetx+0x149 #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #11 0xffffffff8065a28c at lookup+0x45c #12 0xffffffff806594b9 at namei+0x259 #13 0xffffffff80676a33 at kern_statat+0xf3 #14 0xffffffff8067712f at sys_fstatat+0x2f #15 0xffffffff808ae50c at amd64_syscall+0x10c #16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 #1 0xffffffff8059620f at vpanic+0x17f #2 0xffffffff81a27f4a at spl_panic+0x3a #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 #4 0xffffffff8066fdee at vinactivef+0xde #5 0xffffffff80670b8a at vgonel+0x1ea #6 0xffffffff806711e1 at vgone+0x31 #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d #8 0xffffffff81a39069 at sfs_vgetx+0x149 #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #10 0xffffffff80661c2c at lookup+0x45c #11 0xffffffff80660e59 at namei+0x259 #12 0xffffffff8067e3d3 at kern_statat+0xf3 #13 0xffffffff8067eacf at sys_fstatat+0x2f #14 0xffffffff808b5ecc at amd64_syscall+0x10c #15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes #14501	2023-02-21 17:26:33 -08:00
Allan Jude	1d56c6d017	Fix per-jail zfs.mount_snapshot setting When jail.conf set the nopersist flag during startup, it was incorrectly destroying the per-jail ZFS settings. Reported-by: Martin Matuska <mm@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Sponsored-by: Modirum MDPay Sponsored-by: Klara, Inc. Closes #14509	2023-02-21 17:23:01 -08:00
George Amanakis	0f32b1f728	Partially revert `eee9362a7` With commit `34ce4c42f` applied, there is no need for `eee9362a7`. Revert that aside from the test. All tests introduced in those commits pass. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #14502	2023-02-21 09:36:22 -08:00
Richard Yao	9f08b6e31f	Sync thread should avoid holding the spa config write lock when possible spa_sync() currently grabs the write lock due to an old hack that is documented by a comment: We need the write lock here because, for aux vdevs, calling vdev_config_dirty() modifies sav_config. This is ugly and will become unnecessary when we eliminate the aux vdev wart by integrating all vdevs into the root vdev tree. This has lead to deadlocks in rare edge cases from holding the write lock. We can reduce incidence of these deadlocks by not grabbing the write lock on pools without auxillary vdevs. Sponsored-By: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@klarasystems.com> Closes #14282	2023-02-16 14:10:52 -08:00
Paul Dagnelie	dc72c60ec1	zfs redact fails when dnodesize=auto Add handling to dmu_object_next for the case where *objectp == 0. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #14479	2023-02-16 09:23:39 -08:00
Brian Behlendorf	57cfae4a2f	zdb: zero-pad checksum output follow up Apply zero padding for checksums consistently. The SNPRINTF_BLKPTR macro was not updated in commit `ac7648179c` which results in the `cli_root/zdb/zdb_checksum.ksh` test case reliably failing. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14497	2023-02-15 09:06:29 -08:00
Richard Yao	f04cb31e7c	Suppress Clang static analyzer complaint in zfs_replay_create() Clang's static analyzer incorrectly complains about an undefined value here when lr->lr_common.lrc_txtype == TX_SYMLINK and txtype == TX_CREATE. This is impossible, because of this line: txtype = (lr->lr_common.lrc_txtype & ~TX_CI((uint64_t)0x1 << 63)); Changing the code to compare against txtype suppresses the report. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14472	2023-02-14 11:05:41 -08:00
Brian Behlendorf	3fc92adc40	Linux: use filemap_range_has_page() As of the 4.13 kernel filemap_range_has_page() can be used to check if there is a page mapped in a given file range. When available this interface should be used which eliminates the need for the zp->z_is_mapped boolean. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14493	2023-02-14 11:04:34 -08:00
Richard Yao	ab672133a9	Give strlcat() full buffer lengths rather than smaller buffer lengths strlcat() is supposed to be given the length of the destination buffer, including the existing contents. Unfortunately, I had been overzealous when I wrote `a51288aabb`, since I gave it the length of the destination buffer, minus the existing contents. This likely caused a regression on large strings. On the topic of being overzealous, the use of strlcat() in dmu_send_estimate_fast() was unnecessary because recv_clone_name is a fixed length string. We continue using strlcat() mostly as defensive programming, in case the string length is ever changed, even though it is unnecessary. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #14476	2023-02-14 11:03:42 -08:00
Rich Ercolani	cfd57573ff	quick fix for lingering snapdir unmount problems Unfortunately, even after `e79b6807`, I still, much more rarely, tripped asserts when playing with many ctldir mounts at once. Since this appears to happen if we dispatched twice too fast, just ignore it. We don't actually need to do anything if someone already started doing it for us. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #14462	2023-02-13 16:40:13 -08:00
George Amanakis	34ce4c42ff	Fix a race condition in dsl_dataset_sync() when activating features The zio returned from arc_write() in dmu_objset_sync() uses zio_nowait(). However we may reach the end of dsl_dataset_sync() which checks if we need to activate features in the filesystem without knowing if that zio has even run through the ZIO pipeline yet. In that case we will flag features to be activated in dsl_dataset_block_born() but dsl_dataset_sync() has already completed its run and those features will not actually be activated. Mitigate this by moving the feature activation code in dsl_dataset_sync_done(). Also add new ASSERTs in dsl_scan_visitbp() checking if a block contradicts any filesystem flags. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #13816	2023-02-13 16:37:46 -08:00
Prakash Surya	13312e2fa1	Reduce need for contiguous memory for ioctls We've had cases where we trigger an OOM despite having memory freely available on the system. For example, here, we had about 21GB free: kernel: Node 0 Normal: 24187584kB (UME) 15495338kB (UE) 016kB 032kB 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 0*4096kB = 22071296kB The problem being, all the memory is in 4K and 8K contiguous regions, but the allocation request was for a 16K contiguous region: kernel: SafeExecutors-4 invoked oom-killer: gfp_mask=0x42dc0(GFP_KERNEL\|__GFP_NOWARN\|__GFP_COMP\|__GFP_ZERO), order=2, oom_score_adj=0 The offending allocation came from this call trace: kernel: Call Trace: kernel: dump_stack+0x57/0x7a kernel: dump_header+0x4f/0x1e1 kernel: oom_kill_process.cold.33+0xb/0x10 kernel: out_of_memory+0x1ad/0x490 kernel: __alloc_pages_slowpath+0xd55/0xe40 kernel: __alloc_pages_nodemask+0x2df/0x330 kernel: kmalloc_large_node+0x42/0x90 kernel: __kmalloc_node+0x25a/0x320 kernel: ? spl_kmem_free_impl+0x21/0x30 [spl] kernel: spl_kmem_alloc_impl+0xa5/0x100 [spl] kernel: spl_kmem_zalloc+0x19/0x20 [spl] kernel: zfsdev_ioctl+0x2b/0xe0 [zfs] kernel: do_vfs_ioctl+0xa9/0x640 kernel: ? __audit_syscall_entry+0xdd/0x130 kernel: ksys_ioctl+0x67/0x90 kernel: __x64_sys_ioctl+0x1a/0x20 kernel: do_syscall_64+0x5e/0x200 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 kernel: RIP: 0033:0x7fdca3674317 The problem is, for each ioctl that ZFS makes, it has to allocate a zfs_cmd_t structure, which is 13744 bytes in size (on my system): sdb> sizeof zfs_cmd (size_t)13744 This size, coupled with the fact that we currently allocate it with kmem_zalloc, means we need a 16K contiguous region of memory to satisfy the request. The solution taken by this change, is to use "vmem" instead of "kmem" to do the allocation, such that we don't necessarily need a contiguous 16K memory region to satisfy the allocation. Arguably, a better solution would be not to require such a large allocation to begin with (e.g. reduce the size of the zfs_cmd_t structure), but that'd be a much larger change than this "one liner". Thus, I've opted for this approach for now; we can always circle back and attempt to reduce the size of the structure in the future. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Prakash Surya <prakash.surya@delphix.com> Closes #14474	2023-02-13 16:35:59 -08:00
Alexander Motin	87a4dfa561	Improve arc_read() error reporting Debugging reported NULL de-reference panic in dnode_hold_impl() I found that for certain types of errors arc_read() may only return error code, but not properly report it via done and pio arguments. Lack of done calls may result in reference and/or memory leaks in higher level code. Lack of error reporting via pio may result in unnoticed errors there. For example, dbuf_read(), where dbuf_read_impl() ignores arc_read() return, relies completely on the pio mechanism and missed the errors. This patch makes arc_read() to always call done callback and always propagate errors to parent zio, if either is provided. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14454	2023-02-13 13:21:53 -08:00
Andreas Vögele	7883ea2234	rpm: Use libtirpc-devel and /usr/lib on SUSE SUSE Linux distributions require libtirpc-devel. The dracut and udev directories are /usr/lib/dracut and /usr/lib/udev. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: Andreas Vögele <andreas@andreasvoegele.com> Closes #14467 Closes #14468	2023-02-09 11:57:50 -08:00

1 2 3 4 5 ...

8426 Commits