mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-22 10:37:35 +03:00

Author	SHA1	Message	Date
Richard Yao	7584fbe846	Cleanup: Switch to strlcpy from strncpy Coverity found a bug in `zfs_secpolicy_create_clone()` where it is possible for us to pass an unterminated string when `zfs_get_parent()` returns an error. Upon inspection, it is clear that using `strlcpy()` would have avoided this issue. Looking at the codebase, there are a number of other uses of `strncpy()` that are unsafe and even when it is used safely, switching to `strlcpy()` would make the code more readable. Therefore, we switch all instances where we use `strncpy()` to use `strlcpy()`. Unfortunately, we do not portably have access to `strlcpy()` in tests/zfs-tests/cmd/zfs_diff-socket.c because it does not link to libspl. Modifying the appropriate Makefile.am to try to link to it resulted in an error from the naming choice used in the file. Trying to disable the check on the file did not work on FreeBSD because Clang ignores `#undef` when a definition is provided by `-Dstrncpy(...)=...`. We workaround that by explictly including the C file from libspl into the test. This makes things build correctly everywhere. We add a deprecation warning to `config/Rules.am` and suppress it on the remaining `strncpy()` usage. `strlcpy()` is not portably avaliable in tests/zfs-tests/cmd/zfs_diff-socket.c, so we use `snprintf()` there as a substitute. This patch does not tackle the related problem of `strcpy()`, which is even less safe. Thankfully, a quick inspection found that it is used far more correctly than strncpy() was used. A quick inspection did not find any problems with `strcpy()` usage outside of zhack, but it should be said that I only checked around 90% of them. Lastly, some of the fields in kstat_t varied in size by 1 depending on whether they were in userspace or in the kernel. The origin of this discrepancy appears to be `04a479f706` where it was made for no apparent reason. It conflicts with the comment on KSTAT_STRLEN, so we shrink the kernel field sizes to match the userspace field sizes. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13876	2022-09-27 16:35:29 -07:00
Jitendra Patidar	3ed9d6883b	Enforce "-F" flag on resuming recv of full/newfs on existing dataset When receiving full/newfs on existing dataset, then it should be done with "-F" flag. Its enforced for initial receive in checks done in zfs_receive_one function of libzfs. Similarly, on resuming full/newfs recv on existing dataset, it should be done with "-F" flag. When dataset doesn't exist, then full/new recv is done on newly created dataset and it's marked INCONSISTENT. But when receiving on existing dataset, recv is first done on %recv and its marked INCONSISTENT. Existing dataset is not marked INCONSISTENT. Resume of full/newfs receive with dataset not INCONSISTENT indicates that its resuming newfs on existing dataset. So, enforce "-F" flag in this case. Also return an error from dmu_recv_resume_begin_check() in zfs kernel, when its resuming full/newfs recv without force. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Chunwei Chen <david.chen@nutanix.com> Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com> Closes #13856 Closes #13857	2022-09-27 16:34:27 -07:00
Richard Yao	a2163a96ae	Fix bad free in skein code Clang's static analyzer found a bad free caused by skein_mac_atomic(). It will allocate a context on the stack and then pass it to skein_final(), which attempts to free it. Upon inspection, skein_digest_atomic() also has the same problem. These functions were created to match the OpenSolaris ICP API, so I was curious how we avoided this in other providers and looked at the SHA2 code. It appears that SHA2 has a SHA2Final() helper function that is called by the exported sha2_mac_final()/sha2_digest_final() as well as the sha2_mac_atomic() and sha2_digest_atomic() functions. The real work is done in SHA2Final() while some checks and the free are done in sha2_mac_final()/sha2_digest_final(). We fix the use after free in the skein code by taking inspiration from the SHA2 code. We introduce a skein_final_nofree() that does most of the work, and make skein_final() into a function that calls it and then frees the memory. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13954	2022-09-27 12:36:58 -07:00
Richard Yao	f7bda2de97	Fix userspace memory leaks found by Clang Static Analzyer Recently, I have been making a push to fix things that coverity found. However, I was curious what Clang's static analyzer reported, so I ran it and found things that coverity had missed. * contrib/pam_zfs_key/pam_zfs_key.c: If prop_mountpoint is passed more than once, we leak memory. * module/zfs/zcp_get.c: We leak memory on temporary properties in userspace. * tests/zfs-tests/cmd/draid.c: On error from vdev_draid_rand(), we leak memory if best_map had been allocated by a prior iteration. * tests/zfs-tests/cmd/mkfile.c: Memory used by the loop is not freed before program termination. Arguably, these are all minor issues, but if we ignore them, then they could obscure serious bugs, so we fix them. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13955	2022-09-26 17:18:05 -07:00
Richard Yao	8ef15f9322	Cleanup: Remove ineffective unsigned comparisons against 0 Coverity found a number of places where we either do MAX(unsigned, 0) or do assertions that a unsigned variable is >= 0. These do nothing, so let us drop them all. It also found a spot where we do `if (unsigned >= 0 && ...)`. Let us also drop the unsigned >= 0 check. Reviewed-by: Neal Gompa <ngompa@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13871	2022-09-26 17:02:38 -07:00
Richard Yao	52afc3443d	Linux: Fix uninitialized variable usage in zio_do_crypt_data() Coverity complained about this. An error from `hkdf_sha512()` before uio initialization will cause pointers to uninitialized memory to be passed to `zio_crypt_destroy_uio()`. This is a regression that was introduced by `cf63739191`. Interestingly, this never affected FreeBSD, since the FreeBSD version never had that patch ported. Since moving uio initialization to the top of this function would slow down the qat_crypt() path, we only move the `memset()` calls to the top of the function. This is sufficient to fix this problem. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Neal Gompa <ngompa@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13944	2022-09-26 16:44:22 -07:00
Richard Yao	2a493a4c71	Fix unchecked return values and unused return values Coverity complained about unchecked return values and unused values that turned out to be unused return values. Different approaches were used to handle the different cases of unchecked return values: * cmd/zdb/zdb.c: VERIFY0 was used in one place since the existing code had no error handling. An error message was printed in another to match the rest of the code. * cmd/zed/agents/zfs_retire.c: We dismiss the return value with `(void)` because the value is expected to be potentially unset. * cmd/zpool_influxdb/zpool_influxdb.c: We dismiss the return value with `(void)` because the values are expected to be potentially unset. * cmd/ztest.c: VERIFY0 was used since we want failures if something goes wrong in ztest. * module/zfs/dsl_dir.c: We dismiss the return value with `(void)` because there is no guarantee that the zap entry will always be there. For example, old pools imported readonly would not have it and we do not want to fail here because of that. * module/zfs/zfs_fm.c: `fnvlist_add_()` was used since the allocations sleep and thus can never fail. module/zfs/zvol.c: We dismiss the return value with `(void)` because we do not need it. This matches what is already done in the analogous `zfs_replay_write2()`. * tests/zfs-tests/cmd/draid.c: We suppress one return value with `(void)` since the code handles errors already. The other return value is handled by switching to `fnvlist_lookup_uint8_array()`. * tests/zfs-tests/cmd/file/file_fadvise.c: We add error handling. * tests/zfs-tests/cmd/mmap_sync.c: We add error handling for munmap, but ignore failures on remove() with (void) since it is expected to be able to fail. * tests/zfs-tests/cmd/mmapwrite.c: We add error handling. As for unused return values, they were all in places where there was error handling, so logic was added to handle the return values. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13920	2022-09-23 16:52:03 -07:00
Brian Behlendorf	505df8d133	Dynamically size dbuf hash mutex array Incorrectly sizing the array of hash locks used to protect the dbuf hash table can lead to contention and reduce performance. We could unconditionally allocate a larger array for the locks but it's wasteful, particularly for a low-memory system. Instead, dynamically allocate the array of locks and scale it based on total system memory. Additionally, add a new `dbuf_mutex_cache_shift` module option which can be used to override the hash lock array size. This is disabled by default (dbuf_mutex_hash_shift=0) and can only be set at module load time. The minimum target array size is set to 8192, this matches the current constant value. Note that the count of the dbuf hash table and count of the mutex array were added to the /proc/spl/kstat/zfs/dbufstats kstat. Finally, this change removes the _KERNEL conditional checks. These were not required since for the user space build there is no difference between the kmem and vmem interfaces. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #13928	2022-09-22 12:59:56 -07:00
Brian Behlendorf	223b04d23d	Revert "Reduce dbuf_find() lock contention" This reverts commit `34dbc618f5`. While this change resolved the lock contention observed for certain workloads, it inadventantly reduced the maximum hash inserts/removes per second. This appears to be due to the slightly higher acquisition cost of a rwlock vs a mutex. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2022-09-22 12:59:41 -07:00
Richard Yao	e506a0ce40	Cleanup: Change 1 used in bitshifts to 1ULL Coverity complains about this. It is not a bug as long as we never shift by more than 31, but it is not terrible to change the constants from 1 to 1ULL as clean up. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13914	2022-09-22 11:28:33 -07:00
youzhongyang	62e2a2881f	Fix minor issues in namespace delegation support get_user_ns() is only done once for each namespace, so put_user_ns() should be done once too. Fix two typos in user_namespace/user_namespace_002.ksh and user_namespace/user_namespace_003.ksh. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes #13918	2022-09-20 15:25:21 -07:00
Mateusz Guzik	fbf874a4ac	FreeBSD: handle V_PCATCH See https://cgit.FreeBSD.org/src/commit/?id=a75d1ddd74312f5dd79bc1e965f7077679659f2e Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #13910	2022-09-20 15:22:32 -07:00
Mateusz Guzik	3e5caef4c5	FreeBSD: catch up to 1400068 Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #13909	2022-09-20 15:21:30 -07:00
Ameer Hamza	c50b3f14d3	Delay ZFS_PROP_SHARESMB property to handle it for encrypted raw receive For encrypted raw receive, objset creation is delayed until a call to dmu_recv_stream(). ZFS_PROP_SHARESMB property requires objset to be populated when calling zpl_earlier_version(). To correctly handle the ZFS_PROP_SHARESMB property for encrypted raw receive, this change delays setting the property. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #13878	2022-09-20 15:19:05 -07:00
Richard Yao	3f400b0f58	FreeBSD: Cleanup zfs_readdir() The FreeBSD project's coverity scans found dead code in `zfs_readdir()`. Also, the comment above `zfs_readdir()` is out of date. I fixed the comment and deleted all of the dead code, plus additional dead code that was found upon review. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13924	2022-09-20 14:50:16 -07:00
Richard Yao	9276e202eb	FreeBSD: Fix uninitialized pointer read in spa_import_rootpool() The FreeBSD project's coverity scans found this. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13923	2022-09-20 14:43:03 -07:00
Richard Yao	f272960d52	Fix usage of zed_log_msg() and zfs_panic_recover() Coverity complained about the format specifiers not matching variables. In one case, the variable is a constant, so we fix it. In another, we were missing an argument (about which coverity also complained). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13888	2022-09-19 17:32:18 -07:00
Richard Yao	891ac937be	Linux: Fix use-after-free in zfsvfs_create() Coverity reported that we pass a pointer to zfsvfs to `dmu_objset_disown()` after freeing zfsvfs in zfsvfs_create_impl() after a failure in zfsvfs_init(). We have nearly identical duplicate versions of this code for FreeBSD and Linux, but interestingly, the FreeBSD version of this code differs in such a way that it does not suffer from this bug. We remove the difference from the FreeBSD version to fix this bug. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13883	2022-09-19 17:30:58 -07:00
Martin Matuška	042d43a1dd	FreeBSD: fix static module build broken in `7bb707ffa` param_set_arc_free_target(SYSCTL_HANDLER_ARGS) and param_set_arc_no_grow_shift(SYSCTL_HANDLER_ARGS) defined in sysctl_os.c must be made available to arc_os.c. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Martin Matuska <mm@FreeBSD.org> Closes #13915	2022-09-19 17:21:45 -07:00
Mateusz Guzik	9a671fe7ec	FreeBSD: stop passing LK_INTERLOCK to VOP_LOCK There is an ongoing effort to eliminate this feature. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #13908	2022-09-19 17:17:27 -07:00
Tino Reichardt	75e8b5ad84	Fix BLAKE3 tuneable and module loading on Linux and FreeBSD Apply similar options to BLAKE3 as it is done for zfs_fletcher_4_impl. The zfs module parameter on Linux changes from icp_blake3_impl to zfs_blake3_impl. You can check and set it on Linux via sysfs like this: ``` [bash]# cat /sys/module/zfs/parameters/zfs_blake3_impl cycle [fastest] generic sse2 sse41 avx2 [bash]# echo sse2 > /sys/module/zfs/parameters/zfs_blake3_impl [bash]# cat /sys/module/zfs/parameters/zfs_blake3_impl cycle fastest generic [sse2] sse41 avx2 ``` The modprobe module parameters may also be used now: ``` [bash]# modprobe zfs zfs_blake3_impl=sse41 [bash]# cat /sys/module/zfs/parameters/zfs_blake3_impl cycle fastest generic sse2 [sse41] avx2 ``` On FreeBSD the BLAKE3 implementation can be set via sysctl like this: ``` [bsd]# sysctl vfs.zfs.blake3_impl vfs.zfs.blake3_impl: cycle [fastest] generic sse2 sse41 avx2 [bsd]# sysctl vfs.zfs.blake3_impl=sse2 vfs.zfs.blake3_impl: cycle [fastest] generic sse2 sse41 avx2 \ -> cycle fastest generic [sse2] sse41 avx2 ``` This commit changes also some Blake3 internals like these: - blake3_impl_ops_t was renamed to blake3_ops_t - all functions are named blake3_impl_NAME() now Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Co-authored-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13725	2022-09-16 14:25:53 -07:00
Brian Behlendorf	7dee043af5	zfs_enter rework followup The zpl_fadvise() function was recently added and was not included in the initial patch. Update it accordingly. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #13831	2022-09-16 14:25:53 -07:00
Ameer Hamza	577d41d3b2	zfs recv hangs if max recordsize is less than received recordsize - Some optimizations for bqueue enqueue/dequeue. - Added a fix to prevent deadlock when both bqueue_enqueue_impl() and bqueue_dequeue() waits for signal to be triggered. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #13855	2022-09-16 13:52:25 -07:00
Chunwei Chen	768eacedef	zfs_enter rework Replace ZFS_ENTER and ZFS_VERIFY_ZP, which have hidden returns, with functions that return error code. The reason we want to do this is because hidden returns are not obvious and had caused some missing fail path unwinding. This patch changes the common, linux, and freebsd parts. Also fixes fail path unwinding in zfs_fsync, zpl_fsync, zpl_xattr_{list,get,set}, and zfs_lookup(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #13831	2022-09-16 13:36:47 -07:00
Richard Yao	b24d1c77f7	Add zfs_btree_verify_intensity kernel module parameter I see a few issues in the issue tracker that might be aided by being able to turn this on. We have no module parameter for it, so I would like to add one. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13874	2022-09-15 16:22:33 -07:00
Richard Yao	ddb1fd91c0	Fix incorrect size given to bqueue_enqueue() call in dmu_redact.c We pass sizeof (struct redact_record *) rather than sizeof (struct redact_record). Passing the pointer size is wrong. Coverity caught this in two places. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13885	2022-09-15 16:21:21 -07:00
Richard Yao	e949d36040	Fix assertions in crypto reference helpers The assertions are racy and the use of `membar_exit()` did nothing to fix that. The helpers use atomic functions, so we cleverly get values from the atomics that we can use to ensure that the assertions operate on the correct values. We also use `membar_producer()` prior to decrementing reference counts so that operations that happened prior to a decrement to 0 will be guaranteed to happen before the decrement on architectures that reorder atomics. This also slightly improves performance by eliminating unnecessary reads, although I doubt it would be measurable in any benchmark. Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13880	2022-09-15 13:24:00 -07:00
Richard Yao	fd8c3012b3	Fix use-after-free bugs in icp code These were reported by Coverity as "Read from pointer after free" bugs. Presumably, it did not report it as a use-after-free bug because it does not understand the inline assembly that implements the atomic instruction. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13881	2022-09-15 11:46:42 -07:00
Richard Yao	ccec88f11a	FreeBSD: Fix integer conversion for vnlru_free{,_vfsops}() When reviewing #13875, I noticed that our FreeBSD code has an issue where it converts from `int64_t` to `int` when calling `vnlru_free{,_vfsops}()`. The result is that if the int64_t is `1 << 36`, the int will be 0, since the low bits are 0. Even when some low bits are set, a value such as `((1 << 36) + 1)` would truncate to 1, which is wrong. There is protection against this on 32-bit platforms, but on 64-bit platforms, there is no check to protect us, so we add a check. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13882	2022-09-14 12:51:55 -07:00
Richard Yao	4a6e8b99f5	Add assertion to dsl_dataset_set_compression_sync Coverity pointed out that if we somehow receive SPA_FEATURE_NONE, we will use a negative number as an array index. A defensive assertion seems appropriate. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Neal Gompa <ngompa@datto.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13872	2022-09-14 12:50:03 -07:00
Richard Yao	d954ca19ba	Fix theoretical "use-after-free" in dbuf_prefetch_indirect_done() Coverity complains about a "use-after-free" bug in `dbuf_prefetch_indirect_done()` because we use a pointer value after freeing its buffer. The pointer is used for refcounting in ARC (as the reference holder). There is a theoretical situation where the pointer would be reused in a way that causes the refcounting to collide, so we change the order in which we call arc_buf_destroy() and dbuf_prefetch_fini() to match the rest of the function. This prevents the theoretical situation from being a possibility. Also, we have a few return statements with a value, despite this being a void function. We clean those up while we are making changes here. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Neal Gompa <ngompa@datto.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13869	2022-09-13 17:58:29 -07:00
Richard Yao	cf66e7e594	Cleanup: Make memory barrier definitions consistent across kernels We inherited membar_consumer() and membar_producer() from OpenSolaris, but we had replaced membar_consumer() with Linux's smp_rmb() in zfs_ioctl.c. The FreeBSD SPL consequently implemented a shim for the Linux-only smp_rmb(). We reinstate membar_consumer() in platform independent code and fix the FreeBSD SPL to implement membar_consumer() in a way analogous to Linux. Reviewed-by: Konstantin Belousov <kib@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Neal Gompa <ngompa@datto.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13843	2022-09-13 16:59:33 -07:00
Richard Yao	d5d10f2aef	Cleanup dead spa_boot code Unused code detected by coverity. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Neal Gompa <ngompa@datto.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13868	2022-09-13 16:40:10 -07:00
Richard Yao	e5327e7f97	vdev_draid_lookup_map() should not iterate outside draid_maps Coverity reported this as an out-of-bounds read. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Neal Gompa <ngompa@datto.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13865	2022-09-12 12:51:17 -07:00
Richard Yao	13f2b8fb92	Fix use-after-free in btree code Coverty static analysis found these. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Neal Gompa <ngompa@datto.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #10989 Closes #13861	2022-09-12 11:22:15 -07:00
Richard Yao	0e4c830bc1	Cleanup: Use OpenSolaris functions to call scheduler In our codebase, `cond_resched() and `schedule()` are Linux kernel functions that have replaced the OpenSolaris `kpreempt()` functions in the codebase to such an extent that `kpreempt()` in zfs_context.h was broken. Nobody noticed because we did not actually use it. The header had defined `kpreempt()` as `yield()`, which works on OpenSolaris and Illumos where `sched_yield()` is a wrapper for `yield()`, but that does not work on any other platform. The FreeBSD platform specific code implemented shims for these, but the shim for `schedule()` forced us to wait, which is different than merely rescheduling to another thread as the original Linux code does, while the shim for `cond_resched()` had the same definition as its kernel kpreempt() shim. After studying this, I have concluded that we should reintroduce the kpreempt() function in platform independent code with the following definitions: - In the Linux kernel: kpreempt(unused) -> cond_resched() - In the FreeBSD kernel: kpreempt(unused) -> kern_yield(PRI_USER) - In userspace: kpreempt(unused) -> sched_yield() In userspace, nothing changes from this cleanup. In the kernels, the function `fm_fini()` will now call `kern_yield(PRI_USER)` on FreeBSD and `cond_resched()` on Linux. This is instead of `pause("schedule", 1)` on FreeBSD and `schedule()` on Linux. This makes our behavior consistent across platforms. Note that Linux's SPL continues to use `cond_resched()` and `schedule()`. However, those functions have been removed from both the FreeBSD code and userspace code. This should have the benefit of making it slightly easier to port the code to new platforms by making how things should be mapped less confusing. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Neal Gompa <ngompa@datto.com> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13845	2022-09-12 09:55:37 -07:00
Ryan Moeller	60d995727a	FreeBSD: Replace legacy make_dev() interface usage The function make_dev_s() was introduced to replace make_dev() in FreeBSD 11.0. It allows further specification of properties and flags and returns an error code on failure. Using this we can fail loading the module more gracefully than a panic in situations such as when a device named zfs already exists. We already use it for zvols. Use make_dev_s() for /dev/zfs. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #13854	2022-09-08 10:40:18 -07:00
Alexander Motin	37f6845c6f	Improve too large physical ashift handling When iterating through children physical ashifts for vdev, prefer ones above the maximum logical ashift, that we can actually use, but within the administrator defined maximum. When selecting top-level vdev ashift, do not set it to the defined maximum in case physical ashift is even higher, but just ignore one. Using the maximum does not prevent misaligned writes, but reduces space efficiency. Since ZFS tries to write data sequentially and aggregates the writes, in many cases large misanigned writes may be not as bad as the space penalty otherwise. Allow internal physical ashifts for vdevs higher than SHIFT_MAX. May be one day allocator or aggregation could benefit from that. Reduce zfs_vdev_max_auto_ashift default from 16 (64KB) to 14 (16KB), so that ZFS may still use bigger ashifts up to SHIFT_MAX (64KB), but only if it really has to or explicitly told to, but not as an "optimization". There are some read-intensive NVMe SSDs that report Preferred Write Alignment of 64KB, and attempt to build RAIDZ2 of those leads to a space inefficiency that can't be justified. Instead these changes make ZFS fall back to logical ashift of 12 (4KB) by default and only warn user that it may be suboptimal for performance. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #13798	2022-09-08 10:30:53 -07:00
Finix1979	320f0c6022	Add Linux posix_fadvise support The purpose of this PR is to accepts fadvise ioctl from userland to do read-ahead by demand. It could dramatically improve sequential read performance especially when primarycache is set to metadata or zfs_prefetch_disable is 1. If the file is mmaped, generic_fadvise is also called for page cache read-ahead besides dmu_prefetch. Only POSIX_FADV_WILLNEED and POSIX_FADV_SEQUENTIAL are supported in this PR currently. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Finix Yan <yancw@info2soft.com> Closes #13694	2022-09-08 10:29:41 -07:00
Richard Yao	380b08098e	Linux SPL module init: Handle memory allocation failures correctly Upon inspection of our code, I noticed that we assume that __alloc_percpu() cannot fail, and while it probably never has failed in practice, technically, it can fail, so we should handle that. Additionally, we incorrectly assume that `taskq_create()` in spl_kmem_cache_init() cannot fail. The same remark applies to it. Lastly, `spl-init()` failures should always return negative error values, but in some places, we are returning positive 1, which is incorrect. We change those values to their correct error codes. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13847	2022-09-08 10:28:20 -07:00
pkubaj	dff541f698	Fix build on FreeBSD/powerpc64* There's no VSX handler on FreeBSD for now. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Piotr Kubaj <pkubaj@FreeBSD.org> Closes #13848	2022-09-08 10:27:25 -07:00
Rob Wing	983096a1b4	FreeBSD: add kqfilter support for zvol cdev The only event hooked up is NOTE_ATTRIB, which is triggered when the device is resized. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Wing <rew@FreeBSD.org> Closes #13773	2022-09-06 09:49:33 -07:00
Rob Wing	9d0887402b	FreeBSD: add knlist_init_sx() for exclusive locks This will be used to implement kqfilter support for zvol cdevs. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Wing <rew@FreeBSD.org> Closes #13773	2022-09-06 09:48:57 -07:00
Richard Yao	11df48ab8b	Cleanup Raid-Z Typo fixes Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13834	2022-09-06 09:43:21 -07:00
Umer Saleem	59767479ac	Add DD_FIELD string for snapshots_changed property This commit adds DD_FIELD string used in extensified dsl_dir zap object for snapshots_changed property. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #13819	2022-09-02 13:33:50 -07:00
Andriy Gapon	ee9f3bca55	Add zfs.sync.snapshot_rename Only the single snapshot rename is provided. The recursive or more complex rename can be scripted. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Andriy Gapon <avg@FreeBSD.org> Closes #13802	2022-09-02 13:31:19 -07:00
Ryan Moeller	7bb707ffaf	FreeBSD: Organize sysctls FreeBSD had a few platform-specific ARC tunables in the wrong place: - Move FreeBSD-specifc ARC tunables into the same vfs.zfs.arc node as the rest of the ARC tunables. - Move the handlers from arc_os.c to sysctl_os.c and add compat sysctls for the legacy names. While here, some additional clean up: - Most handlers are specific to a particular variable and don't need a pointer passed through the args. - Group blocks of related variables, handlers, and sysctl declarations into logical sections. - Match variable types for temporaries in handlers with the type of the global variable. - Remove leftover comments. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #13756	2022-09-02 13:26:24 -07:00
Alexander Motin	f933b3fd4d	Apply arc_shrink_shift to ARC above arc_c_min It makes sense to free memory in smaller chunks when approaching arc_c_min to let other kernel subsystems to free more, since after that point we can't free anything. This also matches behavior on Linux, where to shrinker reported only the size above arc_c_min. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #13794	2022-09-02 13:21:18 -07:00
Richard Yao	0b30dc484f	FreeBSD: Cleanup dead code from VFS The vfs_*_feature() macros turn anything that uses them into dead code, so we can delete all of it. As a side effect, zfs_set_fuid_feature() is now identical in module/os/freebsd/zfs/zfs_vnops_os.c and module/os/linux/zfs/zfs_vnops_os.c. A few other functions are identical too. Future cleanup could move these into a common file. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Closes #13832	2022-09-02 13:20:10 -07:00
Brian Behlendorf	9f346abbe8	Revert "Avoid panic with recordsize > 128k, raw sending and no large_blocks" This reverts commit `80a650b7bb`. This change inadvertently introduced a regression in ztest where one of the new ASSERTs is triggered in dsl_scan_visitbp(). Reviewed-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #12275 Closes #13799	2022-08-25 13:33:32 -07:00

1 2 3 4 5 ...

3899 Commits