mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-23 02:44:41 +03:00

Author	SHA1	Message	Date
Michael D Labriola	7a7e101437	Linux 5.10 compat: also zvol_revalidate_disk() Commit `59b68723` added a configure check for 5.10, which removed revalidate_disk(), and conditionally replaced it's usage with a call to the new revalidate_disk_size() function. However, the old function also invoked the device's registered callback, in our case zvol_revalidate_disk(). This commit adds a call to zvol_revalidate_disk() in zvol_update_volsize() to make sure the code path stays the same. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Michael D Labriola <michael.d.labriola@gmail.com> Closes #11358	2020-12-23 14:35:47 -08:00
Brian Behlendorf	401ba57ccd	Fix maybe uninitialized variable warning Commit `1c2358c12` restructured this code and introduced a warning about the variable maybe not being initialized. This cannot happen with the updated code but we should initialize the variable anyway to silence the warning. zpl_file.c: In function ‘zpl_iter_write’: zpl_file.c:324:9: warning: ‘count’ may be used uninitialized in this function [-Wmaybe-uninitialized] Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11373	2020-12-23 14:35:47 -08:00
Brian Behlendorf	188950df9e	Remove iov_iter_advance() from iter_read There's no need to call iov_iter_advance() in zpl_iter_read(). This was preserved from the previous code where it wasn't needed but also didn't cause any problems. Now that the iter functions also handle pipes that's no longer the case. When fully reading a pipe buffer iov_iter_advance() may results in the pipe buf release function being called which will not be registered resulting in a NULL dereference. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11375 Closes #11378	2020-12-23 14:35:47 -08:00
Brian Behlendorf	58bc86c5cb	Linux 5.10 compat: use iov_iter in uio structure As of the 5.10 kernel the generic splice compatibility code has been removed. All filesystems are now responsible for registering a ->splice_read and ->splice_write callback to support this operation. The good news is the VFS provided generic_file_splice_read() and iter_file_splice_write() callbacks can be used provided the ->iter_read and ->iter_write callback support pipes. However, this is currently not the case and only iovecs and bvecs (not pipes) are ever attached to the uio structure. This commit changes that by allowing full iov_iter structures to be attached to uios. Ever since the 4.9 kernel the iov_iter structure has supported iovecs, kvecs, bvevs, and pipes so it's desirable to pass the entire thing when possible. In conjunction with this the uio helper functions (i.e uiomove(), uiocopy(), etc) have been updated to understand the new UIO_ITER type. Note that using the kernel provided uio_iter interfaces allowed the existing Linux specific uio handling code to be simplified. When there's no longer a need to support kernel's older than 4.9, then it will be possible to remove the iovec and bvec members from the uio structure and always use a uio_iter. Until then we need to maintain all of the existing types for older kernels. Some additional refactoring and cleanup was included in this change: - Added checks to configure to detect available iov_iter interfaces. Some are available all the way back to the 3.10 kernel and are used when available. In particular, uio_prefaultpages() now always uses iov_iter_fault_in_readable() which is available for all supported kernels. - The unused UIO_USERISPACE type has been removed. It is no longer needed now that the uio_seg enum is platform specific. - Moved zfs_uio.c from the zcommon.ko module to the Linux specific platform code for the zfs.ko module. This gets it out of libzfs where it was never needed and keeps this Linux specific code out of the common sources. - Removed unnecessary O_APPEND handling from zfs_iter_write(), this is redundant and O_APPEND is already handled in zfs_write(); NOTE: Cleanly applying this kernel compatibility change required applying the following commits. This makes the change larger than it absolutely needs to be, but the resulting code matches what's in the branch branch. This is both more tested and makes it easier to apply any future backports in this area. `7cf4cd824` Remove incorrect assertion `783be694f` Reduce confusion in zfs_write `af5626ac2` Return EFAULT at the end of zfs_write() when set `cc1f85be8` Simplify offset and length limit in zfs_write `9585538d0` Const some unchanging variables in zfs_write `86e74dc16` Remove redundant oid parameter to update_pages `b3d723fb0` Factor uid, gid, and projid out of loop in zfs_write `3d40b6554` Share zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSD Reviewed-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11351	2020-12-23 14:35:39 -08:00
Brian Behlendorf	7cf4cd8246	Remove incorrect assertion Commit `85703f6` added a new ASSERT to zfs_write() as part of the cleanup which isn't correct in the case where multiple processes are concurrently extending a file. The `zp->z_size` is updated atomically while holding a range lock on only a portion of the file. Therefore, it's possible for the file size to increase after a same check is performed earlier in the loop causing this ASSERT to fail. The code itself handles this case correctly so only the invalid ASSERT needs to be removed. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11235	2020-12-23 14:35:00 -08:00
Ryan Moeller	783be694f1	Reduce confusion in zfs_write Is this block when abuf != NULL ever reached? Yes, it is. Add asserts and comments to prove that when we get here, we have a full block write at an aligned offset extending past EOF. Simplify by removing the check that tx_bytes == max_blksz, since we can assert that it is always true. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11191	2020-12-23 14:35:00 -08:00
Ryan Moeller	af5626ac27	Return EFAULT at the end of zfs_write() when set FreeBSD's VFS expects EFAULT from zfs_write() if we didn't complete the full write so it can retry the operation. Add some missing SET_ERRORs in zfs_write(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11193	2020-12-23 14:35:00 -08:00
Ryan Moeller	cc1f85be8b	Simplify offset and length limit in zfs_write Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11176	2020-12-23 14:34:59 -08:00
Ryan Moeller	9585538d0e	Const some unchanging variables in zfs_write Show that these values will not be changing later. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11176	2020-12-23 14:34:59 -08:00
Ryan Moeller	86e74dc162	Remove redundant oid parameter to update_pages The oid comes from the znode we are already passing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11176	2020-12-23 14:34:59 -08:00
Ryan Moeller	b3d723fb0e	Factor uid, gid, and projid out of loop in zfs_write Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11176	2020-12-23 14:34:59 -08:00
Matthew Macy	3d40b65540	Share zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSD The zfs_fsync, zfs_read, and zfs_write function are almost identical between Linux and FreeBSD. With a little refactoring they can be moved to the common code which is what is done by this commit. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11078	2020-12-23 14:34:59 -08:00
Brian Behlendorf	fa7b558bef	ZTS: Simplify zpool_initialize_verify_initialized Consider the test to be a success as long as the initializing pattern is found at least once per metaslab. This indicates that at least part of the free space was initialized. Ideally we'd check that the pattern was written to all free space but that's much trickier so this check is a reasonable compromise. Using a here-string to feed the loop in this test causes an empty string to still trigger the loop so we miss the `spacemaps=0` case. Pipe into the loop instead. While here, we can use `zpool wait -t initialize $TESTPOOL` to wait for the pool to initialize. Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11365	2020-12-23 14:34:59 -08:00
Matthew Ahrens	a103ae446e	special device removal space accounting fixes The space in special devices is not included in spa_dspace (or dsl_pool_adjustedsize(), or the zfs `available` property). Therefore there is always at least as much free space in the normal class, as there is allocated in the special class(es). And therefore, there is always enough free space to remove a special device. However, the checks for free space when removing special devices did not take this into account. This commit corrects that. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #11329	2020-12-23 14:34:59 -08:00
sterlingjensen	2ab24dfded	Use the correct return type for getopt Use the correct return type for getopt otherwise clang complains about tautological-constant-out-of-range-compare. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com> Closes #11359	2020-12-23 14:34:59 -08:00
gregory-lee-bartholomew	489633d99a	DKMS: Disable weak modules Fedora does not guarantee a stable kABI, so weak modules should be dis- abled. See the dkms man page for a more detailed explanation of the weak module feature. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com> Closes #9891 Closes #11128 Closes #11242 Closes #11335	2020-12-23 14:34:59 -08:00
Ryan Libby	ee49d9e02b	lua: avoid gcc -Wreturn-local-addr bug Avoid a bug with gcc's -Wreturn-local-addr warning with some obfuscation. In buggy versions of gcc, if a return value is an expression that involves the address of a local variable, and even if that address is legally converted to a non-pointer type, a warning may be emitted and the value of the address may be replaced with zero. Howerver, buggy versions don't emit the warning or replace the value when simply returning a local variable of non-pointer type. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90737 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Libby <rlibby@FreeBSD.org> Closes #11337	2020-12-23 14:34:59 -08:00
Ryan Libby	1fbda9caee	spa: avoid type narrowing warning Building the spa module for i386 caused gcc to emit -Wint-to-pointer-cast "cast to pointer from integer of different size" because spa.spa_did was uint64_t but pthread_join (via thread_join in spa_deactivate) takes a pointer (32-bit on i386). Define spa_did to be pointer-size instead. For now spa_did is in fact never non-zero and the thread_join could instead be ifdef'd out, but changing the size of spa_did may be more useful for the future. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Libby <rlibby@FreeBSD.org> Closes #11336	2020-12-23 14:34:59 -08:00
Ryan Libby	42bdfd3b36	FreeBSD libzfs: gcc requires __thread after static Building libzfs with gcc on FreeBSD failed because gcc is picky about the order of keywords in declarations with __thread, whereas clang is more relaxed. https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ryan Libby <rlibby@FreeBSD.org> Closes #11331	2020-12-23 14:34:59 -08:00
George Amanakis	900480bd96	Fix reporting of CKSUM errors in indirect vdevs When removing and subsequently reattaching a vdev, CKSUM errors may occur as vdev_indirect_read_all() reads from all children of a mirror in case of a resilver. Fix this by checking whether a child is missing the data and setting a flag (ic_error) which is then checked in vdev_indirect_repair() and suppresses incrementing the checksum counter. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #11277	2020-12-23 14:34:59 -08:00
Ryan Moeller	058b6fd069	arc_summary3: Handle overflowing value width Some tunables shown by arc_summary3 have string values that may exceed the normal line length, leaving a negative offset between the name and value fields. The negative space is of course not valid and Python rightly barfs up an exception traceback. Handle an overflowing value field width by ignoring the line length and separating the name from the value by a single space instead. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11270	2020-12-23 14:34:59 -08:00
Ryan Moeller	8847b06bf6	FreeBSD: Implement sysctl for fletcher4 impl There is a tunable to select the fletcher 4 checksum implementation on Linux but it was not present in FreeBSD. Implement the sysctl handler for FreeBSD and use ZFS_MODULE_PARAM_CALL to provide the tunable on both platforms. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11270	2020-12-23 14:34:59 -08:00
Paul Dagnelie	21adfb031c	Fix kernel panic induced by redacted send In the redaction list traversal code, there is a bug in the binary search logic when looking for the resume point. Maxbufid can be decremented to -1, causing us to read the last possible block of the object instead of the one we wanted. This can cause incorrect resume behavior, or possibly even a hang in some cases. In addition, when examining non-last blocks, we can treat the block as being the same size as the last block, causing us to miss entries in the redaction list when determining where to resume. Finally, we were ignoring the case where the resume point was found in the buffer being searched, and resuming from minbufid. All these issues have been corrected, and the code has been significantly simplified to make future issues less likely. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #11297	2020-12-23 14:34:59 -08:00
Ryan Moeller	ae2cfdf8a7	FreeBSD: Fix format of vfs.zfs.arc_no_grow_shift vfs.zfs.arc_no_grow_shift has an invalid type (15) and this causes py-sysctl to format it as a bytearray when it should be an integer. "U" is not a valid format, it should be "I" and the type should match the variable type, int. We can return EINVAL if the value is set below zero. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11318	2020-12-23 14:34:59 -08:00
Ryan Moeller	20e4513c56	FreeBSD: Update usage of py-sysctl py-sysctl now includes the CTLTYPE_NODE type nodes in the list returned by sysctl.filter() on FreeBSD head. It also provides descriptions now. Eliminate the subprocess call to get descriptions, and filter out the nodes so we only deal with values. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11318	2020-12-23 14:34:59 -08:00
Brian Behlendorf	f217a2b902	Fix possibly uninitialized 'root_inode' variable warning Resolve an uninitialized variable warning when compiling. In function ‘zfs_domount’: warning: ‘root_inode’ may be used uninitialized in this function [-Wmaybe-uninitialized] sb->s_root = d_make_root(root_inode); Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11306	2020-12-23 14:34:59 -08:00
George Melikov	aa5b9e1d7c	CI: add zloop workflow Run ztest via zloop for 20 minutes, total run time is ~30 minutes. Signed-off-by: George Melikov <mail@gmelikov.ru>	2020-12-23 14:34:59 -08:00
Ryan Moeller	fb3ad5d24e	FreeBSD: Do zcommon_init sooner to avoid FPU panic There has been a panic affecting some system configurations where the thread FPU context is disturbed during the fletcher 4 benchmarks, leading to a panic at boot. module_init() registers zcommon_init to run in the last subsystem (SI_SUB_LAST). Running it as soon as interrupts have been configured (SI_SUB_INT_CONFIG_HOOKS) makes sure we have finished the benchmarks before we start doing other things. While it's not clear how the FPU context was being disturbed, this does seem to avoid it. Add a module_init_early() macro to run zcommon_init() at this earlier point on FreeBSD. On Linux this is defined as module_init(). Authored by: Konstantin Belousov <kib@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11302	2020-12-23 14:34:59 -08:00
Érico Nogueira Rolim	de2ac3f700	mount_zfs: print strerror instead of errno for error reporting Tracking down an error message with the errno value can be difficult, using strerror makes the error message clearer. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Érico Rolim <erico.erc@gmail.com> Closes #11303	2020-12-23 14:34:59 -08:00
sterlingjensen	f9688b21d7	Drop path prefix workaround Canonicalization, the source of the trouble, was disabled in `9000a9f`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com> Closes #11295	2020-12-23 14:34:59 -08:00
Orivej Desh	fad85e52e5	Delete rw_semaphore.wait_lock configure check Last use of wait_lock was removed in "Linux 5.3 compat: retire rw_tryupgrade()" (`e7a99dab2b`). Fixes the issue reported in https://github.com/openzfs/zfs/issues/11097#issuecomment-714532367 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Orivej Desh <orivej@gmx.fr> Closes #11309	2020-12-23 14:34:59 -08:00
Brian Behlendorf	038aaec1cd	Fix optional "force" arg handing in zfs_ioc_pool_sync() The fnvlist_lookup_boolean_value() function should not be used to check the force argument since it's optional. It may not be provided or may have been created with the wrong flags. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11281 Closes #11284	2020-12-23 14:34:59 -08:00
George Melikov	94ca328fb3	CI: add new zfs-tests-sanity workflow Run zfs-tests with sanity.run for brief results. Timeouts are rare, so minimize false positives by increasing the default from 60 to 180 seconds. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru> Closes #11304	2020-12-23 14:34:59 -08:00
George Melikov	65c4c9a233	ZTS: zpool_trim tests throttle trim process Otherwise trim may finish before progress checks. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru> Closes #11296	2020-12-23 14:34:59 -08:00
Brian Behlendorf	07ca433973	Reduce fletcher4 and raidz benchmark times During module load time all of the available fetcher4 and raidz implementations are benchmarked for a fixed amount of time to determine the fastest available. Manual testing has shown that this time can be significantly reduced with negligible effect on the final results. This commit changes the benchmark time to 1ms which can reduce the module load time by over a second on x86_64. On an x86_64 system with sse3, ssse3, and avx2 instructions the benchmark times are: Fletcher4 603ms -> 15ms RAIDZ 1,322ms -> 64ms Reviewed-by: Matthew Macy <mmacy@freebsd.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11282	2020-12-23 14:34:59 -08:00
Brian Behlendorf	f21d1f8fad	ZTS: adjust zpool_import_012_pos timeout When running in the CI the zpool_import_012_pos test case occasionally takes longer than the maximum 600 seconds. When this happens the test case is considered to have failed but always completes a few minutes latter. Since the logs suggest nothing has actually failed this commit increases timeout and removes the exception. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11286	2020-12-23 14:34:59 -08:00
Brian Behlendorf	ee8794195b	ZTS: Update zfs_share_concurrent_shares.ksh Occasionally an out of memory error is hit by this test case when mounting the filesystems. Try and reduce the likelihood of this occurring by reducing the thread count from 100 to 50. It also has the advantage of slightly speeding up the test. cannot mount 'testpool/testfs3/79': Cannot allocate memory filesystem successfully created, but not mounted Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11283	2020-12-23 14:34:59 -08:00
Brian Behlendorf	0ef4def852	Add sanity.run file This run file contains a subset of functional tests which exercise as much functionality as possible while still executing relatively quickly. The included tests should take no more than a few seconds each to run at most. This provides a convenient way to sanity test a change before committing to a full test run which takes several hours. $ ./scripts/zfs-tests.sh -r sanity ... Results Summary PASS 813 Running Time: 00:14:42 Percent passed: 100.0% Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11271	2020-12-23 14:34:51 -08:00
melak	aa51adf0a2	Fix trivial typo in zfs-diff.8 Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tamas TEVESZ <ice@extreme.hu> Closes #11268 Closes #11272	2020-12-23 13:09:32 -08:00
Alexander Motin	1d02bdee6c	Fix for "Reduce latency effects of non-interactive I/O" It was found that setting min_active tunables for non-interactive I/Os makes them stuck. It is caused by zfs_vdev_nia_delay, that can never be reached if we never issue any I/Os due to min_active set to zero. Fix this by issuing at least one non-interactive I/O at a time when there are no interactive I/Os. When there are interactive I/Os, zero min_active allows to completely block any non-interactive I/O. It may min_active starvation in some scenarios, but who we are to deny foot shooting? Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #11261	2020-12-23 13:09:17 -08:00
Alexander Motin	2080c4f27e	Reduce latency effects of non-interactive I/O Investigating influence of scrub (especially sequential) on random read latency I've noticed that on some HDDs single 4KB read may take up to 4 seconds! Deeper investigation shown that many HDDs heavily prioritize sequential reads even when those are submitted with queue depth of 1. This patch addresses the latency from two sides: - by using _min_active queue depths for non-interactive requests while the interactive request(s) are active and few requests after; - by throttling it further if no interactive requests has completed while configured amount of non-interactive did. While there, I've also modified vdev_queue_class_to_issue() to give more chances to schedule at least _min_active requests to the lowest priorities. It should reduce starvation if several non-interactive processes are running same time with some interactive and I think should make possible setting of zfs_vdev_max_active to as low as 1. I've benchmarked this change with 4KB random reads from ZVOL with 16KB block size on newly written non-fragmented pool. On fragmented pool I also saw improvements, but not so dramatic. Below are log2 histograms of the random read latency in milliseconds for different devices: 4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before: 0, 0, 2, 1, 12, 21, 19, 18, 10, 15, 17, 21 after: 0, 0, 0, 24, 101, 195, 419, 250, 47, 4, 0, 0 , that means maximum latency reduction from 2s to 500ms. 4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before: 0, 0, 2, 31, 38, 28, 18, 12, 17, 20, 24, 10, 3 after: 0, 0, 55, 247, 455, 470, 412, 181, 36, 0, 0, 0, 0 , i.e. from 4s to 250ms. 1 SAS HDD SEAGATE ST14000NM0048 before: 0, 0, 29, 70, 107, 45, 27, 1, 0, 0, 1, 4, 19 after: 1, 29, 681, 1261, 676, 1633, 67, 1, 0, 0, 0, 0, 0 , i.e. from 4s to 125ms. 1 SAS SSD SEAGATE XS3840TE70014 before (microseconds): 0, 0, 0, 0, 0, 0, 0, 0, 70, 18343, 82548, 618 after: 0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844, 90 I've also measured scrub time during the test and on idle pools. On idle fragmented pool I've measured scrub getting few percent faster due to use of QD3 instead of QD2 before. On idle non-fragmented pool I've measured no difference. On busy non-fragmented pool I've measured scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #11166	2020-12-23 13:09:03 -08:00
qzdanis	49ba502f99	Add compatibility for busybox mktemp Busybox's mktemp requires at least six X's in the template, causing the current sed --in-place check to fail because the file does not exist. This change adds additional X's to mktemp templates that do not already have at least six X's in them. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Quentin Zdanis <zdanisq@gmail.com> Closes #11269	2020-12-23 13:08:30 -08:00
Ryan Moeller	7735c9addf	FreeBSD: notify userspace when a vdev is removed This is needed for zfsd to autoreplace vdevs. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11260	2020-12-23 13:08:12 -08:00
Andrew Sun	ed02d603a1	Make zpool status "remove:" label print in bold When ZFS_COLOR is set, zpool status shows row headings in bold, except for the "remove:" heading. This is a quick fix that makes it print in bold too. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrew Sun <me@andrewsun.com> Closes #11255	2020-12-23 13:07:27 -08:00
George Melikov	529469769f	CI: simplify checkstyle runner Remove excess steps. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru> Closes #11262	2020-12-23 13:07:14 -08:00
Prawn	3b854534f0	ZED/zfs-list-cacher.sh: don't exit on ignored event type Check for the history_event type instead. The zfs-list-cacher.sh script currently respects the event types excluded from syslog(!) in ZED_SYSLOG_SUBCLASS_EXCLUDE. This makes little sense in this single-purpose script and silently breaks when history_events are excluded from syslog, which is the default since `13d65987a9`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: InsanePrawn <insane.prawny@gmail.com> Closes #11164 Closes #11347	2020-12-19 18:01:21 -08:00
Brian Behlendorf	dcbf847493	Tag 2.0.0 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> zfs-2.0.0	2020-11-30 10:13:14 -08:00
Brian Behlendorf	2757204434	Verify zfs module loaded before starting services Extend the change made in `ae12b02` to verify the zfs kernel modules are loaded to the rest of the OpenZFS services. If the modules aren't loaded the neither the share, volume, or and zed services can be started. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11243	2020-11-30 09:44:08 -08:00
Đoàn Trần Công Danh	24a6f83847	dracut: use /bin/sh instead of bash as the intepreter Despite that dracut has a hard dependency on bash, its modules doesn't, dracut only has a hard dependency on bash for module-setup (on a fully usable machine). Inside initramfs, dracut allows users choose from a list of handful other shells, e.g. bash, busybox, dash, mkfsh. In fact, my local machine's initramfs is being built with dash, and it's functional for a very long time. Before `64025fa3a` (Silence 'make checkbashisms', 2020-08-20), we also allows our users to have that right, too. Let's fix the problem 'make checkbashisms' reported and allows our users to have that right, again. For 'plymouth' case, let's simply run the command inside the if instead of checking for the existence of command before running it, because the status is also failture if plymouth is unavailable. While we're at it, let's remove an unnecessary fork for grep in zfs-generator.sh.in and its following complicated 'if elif fi' with a simple 'case ... esac'. To support this change, also exclude 90zfs from "make checkbashisms" because the current CI infrastructure ships an old version of "checkbashisms", which complains about "command -v", while the current latest "checkbashisms" thinks it's fine. In the near future, we can revert that change to "Makefile.am" when CI infrastructure is updated. Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Closes #11244	2020-11-30 09:44:02 -08:00
Brian Behlendorf	2c36eb763f	Revert "Reduce latency effects of non-interactive I/O" Under certain conditions commit `a3a4b8def` appears to result in a hang, or poor performance, when importing a pool. Until the root cause can be identified it has been reverted from the release branch. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #11245	2020-11-30 09:43:09 -08:00

1 2 3 4 5 ...

6437 Commits