mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-04-17 08:54:52 +03:00

Author	SHA1	Message	Date
Tony Hutter	11fbcacf37	zed: Add zedlet to power off slot when drive is faulted If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then power off the drive's slot in the enclosure if it becomes FAULTED. This can help silence misbehaving drives. This assumes your drive enclosure fully supports slot power control via sysfs. Reviewed-by: @AllKind Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #15200	2023-08-24 11:59:03 -07:00
Rob N	cae502c175	copy_file_range: fix fallback when source create on same txg In `019dea0a5` we removed the conversion from EAGAIN->EXDEV inside zfs_clone_range(), but forgot to add a test for EAGAIN to the copy_file_range() entry points to trigger fallback to a content copy. This commit fixes that. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kay Pedersen <mail@mkwg.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #15170 Closes #15172	2023-08-14 17:34:14 -07:00
Alexander Motin	8e20e0ff39	ZIL: Replay blocks without next block pointer. If we get next block allocation error during log write, we trigger transaction commit. But the block we have just completed is still written and transactions it covers will be acknowledged normally. If after that we ignore the block during replay just because it is the last in the chain, we may not replay some transactions that we have acknowledged as synced, that is not right. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15132	2023-08-11 09:04:44 -07:00
Alexander Motin	bdb7df4245	ZIL: Avoid dbuf_read() before dmu_sync(). In most cases dmu_sync() works with dirty records directly and does not need actual data. The only exception is dmu_sync_late_arrival(). To save some CPU time use dmu_buf_hold_noread() in z_get_data() and explicitly call dbuf_read() in dmu_sync_late_arrival(). There is also a chance that by that time TXG will already be synced and we won't have to do it at all. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15153	2023-08-11 09:04:08 -07:00
Coleman Kane	8ce2eba9e6	Linux 6.5 compat: Use copy_splice_read instead of filemap_splice_read Using the filemap_splice_read function for the splice_read handler was leading to occasional data corruption under certain circumstances. Favor using copy_splice_read instead, which does not demonstrate the same erroneous behavior under the tested failure cases. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #15164	2023-08-08 15:42:32 -07:00
Umer Saleem	8150090257	Move zinject from openzfs-zfs-test to openzfs-zfsutils For Native Debian packaging, zinject binary and man page is packaged in ZFS test package. zinject is not not directly related to ZTS and should be packaged with other utilities, like it is present in zfs_<ver>.rpm/deb packages. This commit moves zinject binary and man page from openzfs-zfs-test to openzfs-zfsutils package. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #15160	2023-08-08 09:40:36 -07:00
Rafael Kitover	b8c9070d09	dracut: support mountpoint=legacy for root dataset Support mountpoint=legacy for the root dataset in the dracut zfs support scripts. mountpoint=/ or mountpoint=/sysroot also works. Change zfs-env-bootfs.service to add zfsutil to BOOTFSFLAGS only for root datasets with mountpoint != legacy. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Signed-off-by: Rafael Kitover <rkitover@gmail.com> Closes #15149	2023-08-08 09:38:34 -07:00
oromenahar	019dea0a55	zfs_clone_range should return a descriptive error codes Return the more descriptive error codes instead of `EXDEV` when the parameters don't match the requirements of the clone function. Updated the comments in `brt.c` accordingly. The first three errors are just invalid parameters, which zfs can not handle. The fourth error indicates that the block which should be cloned is created and cloned or modified in the same transaction group (`txg`). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Kay Pedersen <mail@mkwg.de> Closes #15148	2023-08-08 09:37:06 -07:00
наб	683edb32b7	libzfs: sendrecv: send_progress_thread: handle SIGINFO/SIGUSR1 POSIX timers target the process, not the thread (as does SIGINFO), so we need to block it in the main thread which will die if interrupted. Ref: https://101010.pl/@ed1conf@bsd.network/110731819189629373 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #15113	2023-08-08 09:35:35 -07:00
Coleman Kane	36261c8238	Linux 6.5 compat: replace generic_file_splice_read with filemap_splice_read The generic_file_splice_read function was removed in Linux 6.5 in favor of filemap_splice_read. Add an autoconf test for filemap_splice_read and use it if it is found as the handler for .splice_read in the file_operations struct. Additionally, ITER_PIPE was removed in 6.5. This change removes the ITER_* macros that OpenZFS doesn't use from being tested in config/kernel-vfs-iov_iter.m4. The removal of ITER_PIPE was causing the test to fail, which also affected the code responsible for setting the .splice_read handler, above. That behavior caused run-time panics on Linux 6.5. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #15155	2023-08-07 15:47:46 -07:00
Ryan Lahfa	fdb8fff916	linux/spl/kmem_cache: undefine `kmem_cache_alloc` before defining it When compiling a kernel with bcachefs and zfs, the two macros will collide, making it impossible to have both filesystems. It is sufficient to just undefine the macro before calling it. On why this should be in ZFS rather than bcachefs, currently, bcachefs is not a in-tree filesystem, but, it has a reasonably high chance of getting included soon. This avoids the breakage in ZFS early, this patch may be distributed downstream in NixOS and is already used there. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Lahfa <ryan@lahfa.xyz> Closes #15144	2023-08-07 13:55:59 -07:00
Alexander Motin	6c94e64963	Refactor dmu_prefetch(). - Split dmu_prefetch_dnode() from dmu_prefetch() into a separate function. It is quite inconvenient to read the code where len = 0 means dnode prefetch instead indirect/data prefetch. One function doing both has no benefits, since the code paths are independent. - Improve dmu_prefetch() handling of long block ranges. Instead of limiting L0 data length to prefetch for to dmu_prefetch_max, make dmu_prefetch_max limit the actual amount of prefetch at the specified level, and, if there is more, prefetch all the rest at higher indirection level. It should improve random access times within the prefetched range of any length, reducing importance of specific dmu_prefetch_max value. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15076	2023-08-07 13:54:41 -07:00
Mateusz Piotrowski	a97b8fc2dd	Fix some typos Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org> Closes #15141	2023-08-07 13:53:59 -07:00
Coleman Kane	e47e9bbe86	Linux 6.5 compat: register_sysctl_table removed Additionally, the .child element of ctl_table has been removed in 6.5. This change adds a new test for the pre-6.5 register_sysctl_table() function, and uses the old code in that case. If it isn't found, then the parentage entries in the tables are removed, and the register_sysctl call is provided the paths of "kernel/spl", "kernel/spl/kmem", and "kernel/spl/kstat" directly, to populate each subdirectory over three calls, as is the new API. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #15138	2023-08-02 14:05:46 -07:00
Brian Atkinson	a5fdba1185	Revert "Linux 6.5 compat: register_sysctl_table removed" This reverts commit `b35374fd64` as there are error messages when loading the SPL module. Errors seemed to be tied to duplicate a duplicate entry. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #15134	2023-08-01 14:48:19 -07:00
Serapheim Dimitropoulos	12373b0cc7	zpool_vdev_remove() should handle EALREADY error return When the vdev properties features was merged an extra check was added in `spa_vdev_remove_top_check()` which checked whether the vdev that we want to remove is already being removed and if so return an EALREADY error. ``` static int spa_vdev_remove_top_check(vdev_t vd) { ... <snip> ... / * This device is already being removed / if (vd->vdev_removing) return (SET_ERROR(EALREADY)); ``` Before that change we'd still fail with an error but it was a more generic one - here is the check that failed later in the same function: ``` / * There can not be a removal in progress. */ if (spa->spa_removing_phys.sr_state == DSS_SCANNING) return (SET_ERROR(EBUSY)); ``` Changing the error code returned from that function changed the behavior of the removal's library interface exposed to the userland - `spa_vdev_remove()` now returns `EZFS_UNKNOWN` instead of `EZFS_EBUSY` that was returning before. This patch adds logic to make `spa_vdev_remove()` mindful of the new EALREADY code and propagating `EZFS_EBUSY` reverting to the previously established semantics of that function. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #15013 Closes #15129	2023-08-01 14:47:00 -07:00
Rob N	ead3eea3e0	linux/copy_file_range: properly request a fallback copy on Linux <5.3 Before Linux 5.3, the filesystem's copy_file_range handler had to signal back to the kernel that we can't fulfill the request and it should fallback to a content copy. This is done by returning -EOPNOTSUPP. This commit converts the EXDEV return from zfs_clone_range to EOPNOTSUPP, to force the kernel to fallback for all the valid reasons it might be unable to clone. Without it the copy_file_range() syscall will return EXDEV to userspace, breaking its semantics. Add test for copy_file_range fallbacks. copy_file_range should always fallback to a content copy whenever ZFS can't service the request with cloning. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kay Pedersen <mail@mkwg.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #15131	2023-08-01 11:31:11 -07:00
Zach Dykstra	fcd61d937f	readmmap.c: fix building with MUSL libc glibc includes sys/types.h from stdlib.h. This is not the case for MUSL, so explicitly include it. Fixes usage of uint_t. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Zach Dykstra <dykstra.zachary@gmail.com> Closes #15130	2023-08-01 09:01:32 -07:00
Rob N	114a39964f	zdb: include cloned blocks in block statistics This gives `zdb -b` support for clone blocks. Previously, it didn't know what clones were, so would count their space allocation multiple times and then report leaked space (or, in debug, would assert trying to claim blocks a second time). This commit fixes those bugs, and reports the number of clones and the space "used" (saved) by them. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kay Pedersen <mail@mkwg.de> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15123	2023-08-01 08:56:30 -07:00
наб	a21ca18d4d	linux: zfs: ctldir: set [amc]time to snapshot's creation property If looking up a snapdir inode failed, hold pool config – hold the snapshot – get its creation property – release it – release it, then use that as the [amc]time in the allocated inode. If that fails then fall back to current time. No performance impact since this is only done when allocating a new snapdir inode. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #15110 Closes #15117	2023-08-01 08:50:17 -07:00
Coleman Kane	6751634d77	Linux 4.20 compat: wrapper function for iov_iter type access An iov_iter_type() function to access the "type" member of the struct iov_iter was added at one point. Move the conditional logic to decide which method to use for accessing it into a macro and simplify the zpl_uio_init code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #15100	2023-08-01 08:42:33 -07:00
Coleman Kane	325505e5c4	Linux 6.4 compat: iter_iov() function now used to get old iov member The iov_iter->iov member is now iov_iter->__iov and must be accessed via the accessor function iter_iov(). Create a wrapper that is conditionally compiled to use the access method appropriate for the target kernel version. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #15100	2023-08-01 08:42:26 -07:00
Coleman Kane	43e8f6e37f	Linux 6.5 compat: blkdev changes Multiple changes to the blkdev API were introduced in Linux 6.5. This includes passing (void* holder) to blkdev_put, adding a new blk_holder_ops* arg to blkdev_get_by_path, adding a new blk_mode_t type that replaces uses of fmode_t, and removing an argument from the release handler on block_device_operations that we weren't using. The open function definition has also changed to take gendisk* and blk_mode_t, so update it accordingly, too. Implement local wrappers for blkdev_get_by_path() and vdev_blkdev_put() so that the in-line calls are cleaner, and place the conditionally-compiled implementation details inside of both of these local wrappers. Both calls are exclusively used within vdev_disk.c, at this time. Add blk_mode_is_open_write() to test FMODE_WRITE / BLK_OPEN_WRITE The wrapper function is now used for testing using the appropriate method for the kernel, whether the open mode is writable or not. Emphasize fmode_t arg in zvol_release is not used Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #15099	2023-08-01 08:37:20 -07:00
Coleman Kane	3b8e318b77	Linux 6.5 compat: use disk_check_media_change when it exists When disk_check_media_change() exists, then define zfs_check_media_change() to simply call disk_check_media_change() on the bd_disk member of its argument. Since disk_check_media_change() is newer than when revalidate_disk was present in bops, we should be able to safely do this via a macro, instead of recreating a new implementation of the inline function that forces revalidation. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #15101	2023-08-01 08:32:38 -07:00
Coleman Kane	b35374fd64	Linux 6.5 compat: register_sysctl_table removed Additionally, the .child element of ctl_table has been removed in 6.5. This change adds a new test for the pre-6.5 register_sysctl_table() function, and uses the old code in that case. If it isn't found, then the parentage entries in the tables are removed, and the register_sysctl call is provided the paths of "kernel/spl", "kernel/spl/kmem", and "kernel/spl/kstat" directly, to populate each subdirectory over three calls, as is the new API. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #15098	2023-08-01 08:27:58 -07:00
oromenahar	e9c59310f7	Check the return value in clonefile test Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Kay Pedersen <mail@mkwg.de> Closes #15128	2023-08-01 08:26:12 -07:00
Alexander Motin	b22bab2547	Remove fastwrite mechanism. Fastwrite was introduced many years ago to improve ZIL writes spread between multiple top-level vdevs by tracking number of allocated but not written blocks and choosing vdev with smaller count. It suposed to reduce ZIL knowledge about allocation, but actually made ZIL to even more actively report allocation code about the allocations, complicating both ZIL and metaslabs code. On top of that, it seems ZIO_FLAG_FASTWRITE setting in dmu_sync() was lost many years ago, that was one of the declared benefits. Plus introduction of embedded log metaslab class solved another problem with allocation rotor accounting both normal and log allocations, since in most cases those are now in different metaslab classes. After all that, I'd prefer to simplify already too complicated ZIL, ZIO and metaslab code if the benefit of complexity is not obvious. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15107	2023-07-28 13:30:33 -07:00
oromenahar	5bdfff5cfc	BRT should return EOPNOTSUPP Return the more descriptive EOPNOTSUPP instead of EXDEV when the storage pool doesn't support block cloning. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Kay Pedersen <mail@mkwg.de> Closes #15097	2023-07-27 11:32:34 -07:00
Alexander Motin	704c80f048	Avoid waiting in dmu_sync_late_arrival(). The transaction there does not produce any dirty data or log blocks, so it should not be throttled. All other cases wait for TXG sync, by which time the log block we are writing will be obsolete, so we can skip waiting and just return error here instead. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15096	2023-07-27 09:07:09 -07:00
Brian Behlendorf	782312c612	zed: Reduce log noise for large JBODs For large JBODs the log message "zfs_iter_vdev: no match" can account for the bulk of the log messages (over 70%). Since this message is purely informational and not that useful we remove it. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #15086 Closes #15094	2023-07-25 13:55:29 -07:00
Alexander Motin	2848de11e5	Remove zl_issuer_lock from zil_suspend(). This locking was recently added as part of #14979. But appears it is illegal to take zl_issuer_lock while holding dp_config_rwlock, taken by dsl_pool_hold(). It causes deadlock with sync thread in spa_sync_upgrades(). On a second thought, we should not need this locking, since zil_commit_impl() we call below takes zl_issuer_lock, that should sufficiently protect zl_suspend reads, combined with other logic from #14979. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15103	2023-07-25 09:08:36 -07:00
Rob Norris	48d0e9465d	zts: block cloning tests Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kay Pedersen <mail@mkwg.de> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050 Closes #405 Closes #13349	2023-07-24 16:37:58 -07:00
Rob Norris	6b0a4be5fe	linux: implement filesystem-side copy/clone functions for EL7 Redhat have backported copy_file_range and clone_file_range to the EL7 kernel using an "extended file operations" wrapper structure. This connects all that up to let cloning work there too. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kay Pedersen <mail@mkwg.de> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050	2023-07-24 16:37:04 -07:00
Rob Norris	9927f219f1	linux: implement filesystem-side clone ioctls Prior to Linux 4.5, the FICLONE etc ioctls were specific to BTRFS, and were implemented as regular filesystem-specific ioctls. This implements those ioctls directly in OpenZFS, allowing cloning to work on older kernels. There's no need to gate these behind version checks; on later kernels Linux will simply never deliver these ioctls, instead calling the approprate VFS op. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kay Pedersen <mail@mkwg.de> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050	2023-07-24 16:36:54 -07:00
Rob Norris	5a35c68b67	linux: implement filesystem-side copy/clone functions This implements the Linux VFS ops required to service the file copy/clone APIs: .copy_file_range (4.5+) .clone_file_range (4.5-4.19) .dedupe_file_range (4.5-4.19) .remap_file_range (4.20+) Note that dedupe_file_range() and remap_file_range(REMAP_FILE_DEDUP) are hooked up here, but are not implemented yet. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kay Pedersen <mail@mkwg.de> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050	2023-07-24 16:36:38 -07:00
Rob Norris	f6facd2429	dbuf_sync_leaf: check DB_READ in state assertions Block cloning introduced a new state transition from DB_NOFILL to DB_READ. This occurs when a block is cloned and then read on the current txg. In this case, the clone will move the dbuf to DB_NOFILL, and then the read will be issued for the overidden block pointer. If that read is still outstanding when it comes time to write, the dbuf will be in DB_READ, which is not handled by the checks in dbuf_sync_leaf, thus tripping the assertions. This updates those checks to allow DB_READ as a valid state iff the dirty record is for a BRT write and there is a override block pointer. This is a safe situation because the block already exists, so there's nothing that could change from underneath the read. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kay Pedersen <mail@mkwg.de> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Original-patch-by: Kay Pedersen <mail@mkwg.de> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050	2023-07-24 16:36:17 -07:00
Rob Norris	d4edecd1a2	dmu_buf_will_clone: only check that current txg is clean dbuf_undirty() will (correctly) only removed dirty records for the given (open) txg. If there is a dirty record for an earlier closed txg that has not been synced out yet, then db_dirty_records will still have entries on it, tripping the assertion. Instead, change the assertion to only consider the current txg. To some extent this is redundant, as its really just saying "did dbuf_undirty() work?", but it it doesn't hurt and accurately expresses our expectations. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kay Pedersen <mail@mkwg.de> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Original-patch-by: Kay Pedersen <mail@mkwg.de> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050	2023-07-24 16:35:56 -07:00
Rob Norris	87a6e135c5	brt_vdev_realloc: use vmem_alloc for large allocation bv_entcount can be a relatively large allocation (see comment for BRT_RANGESIZE), so get it from the big allocator. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kay Pedersen <mail@mkwg.de> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050	2023-07-24 16:35:49 -07:00
Rob Norris	8d21c002c6	zfs_clone_range: use vmem_malloc for large allocation Just silencing the warning about large allocations. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kay Pedersen <mail@mkwg.de> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050	2023-07-24 16:34:11 -07:00
Alexander Motin	2cb992a99c	ZIL: Fix config lock deadlock. When we have some LWBs closed and their ZIOs ready to be issued, we can not afford sleeping on config lock if somebody else try to lock it as writer, or it will cause a deadlock. To solve it, move spa_config_enter() from zil_lwb_write_issue() to zil_lwb_write_close() under zl_issuer_lock to enforce lock ordering with other threads. Now if we can't immediately lock config, issue all previously closed LWBs so that they could drop their config locks after completion, and only then allow sleeping on our lock. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15078 Closes #15080	2023-07-24 13:41:11 -07:00
Brian Behlendorf	fb344f5aeb	Linux 6.4 compat: META Update the META file to reflect compatibility with the 6.4 kernel. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #15095	2023-07-24 11:20:42 -07:00
Rob N	13ec73a028	shellcheck: disable "unreachable command" check [SC2317] This new check in 0.9.0 appears to have some issues with various forms of "early return", like trap, exit and return. This is tripping up (at least): cmd/zed/zed.d/history_event-zfs-list-cacher.sh /etc/zfs/zfs-functions Its not obvious what its complaining about or what the remedy is, so it seems sensible to disable this check for now. See also: https://www.shellcheck.net/wiki/SC2317 https://github.com/koalaman/shellcheck/issues/2542 https://github.com/koalaman/shellcheck/issues/2613 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #15089	2023-07-21 11:53:06 -07:00
Rob N	46adb2820a	metaslab: tuneable to better control force ganging metaslab_force_ganging isn't enough to actually force ganging, because it still only forces 3% of the time. This adds metaslab_force_ganging_pct so we can configure how often to force ganging. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #15088	2023-07-21 11:52:32 -07:00
Alexander Motin	34b3d498a9	Adjust prefetch parameters. - Reduce maximum prefetch distance for 32bit platforms to 8MB as it was previously. Those systems didn't grow much probably, so better stay conservative there. - Retire array_rd_sz tunable, blocking prefetch for large requests. We should not penalize applications trying to be more efficient. The speculative prefetcher by itself has reasonable distance limits, and 1MB is not much at all these days. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15072	2023-07-21 11:51:47 -07:00
Alexander Motin	28430b51e3	Add explicit prefetches to bpobj_iterate(). To simplify error handling bpobj_iterate_blkptrs() iterates through the list of block pointers backwards. Unfortunately speculative prefetcher is currently unable to detect such patterns, that makes each block read there synchronous and very slow on HDD pools. According to my tests, added explicit prefetch reduces time needed to asynchronously delete 8 snapshots of 4 million blocks each from 20 seconds to less than one, that should free sync thread for other useful work, such as async writes, scrub, etc. While there, plug one memory leak in case of bpobj_open() error and harmonize some variable names. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15071	2023-07-21 11:50:48 -07:00
Alan Somers	6fd87e1d8d	Don't emit cksum_{actual_expected} in ereport.fs.zfs.checksum events With anything but fletcher-4, even a tiny change in the input will cause the checksum value to change completely. So knowing the actual and expected checksums doesn't provide much more information than "they don't match". The harm in sending them is simply that they bloat the event. In particular, on FreeBSD the event must fit into a 1016 byte buffer. Fixes #14717 for mirrored pools. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Alan Somers <asomers@gmail.com> Sponsored-by: Axcient Closes #14717 Closes #15052	2023-07-21 11:49:26 -07:00
Alan Somers	cf2a225b24	Don't emit checksum histograms in ereport.fs.zfs.checksum events The checksum histograms were intended to be used with ATA and parallel SCSI, which are obsolete. With modern storage hardware, they will almost always look like white noise; all bits will be wrong. They only serve to bloat the event. That's a particular problem on FreeBSD, where events must fit into a 1016 byte buffer. This fixes issue #14717 for RAIDZ pools, but not for mirror pools. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Alan Somers <asomers@gmail.com> Sponsored-by: Axcient Closes #15052	2023-07-21 11:48:17 -07:00
Tony Hutter	ab0b0393cb	zed: Fix zed ASSERT on slot power cycle We would see zed assert on one of our systems if we powered off a slot. Further examination showed zfs_retire_recv() was reporting a GUID of 0, which in turn would return a NULL nvlist. Add in a check for a zero GUID. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #15084	2023-07-21 11:46:58 -07:00
Chunwei Chen	2d8a2b51dc	Fix zpl_test_super race with zfs_umount We cannot call zpl_enter in zpl_test_super, because zpl_test_super is under spinlock so we can't sleep, and also because zpl_test_super is called without sb->s_umount taken, so it's possible we would race with zfs_umount and call zpl_enter on freed zfsvfs. Here's an stack trace when this happens: [ 2379.114837] VERIFY(cvp->cv_magic == CV_MAGIC) failed [ 2379.114845] PANIC at spl-condvar.c:497:__cv_broadcast() [ 2379.114854] Kernel panic - not syncing: VERIFY(cvp->cv_magic == CV_MAGIC) failed [ 2379.115012] Call Trace: [ 2379.115019] dump_stack+0x74/0x96 [ 2379.115024] panic+0x114/0x2f6 [ 2379.115035] spl_panic+0xcf/0xfc [spl] [ 2379.115477] __cv_broadcast+0x68/0xa0 [spl] [ 2379.115585] rrw_exit+0xb8/0x310 [zfs] [ 2379.115696] rrm_exit+0x4a/0x80 [zfs] [ 2379.115808] zpl_test_super+0xa9/0xd0 [zfs] [ 2379.115920] sget+0xd1/0x230 [ 2379.116033] zpl_mount+0xdc/0x230 [zfs] [ 2379.116037] legacy_get_tree+0x28/0x50 [ 2379.116039] vfs_get_tree+0x27/0xc0 [ 2379.116045] path_mount+0x2fe/0xa70 [ 2379.116048] do_mount+0x80/0xa0 [ 2379.116050] __x64_sys_mount+0x8b/0xe0 [ 2379.116052] do_syscall_64+0x35/0x50 [ 2379.116054] entry_SYSCALL_64_after_hwframe+0x61/0xc6 [ 2379.116057] RIP: 0033:0x7f9912e8b26a Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #15077	2023-07-20 10:30:21 -07:00
Ameer Hamza	d9bb583c25	spa_min_alloc should be GCD, not min Since spa_min_alloc may not be a power of 2, unlike ashifts, in the case of DRAID, we should not select the minimal value among several vdevs. Rounding to a multiple of it is unlikely to work for other vdevs. Instead, using the greatest common divisor produces smaller yet more reasonable results. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #15067	2023-07-20 10:23:52 -07:00

1 2 3 4 5 ...

8834 Commits