mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-01-17 02:26:22 +03:00

Author	SHA1	Message	Date
Rob Norris	8a0e5e8b54	zvol: stop using zvol_state_lock to protect OS-side private data zvol_state_lock is intended to protect access to the global name->zvol lists (zvol_find_by_name()), but has also been used to control access to OS-side private data, accessed through whatever kernel object is used to represent the volume (gendisk, geom, etc). This appears to have been necessary to some degree because the OS-side object is what's used to get a handle on zvol_state_t, so zv_state_lock and zv_suspend_lock can't be used to manage access, but also, with the private object and the zvol_state_t being shutdown and destroyed at the same time in zvol_os_free(), we must ensure that the private object pointer only ever corresponds to a real zvol_state_t, not one in partial destruction. Taking the global lock seems like a convenient way to ensure this. The problem with this is that zvol_state_lock does not actually protect access to the zvol_state_t internals, so we need to take zv_state_lock and/or zv_suspend_lock. If those are contended, this can then cause OS-side operations (eg zvol_open()) to sleep to wait for them while hold zvol_state_lock. This then blocks out all other OS-side operations which want to get the private data, and any ZFS-side control operations that would take the write half of the lock. It's even worse if ZFS-side operations induce OS-side calls back into the zvol (eg creating a zvol triggers a partition probe inside the kernel, and also a userspace access from udev to set up device links). And it gets even works again if anything decides to defer those ops to a task and wait on them, which zvol_remove_minors_impl() will do under high load. However, since the previous commit, we have a guarantee that the private data pointer will always be NULL'd out in zvol_os_remove_minor() _before_ the zvol_state_t is made invalid, but it won't happen until all users are ejected. So, if we make access to the private object pointer atomic, we remove the need to take a global lockout to access it, and so we can remove all acquisitions of zvol_state_lock from the OS side. While here, I've rewritten much of the locking theory comment at the top of zvol.c. It wasn't wrong, but it hadn't been followed exactly, so I've tried to describe the purpose of each lock in a little more detail, and in particular describe where it should and shouldn't be used. Sponsored-by: Klara, Inc. Sponsored-by: Railway Corporation Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17625	2025-08-19 10:06:34 -07:00
Rob Norris	96f9d271ea	zvol: remove the OS-side minor before freeing the zvol When destroying a zvol, it is not "unpublished" from the system (that is, /dev/zd* node removed) until zvol_os_free(). Under Linux, at the time del_gendisk() and put_disk() are called, the device node may still be have an active hold, from a userspace program or something inside the kernel (a partition probe). As it is currently, this can lead to calls to zvol_open() or zvol_release() while the zvol_state_t is partially or fully freed. zvol_open() has some protection against this by checking that private_data is NULL, but zvol_release does not. This implements a better ordering for all of this by adding a new OS-side method, zvol_os_remove_minor(), which is responsible for fully decoupling the "private" (OS-side) objects from the zvol_state_t. For Linux, that means calling put_disk(), nulling private_data, and freeing zv_zso. This takes the place of zvol_os_clear_private(), which was a nod in that direction but did not do enough, and did not do it early enough. Equivalent changes are made on the FreeBSD side to follow the API change. Sponsored-by: Klara, Inc. Sponsored-by: Railway Corporation Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Fedor Uporov <fuporov.vstack@gmail.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17625	2025-08-19 10:06:21 -07:00
Rob Norris	967b15b888	ZIL: allow zil_commit() to fail with error This changes zil_commit() to have an int return, and updates all callers to check it. There are no corresponding internal changes yet; it will always return 0. Since zil_commit() is an indication that the caller _really_ wants the associated data to be durability stored, I've annotated it with the __warn_unused_result__ compiler attribute (via __must_check), to emit a warning if it's ever ussd without doing something with the return code. I hope this will mean we never misuse it in the future. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17398	2025-08-08 16:43:09 -07:00
Rob Norris	82d6f7b047	Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17591	2025-08-07 11:41:42 -07:00
Fedor Uporov	0b6fd024a7	ZVOL: Unify zvol minors operations and improve error handling Now zvol minors creation logic is passed thru spa_zvol_taskq, like it is doing for remove/rename zvol minors functions. Appropriate zvol minors creation functions are refactored: - The zvol_create_minor()/zvol_minors_create_recursive() were removed. - The single zvol_create_minors() is added instead. Also, it become possible to collect zvol minors subtasks status, to detect, if some zvol minor subtask is failed in the subtasks chain. The appropriate message is reported to zfs_dbgmsg buffer in this case. Sponsored-by: vStack, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com> Closes #17575	2025-08-06 10:10:52 -04:00
Fedor Uporov	92da9e0e93	ZVOL: Implement zvol_alloc() function on FreeBSD side Implement zvol_alloc() function on FreeBSD side to increase code base compatibility with Linux. Also, fix issue with late returning in case if volmode=none. Sponsored-by: vStack, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com> Closes #17482	2025-07-31 11:02:09 -04:00
Fedor Uporov	dea0fc969b	ZVOL: Return early, if volmode is ZFS_VOLMODE_NONE on FreeBSD side Return from zvol_os_create_minor() function immediately after dsl_prop_get_integer() call if volmode property value is set to 'none', like it is doing on Linux side. Sponsored-by: vStack, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com> Closes #17405	2025-07-30 09:46:34 -07:00
Fedor Uporov	e0edfcbd4e	ZVOL: Make zvol_volmode module parameter platform-independent The module parameter name was not changed in FreeBSD sysctls list: 'vfs.zfs.vol.mode'. Also, on Linux side the name is: /sys/module/zfs/parameters/zvol_volmode. Sponsored-by: vStack, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com> Closes #17386	2025-05-31 19:09:50 -04:00
Fedor Uporov	e1677d9ee1	ZVOL: Make zvol_prefetch_bytes module parameter platform-independent The module parameter now is represented in FreeBSD sysctls list with name: 'vfs.zfs.vol.prefetch_bytes'. The default value is 131072, same as on Linux side. Sponsored-by: vStack, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com> Closes #17385	2025-05-31 09:58:54 -04:00
Fedor Uporov	3dfa98d013	ZVOL: Make zvol_inhibit_dev module parameter platform-independent The module parameter now is represented in FreeBSD sysctls list with name: 'vfs.zfs.vol.inhibit_dev'. The default value is '0', same as on Linux side. Sponsored-by: vStack, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com> Closes #17384	2025-05-29 09:37:41 -04:00
Fedor Uporov	087d7d80c7	ZVOL: Comment platform-specific empty functions bodies on FreeBSD side Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com> Closes #17383	2025-05-27 20:00:25 -04:00
Alexander Motin	734eba251d	Wire O_DIRECT also to Uncached I/O (#17218 ) Before Direct I/O was implemented, I've implemented lighter version I called Uncached I/O. It uses normal DMU/ARC data path with some optimizations, but evicts data from caches as soon as possible and reasonable. Originally I wired it only to a primarycache property, but now completing the integration all the way up to the VFS. While Direct I/O has the lowest possible memory bandwidth usage, it also has a significant number of limitations. It require I/Os to be page aligned, does not allow speculative prefetch, etc. The Uncached I/O does not have those limitations, but instead require additional memory copy, though still one less than regular cached I/O. As such it should fill the gap in between. Considering this I've disabled annoying EINVAL errors on misaligned requests, adding a tunable for those who wants to test their applications. To pass the information between the layers I had to change a number of APIs. But as side effect upper layers can now control not only the caching, but also speculative prefetch. I haven't wired it to VFS yet, since it require looking on some OS specifics. But while there I've implemented speculative prefetch of indirect blocks for Direct I/O, controllable via all the same mechanisms. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Fixes #17027 Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-05-13 14:26:55 -07:00
Fedor Uporov	1a8f5ad3b0	zvol: Enable zvol threading functionality on FreeBSD Make zvol I/O requests processing asynchronous on FreeBSD side in some cases. Clone zvol threading logic and required module parameters from Linux side. Make zvol threadpool creation/destruction logic shared for both Linux and FreeBSD. The IO requests are processed asynchronously in next cases: - volmode=geom: if IO request thread is geom thread or cannot sleep. - volmode=cdev: if IO request passed thru struct cdevsw .d_strategy routine, mean is AIO request. In all other cases the IO requests are processed synchronously. The volthreading zvol property is ignored on FreeBSD side. Sponsored-by: vStack, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: @ImAwsumm Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com> Closes #17169	2025-05-08 15:25:40 -04:00
Rob Norris	f69631992d	dmu_tx: rename dmu_tx_assign() flags from TXG_* to DMU_TX_* (#17143 ) This helps to avoids confusion with the similarly-named txg_wait_synced(). Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-03-18 16:04:22 -07:00
Rob Norris	eb9098ed47	SPDX: license tags: CDDL-1.0 Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-03-13 17:56:27 -07:00
Brian Atkinson	a10e552b99	Adding Direct IO Support Adding O_DIRECT support to ZFS to bypass the ARC for writes/reads. O_DIRECT support in ZFS will always ensure there is coherency between buffered and O_DIRECT IO requests. This ensures that all IO requests, whether buffered or direct, will see the same file contents at all times. Just as in other FS's , O_DIRECT does not imply O_SYNC. While data is written directly to VDEV disks, metadata will not be synced until the associated TXG is synced. For both O_DIRECT read and write request the offset and request sizes, at a minimum, must be PAGE_SIZE aligned. In the event they are not, then EINVAL is returned unless the direct property is set to always (see below). For O_DIRECT writes: The request also must be block aligned (recordsize) or the write request will take the normal (buffered) write path. In the event that request is block aligned and a cached copy of the buffer in the ARC, then it will be discarded from the ARC forcing all further reads to retrieve the data from disk. For O_DIRECT reads: The only alignment restrictions are PAGE_SIZE alignment. In the event that the requested data is in buffered (in the ARC) it will just be copied from the ARC into the user buffer. For both O_DIRECT writes and reads the O_DIRECT flag will be ignored in the event that file contents are mmap'ed. In this case, all requests that are at least PAGE_SIZE aligned will just fall back to the buffered paths. If the request however is not PAGE_SIZE aligned, EINVAL will be returned as always regardless if the file's contents are mmap'ed. Since O_DIRECT writes go through the normal ZIO pipeline, the following operations are supported just as with normal buffered writes: Checksum Compression Encryption Erasure Coding There is one caveat for the data integrity of O_DIRECT writes that is distinct for each of the OS's supported by ZFS. FreeBSD - FreeBSD is able to place user pages under write protection so any data in the user buffers and written directly down to the VDEV disks is guaranteed to not change. There is no concern with data integrity and O_DIRECT writes. Linux - Linux is not able to place anonymous user pages under write protection. Because of this, if the user decides to manipulate the page contents while the write operation is occurring, data integrity can not be guaranteed. However, there is a module parameter `zfs_vdev_direct_write_verify` that controls the if a O_DIRECT writes that can occur to a top-level VDEV before a checksum verify is run before the contents of the I/O buffer are committed to disk. In the event of a checksum verification failure the write will return EIO. The number of O_DIRECT write checksum verification errors can be observed by doing `zpool status -d`, which will list all verification errors that have occurred on a top-level VDEV. Along with `zpool status`, a ZED event will be issues as `dio_verify` when a checksum verification error occurs. ZVOLs and dedup is not currently supported with Direct I/O. A new dataset property `direct` has been added with the following 3 allowable values: disabled - Accepts O_DIRECT flag, but silently ignores it and treats the request as a buffered IO request. standard - Follows the alignment restrictions outlined above for write/read IO requests when the O_DIRECT flag is used. always - Treats every write/read IO request as though it passed O_DIRECT and will do O_DIRECT if the alignment restrictions are met otherwise will redirect through the ARC. This property will not allow a request to fail. There is also a module parameter zfs_dio_enabled that can be used to force all reads and writes through the ARC. By setting this module parameter to 0, it mimics as if the direct dataset property is set to disabled. Reviewed-by: Brian Behlendorf <behlendorf@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Co-authored-by: Mark Maybee <mark.maybee@delphix.com> Co-authored-by: Matt Macy <mmacy@FreeBSD.org> Co-authored-by: Brian Behlendorf <behlendorf@llnl.gov> Closes #10018	2024-09-14 13:47:59 -07:00
Rob Norris	670147be53	zvol: ensure device minors are properly cleaned up Currently, if a minor is in use when we try to remove it, we'll skip it and never come back to it again. Since the zvol state is hung off the minor in the kernel, this can get us into weird situations if something tries to use it after the removal fails. It's even worse at pool export, as there's now a vestigial zvol state with no pool under it. It's weirder again if the pool is subsequently reimported, as the zvol code (reasonably) assumes the zvol state has been properly setup, when it's actually left over from the previous import of the pool. This commit attempts to tackle that by setting a flag on the zvol if its minor can't be removed, and then checking that flag when a request is made and rejecting it, thus stopping new work coming in. The flag also causes a condvar to be signaled when the last client finishes. For the case where a single minor is being removed (eg changing volmode), it will wait for this signal before proceeding. Meanwhile, when removing all minors, a background task is created for each minor that couldn't be removed on the spot, and those tasks then wake and clean up. Since any new tasks are queued on to the pool's spa_zvol_taskq, spa_export_common() will continue to wait at export until all minors are removed. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #14872 Closes #16364	2024-08-06 12:08:14 -07:00
Rob Norris	6c82951d11	FreeBSD: remove support for FreeBSD < 13.0-RELEASE (#16372 ) This includes the last 12.x release (now EOL) and 13.0 development versions (<1300139). Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2024-08-05 16:56:45 -07:00
Mark Johnston	4367312760	zvol: Fix suspend lock leaks (#16270 ) In several functions, we use a flag variable to track whether zv_suspend_lock is held. This flag was not getting reset in a particular case where we need to retry the underlying operation, resulting in a lock leak. Make sure to update the flag where necessary. Signed-off-by: Mark Johnston <markj@FreeBSD.org> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2024-07-10 14:27:44 -07:00
Alan Somers	21bc066ece	Fix updating the zvol_htable when renaming a zvol When renaming a zvol, insert it into zvol_htable using the new name, not the old name. Otherwise some operations won't work. For example, "zfs set volsize" while the zvol is open. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Signed-off-by: Alan Somers <asomers@FreeBSD.org> Closes #16127 Closes #16128	2024-04-25 14:24:52 -07:00
Alan Somers	e36ff84c33	Update the kstat dataset_name when renaming a zvol Add a dataset_kstats_rename function, and call it when renaming a zvol on FreeBSD and Linux. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alan Somers <asomers@gmail.com> Sponsored-by: Axcient Closes #15482 Closes #15486	2023-11-07 11:34:50 -08:00
Alexander Motin	c3773de168	ZIL: Cleanup sync and commit handling ZVOL: - Mark all ZVOL ZIL transactions as sync. Since ZVOLs have only one object, it makes no sense to maintain async queue and on each commit merge it into sync. Single sync queue is just cheaper, while it changes nothing until actual commit request arrives. - Remove zsd_sync_cnt and the zil_async_to_sync() calls since we are no longer switching between sync and async queues. ZFS: - Mark write transactions as sync based only on number of sync opens (z_sync_cnt). We can not randomly jump between sync and async unless we want data corruptions due to writes reordering. - When file first opened with O_SYNC (z_sync_cnt incremented to 1) call zil_async_to_sync() for it to preserve correct ordering between past and future writes. - Drop zfs_fsyncer_key logic. Looks like it was an optimization for workloads heavily intermixing async writes with tons of fsyncs. But first it was broken 8 years ago due to Linux tsd implementation not allowing data storage between syscalls, and second, I doubt it is safe to switch from async to sync so often and without calling zil_async_to_sync(). - Rename sync argument of *_log_write() into commit, now only signalling caller's intent to call zil_commit() soon after. It allows WR_COPIED optimizations without extra other meanings. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15366	2023-10-30 14:51:56 -07:00
Ameer Hamza	14ba8ab97d	Prevent panic during concurrent snapshot rollback and zvol read Protect zvol_cdev_read with zv_suspend_lock to prevent concurrent release of the dnode, avoiding panic when a snapshot is rolled back in parallel during ongoing zvol read operation. Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #14839	2023-05-09 17:56:35 -07:00
Brian Behlendorf	64bfa6bae3	Additional limits on hole reporting Holding the zp->z_rangelock as a RL_READER over the range 0-UINT64_MAX is sufficient to prevent the dnode from being re-dirtied by concurrent writers. To avoid potentially looping multiple times for external caller which do not take the rangelock holes are not reported after the first sync. While not optimal this is always functionally correct. This change adds the missing rangelock calls on FreeBSD to zvol_cdev_ioctl(). Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14512 Closes #14641	2023-03-28 08:19:03 -07:00
Alan Somers	e197bb24f1	Optionally skip zil_close during zvol_create_minor_impl If there were no zil entries to replay, skip zil_close. zil_close waits for a transaction to sync. That can take several seconds, for example during pool import of a resilvering pool. Skipping zil_close can cut the time for "zpool import" from 2 hours to 45 seconds on a resilvering pool with a thousand zvols. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Sponsored-by: Axcient Closes #13999 Closes #14015	2022-11-08 12:38:08 -08:00
Rob Wing	983096a1b4	FreeBSD: add kqfilter support for zvol cdev The only event hooked up is NOTE_ATTRIB, which is triggered when the device is resized. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Wing <rew@FreeBSD.org> Closes #13773	2022-09-06 09:49:33 -07:00
ixhamza	fb087146de	Add support for per dataset zil stats and use wmsum counters ZIL kstats are reported in an inclusive way, i.e., same counters are shared to capture all the activities happening in zil. Added support to report zil stats for every datset individually by combining them with already exposed dataset kstats. Wmsum uses per cpu counters and provide less overhead as compared to atomic operations. Updated zil kstats to replace wmsum counters to avoid atomic operations. Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #13636	2022-07-20 17:14:06 -07:00
Tino Reichardt	1d3ba0bf01	Replace dead opensolaris.org license link The commit replaces all findings of the link: http://www.opensolaris.org/os/licensing with this one: https://opensource.org/licenses/CDDL-1.0 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #13619	2022-07-11 14:16:13 -07:00
Pawel Jakub Dawidek	a64d757aa4	FreeBSD: Clean up the use of ioflags - Prefer O_* flags over F* flags that mostly mirror O_* flags anyway, but O_* flags seem to be preferred. - Simplify the code as all the F*SYNC flags were defined as FFSYNC flag. - Don't define FRSYNC flag, so we don't generate unnecessary ZIL commits. - Remove EXCL define, FreeBSD ignores the excl argument for zfs_create() anyway. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net> Closes #13400	2022-05-02 16:26:28 -07:00
наб	ef70eff198	module: mark arguments used Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #13110	2022-02-18 09:34:03 -08:00
Christian Schwarz	1dccfd7a38	zvol: make calls to platform ops static There's no need to make the platform ops dynamic dispatch. This change replaces the dynamic dispatch with static calls to the platform-specific functions. To avoid name collisions, prefix all platform-specific functions with `zvol_os_`. I actually find `zvol_..._os` slightly nicer to read in the calling code, but having it as a prefix is useful. Advantage: - easier jump-to-definition / grepping - potential benefits to static analysis - better legibility Future work: also prefix remaining `static` functions in zvol_os.c. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com> Closes #12965	2022-02-07 10:24:38 -08:00
Ryan Moeller	3158c2e3cb	FreeBSD: Fix zvol_cdev_open locking First open locking changes were correctly applied to zvol_geom_open but incorrectly applied to zvol_cdev_open, causing spa_namespace_lock to be held indefinitely. Make the first open locking in zvol_cdev_open match zvol_geom_open. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #13016	2022-01-26 11:23:39 -08:00
Ryan Moeller	5a57d6f73b	FreeBSD: Touch up comments in zvol_os Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #12934	2022-01-14 12:43:31 -08:00
Ryan Moeller	020545a95d	FreeBSD: Fix zvol_*_open() locking These are the changes for FreeBSD corresponding to the changes made for Linux in #12863, see that PR for details. Changes from #12863 are applied for zvol_geom_open and zvol_cdev_open on FreeBSD. This also adds a check for the zvol dying which we had in zvol_geom_open but was missing in zvol_cdev_open. The check causes the open to fail early with ENXIO when we are in the middle of changing volmode. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #12934	2022-01-14 12:43:05 -08:00
наб	7c41df4c77	zvol: remove unused variable Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12917	2022-01-06 11:20:06 -08:00
Alexander Motin	46197dc858	FreeBSD: Ignore make_dev_s() errors Since errors returned by zvol_create_minor_impl() are ignored by the common code, it is more convenient to ignore make_dev_s() errors there. It allows, for example, to get device created for the zvol after later rename instead of having it further stuck in half-created state. zvol_rename_minor() already ignores those errors. While there, switch from MAXPHYS to maxphys in FreeBSD 13+. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12375	2021-07-22 10:22:14 -06:00
Alexander Motin	c71a2bb52f	FreeBSD: Update dataset_kstats for zvols in dev mode Previous commit added accounting for geom mode, but not for dev. In geom mode we actually have GEOM statistics, while in dev mode additional accounting actually makes more sense. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #12097	2021-05-26 12:14:26 -06:00
Ryan Moeller	e4efb70950	FreeBSD: Clean up ASSERT/VERIFY use in module Convert use of ASSERT() to ASSERT0(), ASSERT3U(), ASSERT3S(), ASSERT3P(), and likewise for VERIFY(). In some cases it ended up making more sense to change the code, such as VERIFY on nvlist operations that I have converted to use fnvlist instead. In one place I changed an internal struct member from int to boolean_t to match its use. Some asserts that combined multiple checks with && in a single assert have been split to separate asserts, to make it apparent which check fails. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11971	2021-04-30 16:36:10 -07:00
Andrea Gelmini	bf169e9f15	Fix various typos Correct an assortment of typos throughout the code base. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #11774	2021-04-02 18:52:15 -07:00
Christian Schwarz	93e3658035	zvol: call zil_replaying() during replay zil_replaying(zil, tx) has the side-effect of informing the ZIL that an entry has been replayed in the (still open) tx. The ZIL uses that information to record the replay progress in the ZIL header when that tx's txg syncs. ZPL log entries are not idempotent and logically dependent and thus calling zil_replaying() is necessary for correctness. For ZVOLs the question of correctness is more nuanced: ZVOL logs only TX_WRITE and TX_TRUNCATE, both of which are idempotent. Logical dependencies between two records exist only if the write or discard request had sync semantics or if the ranges affected by the records overlap. Thus, at a first glance, it would be correct to restart replay from the beginning if we crash before replay completes. But this does not address the following scenario: Assume one log record per LWB. The chain on disk is HDR -> 1:W(1, "A") -> 2:W(1, "B") -> 3:W(2, "X") -> 4:W(3, "Z") where N:W(O, C) represents log entry number N which is a TX_WRITE of C to offset A. We replay 1, 2 and 3 in one txg, sync that txg, then crash. Bit flips corrupt 2, 3, and 4. We come up again and restart replay from the beginning because we did not call zil_replaying() during replay. We replay 1 again, then interpret 2's invalid checksum as the end of the ZIL chain and call replay done. The replayed zvol content is "AX". If we had called zil_replaying() the HDR would have pointed to 3 and our resumed replay would not have replayed anything because 3 was corrupted, resulting in zvol content "BX". If 3 logically depends on 2 then the replay corrupted the ZVOL_OBJ's contents. This patch adds the zil_replaying() calls to the replay functions. Since the callbacks in the replay function need the zilog_t* pointer so that they can call zil_replaying() we open the ZIL while replaying in zvol_create_minor(). We also verify that replay has been done when on-demand-opening the ZIL on the first modifying bio. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christian Schwarz <me@cschwarz.com> Closes #11667	2021-03-07 09:49:58 -08:00
Brian Atkinson	d0cd9a5cc6	Extending FreeBSD UIO Struct In FreeBSD the struct uio was just a typedef to uio_t. In order to extend this struct, outside of the definition for the struct uio, the struct uio has been embedded inside of a uio_t struct. Also renamed all the uio_* interfaces to be zfs_uio_* to make it clear this is a ZFS interface. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #11438	2021-01-20 21:27:30 -08:00
Matthew Macy	0ca45cb310	Fix problems in zvol_set_volmode_impl - Don't leave fstrans set when passed a snapshot - Don't remove minor if volmode already matches new value - (FreeBSD) Wait for GEOM ops to complete before trying remove (at create time GEOM will be "tasting" in parallel) - (FreeBSD) Don't leak zvol_state_lock on open if zv == NULL - (FreeBSD) Don't try to unlock zv->zv_state lock if zv == NULL Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11199	2020-11-17 09:50:52 -08:00
Ryan Moeller	7b42f09049	FreeBSD: Simplify zvol_geom_open and zvol_cdev_open We can consolidate the unlocking procedure into one place by starting with drop_suspend set to B_FALSE and moving the open count check up. While here, a little code cleanup. Match the out labels between zvol_geom_open and zvol_cdev_open, and add a missing period in some comments. Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11175	2020-11-10 11:08:10 -08:00
Ryan Moeller	2186ed33f1	FreeBSD: Avoid spurious EINTR in zvol_cdev_open zvol_first_open can fail with EINTR if spa_namespace_lock is not held and cannot be taken without waiting. Apply the same logic that was done for zvol_geom_open to take spa_namespace_lock if not already held on first open in zvol_cdev_open. Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11175	2020-11-10 11:07:25 -08:00
Mariusz Zaborski	ae37ceadaa	FreeBSD: Prevent a NULL reference in zvol_cdev_open Check if the ZVOL has been written before calling zil_async_to_sync. The ZIL will be opened on the first write, not earlier. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org> Closes #11152	2020-11-05 17:02:19 -08:00
Ryan Moeller	181b2adc2a	FreeBSD: zvol_os: Use SET_ERROR more judiciously SET_ERROR is useful to trace errors, so use it where the errors occur rather than factored out to the end of a function. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11146	2020-11-03 09:21:09 -08:00
Ryan Moeller	65a343bbd3	zvol_os: Fix handling of zvol private data zvol private data is supposed to be nulled by zvol_clear_private before zvol_free is called as an indicator that the zvol is going away. Implement zvol_clear_private for volmode=dev. Assert that zvol_clear_private has been called before zvol_free. Check that zvol_clear_private has not been called when updating volsize. If it has, fail with ENXIO. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11117	2020-10-30 15:34:49 -07:00
Ryan Moeller	277884ab42	zvol_os: Don't leak doi in cdev error path Make sure to free doi in zvol_create_minor impl when make_dev_s fails. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11117	2020-10-30 15:34:43 -07:00
Ryan Moeller	9a0ef216e5	zvol_os: Properly ignore error in volmode lookup We fall back to a default volmode and continue when looking up a zvol's volmode property fails. After this we should set the error to 0 to ensure we take the success paths in the out section. While here, make sure we only log that the zvol was created on success. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11117	2020-10-30 15:34:36 -07:00
Ryan Moeller	1a6a75ac07	zvol_os: Code cleanup in zvol_create_minor_impl Nonfunctional changes for readability and consistency. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11117	2020-10-30 15:34:30 -07:00

1 2

61 Commits