mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-01-25 10:12:13 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	4352edaafb	Linux: Fix ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP usage The ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP macros should not be used in the Linux zpl_*.c source files. They return a positive error value which is correct for the common code, but not for the Linux specific kernel code which expects a negative return value. The ZPL_ENTER/ZPL_EXIT/ZPL_VERFY_ZP macros should be used instead. Furthermore, the ZPL_EXIT macro has been updated to not call the zfs_exit_fs() function. This prevents a possible deadlock which can occur when a snapshot is automatically unmounted because the zpl_show_devname() must never wait on in progress automatic snapshot unmounts. Reviewed-by: Adam Moss <c@yotes.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11169 Closes #11201	2020-11-14 10:19:00 -08:00
Matthew Ahrens	d66aab7c08	Assertion failure when logging large output of channel program The output of ZFS channel programs is logged on-disk in the zpool history, and printed by `zpool history -i`. Channel programs can use 10MB of memory by default, and up to 100MB by using the `zfs program -m` flag. Therefore their output can be up to some fraction of 100MB. In addition to being somewhat wasteful of the limited space reserved for the pool history (which for large pools is 1GB), in extreme cases this can result in a failure of `ASSERT(length <= DMU_MAX_ACCESS);` in `dmu_buf_hold_array_by_dnode()`. This commit limits the output size that will be logged to 1MB. Larger outputs will not be logged, instead a entry will be logged indicating the size of the omitted output. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #11194	2020-11-14 10:17:16 -08:00
Ryan Moeller	7e3617de35	Return EFAULT at the end of zfs_write() when set FreeBSD's VFS expects EFAULT from zfs_write() if we didn't complete the full write so it can retry the operation. Add some missing SET_ERRORs in zfs_write(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11193	2020-11-14 10:16:26 -08:00
Brian Behlendorf	b2255edcc0	Distributed Spare (dRAID) Feature This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid\|raidz\|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <he.huang@intel.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Co-authored-by: Don Brady <don.brady@delphix.com> Co-authored-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mmaybee@cray.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10102	2020-11-13 13:51:51 -08:00
Matthew Ahrens	a724db0374	Channel program may spuriously fail with "memory limit exhausted" ZFS channel programs (invoked by `zfs program`) are executed in a LUA sandbox with a limit on the amount of memory they can consume. The limit is 10MB by default, and can be raised to 100MB with the `-m` flag. If the memory limit is exceeded, the LUA program exits and the command fails with a message like `Channel program execution failed: Memory limit exhausted.` The LUA sandbox allocates memory with `vmem_alloc(KM_NOSLEEP)`, which will fail if the requested memory is not immediately available. In this case, the program fails with the same message, `Memory limit exhausted`. However, in this case the specified memory limit has not been reached, and the memory may only be temporarily unavailable. This commit changes the LUA memory allocator `zcp_lua_alloc()` to use `vmem_alloc(KM_SLEEP)`, so that we won't spuriously fail when memory is temporarily low. Instead, we rely on the system to be able to free up memory (e.g. by evicting from the ARC), and we assume that even at the highest memory limit of 100MB, the channel program will not truly exhaust the system's memory. External-issue: DLPX-71924 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #11190	2020-11-11 17:16:15 -08:00
Brian Behlendorf	c08d442e45	Linux: Fix mount/unmount when dataset name has a space The custom zpl_show_devname() helper should translate spaces in to the octal escape sequence \040. The getmntent(2) function is aware of this convention and properly translates the escape character back to a space when reading the fsname. Without this change the `zfs mount` and `zfs unmount` commands incorrectly detect when a dataset with a name containing spaces is mounted. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11182 Closes #11187	2020-11-11 17:14:24 -08:00
Mateusz Guzik	18ca574f0a	G/C data_alloc_arena It is a leftover from illumos always set to NULL and introducing a spurious difference between zio_buf and zio_data_buf. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11188	2020-11-11 17:11:32 -08:00
Tony Perkins	9bd14b8724	Start snapdir_iterate traversals to begin wtih the value of zero. The microzap hash can sometimes be zero for single digit snapnames. The zap cursor can then have a serialized value of two (for . and ..), and skip the first entry in the avl tree for the .zfs/snapshot directory listing, and therefore does not return all snapshots. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Cedric Berger <cedric@precidata.com> Signed-off-by: Tony Perkins <tperkins@datto.com> Closes #11039	2020-11-11 17:06:16 -08:00
Mateusz Guzik	1a0b4f566c	G/C struct znode -> z_moved The field is yet another leftover from unsupported zfs_znode_move. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11186	2020-11-10 12:42:47 -08:00
Ryan Moeller	7b42f09049	FreeBSD: Simplify zvol_geom_open and zvol_cdev_open We can consolidate the unlocking procedure into one place by starting with drop_suspend set to B_FALSE and moving the open count check up. While here, a little code cleanup. Match the out labels between zvol_geom_open and zvol_cdev_open, and add a missing period in some comments. Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11175	2020-11-10 11:08:10 -08:00
Ryan Moeller	2186ed33f1	FreeBSD: Avoid spurious EINTR in zvol_cdev_open zvol_first_open can fail with EINTR if spa_namespace_lock is not held and cannot be taken without waiting. Apply the same logic that was done for zvol_geom_open to take spa_namespace_lock if not already held on first open in zvol_cdev_open. Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11175	2020-11-10 11:07:25 -08:00
Ryan Moeller	d1dd72a2c5	Simplify offset and length limit in zfs_write Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11176	2020-11-10 10:58:59 -08:00
Ryan Moeller	9a764716fc	Const some unchanging variables in zfs_write Show that these values will not be changing later. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11176	2020-11-10 10:58:59 -08:00
Ryan Moeller	8a9634e2f3	Remove redundant oid parameter to update_pages The oid comes from the znode we are already passing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11176	2020-11-10 10:54:30 -08:00
Ryan Moeller	eec6646ea9	Factor uid, gid, and projid out of loop in zfs_write Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11176	2020-11-10 10:53:19 -08:00
Alexander Motin	daabddaac1	Fix dmu_tx_dirty_throttle after arc_c reduction After initial arc_c was reduced to arc_c_min it became possible that on datasets with primarycache=metadata or none dirty data make up most of ARC capacity and easily more than configured 50% of initial arc_c, that causes forced txg commits by arc_tempreserve_space() and periodic very long write delays. This patch makes arc_tempreserve_space() to use arc_c only after ARC warmed up once and arc_c really means something, but use arc_c_max before that. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #11178	2020-11-10 10:39:26 -08:00
Matthew Macy	570d7038d0	Fix dnode refcount tracking Fix a couple of places where the wrong tag is passed to dnode_{hold, rele} Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11184	2020-11-10 10:37:10 -08:00
Mariusz Zaborski	ae37ceadaa	FreeBSD: Prevent a NULL reference in zvol_cdev_open Check if the ZVOL has been written before calling zil_async_to_sync. The ZIL will be opened on the first write, not earlier. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org> Closes #11152	2020-11-05 17:02:19 -08:00
khng300	a4246bce50	FreeBSD: Prevent NULL pointer dereference of resid spa_config_load() passes NULL into resid when doing zfs_file_read(). This would trip over when vfs.zfs.autoimport_disable=0. Sponsored by: The FreeBSD Foundation Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Ka Ho Ng <khng@freebsdfoundation.org> Closes #11149	2020-11-04 16:50:08 -08:00
Ryan Moeller	181b2adc2a	FreeBSD: zvol_os: Use SET_ERROR more judiciously SET_ERROR is useful to trace errors, so use it where the errors occur rather than factored out to the end of a function. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11146	2020-11-03 09:21:09 -08:00
Coleman Kane	59b6872327	Linux 5.10 compat: revalidate_disk_size() added A new function was added named revalidate_disk_size() and the old revalidate_disk() appears to have been deprecated. As the only ZFS code that calls this function is zvol_update_volsize, swapping the old function call out for the new one should be all that is required. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #11085	2020-11-02 22:01:19 +00:00
Coleman Kane	ae15f1c1d8	Linux 5.10 compat: check_disk_change() removed Kernel 5.10 removed check_disk_change() in favor of callers using the faster bdev_check_media_change() instead, and explicitly forcing bdev revalidation when they desire that behavior. To preserve prior behavior, I have wrapped this into a zfs_check_media_change() macro that calls an inline function for the new API that mimics the old behavior when check_disk_change() doesn't exist, and just calls check_disk_change() if it exists. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #11085	2020-11-02 22:01:19 +00:00
Coleman Kane	838a249012	Linux 5.10 compat: percpu_ref added data member Kernel commit 2b0d3d3e4fcfb brought in some changes to the struct percpu_ref structure that moves most of its fields into a member struct named "data" of type struct percpu_ref_data. This includes the "count" member which is updated by vdev_blkg_tryget(), so update this function to chase the API change, and detect it via configure. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #11085	2020-11-02 22:01:19 +00:00
Sebastian Gottschall	7eefaf0ca0	Optimize locking checks in mempool allocator Avoid checking the whole array of objects each time by removing the self organized memory reaping. this can be managed by the global memory reap callback which is called every 60 seconds. this will reduce the use if locking operations significant. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Closes #11126	2020-11-02 12:10:07 -08:00
Christian Schwarz	ab8c935ea6	zfs_vnops: make zfs_get_data OS-independent Move zfs_get_data() in to platform-independent code. The only platform-specific aspect of it is the way we release an inode (Linux) / vnode_t (FreeBSD). I am not aware of a platform that could be supported by ZFS that couldn't implement zfs_rele_async itself. It's sibling zvol_get_data already is platform-independent. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christian Schwarz <me@cschwarz.com> Closes #10979	2020-11-02 12:07:07 -08:00
Mateusz Guzik	09eb36ce3d	Introduce CPU_SEQID_UNSTABLE Current CPU_SEQID users don't care about possibly changing CPU ID, but enclose it within kpreempt disable/enable in order to fend off warnings from Linux's CONFIG_DEBUG_PREEMPT. There is no need to do it. The expected way to get CPU ID while allowing for migration is to use raw_smp_processor_id. In order to make this future-proof this patch keeps CPU_SEQID as is and introduces CPU_SEQID_UNSTABLE instead, to make it clear that consumers explicitly want this behavior. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11142	2020-11-02 11:51:12 -08:00
Matthew Macy	8583540c6e	Consolidate zfs_holey and zfs_access The zfs_holey() and zfs_access() functions can be made common to both FreeBSD and Linux. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11125	2020-10-31 09:40:08 -07:00
Ryan Moeller	65a343bbd3	zvol_os: Fix handling of zvol private data zvol private data is supposed to be nulled by zvol_clear_private before zvol_free is called as an indicator that the zvol is going away. Implement zvol_clear_private for volmode=dev. Assert that zvol_clear_private has been called before zvol_free. Check that zvol_clear_private has not been called when updating volsize. If it has, fail with ENXIO. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11117	2020-10-30 15:34:49 -07:00
Ryan Moeller	277884ab42	zvol_os: Don't leak doi in cdev error path Make sure to free doi in zvol_create_minor impl when make_dev_s fails. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11117	2020-10-30 15:34:43 -07:00
Ryan Moeller	9a0ef216e5	zvol_os: Properly ignore error in volmode lookup We fall back to a default volmode and continue when looking up a zvol's volmode property fails. After this we should set the error to 0 to ensure we take the success paths in the out section. While here, make sure we only log that the zvol was created on success. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11117	2020-10-30 15:34:36 -07:00
Ryan Moeller	1a6a75ac07	zvol_os: Code cleanup in zvol_create_minor_impl Nonfunctional changes for readability and consistency. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11117	2020-10-30 15:34:30 -07:00
Ryan Moeller	260f6a28af	zvol_os: Keep better track of open count in close zvol_geom_close gets a count of the number of close operations to do. Make sure we're always using this count to check if this will be the last close operation performed on the zvol. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11117	2020-10-30 15:34:23 -07:00
Ryan Moeller	0b32d81783	zvol_os: Tidy up asserts Using more specific assert variants gives better messages on failure. No functional change. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11117	2020-10-30 15:34:15 -07:00
Mateusz Guzik	c4ede65bdf	zstd: track allocator statistics Note that this only tracks sizes as requested by the caller. Actual allocated space will almost always be bigger (e.g., rounded up to the next power of 2 or page size). Additionally the allocated buffer may be holding other areas hostage. Nonetheless, this is a starting point for tracking memory usage in zstd. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11129	2020-10-30 15:26:10 -07:00
Attila Fülöp	e8beeaa111	ICP: gcm: Allocate hash subkey table separately While evaluating other assembler implementations it turns out that the precomputed hash subkey tables vary in size, from 816 bytes (avx2/avx512) up to 4816 bytes (avx512-vaes), depending on the implementation. To be able to handle the size differences later, allocate `gcm_Htable` dynamically rather then having a fixed size array, and adapt consumers. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #11102	2020-10-30 15:24:21 -07:00
Attila Fülöp	d9655c5b37	Add some missing cfi frame info in aesni-gcm-x86_64.S While preparing #9749 some .cfi_{start,end}proc directives were missed. Add the missing ones. See upstream https://github.com/openssl/openssl/commit/275a048f Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #11101	2020-10-30 15:23:18 -07:00
Mateusz Guzik	115216cc92	FreeBSD: catch up with 1300124 version bump - use cache_vop_mkdir - cache_rename -> cache_vop_rename Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11136	2020-10-30 15:22:04 -07:00
Ryan Moeller	d1e4ded7bc	FreeBSD: Fix 12.2-STABLE after AT_BENEATH MFC AT_BENEATH was merged to stable/12, where kern_unlinkat takes a non-const path. DECONST the path passed to kern_unlinkat in the case where AT_BENEATH is defined. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11139	2020-10-30 15:19:02 -07:00
Matthew Macy	5fa356ea44	Remove UIO_ZEROCOPY functions structures The original xuio zero copy functionality has always been unused on Linux and FreeBSD. Remove this disabled code to avoid any confusion and improve readability. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11124	2020-10-30 10:00:33 -07:00
Alexander Motin	1199c3e8fb	Yield periodically when rebuilding L2ARC L2ARC devices of several terabytes filled with 4KB blocks may take 15 minutes to rebuild. Due to the way L2ARC log reading is implemented it is quite likely that for all that time rebuild thread will never sleep. At least on FreeBSD kernel threads have absolute priority and can not be preempted by threads with lower priorities. If some thread is also bound to that specific CPU it may not get any CPU time for all the 15 minutes. Reviewed-by: Cedric Berger <cedric@precidata.com> Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Reviewed-by: George Amanakis <gamanakis@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #11116	2020-10-30 08:57:54 -07:00
Ryan Moeller	76d04993a6	Update references to nonexistent man pages in code Refer to the correct section or alternative for FreeBSD and Linux. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11132	2020-10-30 08:55:59 -07:00
Alexander Motin	e3a6ac8d06	FreeBSD: Remove BIO_ORDERED flag from BIO_FLUSH ZFS always waits for the write completion before flushing the cache. That is why it does not require explicit ordering fences around it, which are pretty difficult to implement for NVMe, since one has no internal concept of strict request ordering. This was already removed from FreeBSD once, but got resurrected by mistake during OpenZFS merge. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #11130	2020-10-30 08:50:57 -07:00
Mateusz Guzik	973ba682f5	Linux: g/c leftover fence in zfs_znode_alloc The port removed provisions for zfs_znode_move but the cleanup missed this bit. To quote the original: [snip] list_insert_tail(&zfsvfs->z_all_znodes, zp); membar_producer(); /* * Everything else must be valid before assigning z_zfsvfs makes the * znode eligible for zfs_znode_move(). */ zp->z_zfsvfs = zfsvfs; [/snip] In the current code it is immediately followed by unlock which issues the same fence, thus plays no role in correctness. Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11115	2020-10-29 09:54:20 -07:00
Mateusz Guzik	082ff328f2	FreeBSD: g/c unused zfs_znode_move support The allocator does not provide the functionality to begin with. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11114	2020-10-29 09:52:50 -07:00
Brian Behlendorf	4ce728d028	Use known license string for zlua The Linux kernel MODULE_LICENSE macro only recognizes a handful of license strings and "MIT" is not one of the them. Update the macro to use "Dual MIT/GPL" which is recognized and what the kernel expects MIT licensed modules to use. Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11112 Closes #11113	2020-10-27 09:43:36 -07:00
Ryan Moeller	5c810ac499	FreeBSD: Skip RAW kstat sysctls by default These kstats are often expensive to compute so we want to avoid them unless specifically requested. The following kstats are affected by this change: kstat.zfs.${pool}.multihost kstat.zfs.${pool}.misc.state kstat.zfs.${pool}.txgs kstat.zfs.misc.fletcher_4_bench kstat.zfs.misc.vdev_raidz_bench kstat.zfs.misc.dbufs kstat.zfs.misc.dbgmsg In FreeBSD 13, sysctl(8) has been updated to still list the names/description/type of skipped sysctls so they are still discoverable. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11099	2020-10-26 14:34:28 -07:00
Mateusz Guzik	01a65c5861	FreeBSD: catch up with 1300123 version bump - removed thread argument from VOP_INACTIVE - removed cred argument from VOP_VPTOCNP Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11104	2020-10-26 14:32:17 -07:00
Ryan Moeller	eb02a4c6fb	Add missing zfs_arc_evict_batch_limit tunable It's even documented already. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11094	2020-10-22 10:18:26 -07:00
Matthew Macy	e53d678d4a	Share zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSD The zfs_fsync, zfs_read, and zfs_write function are almost identical between Linux and FreeBSD. With a little refactoring they can be moved to the common code which is what is done by this commit. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11078	2020-10-21 14:08:06 -07:00
Adam D. Moss	666aa69f32	Non-l2arc pool reads shouldn't be l2arc misses The current l2_misses accounting behavior treats all reads to pools without a configured l2arc as an l2arc miss, IFF there is at least one other pool on the system which does have an l2arc configured. This makes it extremely hard to tune for an improved l2arc hit/miss ratio because this ratio will be modulated by reads from pools which do not (and should not) have l2arc devices; its upper limit will depend on the ratio of reads from l2arc'd pools and non-l2arc'd pools. This PR prevents ARC reads affecting l2arc stats (n.b. l2_misses is the only relevant one) where the target spa doesn't have an l2arc. Includes new test - l2arc_l2miss_pos.ksh Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Adam Moss <c@yotes.com> Closes #10921	2020-10-20 11:39:52 -07:00
Kyle Evans	241c62bdd7	Makefile.bsd: remove directory that no longer exists This was removed in a reorganization of directories preparing for the merge of FreeBSD support, `006e9a4088` by mmacy. While llvm is perfectly happy with the nonexistent -I directory, the gcc6 and gcc9 we can elect to use as cross-toolchains both trip over it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Kyle Evans <kevans@FreeBSD.org> Closes #11077	2020-10-20 11:34:59 -07:00
Matthew Macy	ff2f54246d	FreeBSD: delete unreferenced file zfs_onexit_os.c was not deleted when it was removed from the build Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11079	2020-10-20 08:53:16 -07:00
Mateusz Guzik	41e2b3de13	FreeBSD: add missing fplookup_vexec handler to special vop vectors Otherwise lookup can fail with EOPNOTSUPP or panic. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11066	2020-10-15 14:49:06 -07:00
Mateusz Guzik	34cda44af6	FreeBSD: g/c unused vop vector zfsctl_ops_shares_dir Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11066	2020-10-15 14:48:20 -07:00
Don Brady	dff71c7936	Ignore special vdev ashift for spa ashift min/max The removal of a vdev in the normal class would fail if there was a special or deup vdev that had a different ashift than the vdevs in the normal class. Moved the initialization of spa_min_ashift / spa_max_ashift from vdev_open so that it occurs after the vdev allocation bias was initialized (i.e. after vdev_load). Caveat -- In order to remove a special/dedup vdev it must have the same ashift as the normal pool vdevs. This could perhaps be lifted in the future (i.e. for the case where there is ample space in any surviving special class vdevs) Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #9363 Closes #9364 Closes #11053	2020-10-15 14:45:16 -07:00
Christian Schwarz	15a4ca4620	Fix crash caused by invalid snapshot names in redactnvl This is a follow up fix for commit `0fdd6106bb`. The VERIFY is only true when we haven't hit an error code path. See added test case for a reproducer. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christian Schwarz <me@cschwarz.com> Closes #11048	2020-10-14 14:04:19 -07:00
Paul Dagnelie	6a60ef80e2	Fix incorrect deletion order in range_tree_add_impl gap case After a side-effectful call like add or remove, references to range segs stored in btrees can no longer be used safely. We move the remove call to just before the reinsertion call so that the seg remains valid for as long as we need it. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #11044 Closes #11056	2020-10-14 08:59:54 -07:00
Mateusz Guzik	47a7e99939	FreeBSD: fix panic due to tqid overflow The 32-bit counter eventually wraps to 0 which is a sentinel for invalid id. Make it 64-bit on LP64 platforms and 0-check otherwise. Note: Linux counterpart uses id stored per queue instead of a global. I did not check going that way is feasible with the goal being the minimal fix doing the job. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11059	2020-10-14 08:57:03 -07:00
Ryan Moeller	485b50bb9e	Cross-platform acltype The acltype property is currently hidden on FreeBSD and does not reflect the NFSv4 style ZFS ACLs used on the platform. This makes it difficult to observe that a pool imported from FreeBSD on Linux has a different type of ACL that is being ignored, and vice versa. Add an nfsv4 acltype and expose the property on FreeBSD. Make the default acltype nfsv4 on FreeBSD. Setting acltype to an unhanded style is treated the same as setting it to off. The ACLs will not be removed, but they will be ignored. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10520	2020-10-13 21:25:48 -07:00
Warner Losh	b302185a92	FreeBSD: make adjustments for the standalone environment In FreeBSD, there are three compile environments that are supported: user land, the kernel and the bootloader / standalone. Adjust the headers to compile in the standalone environment. Limit kernel-only items from view when _STANDALONE is defined. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Warner Losh <imp@FreeBSD.org> Closes #10998	2020-10-13 21:05:49 -07:00
Matthew Macy	57dc5d42b1	dmu_zfetch: don't leak unreferenced stream when zfetch is freed Currently streams are only freed when: - They have no referencing zfetch and and their I/O references go to zero. - They are more than 2s old and a new I/O request comes in on the same zfetch. This means that we will leak unreferenced streams when their zfetch structure is freed. This change checks the reference count on a stream at zfetch free time. If it is zero we free it immediately. If it has remaining references we allow the prefetch callback to free it at I/O completion time. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam Moss <c@yotes.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11052	2020-10-13 21:03:36 -07:00
Warner Losh	6ba2e72b78	aarch64: Use proper guards for NEON instructions The zstd code assumes that if you are on aarch64, you have NEON instructions. This is not necessarily true. In a boot loader, where you might not have the VFP properly initialized, these instructions may not be available. It's also an error to include arm_neon.h when the NEON insturctions aren't enabled. Change the guards for using the NEON instructions from __aarch64__ to __ARM_NEON which is the standard symbol for knowing if they are available. __ARM_NEON is the proper symbol, defined in ARM C Language Extensions Release 2.1 (https://developer.arm.com/documentation/ihi0053/d/). Some sources suggest __ARM_NEON__, but that's the obsolete spelling from prior versions of the standard. Updated based on zstd pull request https://github.com/facebook/zstd/pull/2356 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Warner Losh <imp@bsdimp.com> Closes #11055	2020-10-13 21:01:40 -07:00
Mateusz Guzik	2ce14cdf68	FreeBSD: use cache_rename if available Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11045	2020-10-13 16:41:26 -07:00
Ryan Moeller	7dfc56d866	Expose zfetch_max_idistance tunable FreeBSD had this value tunable before the switch to the new OpenZFS. The tunable name has changed, breaking legacy compat. Restore legacy compat for this tunable, properly expose the tunable with the new name on all platforms, and document it in zfs-module-parameters(5). While here, clean up the documentation for zfetch_max_distance a bit. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11038	2020-10-13 09:32:34 -07:00
Christian Schwarz	61868bb14d	zil_parse: make callback parameters const Code cleanup, a follow up commit to `4d55ea81`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Co-authored-by: Ryan Moeller <ryan@freqlabs.com> Signed-off-by: Christian Schwarz <me@cschwarz.com> Closes #11020	2020-10-09 09:34:54 -07:00
Ryan Moeller	b7ab7ae241	Linux: Initialize zp in zfs_setattr_dir The value of zp is used without having been initialized under some conditions. Initialize the pointer to NULL. Add a regression test case using chown in acl/posix. However, this is not enough because the setup sets xattr=sa, which means zfs_setattr_dir will not be called. Create a second group of acl tests in acl/posix-sa duplicating the acl/posix tests with symlinks, and remove xattr=sa from the original acl/posix tests. This provides more coverage for the default xattr=on code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10043 Closes #11025	2020-10-09 09:27:14 -07:00
Brian Behlendorf	d0249a4bd0	Replace ZFS on Linux references with OpenZFS This change updates the documentation to refer to the project as OpenZFS instead ZFS on Linux. Web links have been updated to refer to https://github.com/openzfs/zfs. The extraneous zfsonlinux.org web links in the ZED and SPL sources have been dropped. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11007	2020-10-08 20:10:13 -07:00
Jacob Adams	07f5d4d663	Fix Linux modules uninstall A missing semicolon between kmoddir variable declaration and the uninstall for loop caused modules_uninstall-Linux to fail with: Syntax error: "do" unexpected Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Jacob Adams <jacob@tookmund.com> Closes #11032	2020-10-08 20:07:10 -07:00
Chuck Tuffli	a8fc1b8743	Fix ubsan: shift exponent is too large When running libzpool with the Undefined Behavior Sanitizer (ubsan) enabled, a zpool create causes a run-time error: module/zfs/vdev_label.c:600:14: runtime error: shift exponent 64 is too large for 64-bit type 'long long unsigned int'` in vdev_config_generate() Fix is to convert vdev_removal_max_span to its base-2 logarithm, using highbit64(), and then compare the "shifts". Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Chuck Tuffli <ctuffli@gmail.com> Closes #9744 Closes #11024	2020-10-08 16:37:27 -07:00
Ryan Moeller	73989f4b9e	Make dbufstat work on FreeBSD With procfs_list kstats implemented for FreeBSD, dbufs are now exposed as kstat.zfs.misc.dbufs. On FreeBSD, dbufstats can use the sysctl instead of procfs when no input file has been given. Enable the dbufstats tests on FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #11008	2020-10-08 09:40:23 -07:00
Ryan Moeller	82b81a2acd	FreeBSD: Sort and dedup includes in kmod_core Code cleanup. Sort includes, remove duplicates, and drop some extra blank lines in kmod_core.c. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #11000	2020-10-08 09:37:56 -07:00
George Amanakis	a76e4e6761	Make L2ARC tests more robust Instead of relying on arbitrary timers after pool export/import or cache device off/online rely on arcstats. This makes the L2ARC tests more robust. Also cleanup some functions related to persistent L2ARC. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam Moss <c@yotes.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #10983	2020-10-05 15:29:05 -07:00
Ryan Moeller	79f0935fab	FreeBSD: Sort out kernel FPU headers for 12.1-REL We were missing an include for kernel FPU functions, breaking the build on FreeBSD 12.1-RELEASE. This was apparently being pulled in from elsewhere on stable/12 and head. Sorted the other includes in these files while here. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11005	2020-10-02 17:48:45 -07:00
Ryan Moeller	4d55ea811d	Throw const on some strings In C, const indicates to the reader that mutation will not occur. It can also serve as a hint about ownership. Add const in a few places where it makes sense. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #10997	2020-10-02 17:44:10 -07:00
John Poduska	5b525165e9	Mismatched nvlist names in zfs_keys_send_space This causes "zfs send -vt ..." to fail with: cannot resume send: Unknown error 1030 It turns out that some of the name/value pairs in the verification list for zfs_ioc_send_space(), zfs_keys_send_space, had the wrong name, so the ioctl got kicked out in zfs_check_input_nvpairs(). Update the names accordingly. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: John Poduska <jpoduska@datto.com> Closes #10978	2020-10-02 17:40:46 -07:00
Brian Behlendorf	266d1121c3	Fix buggy procfs_list_seq_next warning The kernel seq_read() helper function expects ->next() to update the passed position even there are no more entries. Failure to do so results in the following warning being logged. seq_file: buggy .next function procfs_list_seq_next [spl] did not update position index Functionally there is no issue with the way procfs_list_seq_next() is implemented and the warning is harmless. However, we want to silence this some what scary incorrect warning. This commit updates the Linux procfs code to advance the position even for the last entry. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10984 Closes #10996	2020-09-30 13:27:51 -07:00
Ryan Moeller	d688beb191	FreeBSD: Fix legacy compat for platform IOCs The request number is out of bounds of the platform table. Subtract the starting offset to get the correct subscript. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10994	2020-09-30 13:25:50 -07:00
Matthew Macy	1cb8202b1b	Eliminate gratuitous bzeroing in dbuf_stats_hash_table_data `dbuf_stats_hash_table_data` can take much longer than it needs to by repeatedly bzeroing its buffer when in fact the buffer only needs to be NULL terminated. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10993	2020-09-30 13:24:38 -07:00
Sebastian Gottschall	8a171ccd92	do a cyclic seek for unused memory objects in pool In non regular use cases allocated memory might stay persistent in memory pool. This small patch checks every minute if there are old objects which can be released from memory pool. Right now with regular use, the pool is checked for old objects on each allocation attempt from this pool. so basically polling by its use. Now consider what happens if someone writes a lot of files and stops use of the volume or even unmounts it. So the code will no longer check if objects can be released from the pool. Already allocated objects will still stay in pool cache. this is no big issue for common use. But someone discovered this issue while doing tests. personally i know this behavior and I'm aware of it. Its no big issue. just a enhancement Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Closes #10938 Closes #10969	2020-09-30 13:22:34 -07:00
Ryan Moeller	c0bd2e0fe2	Drop references when skipping dmu_send due to EXDEV When an invalid incremental send is requested where the "to" ds is before the "from" ds, make sure to drop the reference to the pool and the dataset before returning the error. Add an assert on FreeBSD to make sure we don't hold any locks after returning from an ioctl. Add some test coverage. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10919	2020-09-30 13:19:49 -07:00
Brian Behlendorf	5aa3e3d3be	Use known license string for zzstd The Linux kernel MODULE_LICENSE macro only recognizes a handful of license strings and "BSD" is not one of the them. Update the macro to use "Dual BSD/GPL" which is recognized and what the kernel expects BSD licensed module to use. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10982 Closes #10992	2020-09-28 18:43:27 -07:00
Matthew Macy	af20b97078	zfetch: Don't issue new streams when old have not completed The current dmu_zfetch code implicitly assumes that I/Os complete within min_sec_reap seconds. With async dmu and a readonly workload (and thus no exponential backoff in operations from the "write throttle") such as L2ARC rebuild it is possible to saturate the drives with I/O requests. These are then effectively compounded with prefetch requests. This change reference counts streams and prevents them from being recycled after their min_sec_reap timeout if they still have outstanding I/Os. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10900	2020-09-27 17:08:38 -07:00
Adam D. Moss	acfd2d4641	Add DB_RF_NOPREFETCH to dbuf_read()s in dnode.c Prefetching of dnodes in dbuf_read() can cause significant mutex contention for some workloads and isn't very helpful. This is because we already get 32 dnodes for each block read, and when iterating over a directory we prefetch the dnodes in the directory. Disable this prefetching to prevent the lock contention. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Submitted-by: Adam Moss <c@yotes.com> Submitted-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Adam Moss <c@yotes.com> Closes #10877 Closes #10953	2020-09-25 13:49:22 -07:00
Brian Behlendorf	2e407941a2	Fix PREEMPTION=y and BLK_CGROUP=y config on arm64 With PREEMPTION=y and BLK_CGROUP=y preempt_schedule_notrace() is being used on arm64 which is a GPL-only function and hence the build of the DKMS kernel module fails. Fix that by redefining preempt_schedule_notrace() to preempt_schedule() which should be safe as long as tracing is not used. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Juerg Haefliger <juergh@canonical.com> Closes #8545 Closes #9948 Closes #10416 Closes #10973	2020-09-25 13:28:35 -07:00
Mateusz Guzik	f6bb7c029c	FreeBSD: update cache_purgevfs usage after 1300117 version bump Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Reviewed-by: Nick Wolff <darkfiberiru@gmail.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #10970	2020-09-25 13:23:43 -07:00
Ryan Moeller	fa1912e80f	FreeBSD: Code cleanup in zio_crypt Address some unused value and control flow issues flagged by Coverity. Unreachable code is pruned and unused values are avoided. Some scattered sections are reordered for coherence. We can assume kmem_alloc(n, KM_SLEEP) doesn't fail, so there is no need to check if it returned NULL. The allocated memory doesn't need to be zeroed, other than the last iovec (the MAC). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10884	2020-09-25 13:12:35 -07:00
Ryan Moeller	863e38453e	Prune dead branch reported by Coverity wkey is NULL at every `goto error;`. dcp is never NULL. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10884	2020-09-25 13:11:53 -07:00
Christian Schwarz	a5c77dc4d5	zfs_log_write: simplify data copying code for WR_COPIED records lr_write_t records that are WR_COPIED have the record data directly appended to them (see lr_write_t type definition). The data is copied from the debuf using dmu_read_by_dnode. This function was called, only for WR_COPIED records, as part of a short-circuiting if-statement's if-expression. I found this side-effectful call to dmu_read_by_dnode pretty hard to spot. This patch improves readability by moving the call to its own line. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Christian Schwarz <me@cschwarz.com> Closes #10956	2020-09-25 13:06:34 -07:00
Matthew Macy	7b8363d7f0	FreeBSD: Add support for procfs_list The procfs_list interface is required by several kstats. Implement this functionality for FreeBSD to provide access to these kstats. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10890	2020-09-23 16:43:51 -07:00
Matthew Macy	3dad29fb4b	FreeBSD: Don't save user FPU context in kernel threads Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10899	2020-09-23 11:09:48 -07:00
Paul Dagnelie	20dfe8cd3b	Don't set numobjs to UINT64_MAX or near it Resolves an issue with `zfs send` streams from 0.8.4 which prevents them from being received by versions < 0.7. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #10911 Closes #10916	2020-09-22 16:16:07 -07:00
George Amanakis	c6f5e9d92f	Restore clearing of L2CACHE flag in arc_read_done() Commit 45152dc removed clearing of L2CACHE flag in arc_read_done() and moved related code in l2arc_write_eligible(). After careful code inspection arc_read_done() is not bypassed in the case of prefetches. Thus restore the old behavior. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: adam moss <c@yotes.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #10951	2020-09-22 16:08:05 -07:00
Mark Johnston	0daa0320e9	Fix a logic bug in the FreeBSD getpages VOP In commit `cd32b4f5b7` ("Fix a deadlock in the FreeBSD getpages VOP") I introduced a bug while porting the patch originally committed to FreeBSD: the rangelock pointer may be NULL if the try operation failed, so we must avoid calling zfs_rangelock_unlock() in that case. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reported-by: Steve Wills <swills@FreeBSD.org> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #10519 Closes #10960	2020-09-22 16:05:52 -07:00
Ryan Moeller	5f8a9e6a02	FreeBSD: Reduce stack usage of Lua Use the same reduced buffer size for lauxlib that is used on Linux. Fixes panic on HEAD in lua gsub test designed to exhaust stack space. With this we can remove the special case to reserve more stack space on FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kyle Evans <kevans@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10959	2020-09-22 16:03:11 -07:00
Mark Johnston	6bdb09510b	Annontate FreeBSD sysctls with CTLFLAG_MPSAFE Without this, the sysctl system calls will acquire a global lock before invoking the handler. This is noticeable in some situations when running top(1). The global lock is mostly vestigal but continues to see some use and so contention is still a problem; until the default sense of the MPSAFE flag changes, we have to annotate each and every handler. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #10836	2020-09-21 09:49:50 -07:00
Mark Johnston	c50f3c902f	Fix switch statement indentation in the FreeBSD kstat code This is in preparation for some functional changes. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #10950	2020-09-21 09:49:12 -07:00
George Wilson	c494aa7f57	vdev_ashift should only be set once == Motivation and Context The new vdev ashift optimization prevents the removal of devices when a zfs configuration is comprised of disks which have different logical and physical block sizes. This is caused because we set 'spa_min_ashift' in vdev_open and then later call 'vdev_ashift_optimize'. This would result in an inconsistency between spa's ashift calculations and that of the top-level vdev. In addition, the optimization logical ignores the overridden ashift value that would be provided by '-o ashift=<val>'. == Description This change reworks the vdev ashift optimization so that it's only set the first time the device is configured. It still allows the physical and logical ahsift values to be set every time the device is opened but those values are only consulted on first open. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Cedric Berger <cedric@precidata.com> Signed-off-by: George Wilson <gwilson@delphix.com> External-Issue: DLPX-71831 Closes #10932	2020-09-18 12:13:47 -07:00
George Wilson	8e82ffba7b	pool may become suspended during device expansion When expanding a device zfs needs to rescan the partition table to get the correct size. This can only happen when we're in the kernel and requires the device to be closed. As part of the rescan, udev is notified and the device links are removed and recreated. This leave a window where the vdev code may try to reopen the device before udev has recreated the link. If that happens, then the pool may end up in a suspended state. To correct this, we leverage the BLKPG_RESIZE_PARTITION ioctl which allows the partition information to be modified even while it's in use. This ioctl also does not remove the device link associated with the zfs data partition so it eliminates the race condition that can occur in the kernel. Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Wilson <gwilson@delphix.com> Closes #10897	2020-09-17 20:03:10 -07:00
Ryan Moeller	3c7566cb0d	FreeBSD: Do not copy vp into f_data for DTYPE_VNODE files https://reviews.freebsd.org/D26346 Do not copy vp into f_data for DTYPE_VNODE files. The vnode pointer is already stored in f_vnode. Use that so f_data can be reused. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10929	2020-09-17 10:54:14 -07:00
John Poduska	5bed68bdc4	Need a long hold in zpl_mount_impl In zpl_mount_impl, there is: dmu_objset_hold ; returns with pool & ds held dsl_pool_rele sget dsl_dataset_rele As spelled out in the "DSL Pool Configuration Lock" in dsl_pool.c, this requires a long hold. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: John Poduska <jpoduska@datto.com> Closes #10936	2020-09-17 10:53:02 -07:00

1 2 3 4 5 ...

3264 Commits