mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-23 10:54:35 +03:00

Author	SHA1	Message	Date
Brian Atkinson	f52124dce8	Removing old code for k(un)map_atomic It used to be required to pass a enum km_type to kmap_atomic() and kunmap_atomic(), however this is no longer necessary and the wrappers zfs_k(un)map_atomic removed these. This is confusing in the ABD code as the struct abd_iter member iter_km no longer exists and the wrapper macros simply compile them out. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam Moss <c@yotes.com> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #11768	2021-03-19 22:38:44 -07:00
Coleman Kane	e2a8296131	Linux 5.12 compat: idmapped mounts In Linux 5.12, the filesystem API was modified to support ipmapped mounts by adding a "struct user_namespace *" parameter to a number functions and VFS handlers. This change adds the needed autoconf macros to detect the new interfaces and updates the code appropriately. This change does not add support for idmapped mounts, instead it preserves the existing behavior by passing the initial user namespace where needed. A subsequent commit will be required to add support for idmapped mounted. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #11712	2021-03-19 21:00:59 -07:00
Adam D. Moss	1daad98176	Linux: always check or verify return of igrab() zhold() wraps igrab() on Linux, and igrab() may fail when the inode is in the process of being deleted. This means zhold() must only be called when a reference exists and therefore it cannot be deleted. This is the case for all existing consumers so add a VERIFY and a comment explaining this requirement. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Adam Moss <c@yotes.com> Closes #11704	2021-03-16 16:33:34 -07:00
Mateusz Guzik	5ebe425a5b	Macroify teardown lock handling This will allow platforms to implement it as they see fit, in particular in a different manner than rrm locks. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11153	2021-03-12 15:51:39 -08:00
Mateusz Guzik	446400346d	Add branch prediction to ZFS_ENTER and ZFS_VERIFY_ZP macros They are expected to fail only in corner cases. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11153	2021-03-12 15:51:03 -08:00
Adam D. Moss	c94d648b1c	Microoptimizations for VERIFY() and friends Add branch hints and constify the intermediate evaluations of left/right params in VERIFY3*(). Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Adam Moss <c@yotes.com> Closes #11708	2021-03-11 17:16:09 -08:00
Coleman Kane	778fa36ee7	Linux 5.12 compat: replace bio__io_acct with disk__io_acct The bio__acct functions became GPL exports, which causes the kernel modules to refuse to compile. This replaces code with alternate function calls to the disk__io_acct interfaces, which are not GPL exports. This change was added in kernel commit 99dfc43ecbf67f12a06512918aaba61d55863efc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #11639	2021-02-24 10:06:05 -08:00
Brian Atkinson	c0801bf35a	Cleaning up uio headers Making uio_impl.h the common header interface between Linux and FreeBSD so both OS's can share a common header file. This also helps reduce code duplication for zfs_uio_t for each OS. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #11622	2021-02-20 20:16:50 -08:00
khng300	fc273894d2	Rename zfs_inode_update to zfs_znode_update_vfs zfs_znode_update_vfs is a more platform-agnostic name than zfs_inode_update. Besides that, the function's prototype is moved to include/sys/zfs_znode.h as the function is also used in common code. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ka Ho Ng <khng300@gmail.com> Sponsored by: The FreeBSD Foundation Closes #11580	2021-02-09 11:17:29 -08:00
Brian Behlendorf	96a6629872	Remove unused iov_iter_init_compat() wrapper This compatibility code is no longer needed. For it a while iov_iter_init_compat() was used by zfs_uio_prefaultpages() but this code should have been dropped as part of commit `83b91ae1`. Take care of that oversight and remove it. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11543	2021-01-30 10:06:14 -08:00
Brian Atkinson	d0cd9a5cc6	Extending FreeBSD UIO Struct In FreeBSD the struct uio was just a typedef to uio_t. In order to extend this struct, outside of the definition for the struct uio, the struct uio has been embedded inside of a uio_t struct. Also renamed all the uio_* interfaces to be zfs_uio_* to make it clear this is a ZFS interface. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #11438	2021-01-20 21:27:30 -08:00
Brian Behlendorf	c449d4b06d	Linux 5.11 compat: blk_{un}register_region() As of 5.11 the blk_register_region() and blk_unregister_region() functions have been retired. This isn't a problem since add_disk() has implicitly allocated minor numbers for a very long time. Reviewed-by: Rafael Kitover <rkitover@gmail.com> Reviewed-by: Coleman Kane <ckane@colemankane.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11387 Closes #11390	2020-12-27 16:20:46 -08:00
Brian Behlendorf	a970f0594e	Linux 5.11 compat: bio_start_io_acct() / bio_end_io_acct() The generic IO accounting functions have been removed in favor of the bio_start_io_acct() and bio_end_io_acct() functions which provide a better interface. These new functions were introduced in the 5.8 kernels but it wasn't until the 5.11 kernel that the previous generic IO accounting interfaces were removed. This commit updates the blk_generic_*_io_acct() wrappers to provide and interface similar to the updated kernel interface. It's slightly different because for older kernels we need to pass the request queue as well as the bio. Reviewed-by: Rafael Kitover <rkitover@gmail.com> Reviewed-by: Coleman Kane <ckane@colemankane.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11387 Closes #11390	2020-12-27 16:20:24 -08:00
Brian Behlendorf	b7281c88bc	Linux 5.11 compat: lookup_bdev() The lookup_bdev() function has been updated to require a dev_t be passed as the second argument. This is actually pretty nice since the major number stored in the dev_t was the only part we were interested in. This allows to us avoid handling the bdev entirely. The vdev_lookup_bdev() wrapper was updated to emulate the behavior of the new lookup_bdev() for all supported kernels. Reviewed-by: Rafael Kitover <rkitover@gmail.com> Reviewed-by: Coleman Kane <ckane@colemankane.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11387 Closes #11390	2020-12-27 16:20:08 -08:00
Brian Behlendorf	6152014d1a	Linux 4.18.0-257.el8 compat: blk_alloc_queue() The CentOS stream 4.18.0-257 kernel appears to have backported the Linux 5.9 change to make_request_fn and the associated API. To maintain weak modules compatibility the original symbol was retained and the new interface blk_alloc_queue_rh() was added. Unfortunately, blk_alloc_queue() was replaced in the blkdev.h header by blk_alloc_queue_bh() so there doesn't seem to be a way to build new kmods against the old interfces. Even though they appear to still be available for weak module binding. To accommodate this a configure check is added for the new _rh() variant of the function and used if available. If compatibility code gets added to the kernel for the original blk_alloc_queue() interface this should be fine. OpenZFS will simply continue to prefer the new interface and only fallback to blk_alloc_queue() when blk_alloc_queue_rh() isn't available. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11374	2020-12-21 10:11:56 -08:00
Brian Behlendorf	1c2358c12a	Linux 5.10 compat: use iov_iter in uio structure As of the 5.10 kernel the generic splice compatibility code has been removed. All filesystems are now responsible for registering a ->splice_read and ->splice_write callback to support this operation. The good news is the VFS provided generic_file_splice_read() and iter_file_splice_write() callbacks can be used provided the ->iter_read and ->iter_write callback support pipes. However, this is currently not the case and only iovecs and bvecs (not pipes) are ever attached to the uio structure. This commit changes that by allowing full iov_iter structures to be attached to uios. Ever since the 4.9 kernel the iov_iter structure has supported iovecs, kvecs, bvevs, and pipes so it's desirable to pass the entire thing when possible. In conjunction with this the uio helper functions (i.e uiomove(), uiocopy(), etc) have been updated to understand the new UIO_ITER type. Note that using the kernel provided uio_iter interfaces allowed the existing Linux specific uio handling code to be simplified. When there's no longer a need to support kernel's older than 4.9, then it will be possible to remove the iovec and bvec members from the uio structure and always use a uio_iter. Until then we need to maintain all of the existing types for older kernels. Some additional refactoring and cleanup was included in this change: - Added checks to configure to detect available iov_iter interfaces. Some are available all the way back to the 3.10 kernel and are used when available. In particular, uio_prefaultpages() now always uses iov_iter_fault_in_readable() which is available for all supported kernels. - The unused UIO_USERISPACE type has been removed. It is no longer needed now that the uio_seg enum is platform specific. - Moved zfs_uio.c from the zcommon.ko module to the Linux specific platform code for the zfs.ko module. This gets it out of libzfs where it was never needed and keeps this Linux specific code out of the common sources. - Removed unnecessary O_APPEND handling from zfs_iter_write(), this is redundant and O_APPEND is already handled in zfs_write(); Reviewed-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11351	2020-12-18 08:48:26 -08:00
Ryan Moeller	439dc034e9	FreeBSD: Implement sysctl for fletcher4 impl There is a tunable to select the fletcher 4 checksum implementation on Linux but it was not present in FreeBSD. Implement the sysctl handler for FreeBSD and use ZFS_MODULE_PARAM_CALL to provide the tunable on both platforms. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11270	2020-12-11 10:29:01 -08:00
Paul Dagnelie	60a4c7d2a2	Implement memory and CPU hotplug ZFS currently doesn't react to hotplugging cpu or memory into the system in any way. This patch changes that by adding logic to the ARC that allows the system to take advantage of new memory that is added for caching purposes. It also adds logic to the taskq infrastructure to support dynamically expanding the number of threads allocated to a taskq. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Matthew Ahrens <matthew.ahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #11212	2020-12-10 14:09:23 -08:00
Ryan Moeller	e0716250bf	FreeBSD: Do zcommon_init sooner to avoid FPU panic There has been a panic affecting some system configurations where the thread FPU context is disturbed during the fletcher 4 benchmarks, leading to a panic at boot. module_init() registers zcommon_init to run in the last subsystem (SI_SUB_LAST). Running it as soon as interrupts have been configured (SI_SUB_INT_CONFIG_HOOKS) makes sure we have finished the benchmarks before we start doing other things. While it's not clear how the FPU context was being disturbed, this does seem to avoid it. Add a module_init_early() macro to run zcommon_init() at this earlier point on FreeBSD. On Linux this is defined as module_init(). Authored by: Konstantin Belousov <kib@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11302	2020-12-09 21:29:00 -08:00
Brian Behlendorf	4352edaafb	Linux: Fix ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP usage The ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP macros should not be used in the Linux zpl_*.c source files. They return a positive error value which is correct for the common code, but not for the Linux specific kernel code which expects a negative return value. The ZPL_ENTER/ZPL_EXIT/ZPL_VERFY_ZP macros should be used instead. Furthermore, the ZPL_EXIT macro has been updated to not call the zfs_exit_fs() function. This prevents a possible deadlock which can occur when a snapshot is automatically unmounted because the zpl_show_devname() must never wait on in progress automatic snapshot unmounts. Reviewed-by: Adam Moss <c@yotes.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11169 Closes #11201	2020-11-14 10:19:00 -08:00
Mateusz Guzik	1a0b4f566c	G/C struct znode -> z_moved The field is yet another leftover from unsupported zfs_znode_move. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11186	2020-11-10 12:42:47 -08:00
Coleman Kane	ae15f1c1d8	Linux 5.10 compat: check_disk_change() removed Kernel 5.10 removed check_disk_change() in favor of callers using the faster bdev_check_media_change() instead, and explicitly forcing bdev revalidation when they desire that behavior. To preserve prior behavior, I have wrapped this into a zfs_check_media_change() macro that calls an inline function for the new API that mimics the old behavior when check_disk_change() doesn't exist, and just calls check_disk_change() if it exists. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #11085	2020-11-02 22:01:19 +00:00
Christian Schwarz	ab8c935ea6	zfs_vnops: make zfs_get_data OS-independent Move zfs_get_data() in to platform-independent code. The only platform-specific aspect of it is the way we release an inode (Linux) / vnode_t (FreeBSD). I am not aware of a platform that could be supported by ZFS that couldn't implement zfs_rele_async itself. It's sibling zvol_get_data already is platform-independent. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christian Schwarz <me@cschwarz.com> Closes #10979	2020-11-02 12:07:07 -08:00
Mateusz Guzik	09eb36ce3d	Introduce CPU_SEQID_UNSTABLE Current CPU_SEQID users don't care about possibly changing CPU ID, but enclose it within kpreempt disable/enable in order to fend off warnings from Linux's CONFIG_DEBUG_PREEMPT. There is no need to do it. The expected way to get CPU ID while allowing for migration is to use raw_smp_processor_id. In order to make this future-proof this patch keeps CPU_SEQID as is and introduces CPU_SEQID_UNSTABLE instead, to make it clear that consumers explicitly want this behavior. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11142	2020-11-02 11:51:12 -08:00
Matthew Macy	8583540c6e	Consolidate zfs_holey and zfs_access The zfs_holey() and zfs_access() functions can be made common to both FreeBSD and Linux. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11125	2020-10-31 09:40:08 -07:00
Matthew Macy	5fa356ea44	Remove UIO_ZEROCOPY functions structures The original xuio zero copy functionality has always been unused on Linux and FreeBSD. Remove this disabled code to avoid any confusion and improve readability. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11124	2020-10-30 10:00:33 -07:00
Matthew Macy	e53d678d4a	Share zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSD The zfs_fsync, zfs_read, and zfs_write function are almost identical between Linux and FreeBSD. With a little refactoring they can be moved to the common code which is what is done by this commit. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11078	2020-10-21 14:08:06 -07:00
Brian Behlendorf	d0249a4bd0	Replace ZFS on Linux references with OpenZFS This change updates the documentation to refer to the project as OpenZFS instead ZFS on Linux. Web links have been updated to refer to https://github.com/openzfs/zfs. The extraneous zfsonlinux.org web links in the ZED and SPL sources have been dropped. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11007	2020-10-08 20:10:13 -07:00
Ryan Moeller	4d55ea811d	Throw const on some strings In C, const indicates to the reader that mutation will not occur. It can also serve as a hint about ownership. Add const in a few places where it makes sense. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #10997	2020-10-02 17:44:10 -07:00
Matthew Macy	7b8363d7f0	FreeBSD: Add support for procfs_list The procfs_list interface is required by several kstats. Implement this functionality for FreeBSD to provide access to these kstats. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10890	2020-09-23 16:43:51 -07:00
George Wilson	c494aa7f57	vdev_ashift should only be set once == Motivation and Context The new vdev ashift optimization prevents the removal of devices when a zfs configuration is comprised of disks which have different logical and physical block sizes. This is caused because we set 'spa_min_ashift' in vdev_open and then later call 'vdev_ashift_optimize'. This would result in an inconsistency between spa's ashift calculations and that of the top-level vdev. In addition, the optimization logical ignores the overridden ashift value that would be provided by '-o ashift=<val>'. == Description This change reworks the vdev ashift optimization so that it's only set the first time the device is configured. It still allows the physical and logical ahsift values to be set every time the device is opened but those values are only consulted on first open. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Cedric Berger <cedric@precidata.com> Signed-off-by: George Wilson <gwilson@delphix.com> External-Issue: DLPX-71831 Closes #10932	2020-09-18 12:13:47 -07:00
Toomas Soome	1db9e6e4e4	zfs label bootenv should store data as nvlist nvlist does allow us to support different data types and systems. To encapsulate user data to/from nvlist, the libzfsbootenv library is provided. Reviewed-by: Arvind Sankar <nivedita@alum.mit.edu> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #10774	2020-09-15 15:42:27 -07:00
Matthew Macy	ac6e5fb202	Replace cv_{timed}wait_sig with cv_{timed}wait_idle where appropriate There are a number of places where cv_?_sig is used simply for accounting purposes but the surrounding code has no ability to cope with actually receiving a signal. On FreeBSD it is possible to send signals to individual kernel threads so this could enable undesirable behavior. This patch adds routines on Linux that will do the same idle accounting as _sig without making the task interruptible. On FreeBSD cv__idle are all aliases for cv_ Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10843	2020-09-03 20:04:09 -07:00
Ryan Moeller	7b4e27232d	Add 'zfs rename -u' to rename without remounting Allow to rename file systems without remounting if it is possible. It is possible for file systems with 'mountpoint' property set to 'legacy' or 'none' - we don't have to change mount directory for them. Currently such file systems are unmounted on rename and not even mounted back. This introduces layering violation, as we need to update 'f_mntfromname' field in statfs structure related to mountpoint (for the dataset we are renaming and all its children). In my opinion it is worth it, as it allow to update FreeBSD in even cleaner way - in ZFS-only configuration root file system is ZFS file system with 'mountpoint' property set to 'legacy'. If root dataset is named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone it (system/oldrootfs), update FreeBSD and if it doesn't boot we can boot back from system/oldrootfs and rename it back to system/rootfs while it is mounted as /. Before it was not possible, because unmounting / was not possible. Authored by: Pawel Jakub Dawidek <pjd@FreeBSD.org> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported by: Matt Macy <mmacy@freebsd.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10839	2020-09-01 16:14:16 -07:00
Brian Behlendorf	3e29e1971b	Linux 5.9 compat: NR_SLAB_RECLAIMABLE Commit `dcdc12e` added compatibility code to treat NR_SLAB_RECLAIMABLE_B as if it were the same as NR_SLAB_RECLAIMABLE. However, the new value is in bytes while the old value was in pages which means they are not interchangeable. The only place the reclaimable slab size is used is as a component of the calculation done by arc_free_memory(). This function returns the amount of memory the ARC considers to be free or reclaimable at little cost. Rather than switch to a new interface to get this value it has been removed it from the calculation. It is normally a minor component compared to the number of inactive or free pages, and removing it aligns the behavior with the FreeBSD version of arc_free_memory(). Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Coleman Kane <ckane@colemankane.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10834	2020-08-29 20:57:45 -07:00
Pavel Snajdr	772c69d230	Linux 5.7 compat: Include linux/sched.h in spl/sys/mutex.h struct task_struct is needed for lockdep_off() in mutex.h This has popped up after e616cb8daadf (in linux-5.7-rc7). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pavel Snajdr <snajpa@snajpa.net> Closes #10741	2020-08-19 21:37:38 -07:00
Matthew Ahrens	994de7e4b7	Remove KMC_KMEM and KMC_VMEM `KMC_KMEM` and `KMC_VMEM` are now unused since all SPL-implemented caches are `KMC_KVMEM`. KMC_KMEM: Given the default value of `spl_kmem_cache_kmem_limit`, we don't use kmalloc to back the SPL caches, instead we use kvmalloc (KMC_KVMEM). The flag, module parameter, /proc entries, and associated code are removed. KMC_VMEM: This flag is not used, and kvmalloc() is always preferable to vmalloc(). The flag, /proc entries, and associated code are removed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10673	2020-08-17 16:04:28 -07:00
Coleman Kane	d817c17100	Linux 5.9 compat: make_request_fn replaced with submit_bio interface The make_request_fn and associated API was replaced recently in a Linux 5.9 merge, to replace its functionality with a new submit_bio member in struct block_device_operations. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #10696	2020-08-11 13:37:33 -07:00
Coleman Kane	dcdc12e8ba	Linux 5.9 compat: Update NR_SLAB_RECLAIMABLE to NR_SLAB_RECLAIMABLE_B This change appears to primarily be a name change for the enum. Had to update the test logic so that it works so long as either one of these is present (favoring the newer one). Additionally, as this is newer, it only shows up in node_page_item, so this commit doesn't test zone_page_item for the same enum. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #10696	2020-08-11 13:35:20 -07:00
Matthew Ahrens	c6f2b942be	Remove KMC_NOMAGAZINE Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Ahrens <matt@delphix.com> Closes #10650	2020-08-05 10:28:07 -07:00
Matthew Ahrens	3c09f6949a	Remove KMC_QCACHE Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Ahrens <matt@delphix.com> Closes #10650	2020-08-05 10:28:01 -07:00
Matthew Ahrens	d519c10575	Remove KMC_NOHASH Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Ahrens <matt@delphix.com> Closes #10650	2020-08-05 10:27:56 -07:00
Matthew Ahrens	f68af67a0c	Remove KMC_NOTOUCH Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Ahrens <matt@delphix.com> Closes #10650	2020-08-05 10:27:46 -07:00
Matthew Ahrens	492db125dc	Remove KMC_OFFSLAB Remove dead code to make the implementation easier to understand. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Ahrens <matt@delphix.com> Closes #10650	2020-08-05 10:25:37 -07:00
Pavel Snajdr	309d20882f	Fix arc__wait__for__eviction tracepoint `3442c2a02d` added new `arc_wait_for_eviction` tracepoint, which fails to compile, when tracepoints are enabled. The tracepoint definition begins with `DEFINE_ARC_WAIT_FOR_EVICTION_EVENT` and is a multi-line definition, so this fixes the backslash and parenthesis accordingly. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pavel Snajdr <snajpa@snajpa.net> Closes #10669	2020-08-04 10:04:00 -07:00
Matthew Ahrens	3442c2a02d	Revise ARC shrinker algorithm The ARC shrinker callback `arc_shrinker_count/_scan()` is invoked by the kernel's shrinker mechanism when the system is running low on free pages. This happens via 2 code paths: 1. "direct reclaim": The system is attempting to allocate a page, but we are low on memory. The ARC shrinker callback is invoked from the page-allocation code path. 2. "indirect reclaim": kswapd notices that there aren't many free pages, so it invokes the ARC shrinker callback. In both cases, the kernel's shrinker code requests that the ARC shrinker callback release some of its cache, and then it measures how many pages were released. However, it's measurement of released pages does not include pages that are freed via `__free_pages()`, which is how the ARC releases memory (via `abd_free_chunks()`). Rather, the kernel shrinker code is looking for pages to be placed on the lists of reclaimable pages (which is separate from actually-free pages). Because the kernel shrinker code doesn't detect that the ARC has released pages, it may call the ARC shrinker callback many times, resulting in the ARC "collapsing" down to `arc_c_min`. This has several negative impacts: 1. ZFS doesn't use RAM to cache data effectively. 2. In the direct reclaim case, a single page allocation may wait a long time (e.g. more than a minute) while we evict the entire ARC. 3. Even with the improvements made in `67c0f0dedc` ("ARC shrinking blocks reads/writes"), occasionally `arc_size` may stay above `arc_c` for the entire time of the ARC collapse, thus blocking ZFS read/write operations in `arc_get_data_impl()`. To address these issues, this commit limits the ways that the ARC shrinker callback can be used by the kernel shrinker code, and mitigates the impact of arc_is_overflowing() on ZFS read/write operations. With this commit: 1. We limit the amount of data that can be reclaimed from the ARC via the "direct reclaim" shrinker. This limits the amount of time it takes to allocate a single page. 2. We do not allow the ARC to shrink via kswapd (indirect reclaim). Instead we rely on `arc_evict_zthr` to monitor free memory and reduce the ARC target size to keep sufficient free memory in the system. Note that we can't simply rely on limiting the amount that we reclaim at once (as for the direct reclaim case), because kswapd's "boosted" logic can invoke the callback an unlimited number of times (see `balance_pgdat()`). 3. When `arc_is_overflowing()` and we want to allocate memory, `arc_get_data_impl()` will wait only for a multiple of the requested amount of data to be evicted, rather than waiting for the ARC to no longer be overflowing. This allows ZFS reads/writes to make progress even while the ARC is overflowing, while also ensuring that the eviction thread makes progress towards reducing the total amount of memory used by the ARC. 4. The amount of memory that the ARC always tries to keep free for the rest of the system, `arc_sys_free` is increased. 5. Now that the shrinker callback is able to provide feedback to the kernel's shrinker code about our progress, we can safely enable the kswapd hook. This will allow the arc to receive notifications when memory pressure is first detected by the kernel. We also re-enable the appropriate kstats to track these callbacks. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10600	2020-07-31 21:10:52 -07:00
Serapheim Dimitropoulos	843e9ca2e1	Introduce names for ZTHRs When debugging issues or generally analyzing the runtime of a system it would be nice to be able to tell the different ZTHRs running by name rather than having to analyze their stack. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #10630	2020-07-29 09:43:33 -07:00
Matthew Macy	5678d3f593	Prefix zfs internal endian checks with _ZFS FreeBSD defines _BIG_ENDIAN BIG_ENDIAN _LITTLE_ENDIAN LITTLE_ENDIAN on every architecture. Trying to do cross builds whilst hiding this from ZFS has proven extremely cumbersome. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10621	2020-07-28 13:02:49 -07:00
Matthew Ahrens	4fbdb10c7b	remove kmem_cache module parameter KMC_EXPIRE_AGE By default, `spl_kmem_cache_expire` is `KMC_EXPIRE_MEM`, meaning that objects will be removed from kmem cache magazines by `spl_kmem_cache_reap_now()`. There is also a module parameter to change this to `KMC_EXPIRE_AGE`, which establishes a maximum lifetime for objects to stay in the magazine. This setting has rarely, if ever, been used, and is not regularly tested. This commit removes the code for `KMC_EXPIRE_AGE`, and associated module parameters. Additionally, the unused module parameter `spl_kmem_cache_obj_per_slab_min` is removed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10608	2020-07-24 09:39:26 -07:00
Matthew Ahrens	026e529cb3	Remove skc_reclaim, hdr_recl, kmem_cache shrinker The SPL kmem_cache implementation provides a mechanism, `skc_reclaim`, whereby individual caches can register a callback to be invoked when there is memory pressure. This mechanism is used in only one place: the ARC registers the `hdr_recl()` reclaim function. This function wakes up the `arc_reap_zthr`, whose job is to call `kmem_cache_reap()` and `arc_reduce_target_size()`. The `skc_reclaim` callbacks are invoked only by shrinker callbacks and `arc_reap_zthr`, and only callback only wakes up `arc_reap_zthr`. When called from `arc_reap_zthr`, waking `arc_reap_zthr` is a no-op. When called from shrinker callbacks, we are already aware of memory pressure and responding to it. Therefore there is little benefit to ever calling the `hdr_recl()` `skc_reclaim` callback. The `arc_reap_zthr` also wakes once a second, and if memory is low when allocating an ARC buffer. Therefore, additionally waking it from the shrinker calbacks has little benefit. The shrinker callbacks can be invoked very frequently, e.g. 10,000 times per second. Additionally, for invocation of the shrinker callback, skc_reclaim is invoked many times. Therefore, this mechanism consumes significant amounts of CPU time. The kmem_cache shrinker calls `spl_kmem_cache_reap_now()`, which, in addition to invoking `skc_reclaim()`, does two things to attempt to free pages for use by the system: 1. Return free objects from the magazine layer to the slab layer 2. Return entirely-free slabs to the page layer (i.e. free pages) These actions apply only to caches implemented by the SPL, not those that use the underlying kernel SLAB/SLUB caches. The SPL caches are used for objects >=32KB, which are primarily linear ABD's cached in the DBUF cache. These actions (freeing objects from the magazine layer and returning entirely-free slabs) are also taken whenever a `kmem_cache_free()` call finds a full magazine. So there would typically be zero entirely-free slabs, and the number of objects in magazines is limited (typically no more than 64 objects per magazine, and there's one magazine per CPU). Therefore the benefit of `spl_kmem_cache_reap_now()`, while nonzero, is modest. We also call `spl_kmem_cache_reap_now()` from the `arc_reap_zthr`, when memory pressure is detected. Therefore, calling `spl_kmem_cache_reap_now()` from the kmem_cache shrinker is not needed. This commit removes the `skc_reclaim` mechanism, its only callback `hdr_recl()`, and the kmem_cache shrinker callback. Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10576	2020-07-19 09:58:30 -07:00

1 2 3

120 Commits