mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-22 18:40:43 +03:00

Author	SHA1	Message	Date
Matthew Ahrens	b3212d2fa6	Improve performance of zio_taskq_member __zio_execute() calls zio_taskq_member() to determine if we are running in a zio interrupt taskq, in which case we may need to switch to processing this zio in a zio issue taskq. The call to zio_taskq_member() can become a performance bottleneck when we are processing a high rate of zio's. zio_taskq_member() calls taskq_member() on each of the zio interrupt taskqs, of which there are 21. This is slow because each call to taskq_member() does tsd_get(taskq_tsd), which on Linux is relatively slow. This commit improves the performance of zio_taskq_member() by having it cache the value of tsd_get(taskq_tsd), reducing the number of those calls to 1/21th of the current behavior. In a test case running `zfs send -c >/dev/null` of a filesystem with small blocks (average 2.5KB/block), zio_taskq_member() was using 6.7% of one CPU, and with this change it is reduced to 1.3%. Overall time to perform the `zfs send` reduced by 10% (~150,000 block/sec to ~165,000 blocks/sec). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10070	2020-03-03 10:29:38 -08:00
Ryan Moeller	9bb907bc3f	Make spa_history_zone platform-dependent in kernel This function should only return "linux" on Linux. Move the kernel part of the function out of common code. Fix the tests for FreeBSD. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10079	2020-03-02 09:43:30 -08:00
Matthew Macy	ae9f92f6f3	Re-share zfsdev_getminor and zfs_onexit_fd_hold By adding a zfs_file_private accessor to the common interfaces and some extensions to FreeBSD platform code it is now possible to share the implementations for the aforementioned functions. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10073	2020-02-28 14:50:32 -08:00
Brian Behlendorf	bd0d24e09b	Linux 5.5 compat: blkg_tryget() Commit https://github.com/torvalds/linux/commit/9e8d42a0f accidentally converted the static inline function blkg_tryget() to GPL-only for kernels built with CONFIG_PREEMPT_RCU=y and CONFIG_BLK_CGROUP=y. Resolve the build issue by providing our own equivalent functionality when needed which uses rcu_read_lock_sched() internally as before. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9745 Closes #10072	2020-02-28 08:58:39 -08:00
Brian Behlendorf	2c3a83701d	Linux 5.6 compat: time_t As part of the Linux kernel's y2038 changes the time_t type has been fully retired. Callers are now required to use the time64_t type. Rather than move to the new type, I've removed the few remaining places where a time_t is used in the kernel code. They've been replaced with a uint64_t which is already how ZFS internally handled these values. Going forward we should work towards updating the remaining user space time_t consumers to the 64-bit interfaces. Reviewed-by: Matthew Macy <mmacy@freebsd.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10052 Closes #10064	2020-02-27 09:31:02 -08:00
Matthew Macy	28caa74b19	Refactor dnode dirty context from dbuf_dirty * Add dedicated donde_set_dirtyctx routine. * Add empty dirty record on destroy assertion. * Make much more extensive use of the SET_ERROR macro. Reviewed-by: Will Andrews <wca@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9924	2020-02-26 16:09:17 -08:00
Dirkjan Bussink	327000ce04	Remove zfs_getattr and convoff dead code The `convoff` function is called only in one code path in `zfs_space`. Each caller of `zfs_space` is called with a `flock64_t` that has `l_whence` set to `SEEK_SET`. This means that `convoff` always results in a no-op as the `bfp` parameter has `l_whence` set to `SEEK_SET` and `int whence` is `SEEK_SET` as well. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Closes #10006	2020-02-24 15:38:22 -08:00
DeHackEd	d09dc5980c	Honour sync=disabled when relinking tpmfiles Unlinked files don't respect synchronous flush commands, but when they get relinked their state is unknown. Previously we force flushed all such files even when sync=disabled. Correct this case. Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: DHE <git@dehacked.net> Closes #10005	2020-02-16 12:44:08 -08:00
Matthew Ahrens	2adc6b35ae	Missed wakeup when growing kmem cache When growing the size of a (VMEM or KVMEM) kmem cache, spl_cache_grow() always does taskq_dispatch(spl_cache_grow_work), and then waits for the KMC_BIT_GROWING to be cleared by the taskq thread. The taskq thread (spl_cache_grow_work()) does: 1. allocate new slab and add to list 2. wake_up_all(skc_waitq) 3. clear_bit(KMC_BIT_GROWING) Therefore, the waiting thread can wake up before GROWING has been cleared. It will see that the growing has not yet completed, and go back to sleep until it hits the 100ms timeout. This can have an extreme performance impact on workloads that alloc/free more than fits in the (statically-sized) magazines. These workloads allocate and free slabs with high frequency. The problem can be observed with `funclatency spl_cache_grow`, which on some workloads shows that 99.5% of the time it takes <64us to allocate slabs, but we spend ~70% of our time in outliers, waiting for the 100ms timeout. The fix is to do `clear_bit(KMC_BIT_GROWING)` before `wake_up_all(skc_waitq)`. A future investigation should evaluate if we still actually need to taskq_dispatch() at all, and if so on which kernel versions. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #9989	2020-02-13 11:23:02 -08:00
Ryan Moeller	57940b435c	Share some code for spa deadman tunables We need to do the same thing to update all spas on any OS for these tunables, so let's share the code. While here let's match the types of the literals initializing the variables with the type of the variable. Reviewed-by: Allan Jude <allanjude@freebsd.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #9964	2020-02-10 13:11:30 -08:00
Brian Behlendorf	795699a6cc	Linux 5.6 compat: timestamp_truncate() The timestamp_truncate() function was added, it replaces the existing timespec64_trunc() function. This change renames our wrapper function to be consistent with the upstream name and updates the compatibility code for older kernels accordingly. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9956 Closes #9961	2020-02-07 11:04:32 -08:00
Brian Behlendorf	0dd7364853	Linux 5.6 compat: struct proc_ops The proc_ops structure was introduced to replace the use of of the file_operations structure when registering proc handlers. This change creates a new kstat_proc_op_t typedef for compatibility which can be used to pass around the correct structure. This change additionally adds the 'const' keyword to all of the existing proc operations structures. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9961	2020-02-07 11:03:53 -08:00
Romain Dolbeau	77122f9d68	Replace static per-cpu with dynamic per-cpu data This solves the issue of loading the spl module on RISC-V. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu> Closes #9942	2020-02-06 09:26:13 -08:00
Alexander Motin	741db5a346	Prepare ks_data before calling kstat_install() It violated sequence described in kstat.h, and at least on FreeBSD kstat_install() uses provided names to create the sysctls. If the names are not available at the time, it ends up bad. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #9933	2020-02-04 08:49:12 -08:00
Matthew Ahrens	ec21397127	async zvol minor node creation interferes with receive When we finish a zfs receive, dmu_recv_end_sync() calls zvol_create_minors(async=TRUE). This kicks off some other threads that create the minor device nodes (in /dev/zvol/poolname/...). These async threads call zvol_prefetch_minors_impl() and zvol_create_minor(), which both call dmu_objset_own(), which puts a "long hold" on the dataset. Since the zvol minor node creation is asynchronous, this can happen after the `ZFS_IOC_RECV[_NEW]` ioctl and `zfs receive` process have completed. After the first receive ioctl has completed, userland may attempt to do another receive into the same dataset (e.g. the next incremental stream). This second receive and the asynchronous minor node creation can interfere with one another in several different ways, because they both require exclusive access to the dataset: 1. When the second receive is finishing up, dmu_recv_end_check() does dsl_dataset_handoff_check(), which can fail with EBUSY if the async minor node creation already has a "long hold" on this dataset. This causes the 2nd receive to fail. 2. The async udev rule can fail if zvol_id and/or systemd-udevd try to open the device while the the second receive's async attempt at minor node creation owns the dataset (via zvol_prefetch_minors_impl). This causes the minor node (/dev/zd*) to exist, but the udev-generated /dev/zvol/... to not exist. 3. The async minor node creation can silently fail with EBUSY if the first receive's zvol_create_minor() trys to own the dataset while the second receive's zvol_prefetch_minors_impl already owns the dataset. To address these problems, this change synchronously creates the minor node. To avoid the lock ordering problems that the asynchrony was introduced to fix (see #3681), we create the minor nodes from open context, with no locks held, rather than from syncing contex as was originally done. Implementation notes: We generally do not need to traverse children or prefetch anything (e.g. when running the recv, snapshot, create, or clone subcommands of zfs). We only need recursion when importing/opening a pool and when loading encryption keys. The existing recursive, asynchronous, prefetching code is preserved for use in these cases. Channel programs may need to create zvol minor nodes, when creating a snapshot of a zvol with the snapdev property set. We figure out what snapshots are created when running the LUA program in syncing context. In this case we need to remember what snapshots were created, and then try to create their minor nodes from open context, after the LUA code has completed. There are additional zvol use cases that asynchronously own the dataset, which can cause similar problems. E.g. changing the volmode or snapdev properties. These are less problematic because they are not recursive and don't touch datasets that are not involved in the operation, there is still potential for interference with subsequent operations. In the future, these cases should be similarly converted to create the zvol minor node synchronously from open context. The async tasks of removing and renaming minors do not own the objset, so they do not have this problem. However, it may make sense to also convert these operations to happen synchronously from open context, in the future. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> External-issue: DLPX-65948 Closes #7863 Closes #9885	2020-02-03 09:33:14 -08:00
Matthew Macy	d3c1e45b7a	Re-consolidate zio_delay_interrupt With recent SPL changes there is no longer any need for a per platform version. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9860	2020-01-21 15:04:13 -08:00
Brian Behlendorf	70835c5b75	Unify target_cpu handling Over the years several slightly different approaches were used in the Makefiles to determine the target architecture. This change updates both the build system and Makefile to handle this in a consistent fashion. TARGET_CPU is set to i386, x86_64, powerpc, aarch6 or sparc64 and made available in the Makefiles to be used as appropriate. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9848	2020-01-17 12:40:09 -08:00
loli10K	7e2da7786e	KMC_KVMEM disrupts kv_alloc() memory alignment expectations On kernels with KASAN enabled the following failure can be observed as soon as the zfs module is loaded: VERIFY(IS_P2ALIGNED(ptr, PAGE_SIZE)) failed PANIC at spl-kmem-cache.c:228:kv_alloc() The problem is kmalloc() has never guaranteed aligned allocations; this requirement resulted in zfsonlinux/spl@8b45dda which removed all kmalloc() usage in kv_alloc(). Until a GFP_ALIGNED flag (or equivalent functionality) is provided by the kernel this commit partially reverts `66955885` and `6d948c35` to prevent k(v)malloc() allocations in kv_alloc(). Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Michael Niewöhner <foss@mniewoehner.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #9813	2020-01-14 09:09:59 -08:00
Brian Behlendorf	e458fcca75	Change http://zfsonlinux.org links to https://zfsonlinux.org Update the project website links contained in to repository to reference the secure https://zfsonlinux.org address. Reviewed-By: Richard Laager <rlaager@wiktel.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Garrett Fields <ghfields@gmail.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9837	2020-01-13 16:43:59 -08:00
Brian Behlendorf	bc9cef11fd	Fix QAT allocation failure return value When qat_compress() fails to allocate the required contiguous memory it mistakenly returns success. This prevents the fallback software compression from taking over and (un)compressing the block. Resolve the issue by correctly setting the local 'status' variable on all exit paths. Furthermore, initialize it to CPA_STATUS_FAIL to ensure qat_compress() always fails safe to guard against any similar bugs in the future. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9784 Closes #9788	2020-01-06 11:17:53 -08:00
Ubuntu	abfdb83607	cppcheck: (error) Shifting signed 64-bit value by 63 bits As of cppcheck 1.82 surpress the warning regarding shifting too many bits for __divdi3() implemention. The algorithm used here is correct. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9732	2019-12-18 17:24:42 -08:00
Ubuntu	7cf1fe6331	cppcheck: (error) Uninitialized variable As of cppcheck 1.82 warnings are issued when using the list_for_each_* functions with an uninitialized variable. Functionally, this is fine but to resolve the warning initialize these variables. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9732	2019-12-18 17:24:29 -08:00
Tomohiro Kusumi	ddb4e69db5	Don't fail to apply umask for O_TMPFILE files Apply umask to `mode` which will eventually be applied to inode. This is needed since VFS doesn't apply umask for O_TMPFILE files. (Note that zpl_init_acl() applies `ip->i_mode &= ~current_umask();` only when POSIX ACL is used.) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes #8997 Closes #8998	2019-12-13 15:02:23 -08:00
Matthew Macy	13a9a6f5e8	Make zfs_replay.c work on FreeBSD FreeBSD's vfs currently doesn't permit file systems to do their own locking. To avoid having to have duplicate zfs functions with and without locking add locking here. With luck these changes can be removed in the future. Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9715	2019-12-13 07:54:10 -08:00
Ryan Moeller	957c7aa23c	Relocate common quota functions to shared code The quota functions are common to all implementations and can be moved to common code. As a simplification they were moved to the Linux platform code in the initial refactoring. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9710	2019-12-11 12:12:08 -08:00
Matthew Macy	657ce25357	Eliminate Linux specific inode usage from common code Change many of the znops routines to take a znode rather than an inode so that zfs_replay code can be largely shared and in the future the much of the znops code may be shared. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9708	2019-12-11 11:53:57 -08:00
Matthew Macy	362ae8d11f	Abstract away platform specific superblock references The zfsvfs->z_sb field is Linux specified and should be abstracted. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9697	2019-12-10 09:21:07 -08:00
Brian Behlendorf	a25861dcae	ZTS: Fix zpool_reopen_001_pos Update the vdev_disk_open() retry logic to use a specified number of milliseconds to be more robust. Additionally, on failure log both the time waited and requested timeout to the internal log. The default maximum allowed open retry time has been increased from 500ms to 1000ms. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9680	2019-12-09 11:09:14 -08:00
Matthew Macy	e64e84eca5	Refactor deadman set failmode to be cross platform Update zfs_deadman_failmode to use the ZFS_MODULE_PARAM_CALL wrapper, and split the common and platform specific portions. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9670	2019-12-05 12:40:45 -08:00
Matthew Macy	2a8ba608d3	Replace ASSERTV macro with compiler annotation Remove the ASSERTV macro and handle suppressing unused compiler warnings for variables only in ASSERTs using the __attribute__((unused)) compiler annotation. The annotation is understood by both gcc and clang. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9671	2019-12-05 12:37:00 -08:00
Matthew Macy	5142032106	Move zfs_cmd_t copyin/copyout to platform code FreeBSD needs to cope with multiple version of the zfs_cmd_t structure. Allowing the platform code to pre and post process the cmd structure makes it possible to work with legacy tooling. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9624	2019-12-02 10:08:27 -08:00
Matthew Macy	f348c78f97	Mark Linux fallocate extensions as specific to Linux fallocate(2) is a Linux-specific system call which in unavailable on other platforms. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9633	2019-11-30 15:40:22 -08:00
Matthew Macy	a5b762ab1d	Resolve ZoF differences in zfs_ioctl.h FreeBSD needs to be able to pass the jail id to the jail/unjail ioctls and the struct file in the device structure is unused. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9625	2019-11-30 15:35:54 -08:00
Brian Behlendorf	9e17e6f254	Remove zfs_vdev_elevator module option As described in commit `f81d5ef6` the zfs_vdev_elevator module option is being removed. Users who require this functionality should update their systems to set the disk scheduler using a udev rule. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #8664 Closes #9417 Closes #9609	2019-11-27 10:35:49 -08:00
Mauricio Faria de Oliveira	0c46813805	Check for unlinked znodes after igrab() The changes in commit `41e1aa2a` / PR #9583 introduced a regression on tmpfile_001_pos: fsetxattr() on a O_TMPFILE file descriptor started to fail with errno ENODATA: openat(AT_FDCWD, "/test", O_RDWR\|O_TMPFILE, 0666) = 3 <...> fsetxattr(3, "user.test", <...>, 64, 0) = -1 ENODATA The originally proposed change on PR #9583 is not susceptible to it, so just move the code/if-checks around back in that way, to fix it. Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Original-patch-by: Heitor Alves de Siqueira <halves@canonical.com> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Closes #9602	2019-11-21 12:24:03 -08:00
Matthew Macy	da92d5cbb3	Add zfs_file_* interface, remove vnodes Provide a common zfs_file_* interface which can be implemented on all platforms to perform normal file access from either the kernel module or the libzpool library. This allows all non-portable vnode_t usage in the common code to be replaced by the new portable zfs_file_t. The associated vnode and kobj compatibility functions, types, and macros have been removed from the SPL. Moving forward, vnodes should only be used in platform specific code when provided by the native operating system. Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9556	2019-11-21 09:32:57 -08:00
Brian Behlendorf	7ae3f8dc8f	Partially revert `5a6ac4c` Reinstate the zpl_revalidate() functionality to resolve a regression where dentries for open files during a rollback are not invalidated. The unrelated functionality for automatically unmounting .zfs/snapshots was not reverted. Nor was the addition of shrink_dcache_sb() to the zfs_resume_fs() function. This issue was not immediately caught by the CI because the test case intended to catch it was included in the list of ZTS tests which may occasionally fail for unrelated reasons. Remove all of the rollback tests from this list to help identify the frequency of any spurious failures. The rollback_003_pos.ksh test case exposes a real issue with the long standing code which needs to be investigated. Regardless, it has been enable with a small workaround in the test case itself. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9587 Closes #9592	2019-11-18 13:05:56 -08:00
Heitor Alves de Siqueira	41e1aa2a06	Break out of zfs_zget early if unlinked znode If zp->z_unlinked is set, we're working with a znode that has been marked for deletion. If that's the case, we can skip the "goto again" loop and return ENOENT, as the znode should not be discovered. Reviewed-by: Richard Yao <ryao@gentoo.org> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Heitor Alves de Siqueira <halves@canonical.com> Closes #9583	2019-11-15 09:56:05 -08:00
loli10K	7ba964cc3f	Prevent NULL pointer dereference in blkg_tryget() on EL8 kernels blkg_tryget() as shipped in EL8 kernels does not seem to handle NULL @blkg as input; this is different from its mainline counterpart where NULL is accepted. To prevent dereferencing a NULL pointer when dealing with block devices which do not set a root_blkg on the request queue perform the NULL check in vdev_bio_associate_blkg(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #9546 Closes #9577	2019-11-13 10:19:06 -08:00
Michael Niewöhner	8aaa10a9a0	Check for __GFP_RECLAIM instead of GFP_KERNEL Check for __GFP_RECLAIM instead of GFP_KERNEL because zfs modifies IO and FS flags which breaks the check for GFP_KERNEL. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes #9034	2019-11-13 10:05:23 -08:00
Michael Niewöhner	6d948c3519	Add kmem_cache flag for forcing kvmalloc This adds a new KMC_KVMEM flag was added to enforce use of the kvmalloc allocator in kmem_cache_create even for large blocks, which may also increase performance in some specific cases (e.g. zstd), too. Default to KVMEM instead of VMEM in spl_kmem_cache_create. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <matt@delphix.com> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes #9034	2019-11-13 10:05:23 -08:00
Michael Niewöhner	66955885e2	Make use of kvmalloc if available and fix vmem_alloc implementation This patch implements use of kvmalloc for GFP_KERNEL allocations, which may increase performance if the allocator is able to allocate physical memory, if kvmalloc is available as a public kernel interface (since v4.12). Otherwise it will simply fall back to virtual memory (vmalloc). Also fix vmem_alloc implementation which can lead to slow allocations since the first attempt with kmalloc does not make use of the noretry flag but tells the linux kernel to retry several times before it fails. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <matt@delphix.com> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes #9034	2019-11-13 10:05:10 -08:00
Michael Niewöhner	c025008df5	Add missing documentation for some KMC flags Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <matt@delphix.com> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes #9034	2019-11-13 09:34:51 -08:00
Brian Behlendorf	066e825221	Linux compat: Minimum kernel version 3.10 Increase the minimum supported kernel version from 2.6.32 to 3.10. This removes support for the following Linux enterprise distributions. Distribution \| Kernel \| End of Life ---------------- \| ------ \| ------------- Ubuntu 12.04 LTS \| 3.2 \| Apr 28, 2017 SLES 11 \| 3.0 \| Mar 32, 2019 RHEL / CentOS 6 \| 2.6.32 \| Nov 30, 2020 The following changes were made as part of removing support. * Updated `configure` to enforce a minimum kernel version as specified in the META file (Linux-Minimum: 3.10). configure: error: * Cannot build against kernel version 2.6.32. * The minimum supported kernel version is 3.10. * Removed all `configure` kABI checks and matching C code for interfaces which solely predate the Linux 3.10 kernel. * Updated all `configure` kABI checks to fail when an interface is missing which was in the 3.10 kernel up to the latest 5.1 kernel. Removed the HAVE_* preprocessor defines for these checks and updated the code to unconditionally use the verified interface. * Inverted the detection logic in several kABI checks to match the new interface as it appears in 3.10 and newer and not the legacy interface. * Consolidated the following checks in to individual files. Due the large number of changes in the checks it made sense to handle this now. It would be desirable to group other related checks in the same fashion, but this as left as future work. - config/kernel-blkdev.m4 - Block device kABI checks - config/kernel-blk-queue.m4 - Block queue kABI checks - config/kernel-bio.m4 - Bio interface kABI checks * Removed the kABI checks for sops->nr_cached_objects() and sops->free_cached_objects(). These interfaces are currently unused. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9566	2019-11-12 08:59:06 -08:00
Pavel Snajdr	5a6ac4cffc	Remove zpl_revalidate This patch removes the need for zpl_revalidate altogether. There were 3 main reasons why we used d_revalidate: 1. periodic automounted snapshots umount deferral 2. negative dentries created before snapshot rollback 3. stale inodes referenced by dentry cache after snapshot rollback Periodic snapshots deferral solution introduces zfs_exit_fs function, which is called as a part of ZFS_EXIT(zfsvfs_t) macro. Negative dentries and stale inodes are solved by flushing the dcache for the particular dataset on zfs_resume_fs call. This patch also removes now unused HAVE_S_D_OP configure test. Reviewed-by: Aleksa Sarai <cyphar@cyphar.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pavel Snajdr <snajpa@snajpa.net> Closes #8774 Closes #9549	2019-11-11 09:34:21 -08:00
Matthew Macy	27ece2ee4d	Move platform specific parts of zfs_znode.h to platform code Some of the znode fields are different and functions consuming an inode don't exist on FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9536	2019-11-06 10:54:25 -08:00
Prakash Surya	ae38e00968	Add tracepoints for taskq entry lifetime events This adds some new DTRACE_PROBE* endpoints so that we can observe taskq latencies on a system. Additionally, a new "taskqlatency.bt" script is added to do this observation via "bpftrace". Lastly, a "zfs-trace.sh" script is added to wrap "bpftrace" with the proper options required to run and use "taskqlatency.bt". For example, with these changes in place, a user can run the following: $ cd ./contrib/bpftrace $ sudo ./zfs-trace.sh taskqlatency.bt Attaching 6 probes... ^C Here's some example output, showing latency information for time spent executing the taskq entry's function: @exec_lat_us[dp_sync_taskq, userquota_updates_task]: [2, 4) 5 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [4, 8) 0 \| \| [8, 16) 1 \|@@@@@@@@@@ \| [16, 32) 2 \|@@@@@@@@@@@@@@@@@@@@ \| @exec_lat_us[z_wr_int_h, zio_execute]: [8, 16) 16 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [16, 32) 2 \|@@@@@@ \| @exec_lat_us[z_wr_iss_h, zio_execute]: [16, 32) 4 \|@@@@@@@@@@@@@@@@ \| [32, 64) 13 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [64, 128) 1 \|@@@@ \| @exec_lat_us[z_ioctl_int, zio_execute]: [2, 4) 1 \|@@@@ \| [4, 8) 11 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [8, 16) 8 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| @exec_lat_us[dp_sync_taskq, sync_dnodes_task]: [2, 4) 1 \|@@@@@@ \| [4, 8) 7 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [8, 16) 8 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [16, 32) 2 \|@@@@@@@@@@@@@ \| [32, 64) 4 \|@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [64, 128) 1 \|@@@@@@ \| [128, 256) 0 \| \| [256, 512) 1 \|@@@@@@ Here's some example output, showing latency information for time spent waiting on the taskq, prior to starting execution of entry's function: @queue_lat_us[dp_sync_taskq]: [2, 4) 1 \|@@@@ \| [4, 8) 7 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [8, 16) 2 \|@@@@@@@@ \| [16, 32) 3 \|@@@@@@@@@@@@@ \| [32, 64) 12 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [64, 128) 6 \|@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [128, 256) 0 \| \| [256, 512) 1 \|@@@@ \| @queue_lat_us[z_wr_iss]: [4, 8) 4 \|@@@@ \| [8, 16) 13 \|@@@@@@@@@@@@@@@ \| [16, 32) 6 \|@@@@@@@ \| [32, 64) 2 \|@@ \| [64, 128) 12 \|@@@@@@@@@@@@@@ \| [128, 256) 15 \|@@@@@@@@@@@@@@@@@@ \| [256, 512) 33 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [512, 1K) 27 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [1K, 2K) 7 \|@@@@@@@@ \| [2K, 4K) 14 \|@@@@@@@@@@@@@@@@ \| [4K, 8K) 14 \|@@@@@@@@@@@@@@@@ \| [8K, 16K) 23 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [16K, 32K) 43 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| @queue_lat_us[z_wr_int]: [2, 4) 10 \|@@@@@ \| [4, 8) 71 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [8, 16) 88 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [16, 32) 50 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [32, 64) 65 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ \| [64, 128) 43 \|@@@@@@@@@@@@@@@@@@@@@@@@@ \| [128, 256) 19 \|@@@@@@@@@@@ \| [256, 512) 3 \|@ \| [512, 1K) 1 \| \| Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <prakash.surya@delphix.com> Closes #9525	2019-11-01 13:14:54 -07:00
Prakash Surya	e5d1c27e30	Enable use of DTRACE_PROBE* macros in "spl" module This change modifies some of the infrastructure for enabling the use of the DTRACE_PROBE* macros, such that we can use tehm in the "spl" module. Currently, when the DTRACE_PROBE* macros are used, they get expanded to create new functions, and these dynamically generated functions become part of the "zfs" module. Since the "spl" module does not depend on the "zfs" module, the use of DTRACE_PROBE* in the "spl" module would result in undefined symbols being used in the "spl" module. Specifically, DTRACE_PROBE* would turn into a function call, and the function being called would be a symbol only contained in the "zfs" module; which results in a linker and/or runtime error. Thus, this change adds the necessary logic to the "spl" module, to mirror the tracing functionality available to the "zfs" module. After this change, we'll have a "trace_zfs.h" header file which defines the probes available only to the "zfs" module, and a "trace_spl.h" header file which defines the probes available only to the "spl" module. Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <prakash.surya@delphix.com> Closes #9525	2019-11-01 13:13:43 -07:00
Matthew Macy	4a2ed90013	Wrap Linux module macros MODULE_VERSION is already defined on FreeBSD. Wrap all of the used MODULE_* macros for the sake of consistency and portability. Add a user space noop version to reduce the need for _KERNEL ifdefs. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9542	2019-11-01 10:41:03 -07:00
Matthew Macy	bd4dde8ef7	Prefix struct rangelock A struct rangelock already exists on FreeBSD. Add a zfs_ prefix as per our convention to prevent any conflict with existing symbols. This change is a follow up to `2cc479d0`. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9534	2019-11-01 10:37:33 -07:00

1 2

78 Commits