mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2024-12-27 03:19:35 +03:00

Author	SHA1	Message	Date
Rafael Kitover	8b8b44d06f	kernel timer API rework In `config/kernel-timer.m4` refactor slightly to check more generally for the new `timer_setup()` APIs, but also check the callback signature because some kernels (notably 4.14) have the new `timer_setup()` API but use the old callback signature. Also add a check for a `flags` member in `struct timer_list`, which was added in 4.1-rc8. Add compatibility shims to `include/spl/sys/timer.h` to allow using the new timer APIs with the only two caveats being that the callback argument type must be declared as `spl_timer_list_t` and an explicit assignment is required to get the timer variable for the `timer_of()` macro. So the callback would look like this: ```c __cv_wakeup(spl_timer_list_t t) { struct timer_list tmr = (struct timer_list )t; struct thing parent = from_timer(parent, tmr, parent_timer_field); ... / do stuff with parent */ ``` Make some minor changes to `spl-condvar.c` and `spl-taskq.c` to use the new timer APIs instead of conditional code. Reviewed-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rafael Kitover <rkitover@gmail.com> Closes #8647	2019-05-23 14:40:28 -07:00
Brian Behlendorf	93ce2b4ca5	Update build system and packaging Minimal changes required to integrate the SPL sources in to the ZFS repository build infrastructure and packaging. Build system and packaging: * Renamed SPL_* autoconf m4 macros to ZFS_. Removed redundant SPL_* autoconf m4 macros. * Updated the RPM spec files to remove SPL package dependency. * The zfs package obsoletes the spl package, and the zfs-kmod package obsoletes the spl-kmod package. * The zfs-kmod-devel* packages were updated to add compatibility symlinks under /usr/src/spl-x.y.z until all dependent packages can be updated. They will be removed in a future release. * Updated copy-builtin script for in-kernel builds. * Updated DKMS package to include the spl.ko. * Updated stale AUTHORS file to include all contributors. * Updated stale COPYRIGHT and included the SPL as an exception. * Renamed README.markdown to README.md * Renamed OPENSOLARIS.LICENSE to LICENSE. * Renamed DISCLAIMER to NOTICE. Required code changes: * Removed redundant HAVE_SPL macro. * Removed _BOOT from nvpairs since it doesn't apply for Linux. * Initial header cleanup (removal of empty headers, refactoring). * Remove SPL repository clone/build from zimport.sh. * Use of DEFINE_RATELIMIT_STATE and DEFINE_SPINLOCK removed due to build issues when forcing C99 compilation. * Replaced legacy ACCESS_ONCE with READ_ONCE. * Include needed headers for `current` and `EXPORT_SYMBOL`. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> TEST_ZIMPORT_SKIP="yes" Closes #7556	2018-05-29 16:00:33 -07:00
Brian Behlendorf	5461eefe50	Fix cstyle warnings This patch contains no functional changes. It is solely intended to resolve cstyle warnings in order to facilitate moving the spl source code in to the zfs repository. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #681	2018-02-07 11:49:38 -08:00
Tony Hutter	c9821f1ccc	Linux 4.15 compat: timer updates Use timer_setup() macro and new timeout function definition. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #670 Closes #671	2017-12-21 10:56:32 -08:00
James Cowgill	35a44fcb8d	Remove all spin_is_locked calls On systems with CONFIG_SMP turned off, spin_is_locked always returns false causing these assertions to fail. Remove them as suggested in zfsonlinux/zfs#6558. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: James Cowgill <james.cowgill@mips.com> Closes #665	2017-10-30 11:16:56 -07:00
Boris Protopopov	9243b0fb47	Add assert under lock to detect cases of dispach of a preallocated taskq work item to more than one queue concurrently. Also, please see discussion in zfsonlinux/zfs#3840. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com> Closes #609	2017-08-08 08:31:52 -07:00
Chunwei Chen	cce83ba0ec	Fix use-after-free in taskq_seq_show_impl taskq_seq_show_impl walks the tq_active_list to show the tqent_func and tqent_arg. However for taskq_dispatch_ent, it's very likely that the task entry will be freed during the function call, and causes a use-after-free bug. To fix this, we duplicate the task entry to an on-stack struct, and assign it instead to tqt_task. This way, the tq_lock alone will guarantee its safety. Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #638 Closes #640	2017-08-04 09:57:58 -07:00
Chunwei Chen	f200b83673	Add system_delay_taskq for long delay Add a dedicated system_delay_taskq for long delay like spa_deadman and zpl_posix_acl_free. This will allow us to use system_taskq in the manner of dispatch multiple tasks and call taskq_wait_outstanding. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #588	2016-12-08 14:00:20 -07:00
Ubuntu	cbba714667	Add TASKQID_INVALID and TASKQID_INITIAL macros Add the TASKQID_INVALID and TASKQID_INITIAL macros and update the taskq implementation and test cases to use them. This is solely for the purposes of readability and introduces no functional change. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-11-02 10:34:19 -07:00
Chunwei Chen	b3a22a0a00	Fix taskq_wait_outstanding re-evaluate tq_next_id wait_event is a macro, so the current implementation will cause re- evaluation of tq_next_id every time it wakes up. This would cause taskq_wait_outstanding(tq, 0) to be equivalent to taskq_wait(tq) Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Issue #553	2016-05-24 13:02:10 -07:00
Chunwei Chen	5ce028b0d4	Fix race between taskq_destroy and dynamic spawning thread While taskq_destroy would wait for dynamic_taskq to finish its tasks, but it does not implies the thread being spawned is up and running. This will cause taskq to be freed before the thread can exit. We fix this by using tq_nspawn to indicate how many threads are being spawned before they are inserted to the thread list. And have taskq_destroy to wait for it to drop to zero. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Issue #553 Closes #550	2016-05-24 13:00:17 -07:00
Tim Chase	7bb5d92de8	Allow spawning a new thread for TQ_NOQUEUE dispatch with dynamic taskq When a TQ_NOQUEUE dispatch is done on a dynamic taskq, allow another thread to be spawned. This will cause TQ_NOQUEUE to behave similarly as it does with non-dynamic taskqs. Add support for TQ_NOQUEUE to taskq_dispatch_ent(). Signed-off-by: Tim Chase <tim@onlight.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #530	2016-03-17 09:52:35 -07:00
Chunwei Chen	8f3b403a73	Allow kicking a taskq to spawn more threads This patch add a module parameter spl_taskq_kick. When writing non-zero value to it, it will scan all the taskq, if a taskq contains a task pending for more than 5 seconds, it will be forced to spawn a new thread. This is use as an emergency recovery from deadlock, not a general solution. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #529	2016-02-05 14:08:31 -08:00
Chunwei Chen	16522ac290	Use tsd to store tq for taskq_member To prevent taskq_member holding tq_lock and doing linear search, thus causing contention. We store the taskq pointer to which the thread belongs in tsd. This way taskq_member will not need to touch tq_lock, and tsd has per slot spinlock. So the contention should be reduced greatly. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #500 Closes #504 Closes #505	2016-01-20 13:07:45 -08:00
Chunwei Chen	b4ad50ac5f	Use spl_fstrans_mark instead of memalloc_noio_save For earlier versions of the kernel with memalloc_noio_save, it only turns off __GFP_IO but leaves __GFP_FS untouched during direct reclaim. This would cause threads to direct reclaim into ZFS and cause deadlock. Instead, we should stick to using spl_fstrans_mark. Since we would explicitly turn off both __GFP_IO and __GFP_FS before allocation, it will work on every version of the kernel. This impacts kernel versions 3.9-3.17, see upstream kernel commit torvalds/linux@934f307 for reference. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #515 Issue zfsonlinux/zfs#4111	2015-12-18 13:24:52 -08:00
Tim Chase	200366f23f	Provide kstat for taskqs This patch provides 2 new kstats to display task queues: /proc/spl/taskqs-all - Display all task queues /proc/spl/taskqs - Display only "active" task queues A task queue is considered to be "active" if it currently has active (running) threads or if any of its pending, priority, delay or waitq lists are not empty. If the task queue has running threads, displays each thread function's address (symbolically, if possibly) and its argument. If the task queue has a non-empty list of pending, priority or delayed task queue entries (taskq_ent_t), displays each entry's thread function address and arguemnt. If the task queue has any waiters, displays each waiting task's pid. Note: This patch also updates some comments in taskq.h which referred to "taskq_t" when they should have referred to "taskq_ent_t". Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #491	2015-12-16 09:35:22 -08:00
Brian Behlendorf	2c4332cf79	Fix cstyle issues in spl-taskq.c and taskq.h This patch only addresses the issues identified by the style checker. It contains no functional changes. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-12-11 16:20:22 -08:00
Chunwei Chen	066b89e685	Don't use tq->tq_lock_flags The flags argument in spin_lock_irqsave is modified out side of spin_lock context. We cannot use a shared variable like tq->tq_lock_flags for them. This patch removes it and uses local variable for the flags. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #506	2015-12-11 16:20:03 -08:00
Olaf Faaland	326172d854	Subclass tq_lock to eliminate a lockdep warning When taskq_dispatch() calls taskq_thread_spawn() to create a new thread for a taskq, linux lockdep warns of possible recursive locking. This is a false positive. One such call chain is as follows, when a taskq needs more threads: taskq_dispatch->taskq_thread_spawn->taskq_dispatch The initial taskq_dispatch() holds tq_lock on the taskq that needed more worker threads. The later call into taskq_dispatch() takes dynamic_taskq->tq_lock. Without subclassing, lockdep believes these could potentially be the same lock and complains. A similar case occurs when taskq_dispatch() then calls task_alloc(). This patch uses spin_lock_irqsave_nested() when taking tq_lock, with one of two new lock subclasses: subclass taskq TQ_LOCK_DYNAMIC dynamic_taskq TQ_LOCK_GENERAL any other Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #480	2015-12-11 16:19:56 -08:00
Brian Behlendorf	c5a8b1e163	Revert "Make taskq_member() use ->journal_info" This reverts commit `a430c11f0b`. Using journal_info like this can cause a BUG at kernel fs/jbd2/transaction.c:425! Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #500	2015-12-08 17:12:36 -08:00
Richard Yao	a430c11f0b	Make taskq_member() use ->journal_info The ->journal_info pointer in the task_struct is reserved for use by filesystems and because the kernel can have multiple file systems on the same stack due to direct reclaim, each filesystem that touches ->journal_info in a callback function will save the value at the start of its frame and restore it at the end of its frame. This allows us to safely use ->journal_info to store a pointer to the taskq's struct in taskq threads so that ZFS code paths can detect the presence of a taskq. This could break if the ZFS code were to use taskq_member from the context of direct reclaim. However, there are no such uses of it in that manner, so this is safe. This eliminates an O(N) list traversal under a spinlock with an O(1) unlocked pointer comparison. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: tuxoko <tuxoko@gmail.com> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #500	2015-12-08 13:24:47 -08:00
tuxoko	f5f2b87df0	Fix taskq dynamic spawning Currently taskq_dispatch() will spawn new task with a condition that the caller is also a member of the taskq. However, under this condition, it will still cause deadlock where a task on tq1 is waiting another thread, who is trying to dispatch a task on tq1. So this patch removes the check. For example when you do: zfs send pp/fs0@001 \| zfs recv pp/fs0_copy This will easily deadlock before this patch. Also, move the seq_task check from taskq_thread_spawn() to taskq_thread() because it's not used by the caller from taskq_dispatch(). Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #496	2015-11-13 15:02:55 -08:00
Richard Yao	d4bf6d8429	Disable direct reclaim in taskq worker threads on Linux 3.9+ Illumos does not have direct reclaim and code run inside taskq worker threads is not designed to deal with it. Allowing direct reclaim inside a worker thread can therefore deadlock. We set PF_MEMALLOC_NOIO through memalloc_noio_save() to indicate to the kernel's reclaim code that we are inside a context where memory allocations cannot be allowed to block on filesystem activity. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue zfsonlinux/zfs#1274 Issue zfsonlinux/zfs#2390 Closes #474	2015-09-09 14:30:47 -07:00
Tim Chase	a64e55752f	Create a new thread during recursive taskq dispatch if necessary When dynamic taskq is enabled and all threads for a taskq are occupied, a recursive dispatch can cause a deadlock if calling thread depends on the recursively-dispatched thread for its return condition. This patch attempts to create a new thread for recursive dispatch when none are available. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #472	2015-09-01 08:46:41 -07:00
Brian Behlendorf	801b56090b	Revert "Create a new thread during recursive taskq dispatch if necessary" This reverts commit `076821e` due to a locking issue uncovered in subsequent testing. An ASSERT is hit due to tq->tq_nspawn being updated outside the lock. The patch will need to be reworked. VERIFY3(0 == tq->tq_nspawn) failed (0 == -1) Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #472	2015-08-31 17:03:01 -07:00
Tim Chase	076821eaff	Create a new thread during recursive taskq dispatch if necessary When dynamic taskq is enabled and all threads for a taskq are occupied, a recursive dispatch can cause a deadlock if calling thread depends on the recursively-dispatched thread for its return condition. This patch attempts to create a new thread for recursive dispatch when none are available. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #472	2015-08-31 15:52:06 -07:00
Brian Behlendorf	9dc5ffbec8	Invert minclsyspri and maxclsyspri On Linux the meaning of a processes priority is inverted with respect to illumos. High values on Linux indicate a _low_ priority while high value on illumos indicate a _high_ priority. In order to preserve the logical meaning of the minclsyspri and maxclsyspri macros when they are used by the illumos wrapper functions their values have been inverted. This way when changes are merged from upstream illumos we won't need to remember to invert the macro. It could also lead to confusion. Note this change also reverts some of the priorities changes in prior commit `62aa81a`. The rational is as follows: spl_kmem_cache - High priority may result in blocked memory allocs spl_system_taskq - May perform I/O for file backed VDEVs spl_dynamic_taskq - New taskq threads should be spawned promptly Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Issue zfsonlinux/zfs#3607	2015-07-28 13:59:03 -07:00
Brian Behlendorf	62aa81a577	Add defclsyspri macro Add a new defclsyspri macro which can be used to request the default Linux scheduler priority. Neither the minclsyspri or maxclsyspri map to the default Linux kernel thread priority. This makes it awkward to create taskqs which run with the same priority as the rest of the kernel threads on the system which can lead to performance issues. All SPL callers which previously used minclsyspri or maxclsyspri have been changed to use defclsyspri. The vast majority of callers were part of the test suite which won't have an external impact. The few places where it could impact performance the change was from maxclsyspri to defclsyspri. This makes it more likely the process will be scheduled which may help performance. To facilitate further performance analysis the spl_taskq_thread_priority module option has been added. When disabled (0) all newly created kernel threads will use the default kernel thread priority. When enabled (1) the specified taskq priority will be used. By default this value is enabled (1). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-07-23 13:25:49 -07:00
Brian Behlendorf	3c82160ff2	Set TASKQ_DYNAMIC for kmem and system taskqs Add the TASKQ_DYNAMIC flag to the kmem_cache and system taskqs to reduce the number of idle threads on the system. Additional threads will be created on demand up to the previous maximum thread counts. This should have minimal, if any, impact on performance. This makes the system taskq consistent with illumos which is always created as a dynamic taskq with up to 64 threads. The task limits for the kmem_cache have been increased to avoid any unnessisary throttling and to keep a larger reserve of task_t structures on the free list. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #458	2015-06-24 15:14:25 -07:00
Brian Behlendorf	f7a973d99b	Add TASKQ_DYNAMIC feature Setting the TASKQ_DYNAMIC flag will create a taskq with dynamic semantics. Initially only a single worker thread will be created to service tasks dispatched to the queue. As additional threads are needed they will be dynamically spawned up to the max number specified by 'nthreads'. When the threads are no longer needed, because the taskq is empty, they will automatically terminate. Due to the low cost of creating and destroying threads under Linux by default new threads and spawned and terminated aggressively. There are two modules options which can be tuned to adjust this behavior if needed. * spl_taskq_thread_sequential - The number of sequential tasks, without interruption, which needed to be handled by a worker thread before a new worker thread is spawned. Default 4. * spl_taskq_thread_dynamic - Provides the ability to completely disable the use of dynamic taskqs on the system. This is provided for the purposes of debugging and troubleshooting. Default 1 (enabled). This behavior is fundamentally consistent with the dynamic taskq implementation found in both illumos and FreeBSD. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #458	2015-06-24 15:14:18 -07:00
Chris Dunlop	a876b0305e	Make taskq_wait() block until the queue is empty Under Illumos taskq_wait() returns when there are no more tasks in the queue. This behavior differs from ZoL and FreeBSD where taskq_wait() returns when all the tasks in the queue at the beginning of the taskq_wait() call are complete. New tasks added whilst taskq_wait() is running will be ignored. This difference in semantics makes it possible that new subtle issues could be introduced when porting changes from Illumos. To avoid that possibility the taskq_wait() function is being updated such that it blocks until the queue in empty. The previous behavior remains available through the taskq_wait_outstanding() interface. Note that this function was previously called taskq_wait_all() but has been renamed to avoid confusion. Signed-off-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #455	2015-06-09 12:20:12 -07:00
Brian Behlendorf	8d9a23e82c	Retire legacy debugging infrastructure When the SPL was originally written Linux tracepoints were still in their infancy. Therefore, an entire debugging subsystem was added to facilite tracing which served us well for many years. Now that Linux tracepoints have matured they provide all the functionality of the previous tracing subsystem. Rather than maintain parallel functionality it makes sense to fully adopt tracepoints. Therefore, this patch retires the legacy debugging infrastructure. See zfsonlinux/zfs@bc9f413 for the tracepoint changes. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #408	2014-11-19 10:35:07 -08:00
Andrey Vesnovaty	703371d8c7	Evenly distribute the taskq threads across available CPUs The problem is described in commit `aeeb4e0c0a`. However, instead of disabling the binding to CPU altogether we just keep the last CPU index across calls to taskq_create() and thus achieve even distribution of the taskq threads across all available CPUs. The implementation based on assumption that task queues initialization performed in serial manner. Signed-off-by: Andrey Vesnovaty <andrey.vesnovaty@gmail.com> Signed-off-by: Andrey Vesnovaty <andreyv@infinidat.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #336	2014-04-25 15:29:18 -07:00
Tim Chase	17a527cb0f	Support post-3.13 kthread_create() semantics. Provide spl_kthread_create() as a wrapper to the kernel's kthread_create() to provide pre-3.13 semantics. Re-try if the call is interrupted or if it would have returned -ENOMEM. Otherwise return NULL. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #339	2014-04-08 12:44:42 -07:00
Brian Behlendorf	aeeb4e0c0a	Remove default taskq thread to CPU bindings When this code was written it appears to have been assumed that every taskq would have a large number of threads. In this case it would make sense to attempt to evenly bind the threads over all available CPUs. However, it failed to consider that creating taskqs with a small number of threads will cause the CPUs with lower ids become over-subscribed. For this reason the kthread_bind() call is being removed and we're leaving the kernel to schedule these threads as it sees fit. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #325	2014-01-07 10:46:24 -08:00
Brian Behlendorf	99c452bbba	Fix taskq_wait_id() The existing taskq_wait_id() function can incorrectly block indefinitely. Reimplement it more simply using wait_event() in a similar fashion to taskq_wait_all(). This flaw was uncovered in the context of moving vn_rdwr() to a taskq. Previously taskq_wait_id() had no consumers outside the SPLAT task framework which is why the issue went unnoticed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-05-03 14:32:29 -07:00
Ned Bass	3d6af2dd6d	Refresh links to web site Update links to refer to the official ZFS on Linux website instead of @behlendorf's personal fork on github. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-03-04 19:09:34 -08:00
Brian Behlendorf	d9acd930b5	taskq delay/cancel functionality Add the ability to dispatch a delayed task to a taskq. The desired behavior is for the task to be queued but not executed by a worker thread until the expiration time is reached. To achieve this two new functions were added. * taskq_dispatch_delay() - This function behaves exactly like taskq_dispatch() however it takes a third 'expire_time' argument. The caller should pass the desired time the task should be executed as an absolute value in jiffies. The task is guarenteed not to run before this time, it may run slightly latter if all the worker threads are busy. * taskq_cancel_id() - Given a task id attempt to cancel the task before it gets executed. This is primarily useful for canceling delay tasks but can be used for canceling any previously dispatched task. There are three possible return values. 0 - The task was found and canceled before it was executed. ENOENT - The task was not found, either it was already run or an invalid task id was supplied by the caller. EBUSY - The task is currently executing any may not be canceled. This function will block until the task has been completed. * taskq_wait_all() - The taskq_wait_id() function was renamed taskq_wait_all() to more clearly reflect its actual behavior. It is only curreny used by the splat taskq regression tests. * taskq_wait_id() - Historically, the only difference between this function and taskq_wait() was that you passed the task id. In both functions you would block until ALL lower task ids which executed. This was semantically correct but could be very slow particularly if there were delay tasks submitted. To better accomidate the delay tasks this function was reimplemnted. It will now only block until the passed task id has been completed. This is actually a fairly low risk change for a few reasons. * Only new ZFS callers will make use of the new interfaces and very little common code was changed to support the new functions. * The existing taskq_wait() implementation was not changed just slightly refactored. * The newly optimized taskq_wait_id() implementation was never used by ZFS we can't accidentally introduce a new bug there. NOTE: This functionality does not exist in the Illumos taskqs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-12-12 09:54:07 -08:00
Brian Behlendorf	aed8671cb0	taskq style, remove #define wrappers When the taskq implementation was originally written I wrapped all the API functions in #define's. This was done as a preventative measure to ensure that a taskq symbol never conflicted with an existing kernel symbol. However, in practice the taskq symbols never conflicted. The only major conflicts occured with the kmem cache API. Since this added layer of obfuscation never bought us anything for the taskq's I'm removing it. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-12-12 09:54:07 -08:00
Brian Behlendorf	472a34caff	taskq style, convert spaces to soft tabs Update the taskq implementation to conform with the style used throughout the rest of the code. There are no functional changes in this commit. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-12-12 09:54:07 -08:00
Brian Behlendorf	9b51f21841	Remove TQ_SLEEP -> KM_SLEEP mapping When the taskq code was originally written it seemed like a good idea to simply map TQ_SLEEP to KM_SLEEP. Unfortunately, this assumed that the TQ_* flags would never confict with any of the Linux GFP_* flags. When adding the TQ_PUSHPAGE support in commit `cd5ca4b` this invariant was accidentally broken. Therefore to support TQ_PUSHPAGE, which is needed for Linux, and prevent any further confusion I have removed this direct mapping. The TQ_SLEEP, TQ_NOSLEEP, and TQ_PUSHPAGE are no longer defined in terms of their KM_* counterparts. Instead a simple mapping function is introduce to convert TQ_* -> KM_* where needed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #171	2012-09-12 11:41:42 -07:00
Brian Behlendorf	330fe010e4	Revert "Switch KM_SLEEP to KM_PUSHPAGE" This reverts commit `cd5ca4b2f8` due to conflicts in the higher TQ_ bits which caused incorrect behavior. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-09-12 10:07:48 -07:00
Brian Behlendorf	cd5ca4b2f8	Switch KM_SLEEP to KM_PUSHPAGE Under certain circumstances the following functions may be called in a context where KM_SLEEP is unsafe and can result in a deadlocked system. To avoid this problem the unconditional KM_SLEEPs are converted to KM_PUSHPAGEs. This will prevent them from attempting to initiate any I/O during direct reclaim. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-27 12:00:55 -07:00
Brian Behlendorf	d47e664ad4	Revert "Add TASKQ_NORECLAIM flag" This reverts commit `372c257233`. The use of the PF_MEMALLOC flag was always a hack to work around memory reclaim deadlocks. Those issues are believed to be resolved so this workaround can be safely reverted. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-27 12:00:42 -07:00
Prakash Surya	05b8f50c33	Update a comment to reflect new taskq internals As of the removal of the taskq work list made in commit: commit `2c02b71b14` Author: Prakash Surya <surya1@llnl.gov> Date: Mon Dec 5 17:32:48 2011 -0800 Replace tq_work_list and tq_threads in taskq_t To lay the ground work for introducing the taskq_dispatch_prealloc() interface, the tq_work_list and tq_threads fields had to be replaced with new alternatives in the taskq_t structure. the comment above taskq_wait_check has been incorrect. This change is an attempt at bringing that description more in line with the current implementation. Essentially, references to the old task work list had to be updated to reference the new taskq thread active list. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #65	2012-04-30 10:49:15 -07:00
Ned Bass	3c6ed5410b	Taskq locking optimizations Testing has shown that tq->tq_lock can be highly contended when a large number of small work items are dispatched. The lock hold time is reduced by the following changes: 1) Use exclusive threads in the work_waitq When a single work item is dispatched we only need to wake a single thread to service it. The current implementation uses non-exclusive threads so all threads are woken when the dispatcher calls wake_up(). If a large number of threads are in the queue this overhead can become non-negligible. 2) Conditionally add/remove threads from work waitq Taskq threads need only add themselves to the work wait queue if there are no pending work items. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #32	2012-01-19 14:42:49 -08:00
Ned Bass	0bb43ca282	Revert "Taskq locking optimizations" This reverts commit `ec2b41049f`. A race condition was introduced by which a wake_up() call can be lost after the taskq thread determines there is no pending work items, leading to deadlock: 1. taksq thread enables interrupts 2. dispatcher thread runs, queues work item, call wake_up() 3. taskq thread runs, adds self to waitq, sleeps This could easily happen if an interrupt for an IO completion was outstanding at the point where the taskq thread reenables interrupts, just before the call to add_wait_queue_exclusive(). The handler would run immediately within the race window. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #32	2012-01-19 14:42:39 -08:00
Ned Bass	ec2b41049f	Taskq locking optimizations Testing has shown that tq->tq_lock can be highly contended when a large number of small work items are dispatched. The lock hold time is reduced by the following changes: 1) Use exclusive threads in the work_waitq When a single work item is dispatched we only need to wake a single thread to service it. The current implementation uses non-exclusive threads so all threads are woken when the dispatcher calls wake_up(). If a large number of threads are in the queue this overhead can become non-negligible. 2) Conditionally add/remove threads from work waitq outside of tq_lock Taskq threads need only add themselves to the work wait queue if there are no pending work items. Furthermore, the add and remove function calls can be made outside of the taskq lock since the wait queues are protected from concurrent access by their own spinlocks. 3) Call wake_up() outside of tq->tq_lock Again, the wait queues are protected by their own spinlock, so the dispatcher functions can drop tq->tq_lock before calling wake_up(). A new splat test taskq:contention was added in a prior commit to measure the impact of these changes. The following table summarizes the results using data from the kernel lock profiler. tq_lock time %diff Wall clock (s) %diff original: 39117614.10 0 41.72 0 exclusive threads: 31871483.61 18.5 34.2 18.0 unlocked add/rm waitq: 13794303.90 64.7 16.17 61.2 unlocked wake_up(): 1589172.08 95.9 16.61 60.2 Each row reflects the average result over 5 test runs. /proc/lock_stats was zeroed out before and collected after each run. Column 1 is the cumulative hold time in microseconds for tq->tq_lock. The tests are cumulative; each row reflects the code changes of the previous rows. %diff is calculated with respect to "original" as 100*(orig-new)/orig. Although calling wake_up() outside of the taskq lock dramatically reduced the taskq lock hold time, the test actually took slightly more wall clock time. This is because the point of contention shifts from the taskq lock to the wait queue lock. But the change still seems worthwhile since it removes our taskq implementation as a bottleneck, assuming the small increase in wall clock time to be statistical noise. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #32	2012-01-18 10:36:57 -08:00
Prakash Surya	8f2503e0af	Store copy of tqent_flags prior to servicing task A preallocated taskq_ent_t's tqent_flags must be checked prior to servicing the taskq_ent_t. Once a preallocated taskq entry is serviced, the ownership of the entry is handed back to the caller of taskq_dispatch, thus the entry's contents can potentially be mangled. In particular, this is a problem in the case where a preallocated taskq entry is serviced, and the caller clears it's tqent_flags field. Thus, when the function returns and task_done is called, it looks as though the entry is not a preallocated task (when in fact it is a preallocated task). In this situation, task_done will place the preallocated taskq_ent_t structure onto the taskq_t's free list. This is a huge mistake. If the taskq_ent_t is then freed by the caller of taskq_dispatch, the taskq_t's free list will hold a pointer to garbage data. Even worse, if nothing has over written the freed memory before the pointer is dereferenced, it may still look as though it points to a valid list_head belonging to a taskq_ent_t structure. Thus, the task entry's flags are now copied prior to servicing the task. This copy is then checked to see if it is a preallocated task, and determine if the entry needs to be passed down to the task_done function. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #71	2011-12-16 16:54:00 -08:00
Prakash Surya	e7e5f78e7b	Swap taskq_ent_t with taskqid_t in taskq_thread_t The taskq_t's active thread list is sorted based on its tqt_ent->tqent_id field. The list is kept sorted solely by inserting new taskq_thread_t's in their correct sorted location; no other means is used. This means that once inserted, if a taskq_thread_t's tqt_ent->tqent_id field changes, the list runs the risk of no longer being sorted. Prior to the introduction of the taskq_dispatch_prealloc() interface, this was not a problem as a taskq_ent_t actively being serviced under the old interface should always have a static tqent_id field. Thus, once the taskq_thread_t is added to the taskq_t's active thread list, the taskq_thread_t's tqt_ent->tqent_id field would remain constant. Now, this is no longer the case. Currently, if using the taskq_dispatch_prealloc() interface, any given taskq_ent_t actively being serviced _may_ have its tqent_id value incremented. This happens when the preallocated taskq_ent_t structure is recursively dispatched. Thus, a taskq_thread_t could potentially have its tqt_ent->tqent_id field silently modified from under its feet. If this were to happen to a taskq_thread_t on a taskq_t's active thread list, this would compromise the integrity of the order of the list (as the list _may_ no longer be sorted). To get around this, the taskq_thread_t's taskq_ent_t pointer was replaced with its own static copy of the tqent_id. So, as a taskq_ent_t is pulled off of the taskq_t's pending list, a static copy of its tqent_id is made and this copy is used to sort the active thread list. Using a static copy is key in ensuring the integrity of the order of the active thread list. Even if the underlying taskq_ent_t is recursively dispatched (as has its tqent_id modified), this static copy stored inside the taskq_thread_t will remain constant. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #71	2011-12-16 13:26:54 -08:00

1 2

67 Commits