Setting the TASKQ_DYNAMIC flag will create a taskq with dynamic
semantics. Initially only a single worker thread will be created
to service tasks dispatched to the queue. As additional threads
are needed they will be dynamically spawned up to the max number
specified by 'nthreads'. When the threads are no longer needed,
because the taskq is empty, they will automatically terminate.
Due to the low cost of creating and destroying threads under Linux
by default new threads and spawned and terminated aggressively.
There are two modules options which can be tuned to adjust this
behavior if needed.
* spl_taskq_thread_sequential - The number of sequential tasks,
without interruption, which needed to be handled by a worker
thread before a new worker thread is spawned. Default 4.
* spl_taskq_thread_dynamic - Provides the ability to completely
disable the use of dynamic taskqs on the system. This is provided
for the purposes of debugging and troubleshooting. Default 1
(enabled).
This behavior is fundamentally consistent with the dynamic taskq
implementation found in both illumos and FreeBSD.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes#458
Under Illumos taskq_wait() returns when there are no more tasks
in the queue. This behavior differs from ZoL and FreeBSD where
taskq_wait() returns when all the tasks in the queue at the
beginning of the taskq_wait() call are complete. New tasks
added whilst taskq_wait() is running will be ignored.
This difference in semantics makes it possible that new subtle
issues could be introduced when porting changes from Illumos.
To avoid that possibility the taskq_wait() function is being
updated such that it blocks until the queue in empty.
The previous behavior remains available through the
taskq_wait_outstanding() interface. Note that this function
was previously called taskq_wait_all() but has been renamed
to avoid confusion.
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#455
Slightly increasing the size of a kmutex_t has caused us to exceed
the stack frame warning size in splat_taskq_test2_impl(). To address
this the tq_args have been moved to the heap.
cc1: warnings being treated as errors
spl-0.6.3/module/splat/splat-taskq.c:358:
error: the frame size of 1040 bytes is larger than 1024 bytes
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Issue #435
This change introduces no functional changes to the memory management
interfaces. It only restructures the existing codes by separating the
kmem, vmem, and kmem cache implementations in the separate source and
header files.
Splitting this functionality in to separate files required the addition
of spl_vmem_{init,fini}() and spl_kmem_cache_{initi,fini}() functions.
Additionally, several minor changes to the #include's were required to
accommodate the removal of extraneous header from kmem.h.
But again, while large this patch introduces no functional changes.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
When running the SPLAT tests on a kernel with CONFIG_DEBUG_OBJECTS=y
enabled the following warning is generated.
ODEBUG: object is on stack, but not annotated
WARNING: at lib/debugobjects.c:300 __debug_object_init+0x221/0x480()
This is caused by the test cases placing a debug object on the stack
rather than the heap. This isn't harmful since they are small objects
but to make CONFIG_DEBUG_OBJECTS=y happy the objects have been relocated
to the heap. This impacted taskq tests 1, 3, and 7.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#424
Don't include the compatibility code in linux/*_compat.h in the public
header sys/types.h. This causes problems when an external code base
includes the ZFS headers and has its own conflicting compatibility code.
Lustre, in particular, defined SHRINK_STOP for compatibility with
pre-3.12 kernels in a way that conflicted with the SPL's definition.
Because Lustre ZFS OSD includes ZFS headers it fails to build due to a
'"SHRINK_STOP" redefined' compiler warning. To avoid such conflicts
only include the compat headers from .c files or private headers.
Also, for consistency, include sys/*.h before linux/*.h then sort by
header name.
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#411
While running SPLAT on a kernel with CONFIG_DEBUG_ATOMIC_SLEEP
enabled the taskq:front was flagged as a test which might sleep
which in an unsafe context. Specifically, the splat_vprint()
function which internally takes a mutex was being called under
a spin lock. Moving the log function outside the spin lock
cleanly solves this issue.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
When comparing times gotten from ddi_get_lbolt, we have to take account of
wrap around of jiffies. Therefore, we cannot use 't1 < t2'. Instead we should
use 't1 - t2 < 0'.
This patch add ddi_time_after and friends to address this issue. They have
strict type restriction, clock_t for vanilla and int64_t for 64 version, to
prevent type conversion from screwing things.
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#335
Update links to refer to the official ZFS on Linux website instead of
@behlendorf's personal fork on github.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The slightly increased size of the taskq_ent_t when debugging is
enabled has pushed the taskq:front splat test over frame size
limit. To resolve this dynamically allocate the taskq_ent_t
structures so they are part of the heap instead of the stack.
In function 'splat_taskq_test6_impl'
error: the frame size of 1648 bytes is larger than 1024 bytes
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The slightly increased size of the taskq_ent_t when debugging is
enabled has pushed the taskq:order splat test over frame size
limit. To resolve this dynamically allocate the taskq_ent_t
structures so they are part of the heap instead of the stack.
In function 'splat_taskq_test5_impl'
error: the frame size of 1680 bytes is larger than 1024 bytes
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Add a test case for taskq_cancel_id() to verify it is working
properly. Just like taskq:delay we start by dispatching 100
tasks. However this time 1/3 of the tasks use taskq_dispatch()
and will be run immediately, and 2/3 use taskq_dispatch_delay().
The idea is to create a busy taskq with both active, pending,
and delayed tasks.
After all the items have been successfully dispatched the test
begins randomly canceling known task ids. It will do this for
5 seconds randomly canceling a task id and then sleeping for a
few milliseconds. The task being canceled may have already run,
still be on the pending list, or may be currently being executed
by a worker thread. The idea is to ensure we catch any subtle
race conditions.
Once all the non-canceled tasks have completed we cross check
the number of tasks which ran with the number of tasks which
were successfully canceled. Additionally, we verify that the
taskq_cancel_id() function never blocks longer than needed.
This time is bounded by the longest run time of the task which
was dispatched.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Add a test case for taskq_dispatch_delay() to verify it is working
properly. The test dispatchs 100 tasks to a taskq with random
expiration times spread over 5 seconds. As each task expires and
gets executed by a worker thread it verifies that it was run at
the correct time. Once all the delayed tasks have been executed
we double check that all the dispatched tasks were successful.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Add the ability to dispatch a delayed task to a taskq. The desired
behavior is for the task to be queued but not executed by a worker
thread until the expiration time is reached. To achieve this two
new functions were added.
* taskq_dispatch_delay() -
This function behaves exactly like taskq_dispatch() however it
takes a third 'expire_time' argument. The caller should pass the
desired time the task should be executed as an absolute value in
jiffies. The task is guarenteed not to run before this time, it
may run slightly latter if all the worker threads are busy.
* taskq_cancel_id() -
Given a task id attempt to cancel the task before it gets executed.
This is primarily useful for canceling delay tasks but can be used for
canceling any previously dispatched task. There are three possible
return values.
0 - The task was found and canceled before it was executed.
ENOENT - The task was not found, either it was already run or an
invalid task id was supplied by the caller.
EBUSY - The task is currently executing any may not be canceled.
This function will block until the task has been completed.
* taskq_wait_all() -
The taskq_wait_id() function was renamed taskq_wait_all() to more
clearly reflect its actual behavior. It is only curreny used by
the splat taskq regression tests.
* taskq_wait_id() -
Historically, the only difference between this function and
taskq_wait() was that you passed the task id. In both functions you
would block until ALL lower task ids which executed. This was
semantically correct but could be very slow particularly if there
were delay tasks submitted.
To better accomidate the delay tasks this function was reimplemnted.
It will now only block until the passed task id has been completed.
This is actually a fairly low risk change for a few reasons.
* Only new ZFS callers will make use of the new interfaces and
very little common code was changed to support the new functions.
* The existing taskq_wait() implementation was not changed just
slightly refactored.
* The newly optimized taskq_wait_id() implementation was never
used by ZFS we can't accidentally introduce a new bug there.
NOTE: This functionality does not exist in the Illumos taskqs.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The taskq:front test has a race condition where task 4 and 8
race to complete, due to an incorrectly calculated set of delay
"factors" (T). If task 4 wins and actually finishes first, the
verification of the order of completion will fail.
The delays calculated to order task completion do not take into
account the terminal line in the table, and so are all off by
a factor of 1. This causes all the tasks in all queues to finish
sooner than expected and the accumulated error is the root cause
of tasks 4 and 8 racing to complete first. Before the change the
"actual" table looks like I commented in #130.
I changed:
* the table in the comment to correctly reflect the test and the
factor timings needed.
* the individual task delay factors of T so that ONLY 1 task will
every 2T. (on average)
* 1T was reduced from 100ms to 50ms. This halves the duration of
the test and makes any remaining raciness more likely to cause
failures, but it did not cause the test to fail.
* simplified the delay factor logic by using a table look-up
instead of a switch.
* Added a "task started" message so that with -v it is possible
to see the order tasks are started.
* Moved the "task completed" message inside the spinlock so that
with -v the message truly reflects the absolute order of
completion as guaranteed by the spinlock.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#130
Restructure the the SPLAT headers such that each test only
includes the minimal set of headers it requires.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Add a test designed to generate contention on the taskq spinlock by
using a large number of threads (100) to perform a large number (131072)
of trivial work items from a single queue. This simulates conditions
that may occur with the zio free taskq when a 1TB file is removed from a
ZFS filesystem, for example. This test should always pass. Its purpose
is to provide a benchmark to easily measure the effectiveness of taskq
optimizations using statistics from the kernel lock profiler.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #32
The splat-taskq test functions were slightly modified to exercise
the new taskq interface in addition to the old interface. If the
old interface passes each of its tests, the new interface is
exercised. Both sub tests (old interface and new interface) must
pass for each test as a whole to pass.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#65
Added another splat taskq test to ensure tasks can be recursively
submitted to a single task queue without issue. When the
taskq_dispatch_prealloc() interface is introduced, this use case
can potentially cause a deadlock if a taskq_ent_t is dispatched
while its tqent_list field is not empty. This _should_ never be
a problem with the existing taskq_dispatch() interface.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #65
This change adds the neglected SPLAT_TEST_FINI call for the
SPLAT_TASKQ_TEST6_ID, just as is done for the other 5 SPLAT_TASKQ_*
tests.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#64
The splat_taskq_test4_common function was incorrectly referencing
the splat_taskq-test13_func symbol, when it meant to be using the
splat_taskq_test4_func symbol.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#61
When TQ_SLEEP is used, taskq_dispatch() should always succeed even if the
number of pending tasks is above tq->tq_maxalloc. This semantic is similar
to KM_SLEEP in kmem allocations, which also always succeed.
However, we cannot block forever otherwise there is a risk of deadlock.
Therefore, we still allow the number of pending tasks to go above
tq->tq_maxalloc with TQ_SLEEP, but we may sleep up to 1 second per task
dispatch, thereby throttling the task dispatch rate.
One of the existing splat tests was also augmented to test for this scenario.
The test would fail with the previous implementation but now it succeeds.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Use 3 threads and 8 tasks. Dispatch the final 3 tasks with TQ_FRONT.
The first three tasks keep the worker threads busy while we stuff the
queues. Use msleep() to force a known execution order, assuming
TQ_FRONT is properly honored. Verify that the expected completion
order occurs.
The splat_taskq_test5_order() function may be useful in more than
one test. This commit generalizes it by renaming the function to
splat_taskq_test_order() and adding a name argument instead of
assuming SPLAT_TASKQ_TEST5_NAME as the test name.
The documentation for splat taskq regression test #5 swaps the two required
completion orders in the diagram. This commit corrects the error.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Updated AUTHORS, COPYING, DISCLAIMER, and INSTALL files. Added
standardized headers to all source file to clearly indicate the
copyright, license, and to give credit where credit is due.
This test case verifies the correct behavior of taskq_wait_id().
In particular it ensure the the following two cases are handled
properly:
1) Task ids larger than the waited for task id can run and
complete as long as there is an available worker thread.
2) All task ids lower than the waited one must complete before
unblocking even if the waited task id itself has completed.
- Proper ioctl() 32/64-bit binary compatibility. We need to ensure the
ioctl data itself is always packed the same for 32/64-bit binaries.
Additionally, the correct thing to do is encode this size in bytes
as part of the command using _IOC_SIZE().
- Minor formatting changes to respect the 80 character limit.
- Move all SPLAT_SUBSYSTEM_* defines in to splat-ctl.h.
- Increase SPLAT_SUBSYSTEM_UNKNOWN because we were getting close
to accidentally using it for a real registered subsystem.
I'm very surprised this has not surfaced until now. But the taskq_wait()
implementation work only wait successfully the first time it was called.
Subsequent usage of taskq_wait() on the taskq would not wait.
The issue was caused by tq->tq_lowest_id being set to MAX_INT after the
first wait completed. This caused subsequent waits which check that the
waiting id is less than the lowest taskq id to always succeed. The fix
is to ensure that tq->tq_lowest_id is never set larger than tq->tq_next.id.
Additional fixes which were added to this patch include:
1) Fix a race by placing the taskq_wait_check() in the tq->tq_lock spinlock.
2) taskq_wait() should wait for the largest outstanding id.
3) Multiple spelling corrections.
4) Added taskq wait regression test to validate correct behavior.