The taskq:front test has a race condition where task 4 and 8
race to complete, due to an incorrectly calculated set of delay
"factors" (T). If task 4 wins and actually finishes first, the
verification of the order of completion will fail.
The delays calculated to order task completion do not take into
account the terminal line in the table, and so are all off by
a factor of 1. This causes all the tasks in all queues to finish
sooner than expected and the accumulated error is the root cause
of tasks 4 and 8 racing to complete first. Before the change the
"actual" table looks like I commented in #130.
I changed:
* the table in the comment to correctly reflect the test and the
factor timings needed.
* the individual task delay factors of T so that ONLY 1 task will
every 2T. (on average)
* 1T was reduced from 100ms to 50ms. This halves the duration of
the test and makes any remaining raciness more likely to cause
failures, but it did not cause the test to fail.
* simplified the delay factor logic by using a table look-up
instead of a switch.
* Added a "task started" message so that with -v it is possible
to see the order tasks are started.
* Moved the "task completed" message inside the spinlock so that
with -v the message truly reflects the absolute order of
completion as guaranteed by the spinlock.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#130
Restructure the the SPLAT headers such that each test only
includes the minimal set of headers it requires.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Add a test designed to generate contention on the taskq spinlock by
using a large number of threads (100) to perform a large number (131072)
of trivial work items from a single queue. This simulates conditions
that may occur with the zio free taskq when a 1TB file is removed from a
ZFS filesystem, for example. This test should always pass. Its purpose
is to provide a benchmark to easily measure the effectiveness of taskq
optimizations using statistics from the kernel lock profiler.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #32
The splat-taskq test functions were slightly modified to exercise
the new taskq interface in addition to the old interface. If the
old interface passes each of its tests, the new interface is
exercised. Both sub tests (old interface and new interface) must
pass for each test as a whole to pass.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#65
Added another splat taskq test to ensure tasks can be recursively
submitted to a single task queue without issue. When the
taskq_dispatch_prealloc() interface is introduced, this use case
can potentially cause a deadlock if a taskq_ent_t is dispatched
while its tqent_list field is not empty. This _should_ never be
a problem with the existing taskq_dispatch() interface.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #65
This change adds the neglected SPLAT_TEST_FINI call for the
SPLAT_TASKQ_TEST6_ID, just as is done for the other 5 SPLAT_TASKQ_*
tests.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#64
The splat_taskq_test4_common function was incorrectly referencing
the splat_taskq-test13_func symbol, when it meant to be using the
splat_taskq_test4_func symbol.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#61
When TQ_SLEEP is used, taskq_dispatch() should always succeed even if the
number of pending tasks is above tq->tq_maxalloc. This semantic is similar
to KM_SLEEP in kmem allocations, which also always succeed.
However, we cannot block forever otherwise there is a risk of deadlock.
Therefore, we still allow the number of pending tasks to go above
tq->tq_maxalloc with TQ_SLEEP, but we may sleep up to 1 second per task
dispatch, thereby throttling the task dispatch rate.
One of the existing splat tests was also augmented to test for this scenario.
The test would fail with the previous implementation but now it succeeds.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Use 3 threads and 8 tasks. Dispatch the final 3 tasks with TQ_FRONT.
The first three tasks keep the worker threads busy while we stuff the
queues. Use msleep() to force a known execution order, assuming
TQ_FRONT is properly honored. Verify that the expected completion
order occurs.
The splat_taskq_test5_order() function may be useful in more than
one test. This commit generalizes it by renaming the function to
splat_taskq_test_order() and adding a name argument instead of
assuming SPLAT_TASKQ_TEST5_NAME as the test name.
The documentation for splat taskq regression test #5 swaps the two required
completion orders in the diagram. This commit corrects the error.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Updated AUTHORS, COPYING, DISCLAIMER, and INSTALL files. Added
standardized headers to all source file to clearly indicate the
copyright, license, and to give credit where credit is due.
This test case verifies the correct behavior of taskq_wait_id().
In particular it ensure the the following two cases are handled
properly:
1) Task ids larger than the waited for task id can run and
complete as long as there is an available worker thread.
2) All task ids lower than the waited one must complete before
unblocking even if the waited task id itself has completed.
- Proper ioctl() 32/64-bit binary compatibility. We need to ensure the
ioctl data itself is always packed the same for 32/64-bit binaries.
Additionally, the correct thing to do is encode this size in bytes
as part of the command using _IOC_SIZE().
- Minor formatting changes to respect the 80 character limit.
- Move all SPLAT_SUBSYSTEM_* defines in to splat-ctl.h.
- Increase SPLAT_SUBSYSTEM_UNKNOWN because we were getting close
to accidentally using it for a real registered subsystem.
I'm very surprised this has not surfaced until now. But the taskq_wait()
implementation work only wait successfully the first time it was called.
Subsequent usage of taskq_wait() on the taskq would not wait.
The issue was caused by tq->tq_lowest_id being set to MAX_INT after the
first wait completed. This caused subsequent waits which check that the
waiting id is less than the lowest taskq id to always succeed. The fix
is to ensure that tq->tq_lowest_id is never set larger than tq->tq_next.id.
Additional fixes which were added to this patch include:
1) Fix a race by placing the taskq_wait_check() in the tq->tq_lock spinlock.
2) taskq_wait() should wait for the largest outstanding id.
3) Multiple spelling corrections.
4) Added taskq wait regression test to validate correct behavior.