Fix taskq NULL pointer dereference on timer race

Remove unsafe timer_pending() check in taskq_cancel_id() that created a
race where:
- Timer expires and timer_pending() returns FALSE
- task_done() frees task with tqent_func = NULL
- Timer callback executes and queues freed task
- Worker thread crashes executing NULL function

Always call timer_delete_sync() unconditionally to ensure timer callback
completes before task is freed.

Reliably reproducible by injecting mdelay(10) after setting CANCEL flag
to widen the race window, combined with frequent task cancellations
(e.g., snapshot automount expiry).

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17942
This commit is contained in:
Ameer Hamza 2025-11-19 21:21:10 +05:00 committed by Brian Behlendorf
parent 145c606c60
commit 663dc86de2

View File

@ -635,14 +635,31 @@ taskq_cancel_id(taskq_t *tq, taskqid_t id)
/*
* The task_expire() function takes the tq->tq_lock so drop
* drop the lock before synchronously cancelling the timer.
* the lock before synchronously cancelling the timer.
*
* Always call timer_delete_sync() unconditionally. A
* timer_pending() check would be insufficient and unsafe.
* When a timer expires, it is immediately dequeued from the
* timer wheel (timer_pending() returns FALSE), but the
* callback (task_expire) may not run until later.
*
* The race window:
* 1) Timer expires and is dequeued - timer_pending() now
* returns FALSE
* 2) task_done() is called below, freeing the task, sets
* tqent_func = NULL and clears flags including CANCEL
* 3) Timer callback finally runs, sees no CANCEL flag,
* queues task to prio_list
* 4) Worker thread attempts to execute NULL tqent_func
* and panics
*
* timer_delete_sync() prevents this by ensuring the timer
* callback completes before the task is freed.
*/
if (timer_pending(&t->tqent_timer)) {
spin_unlock_irqrestore(&tq->tq_lock, flags);
timer_delete_sync(&t->tqent_timer);
spin_lock_irqsave_nested(&tq->tq_lock, flags,
tq->tq_lock_class);
}
spin_unlock_irqrestore(&tq->tq_lock, flags);
timer_delete_sync(&t->tqent_timer);
spin_lock_irqsave_nested(&tq->tq_lock, flags,
tq->tq_lock_class);
if (!(t->tqent_flags & TQENT_FLAG_PREALLOC))
task_done(tq, t);