Missed wakeup when growing kmem cache

When growing the size of a (VMEM or KVMEM) kmem cache, spl_cache_grow()
always does taskq_dispatch(spl_cache_grow_work), and then waits for the
KMC_BIT_GROWING to be cleared by the taskq thread.

The taskq thread (spl_cache_grow_work()) does:
1. allocate new slab and add to list
2. wake_up_all(skc_waitq)
3. clear_bit(KMC_BIT_GROWING)

Therefore, the waiting thread can wake up before GROWING has been
cleared.  It will see that the growing has not yet completed, and go
back to sleep until it hits the 100ms timeout.

This can have an extreme performance impact on workloads that alloc/free
more than fits in the (statically-sized) magazines.  These workloads
allocate and free slabs with high frequency.

The problem can be observed with `funclatency spl_cache_grow`, which on
some workloads shows that 99.5% of the time it takes <64us to allocate
slabs, but we spend ~70% of our time in outliers, waiting for the 100ms
timeout.

The fix is to do `clear_bit(KMC_BIT_GROWING)` before
`wake_up_all(skc_waitq)`.

A future investigation should evaluate if we still actually need to
taskq_dispatch() at all, and if so on which kernel versions.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #9989
This commit is contained in:
Matthew Ahrens 2020-02-13 11:23:02 -08:00 committed by Tony Hutter
parent f3bf67d04d
commit d4e04cc145

View File

@ -1176,7 +1176,6 @@ __spl_cache_grow(spl_kmem_cache_t *skc, int flags)
smp_mb__before_atomic(); smp_mb__before_atomic();
clear_bit(KMC_BIT_DEADLOCKED, &skc->skc_flags); clear_bit(KMC_BIT_DEADLOCKED, &skc->skc_flags);
smp_mb__after_atomic(); smp_mb__after_atomic();
wake_up_all(&skc->skc_waitq);
} }
spin_unlock(&skc->skc_lock); spin_unlock(&skc->skc_lock);
@ -1189,12 +1188,14 @@ spl_cache_grow_work(void *data)
spl_kmem_alloc_t *ska = (spl_kmem_alloc_t *)data; spl_kmem_alloc_t *ska = (spl_kmem_alloc_t *)data;
spl_kmem_cache_t *skc = ska->ska_cache; spl_kmem_cache_t *skc = ska->ska_cache;
(void) __spl_cache_grow(skc, ska->ska_flags); int error = __spl_cache_grow(skc, ska->ska_flags);
atomic_dec(&skc->skc_ref); atomic_dec(&skc->skc_ref);
smp_mb__before_atomic(); smp_mb__before_atomic();
clear_bit(KMC_BIT_GROWING, &skc->skc_flags); clear_bit(KMC_BIT_GROWING, &skc->skc_flags);
smp_mb__after_atomic(); smp_mb__after_atomic();
if (error == 0)
wake_up_all(&skc->skc_waitq);
kfree(ska); kfree(ska);
} }
@ -1245,9 +1246,11 @@ spl_cache_grow(spl_kmem_cache_t *skc, int flags, void **obj)
*/ */
if (!(skc->skc_flags & KMC_VMEM)) { if (!(skc->skc_flags & KMC_VMEM)) {
rc = __spl_cache_grow(skc, flags | KM_NOSLEEP); rc = __spl_cache_grow(skc, flags | KM_NOSLEEP);
if (rc == 0) if (rc == 0) {
wake_up_all(&skc->skc_waitq);
return (0); return (0);
} }
}
/* /*
* This is handled by dispatching a work request to the global work * This is handled by dispatching a work request to the global work