The first locking issue was due to the semaphore I used. I was trying

to be overly clever and the context switch when the semaphore was busy
was destroying performance.  Converting to a simple spin lock bough me
a factor of 50 or so.  That said it's still not good enough.  Tests
show bad performance and we are still CPU bound.  The logical fix is
I need to implement per-cpu hot caches to minimize the SMP contention.
Linux and Solaris both have this, I was hoping to do without but it
looks like that's not to be.

   kmem_lock: time (sec)        slabs           objs            hash
   kmem_lock:                   tot/max/calc    tot/max/calc    size/depth
   kmem_lock:  0.022000000      7/6/64  224/177/2048    32768/1
   kmem_lock:  0.039000000      13/13/128       416/404/4096    32768/1
   kmem_lock:  0.079000000      23/21/256       736/672/8192    32768/1
   kmem_lock:  0.158000000      48/47/512       1536/1504/16384 32768/1
   kmem_lock:  0.345000000      105/105/1024    3360/3358/32768 32768/2
   kmem_lock:  0.760000000      202/200/2048    6464/6400/65536 32768/3



git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@135 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
This commit is contained in:
behlendo
2008-06-24 17:18:15 +00:00
parent 44b8f1769f
commit d46630e0f3
3 changed files with 41 additions and 35 deletions
+7 -5
View File
@@ -584,11 +584,11 @@ splat_kmem_test8(struct file *file, void *arg)
kcp.kcp_file = file;
splat_vprint(file, SPLAT_KMEM_TEST8_NAME, "%s",
"time (sec)\tslabs \tobjs\n");
"time (sec)\tslabs \tobjs \thash\n");
splat_vprint(file, SPLAT_KMEM_TEST8_NAME, "%s",
" \ttot/max/calc\ttot/max/calc\n");
" \ttot/max/calc\ttot/max/calc\tsize/depth\n");
for (alloc = 64; alloc <= 1024; alloc *= 2) {
for (alloc = 64; alloc <= 4096; alloc *= 2) {
kcp.kcp_size = 256;
kcp.kcp_count = 0;
kcp.kcp_threads = 0;
@@ -625,14 +625,16 @@ splat_kmem_test8(struct file *file, void *arg)
delta = timespec_sub(stop, start);
splat_vprint(file, SPLAT_KMEM_TEST8_NAME, "%2ld.%09ld\t"
"%lu/%lu/%lu\t%lu/%lu/%lu\n",
"%lu/%lu/%lu\t%lu/%lu/%lu\t%lu/%lu\n",
delta.tv_sec, delta.tv_nsec,
(unsigned long)kcp.kcp_cache->skc_slab_total,
(unsigned long)kcp.kcp_cache->skc_slab_max,
(unsigned long)(kcp.kcp_alloc * 32 / SPL_KMEM_CACHE_OBJ_PER_SLAB),
(unsigned long)kcp.kcp_cache->skc_obj_total,
(unsigned long)kcp.kcp_cache->skc_obj_max,
(unsigned long)(kcp.kcp_alloc * 32));
(unsigned long)(kcp.kcp_alloc * 32),
(unsigned long)kcp.kcp_cache->skc_hash_size,
(unsigned long)kcp.kcp_cache->skc_hash_depth);
kmem_cache_destroy(kcp.kcp_cache);