based of the spl_kmem_obj_t tacked on the end of each object.
This actually isn't so back because we are now allocing large
chunks for the slab and partitioning it ourselves. So there's
not a ton of wasted space. We may suffer a performance hit
however due to alignment issues.
- Remove remaining depenancies on the linux slab implementation.
We're standing on our own now for better or worse.
- Rework slabs to be either kmem or vmem based. If neither
KMC_VMEM of KMC_KMEM are specified we make a decent guess
about what will work best for their based on the object
size. Additionally we provide a kmem_virt() function caller
can use to see if they have a virtual or physical address.
- Minor fixups in the test suite.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@141 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
based by vmalloc()'ed memory. I now alloc a slab which is
roughly 32*spl_obj_size and in this block of memory I place
the slab descriptor, slab object descriptors, and objects
themselves. This greatly reduces vmalloc lock contention.
Still some minor cleanup remains and fine tuning but
it's working pretty well.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@139 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
well for the expected workloads. Improvement in this commit include:
- Added DEBUG_KMEM_TRACKING #define which can optionally be set
when DEBUG_KMEM is defined to do per allocation tracking. This
allows us to get all the lightweight kmem debugging enabled by
default which is pretty light weight, and only when looking
for a memory leak we can briefly enable the per alloc tracking.
- Added set_normalized_timespec() in to SPL to simply using
the timespec() primatives from within a module.
- Added per-spinlock cycle counters to the slab in an attempt
to run down a lock contention issue. The contended lock
was in vmalloc() but I'm going to leave the cycle counters
in place for a little while until I'm convinced there arn't
other locking improvement possible in the slab.
- Added a proc interface to the slab to export per slab
cache statistics to /proc/spl/kmem/slab for analysis.
- Reworked spl_slab_alloc() function to allocate from kmem for
small allocation and vmem for large allocations. This improved
things considerably but futher work is needed.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@138 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
when repopulating it. Plus I fixed a few more suble races in
that part of the code which were catching me. Finally I fixed
a small race in kmem_test8.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@137 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
factor of 10x improvement on SMP system due to reduced lock contention.
This may put me in the ballpark of what is needed. We can still further
improve things on NUMA systems by creating an additional L3 cache per
memory node instead of the current global pool. With luck this won't
be needed. I should also take another look at the locking now that
everything is working. There's a good chance I can tighten it up a
little bit and improve things a little more.
kmem_lock: time (sec) slabs objs hash
kmem_lock: tot/max/calc tot/max/calc size/depth
kmem_lock: 0.000999926 6/6/1 192/192/32 32768/0
kmem_lock: 0.000999926 4/4/2 128/128/64 32768/0
kmem_lock: 0.000999926 4/4/4 128/128/128 32768/0
kmem_lock: 0.000999926 4/4/8 128/128/256 32768/0
kmem_lock: 0.000999926 4/4/16 128/128/512 32768/0
kmem_lock: 0.000999926 4/4/32 128/128/1024 32768/0
kmem_lock: 0.000999926 4/4/64 128/128/2048 32768/0
kmem_lock: 0.000999926 8/8/128 256/256/4096 32768/0
kmem_lock: 0.003999704 24/23/256 768/736/8192 32768/1
kmem_lock: 0.012999038 44/41/512 1408/1312/16384 32768/1
kmem_lock: 0.051996153 96/93/1024 3072/2976/32768 32768/2
kmem_lock: 0.181986536 187/184/2048 5984/5888/65536 32768/3
kmem_lock: 0.655951469 342/339/4096 10944/10848/131072 32768/4
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@136 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
to be overly clever and the context switch when the semaphore was busy
was destroying performance. Converting to a simple spin lock bough me
a factor of 50 or so. That said it's still not good enough. Tests
show bad performance and we are still CPU bound. The logical fix is
I need to implement per-cpu hot caches to minimize the SMP contention.
Linux and Solaris both have this, I was hoping to do without but it
looks like that's not to be.
kmem_lock: time (sec) slabs objs hash
kmem_lock: tot/max/calc tot/max/calc size/depth
kmem_lock: 0.022000000 7/6/64 224/177/2048 32768/1
kmem_lock: 0.039000000 13/13/128 416/404/4096 32768/1
kmem_lock: 0.079000000 23/21/256 736/672/8192 32768/1
kmem_lock: 0.158000000 48/47/512 1536/1504/16384 32768/1
kmem_lock: 0.345000000 105/105/1024 3360/3358/32768 32768/2
kmem_lock: 0.760000000 202/200/2048 6464/6400/65536 32768/3
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@135 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
allocator. I have serious contention issues here and I needed
a way to easily measure how much the following batch of changes
will improve things. Currently things are quite bad when the
allocator is highly contended, and interestingly it seems to
get worse in a non-linear fashion... I'm not sure why yet.
I'll figure it out tomorrow.
kmem:kmem_lock Pass
kmem_lock: time (sec) slabs objs
kmem_lock: tot/max/calc tot/max/calc
kmem_lock: 0.061000000 75/60/64 2400/1894/2048
kmem_lock: 0.157000000 134/125/128 4288/3974/4096
kmem_lock: 0.471000000 263/249/256 8416/7962/8192
kmem_lock: 2.526000000 518/499/512 16576/15957/16384
kmem_lock: 14.393000000 990/978/1024 31680/31270/32768
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@134 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
longer be based on the linux slab but to be its own complete
implementation. The new slab behaves much more like the
Solaris slab than the Linux slab.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@132 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
- Detailed kmem memory allocation tracking. We can now get on
spl module unload a list of all memory allocations which were
not free'd and where the original alloc was. E.g.
SPL: 15554:632:(spl-kmem.c:442:kmem_fini()) kmem leaked 90/319332 bytes
SPL: 15554:648:(spl-kmem.c:451:kmem_fini()) address size data func:line
SPL: 15554:648:(spl-kmem.c:457:kmem_fini()) ffff8100734b68b8 32 0100000001005a5a __spl_mutex_init:70
SPL: 15554:648:(spl-kmem.c:457:kmem_fini()) ffff8100734b6148 13 &tl->tl_lock __spl_mutex_init:74
SPL: 15554:648:(spl-kmem.c:457:kmem_fini()) ffff81007ac43730 32 0100000001005a5a __spl_mutex_init:70
SPL: 15554:648:(spl-kmem.c:457:kmem_fini()) ffff81007ac437d8 13 &tl->tl_lock __spl_mutex_init:74
- Shift to using rwsems in kmem implmentation, to simply locking and
improve concurency.
- Shift to using rwsems in mutex implementation, additionally ensure we
never sleep in the init function if non-zero preempt_count or
interrupts are disabled as can happen in a slab cache ctor/dtor.
- Other minor formating fixes and such.
TODO:
- Finish the vmem memory allocation tracking
- Vet all other SPL primatives for potential sleeping during *_init. I
suspect the rwlock implemenation does this and should be fixes just
like the mutex implemenation.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@95 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
Adjust kmem slab interface to make a copy of the slab name before
passing it on to the linux slab (we free it latter too)
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@47 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
- Removed all references to kzt and replaced with splat
- Moved portions of include files which do not need to be
available to all source files in to local.h files in
proper source subdirs.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@14 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
the directories at the top level but that proved troublesome. The
kernel buildsystem and autoconf were conflicting too much. To
resolve the issue I moved the kernel bits in to a modules directory
which can then only use the kernel build system. We just pass
along the likely make targets to the kernel build system.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@11 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c