1) Ensure mutex_init() never fails in the case of ENOMEM by retrying
forever. I don't think I've ever seen this happen but it was clear
after code inspection that if it did we would immediately crash.
2) Enable full debugging in check.sh for sanity tests. Might as well
get as much debug as we can in the case of a failure.
3) Reworked list of kmem caches tracked by SPL in to a hash with the
key based on the address of the kmem_cache_t. This should speed
up the constructor/destructor/shrinker lookup needed now for newer
kernel which removed the destructor support.
4) Updated kmem_cache_create to handle the case where CONFIG_SLUB
is defined. The slub would occasionally merge slab caches which
resulted in non-unique keys for our hash lookup in 3). To fix this
we detect if the slub is enabled and then set the needed flag
to prevent this merging from ever occuring.
5) New kernels removed the proc_dir_entry pointer from items
registered by sysctl. This means we can no long be sneaky and
manually insert things in to the sysctl tree simply by walking
the proc tree. So I'm forced to create a seperate tree for
all the things I can't easily support via sysctl interface.
I don't like it but it will do for now.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@124 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
and a hang on subsequent sys_close. I'm not quite sure why the Fedora
kernel caught this bug the Chaos kernel did not, but I'm glad!
Convert remaining BUG_ON's to ASSERTs
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@122 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
working on this branch for the next few days I suggested you work
off of the 0.3.1 tag. The following changes are fairly extensive
and are designed to make the SPL compatible with all kernels in
the range of 2.6.18-2.6.25. There were 13 relevant API changes
between these releases and I have added the needed autoconf tests
to check for them. However, this has not all been tested extensively.
I'll sort of the breakage on Fedora Core 9 and RHEL5 this week.
SPL_AC_TYPE_UINTPTR_T
SPL_AC_TYPE_KMEM_CACHE_T
SPL_AC_KMEM_CACHE_DESTROY_INT
SPL_AC_ATOMIC_PANIC_NOTIFIER
SPL_AC_3ARGS_INIT_WORK
SPL_AC_2ARGS_REGISTER_SYSCTL
SPL_AC_KMEM_CACHE_T
SPL_AC_KMEM_CACHE_CREATE_DTOR
SPL_AC_3ARG_KMEM_CACHE_CREATE_CTOR
SPL_AC_SET_SHRINKER
SPL_AC_PATH_IN_NAMEIDATA
SPL_AC_TASK_CURR
SPL_AC_CTL_UNNUMBERED
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@119 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
compiled out when doing performance runs.
- Bite the bullet and fully autoconfize the debug options in the configure
time parameters. By default all the debug support is disable in the core
SPL build, but available to modules which enable it when building against
the SPL. To enable particular SPL debug support use the follow configure
options:
--enable-debug Internal ASSERTs
--enable-debug-kmem Detailed memory accounting
--enable-debug-mutex Detailed mutex tracking
--enable-debug_kstat Kstat info exported to /proc
--enable-debug-callb Additional callb debug
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@111 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
other primitive implementations. Additionally ensure that GFP_ATOMIC
is use for allocations when in interrupt context.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@108 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
with the function name passed to be used as a thread name. Leaving
the trailing _thread is just redundant so just strip it this
make the thread names far more readable.
Use a strncpy in spl-mutex just to be safe.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@107 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
may not fail. To get this behavior I'd added a retry to the shim layer
even though it is abusive to the VM, at least it should prevent the crash.
Additionally I added a proc counter so I can easily check how often this
is happening. It should be fairly rare, but likely will get worse and
worse the longer the machine has been up.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@104 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
not to support a few flags (we assert if they are used), and I
did not add the libkstat interface and instead exported everything
to proc for easy access.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@103 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
- Ensure the mutex_stats_sem and mutex_stats_list are initialized
- Only spin if you have to in mutex_init
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@97 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
- Detailed kmem memory allocation tracking. We can now get on
spl module unload a list of all memory allocations which were
not free'd and where the original alloc was. E.g.
SPL: 15554:632:(spl-kmem.c:442:kmem_fini()) kmem leaked 90/319332 bytes
SPL: 15554:648:(spl-kmem.c:451:kmem_fini()) address size data func:line
SPL: 15554:648:(spl-kmem.c:457:kmem_fini()) ffff8100734b68b8 32 0100000001005a5a __spl_mutex_init:70
SPL: 15554:648:(spl-kmem.c:457:kmem_fini()) ffff8100734b6148 13 &tl->tl_lock __spl_mutex_init:74
SPL: 15554:648:(spl-kmem.c:457:kmem_fini()) ffff81007ac43730 32 0100000001005a5a __spl_mutex_init:70
SPL: 15554:648:(spl-kmem.c:457:kmem_fini()) ffff81007ac437d8 13 &tl->tl_lock __spl_mutex_init:74
- Shift to using rwsems in kmem implmentation, to simply locking and
improve concurency.
- Shift to using rwsems in mutex implementation, additionally ensure we
never sleep in the init function if non-zero preempt_count or
interrupts are disabled as can happen in a slab cache ctor/dtor.
- Other minor formating fixes and such.
TODO:
- Finish the vmem memory allocation tracking
- Vet all other SPL primatives for potential sleeping during *_init. I
suspect the rwlock implemenation does this and should be fixes just
like the mutex implemenation.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@95 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
crashes but it's not clear to me yet if these are a problem with
the mutex implementation or ZFSs usage of it.
Minor taskq fixes to add new tasks to the end of the pending list.
Minor enhansements to the debug infrastructure.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@94 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
configurable number of threads like the Solaris version and almost
all of the options are supported. Unfortunately, it appears to have
made absolutely no difference to our performance numbers. I need
to keep looking for where we are bottle necking.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@93 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
do not have __GFP_ZERO set. Once the memory is allocated
then zero out the memory if __GFP_ZERO is passed to
__vmem_alloc.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@88 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
before the debug subsystem is fully set up, or after the debug
subsystem has been torn down.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@86 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
ensure I never add anything to the stack I don't absolutely need.
All this debug code could be removed from a production build
anyway so I'm not so worried about the performance impact. We
may also consider revisting the mutex and condvar implementation
to ensure no additional stack is used there.
Initial indications are I have reduced the worst case stack
usage to 9080 bytes. Still to large for the default 8k stacks
so I have been forced to run with 16k stacks until I can
reduce the worst offenders.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@83 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
process of destroying the stacks. Threshhold set fairly aggressively
top 80% of stack usage.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@82 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
- Replacing all BUG_ON()'s with proper ASSERT()'s
- Using ENTRY,EXIT,GOTO, and RETURN macro to instument call paths
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@78 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
changes bring over everything lustre had for debugging with
two exceptions. I dropped by the debug daemon and upcalls
just because it made things a little easier. They can be
readded easily enough if we feel they are needed.
Everything compiles and seems to work on first inspection
but I suspect there are a handful of issues still lingering
which I'll be sorting out right away. I just wanted to get
all these changes commited and safe. I'm getting a little
paranoid about losing them.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@75 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
when necessary to avoid deadlocks. We were seeing the deadlock
when calling kmem_cache_generic_constructor() and then an interrupt
forced us to end up calling kmem_cache_generic_destructor()
which caused our deadlock.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@74 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c
think this should fix anything but it's a good idea regardless.
- Drop the lock before calling the construct/destructor for the slab
otherwise we can't sleep in a constructor/destructor and for long running
functions we may NMI.
- Do something braindead, but safe for the console debug logs for now.
git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@73 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c