mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-04-17 08:54:52 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	5652e7b497	When using x86 specific rwsem correctly intepret rwsem->count.	2009-12-01 15:47:27 -08:00
Brian Behlendorf	1273cf284b	Always use the generic mutex_destroy().	2009-11-15 15:04:02 -08:00
Brian Behlendorf	05b48408fb	Add mutex_enter_nested() as wrapper for mutex_lock_nested(). This symbol can be used by GPL modules which use the SPL to handle cases where a call path takes a two different locks by the same name. This is needed to avoid a false positive in the lock checker.	2009-11-15 14:27:15 -08:00
Brian Behlendorf	8b45dda2bc	Linux 2.6.31 kmem cache alignment fixes and cleanup. The big fix here is the removal of kmalloc() in kv_alloc(). It used to be true in previous kernels that kmallocs over PAGE_SIZE would always be pages aligned. This is no longer true atleast in 2.6.31 there are no longer any alignment expectations. Since kv_alloc() requires the resulting address to be page align we no only either directly allocate pages in the KMC_KMEM case, or directly call __vmalloc() both of which will always return a page aligned address. Additionally, to avoid wasting memory size is always a power of two. As for cleanup several helper functions were introduced to calculate the aligned sizes of various data structures. This helps ensure no case is accidentally missed where the alignment needs to be taken in to account. The helpers now use P2ROUNDUP_TYPE instead of P2ROUNDUP which is safer since the type will be explict and we no longer count on the compiler to auto promote types hopefully as we expected. Always wnforce minimum (SPL_KMEM_CACHE_ALIGN) and maximum (PAGE_SIZE) alignment restrictions at cache creation time. Use SPL_KMEM_CACHE_ALIGN in splat alignment test.	2009-11-13 11:12:43 -08:00
Brian Behlendorf	c89fdee4d3	Remove __GFP_NOFAIL in kmem and retry internally. As of 2.6.31 it's clear __GFP_NOFAIL should no longer be used and it may disappear from the kernel at any time. To handle this I have simply added _nofail wrappers in the kmem implementation which perform the retry for non-atomic allocations. From linux-2.6.31 mm/page_alloc.c:1166 / * __GFP_NOFAIL is not to be used in new code. * * All __GFP_NOFAIL callers should be fixed so that they * properly detect and handle allocation failures. * * We most definitely don't want callers attempting to * allocate greater than order-1 page units with * __GFP_NOFAIL. */ WARN_ON_ONCE(order > 1);	2009-11-12 15:11:24 -08:00
Brian Behlendorf	baf2979ed3	Linux 2.6.31 Compatibility Updates SPL_AC_2ARGS_SET_FS_PWD macro updated to explicitly include linux/fs_struct.h which was dropped from linux/sched.h. min_wmark_pages, low_wmark_pages, high_wmark_pages macros introduced in newer kernels. For older kernels mm_compat.h was introduced to define them as needed as direct mappings to per zone min_pages, low_pages, max_pages.	2009-11-10 14:06:57 -08:00
Brian Behlendorf	055ffd98cf	Autoconf --enable-debug-* cleanup Cleanup the --enable-debug-* configure options, this has been pending for quite some time and I am glad I finally got to it. To summerize: 1) All SPL_AC_DEBUG_* macros were updated to be a more autoconf friendly. This mainly involved shift to the GNU approved usage of AC_ARG_ENABLE and ensuring AS_IF is used rather than directly using an if [ test ] construct. 2) --enable-debug-kmem=yes by default. This simply enabled keeping a running tally of total memory allocated and freed and reporting a memory leak if there was one at module unload. Additionally, it ensure /proc/spl/kmem/slab will exist by default which is handy. The overhead is low for this and it should not impact performance. 3) --enable-debug-kmem-tracking=no by default. This option was added to provide a configure option to enable to detailed memory allocation tracking. This support was always there but you had to know where to turn it on. By default this support is disabled because it is known to badly hurt performence, however it is invaluable when chasing a memory leak. 4) --enable-debug-kstat removed. After further reflection I can't see why you would ever really want to turn this support off. It is now always on which had the nice side effect of simplifying the proc handling code in spl-proc.c. We can now always assume the top level directory will be there. 5) --enable-debug-callb removed. This never really did anything, it was put in provisionally because it might have been needed. It turns out it was not so I am just removing it to prevent confusion.	2009-10-30 13:58:51 -07:00
Brian Behlendorf	302b88e6ab	Add autoconf checks for atomic64_cmpxchg + atomic64_xchg These functions didn't exist for all archs prior to 2.6.24. This patch addes an autoconf test to detect this and add them when needed. The autoconf check is needed instead of just an #ifndef because in the most modern kernels atomic64_{cmp}xchg are implemented as in inline function and not a #define.	2009-10-30 13:53:17 -07:00
Brian Behlendorf	5e9b5d832b	Use Linux atomic primitives by default. Previously Solaris style atomic primitives were implemented simply by wrapping the desired operation in a global spinlock. This was easy to implement at the time when I wasn't 100% sure I could safely layer the Solaris atomic primatives on the Linux counterparts. It however was likely not good for performance. After more investigation however it does appear the Solaris primitives can be layered on Linux's fairly safely. The Linux atomic_t type really just wraps a long so we can simply cast the Solaris unsigned value to either a atomic_t or atomic64_t. The only lingering problem for both implementations is that Solaris provides no atomic read function. This means reading a 64-bit value on a 32-bit arch can (and will) result in word breaking. I was very concerned about this initially, but upon further reflection it is a limitation of the Solaris API. So really we are just being bug-for-bug compatible here. With this change the default implementation is layered on top of Linux atomic types. However, because we're assuming a lot about the internal implementation of those types I've made it easy to fall-back to the generic approach. Simply build with --enable-atomic_spinlocks if issues are encountered with the new implementation.	2009-10-30 10:55:25 -07:00
Brian Behlendorf	51a727e90f	Set cwd to '/' for the process executing insmod. Ricardo has pointed out that under Solaris the cwd is set to '/' during module load, while under Linux it is set to the callers cwd. To handle this cleanly I've reworked the module _init()/_exit() macros so they call a _setup()/_cleanup() function when any SPL dependent module is loaded or unloaded. This gives us a chance to perform any needed modification of the process, in this case changing the cwd. It also handily provides a way to avoid creating wrapper init()/exit() functions because the Solaris and Linux prototypes differ slightly. All dependent modules should now call the spl helper macros spl_module_{init,exit}() instead of the native linux versions. Unfortunately, it appears that under Linux there has been no consistent API in the kernel to set the cwd in a module. Because of this I have had to add more autoconf magic than I'd like. However, what I have done is correct and has been tested on RHEL5, SLES11, FC11, and CHAOS kernels. In addition, I have change the rootdir type from a 'void ' to the correct 'vnode_t ' type. And I've set rootdir to a non-NULL value.	2009-10-01 16:06:15 -07:00
Brian Behlendorf	0e77fc118e	Expand SEM() outside init_rwsem and directly call __init_rwsem(). We need to directly call __init_rwsem() or the name gets expanded to SEM(lock-name). This is safe and correct for the support arches x86/x86_64/ppc/ppc64.	2009-09-29 03:19:09 -07:00
Brian Behlendorf	4d54fdee1d	Reimplement mutexs for Linux lock profiling/analysis For a generic explanation of why mutexs needed to be reimplemented to work with the kernel lock profiling see commits: `e811949a57` and `d28db80fd0` The specific changes made to the mutex implemetation are as follows. The Linux mutex structure is now directly embedded in the kmutex_t. This allows a kmutex_t to be directly case to a mutex struct and passed directly to the Linux primative. Just like with the rwlocks it is critical that these functions be implemented as '#defines to ensure the location information is preserved. The preprocessor can then do a direct replacement of the Solaris primative with the linux primative. Just as with the rwlocks we need to track the lock owner. Here things get a little more interesting because depending on your kernel version, and how you've built your kernel Linux may already do this for you. If your running a 2.6.29 or newer kernel on a SMP system the lock owner will be tracked. This was added to Linux to support adaptive mutexs, more on that shortly. Alternately, your kernel might track the lock owner if you've set CONFIG_DEBUG_MUTEXES in the kernel build. If neither of the above things is true for your kernel the kmutex_t type will include and track the lock owner to ensure correct behavior. This is all handled by a new autoconf check called SPL_AC_MUTEX_OWNER. Concerning adaptive mutexs these are a very recent development and they did not make it in to either the latest FC11 of SLES11 kernels. Ideally, I'd love to see this kernel change appear in one of these distros because it does help performance. From Linux kernel commit: 0d66bf6d3514b35eb6897629059443132992dbd7 "Testing with Ingo's test-mutex application... gave a 345% boost for VFS scalability on my testbox" However, if you don't want to backport this change yourself you can still simply export the task_curr() symbol. The kmutex_t implementation will use this symbol when it's available to provide it's own adaptive mutexs. Finally, DEBUG_MUTEX support was removed including the proc handlers. This was done because now that we are cleanly integrated with the kernel profiling all this information and much much more is available in debug kernel builds. This code was now redundant. Update mutexs validated on: - SLES10 (ppc64) - SLES11 (x86_64) - CHAOS4.2 (x86_64) - RHEL5.3 (x86_64) - RHEL6 (x86_64) - FC11 (x86_64)	2009-09-25 14:47:01 -07:00
Brian Behlendorf	d28db80fd0	Update rwlocks to track owner to ensure correct semantics The behavior of RW__HELD was updated because it was not quite right. It is not sufficient to return non-zero when the lock is help, we must only do this when the current task in the holder. This means we need to track the lock owner which is not something tracked in a Linux semaphore. After some experimentation the solution I settled on was to embed the Linux semaphore at the start of a larger krwlock_t structure which includes the owner field. This maintains good performance and allows us to cleanly intergrate with the kernel lock analysis tools. My reasons: 1) By placing the Linux semaphore at the start of krwlock_t we can then simply cast krwlock_t to a rw_semaphore and pass that on to the linux kernel. This allows us to use '#defines so the preprocessor can do direct replacement of the Solaris primative with the linux equivilant. This is important because it then maintains the location information for each rw_ call point. 2) Additionally, by adding the owner to krwlock_t we can keep this needed extra information adjacent to the lock itself. This removes the need for a fancy lookup to get the owner which is optimal for performance. We can also leverage the existing spin lock in the semaphore to ensure owner is updated correctly. 3) All helper functions which do not need to strictly be implemented as a define to preserve location information can be done as a static inline function. 4) Adding the owner to krwlock_t allows us to remove all memory allocations done during lock initialization. This is good for all the obvious reasons, we do give up the ability to specific the lock name. The Linux profiling tools will stringify the lock name used in the code via the preprocessor and use that. Update rwlocks validated on: - SLES10 (ppc64) - SLES11 (x86_64) - CHAOS4.2 (x86_64) - RHEL5.3 (x86_64) - RHEL6 (x86_64) - FC11 (x86_64)	2009-09-25 14:14:35 -07:00
Brian Behlendorf	e811949a57	Reimplement rwlocks for Linux lock profiling/analysis. It turns out that the previous rwlock implementation worked well but did not integrate properly with the upstream kernel lock profiling/ analysis tools. This is a major problem since it would be awfully nice to be able to use the automatic lock checker and profiler. The problem is that the upstream lock tools use the pre-processor to create a lock class for each uniquely named locked. Since the rwsem was embedded in a wrapper structure the name was always the same. The effect was that we only ended up with one lock class for the entire SPL which caused the lock dependency checker to flag nearly everything as a possible deadlock. The solution was to directly map a krwlock to a Linux rwsem using a typedef there by eliminating the wrapper structure. This was not done initially because the rwsem implementation is specific to the arch. To fully implement the Solaris krwlock API using only the provided rwsem API is not possible. It can only be done by directly accessing some of the internal data member of the rwsem structure. For example, the Linux API provides a different function for dropping a reader vs writer lock. Whereas the Solaris API uses the same function and the caller does not pass in what type of lock it is. This means to properly drop the lock we need to determine if the lock is currently a reader or writer lock. Then we need to call the proper Linux API function. Unfortunately, there is no provided API for this so we must extracted this information directly from arch specific lock implementation. This is all do able, and what I did, but it does complicate things considerably. The good news is that in addition to the profiling benefits of this change. We may see performance improvements due to slightly reduced overhead when creating rwlocks and manipulating them. The only function I was forced to sacrafice was rw_owner() because this information is simply not stored anywhere in the rwsem. Luckily this appears not to be a commonly used function on Solaris, and it is my understanding it is mainly used for debugging anyway. In addition to the core rwlock changes, extensive updates were made to the rwlock regression tests. Each class of test was extended to provide more API coverage and to be more rigerous in checking for misbehavior. This is a pretty significant change and with that in mind I have been careful to validate it on several platforms before committing. The full SPLAT regression test suite was run numberous times on all of the following platforms. This includes various kernels ranging from 2.6.16 to 2.6.29. - SLES10 (ppc64) - SLES11 (x86_64) - CHAOS4.2 (x86_64) - RHEL5.3 (x86_64) - RHEL6 (x86_64) - FC11 (x86_64)	2009-09-18 16:09:47 -07:00
Brian Behlendorf	c65d62d8bf	Disable stack overflow checking by default. The run time stack overflow checking is being disabled by default because it is not safe for use with 2.6.29 and latter kernels. These kernels do now have their own stack overflow checking so this support has become redundant anyway. It can be re-enabled for older kernels or arches without stack overflow checking by redefining CHECK_STACK().	2009-07-30 13:52:11 -07:00
Brian Behlendorf	6ae7fef5b9	Update global_page_state() support for 2.6.29 kernels. Basically everything we need to monitor the global memory state of the system is now cleanly available via global_page_state(). The problem is that this interface is still fairly recent, and there has been one change in the page state enum which we need to handle. These changes basically boil down to the following: - If global_page_state() is available we should use it. Several autoconf checks have been added to detect the correct enum names. - If global_page_state() is not available check to see if get_zone_counts() symbol is available and use that. - If the get_zone_counts() symbol is not exported we have no choice be to dynamically aquire it at load time. This is an absolute last resort for old kernel which we don't want to patch to cleanly export the symbol.	2009-07-28 15:06:42 -07:00
Brian Behlendorf	ec7d53e99a	Add basic credential support and splat tests. The previous credential implementation simply provided the needed types and a couple of dummy functions needed. This update correctly ties the basic Solaris credential API in to one of two Linux kernel APIs. Prior to 2.6.29 the linux kernel embeded all credentials in the task structure. For these kernels, we pass around the entire task struct as if it were the credential, then we use the helper functions to extract the credential related bits. As of 2.6.29 a new credential type was added which we can and do fairly cleanly layer on top of. Once again the helper functions nicely hide the implementation details from all callers. Three tests were added to the splat test framework to verify basic correctness. They should be extended as needed when need credential functions are added.	2009-07-27 17:18:59 -07:00
Ricardo M. Correia	ac95d0974b	Fixed NULL dereference by tcd_for_each() when the kmalloc() call in module/spl/spl-debug.c:1163 returns NULL. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2009-07-14 15:24:59 -07:00
Brian Behlendorf	b11b08ed64	Add a little paranoia here to ensure endianess is set correctly.	2009-07-14 14:28:04 -07:00
Brian Behlendorf	06dea10380	Add basic groupmember() function, not sup groups.	2009-07-10 10:58:06 -07:00
Brian Behlendorf	d3126abe75	Add ddi_copyin/ddi_copyout support for fake kernel originated ioctls.	2009-07-10 10:56:32 -07:00
Brian Behlendorf	2a734e9c26	Define ACE_ALL_PERMS for use by ACLs	2009-07-09 15:00:25 -07:00
Brian Behlendorf	c18cbcfe66	Define FKIOCTL which is used on Solaris to mark an in-kernel ioctl.	2009-07-09 14:59:41 -07:00
Brian Behlendorf	3a68dc5374	Add ASSERTV macro to simplify removing variables (the V in ASSERTV) which are only used in ASSERT().	2009-07-09 12:15:23 -07:00
Brian Behlendorf	915404bd50	Add basic support for TASKQ_THREADS_CPU_PCT taskq flag which is used to scale the number of threads based on the number of online CPUs. As CPUs are added/removed we should rescale the thread count appropriately, but currently this is only done at create.	2009-07-09 10:07:52 -07:00
Brian Behlendorf	124ca8a5a9	SLES10 Fixes (part 7) - Initial SLES testing uncovered a long standing bug in the debug tracing. The tcd_for_each() macro expected a NULL to terminate the trace_data[i] array but this was only ever true due to luck. All trace_data[] iterators are now properly capped by TCD_TYPE_MAX. - SPLAT_MAJOR 229 conflicted with a 'hvc' device on my SLES system. Since this was always an arbitrary choice I picked something else. - The HAVE_PGDAT_LIST case should set pgdat_list_addr to the value stored at the address of the memory location returned by kallsyms_lookup_name().	2009-05-20 15:30:13 -07:00
Brian Behlendorf	5232d256b4	SLES10 Fixes (part 6) - Prior to 2.6.17 there were no *_pgdat helper functions in mm/mmzone.c. Instead for_each_zone() operated directly on pgdat_list which may or may not have been exported depending on how your kernel was compiled. Now new configure checks determine if you have the helpers or not, and if the needed symbols are exported. If they are not exported then they are dynamically aquired at runtime by kallsyms_lookup_name().	2009-05-20 14:23:13 -07:00
Brian Behlendorf	3731931529	Powerpc Fixes (part 1): - Enable builds for powerpc ISA type. - Add DIV_ROUND_UP and roundup macros if unavailable. - Cast 64-bit values for %lld format string to (long long) to quiet compile warning.	2009-05-20 12:23:24 -07:00
Brian Behlendorf	6c9433c150	SLES10 Fixes (part 3): - Configure check for mutex_lock_nested(). This function was introduced as part of the mutex validator in 2.6.18, but if it's unavailable then it's safe to fallback to a plain mutex_lock().	2009-05-20 11:00:39 -07:00
Brian Behlendorf	96dded3844	SLES10 Fixes (part 2): - Configure check, the div64_64() function was renamed to div64_u64() as of 2.6.26. - Configure check, the global_page_state() fuction was introduced in 2.6.18 kernels. The earlier 2.6.16 based SLES10 must not try and use it, thankfully get_zone_counts() is still available. - To simplify debugging poison all symbols aquired dynamically using spl_kallsyms_lookup_name() with SYMBOL_POISON. - Add console messages when the user mode helpers fail. - spl_kmem_init_globals() use bit shifts instead of division. - When the monotonic clock is unavailable __gethrtime() must perform the HZ division as an 'unsigned long long' because the SPL only implements __udivdi3(), and not __divdi3() for 'long long' division on 32-bit arches.	2009-05-20 10:08:37 -07:00
Brian Behlendorf	759dfe7d43	Add list_move_tail() function.	2009-03-19 21:40:07 -07:00
Brian Behlendorf	0cbaeb117a	Allow spl_config.h to be included by dependant packages We need dependent packages to be able to include spl_config.h so they can leverage the configure checks the SPL has done. This is important because several of the spl headers need the results of these checks to work properly. Unfortunately, the autoheader build product is always private to a particular build and defined certain common things. (PACKAGE, VERSION, etc). This prevents other packages which also use autoheader from being include because the definitions conflict. To avoid this problem the SPL build system leverage AH_BOTTOM to include a spl_unconfig.h at the botton of the autoheader build product. This custom include undefs all known shared symbols to prevent the confict. This does however mean that those definition are also not availble to the SPL package either. The SPL package therefore uses the equivilant SPL_META_* definitions.	2009-03-17 14:55:59 -07:00
Brian Behlendorf	e11d6c5f50	FC10/i686 Compatibility Update (2.6.27.19-170.2.35.fc10.i686) In the interests of portability I have added a FC10/i686 box to my list of development platforms. The hope is this will allow me to keep current with upstream kernel API changes, and at the same time ensure I don't accidentally break x86 support. This patch resolves all remaining issues observed under that environment. 1) SPL_AC_ZONE_STAT_ITEM_FIA autoconf check added. As of 2.6.21 the kernel added a clean API for modules to get the global count for free, inactive, and active pages. The SPL attempts to detect if this API is available and directly map spl_global_page_state() to global_page_state(). If the full API is not available then spl_global_page_state() is implemented as a thin layer to get these values via get_zone_counts() if that symbol is available. 2) New kmem:vmem_size regression test added to validate correct vmem_size() functionality. The test case acquires the current global vmem state, allocates from the vmem region, then verifies the allocation is correctly reflected in the vmem_size() stats. 3) Change splat_kmem_cache_thread_test() to always use KMC_KMEM based memory. On x86 systems with limited virtual address space failures resulted due to exhaustig the address space. The tests really need to problem exhausting all memory on the system thus we need to use the physical address space. 4) Change kmem:slab_lock to cap it's memory usage at availrmem instead of using the native linux nr_free_pages(). This provides additional test coverage of the SPL Linux VM integration. 5) Change kmem:slab_overcommit to perform allocation of 256K instead of 1M. On x86 based systems it is not possible to create a kmem backed slab with entires of that size. To compensate for this the number of allocations performed in increased by 4x. 6) Additional autoconf documentation for proposed upstream API changes to make additional symbols available to modules. 7) Console error messages added when spl_kallsyms_lookup_name() fails to locate an expected symbol. This causes the module to fail to load and we need to know exactly which symbol was not available.	2009-03-17 12:16:31 -07:00
Brian Behlendorf	7257ec4185	Fix taskq_wait() not waiting bug I'm very surprised this has not surfaced until now. But the taskq_wait() implementation work only wait successfully the first time it was called. Subsequent usage of taskq_wait() on the taskq would not wait. The issue was caused by tq->tq_lowest_id being set to MAX_INT after the first wait completed. This caused subsequent waits which check that the waiting id is less than the lowest taskq id to always succeed. The fix is to ensure that tq->tq_lowest_id is never set larger than tq->tq_next.id. Additional fixes which were added to this patch include: 1) Fix a race by placing the taskq_wait_check() in the tq->tq_lock spinlock. 2) taskq_wait() should wait for the largest outstanding id. 3) Multiple spelling corrections. 4) Added taskq wait regression test to validate correct behavior.	2009-03-15 15:13:49 -07:00
Brian Behlendorf	c5f704607b	Build system and packaging (RPM support) An update to the build system to properly support all commonly used Makefile targets these include: make all # Build everything make install # Install everything make clean # Clean up build products make distclean # Clean up everything make dist # Create package tarball make srpm # Create package source RPM make rpm # Create package binary RPMs make tags # Create ctags and etags for everything Extra care was taken to ensure that the source RPMs are fully rebuildable against Fedora/RHEL/Chaos kernels. To build binary RPMs from the source RPM for your system simply run: rpmbuild --rebuild spl-x.y.z-1.src.rpm This will produce two binary RPMs with correct 'requires' dependencies for your kernel. One will contain all spl modules and support utilities, the other is a devel package for compiling additional kernel modules which are dependant on the spl. spl-x.y.z-1_<kernel version>.x86_64.rpm spl-devel-x.y.2-1_<kernel version>.x86_64.rpm	2009-03-09 15:56:55 -07:00
Ricardo M. Correia	32f74c5280	XXX: Temporarily disable vmem_size().	2009-03-05 10:13:59 -08:00
Brian Behlendorf	04fa349d69	Merge branch 'kallsyms'	2009-03-04 10:19:41 -08:00
Brian Behlendorf	d1ff2312b0	Linux VM Integration Cleanup Remove all instances of functions being reimplemented in the SPL. When the prototypes are available in the linux headers but the function address itself is not exported use kallsyms_lookup_name() to find the address. The function name itself can them become a define which calls a function pointer. This is preferable to reimplementing the function in the SPL because it ensures we get the correct version of the function for the running kernel. This is actually pretty safe because the prototype is defined in the headers so we know we are calling the function properly. This patch also includes a rhel5 kernel patch we exports the needed symbols so we don't need to use kallsyms_lookup_name(). There are autoconf checks to detect if the symbol is exported and if so to use it directly. We should add patches for stock upstream kernels as needed if for no other reason than so we can easily track which additional symbols we needed exported. Those patches can also be used by anyone willing to rebuild their kernel, but this should not be a requirement. The rhel5 version of the export-symbols patch has been applied to the chaos kernel. Additional fixes: 1) Implement vmem_size() function using get_vmalloc_info() 2) SPL_CHECK_SYMBOL_EXPORT macro updated to use $LINUX_OBJ instead of $LINUX because Module.symvers is a build product. When $LINUX_OBJ != $LINUX we will not properly detect exported symbols. 3) SPL_LINUX_COMPILE_IFELSE macro updated to add include2 and $LINUX/include search paths to allow proper compilation when the kernel target build directory is not the source directory.	2009-03-04 10:04:15 -08:00
Ricardo M. Correia	eb7c7f44e8	Changed ptob()/btop() mult/div into bit shifts. Added necessary include for PAGE_SHIFT.	2009-02-25 15:50:58 -08:00
Ricardo M. Correia	7819a92a9b	Added btop() and moved ptob() to include/sys/param.h.	2009-02-25 15:50:50 -08:00
Ricardo M. Correia	4327ac3ff9	Changed z_compress_level() and z_uncompress() prototypes to match the ones in Solaris. Fixes compilation warning.	2009-02-23 11:45:59 -08:00
Brian Behlendorf	a1cf80b493	Matching kmem_free() fix for use after free case. See commit `bb01879ebe` for a full description. This issue should have been addressed in the same commit but it slipped my mind.	2009-02-19 12:28:10 -08:00
Brian Behlendorf	99639e4a13	Add zone_get_hostid() function Minimal support added for the zone_get_hostid() function. Only global zones are supported therefore this function must be called with a NULL argumment. Additionally, I've added the HW_HOSTID_LEN define and updated all instances where a hard coded magic value of 11 was used; "A good riddance of bad rubbish!"	2009-02-19 11:26:17 -08:00
Brian Behlendorf	bb01879ebe	Coverity 9654, 9654: Use After Free Because vmem_free() was implemented as a macro using the ',' operator to evaluate both arguments and we performed the free before evaluating size we would deference the free'd pointer. To resolve the problem we just invert the ordering and evaluate size first just as if it was evaluated by the caller when being passed to this function. This ensure that if the caller is doing something reckless like performing an assignment as part of the size argument we still perform it and it simply doesn't get removed by the macro. Oh course nobody should be doing this sort of thing, but just in case.	2009-02-17 16:51:19 -08:00
Brian Behlendorf	15dc8b072e	Coverity 9652, 9653: No Effect Removed 2 ASSERT()s which had no effect because by definition size_t is always an unsigned type thus is always >= 0.	2009-02-17 16:30:58 -08:00
Brian Behlendorf	9b1b8e4c24	kmem slab magazine ageing deadlock - The previous magazine ageing sceme relied on the on_each_cpu() function to call spl_magazine_age() on each cpu. It turns out this could deadlock with do_flush_tlb_all() which also relies on the IPI based on_each_cpu(). To avoid this problem a per- magazine delayed work item is created and indepentantly scheduled to the correct cpu removing the need for on_each_cpu(). - Additionally two unused fields were removed from the type spl_kmem_cache_t, they were hold overs from previous cleanup. - struct work_struct work - struct timer_list timer	2009-02-17 15:52:18 -08:00
Brian Behlendorf	f6c5d4ff88	Build system update - Added default build flags: -Wall -Wstrict-prototypes -Werror -Wshadow - Added missing Makefile's for include/ subdirectories.	2009-02-12 14:45:22 -08:00
Brian Behlendorf	37db7d8cf9	kmem slab fixes - Default SPL_KMEM_CACHE_DELAY changed to 15 to match Solaris. - Aged out slab checking occurs every SPL_KMEM_CACHE_DELAY / 3. - skc->skc_reap tunable added whichs allows callers of spl_slab_reclaim() to cap the number of slabs reclaimed. On Solaris all eligible slabs are always reclaimed, and this is still the default behavior. However, I suspect that is not always wise for reasons such as in the next comment. - spl_slab_reclaim() added cond_resched() while walking the slab/object free lists. Soft lockups were observed when freeing large numbers of vmalloc'd slabs/objets. - spl_slab_reclaim() 'sks->sks_ref > 0' check changes from incorrect 'break' to 'continue' to ensure all slabs are checked. - spl_cache_age() reworked to avoid a deadlock with do_flush_tlb_all() which occured because we slept waiting for completion in spl_cache_age(). To waiting for magazine reclamation to finish is not required so we no longer wait. - spl_magazine_create() and spl_magazine_destroy() shifted back to using for_each_online_cpu() instead of the spl_on_each_cpu() approach which was of course a bad idea due to memory allocations which Ricardo pointed out.	2009-02-12 13:32:10 -08:00
Ricardo M. Correia	f500ccff35	Minor bug fix due to MAXOFFSET_T constant being too large on 32-bit systems.	2009-02-07 00:53:39 +00:00
Brian Behlendorf	4ab13d3b5c	Additional Linux VM integration Added support for Solaris swapfs_minfree, and swapfs_reserve tunables. In additional availrmem is now available and return a reasonable value which is reasonably analogous to the Solaris meaning. On linux we return the sun of free and inactive pages since these are all easily reclaimable. All tunables are available in /proc/sys/kernel/spl/vm/* and they may need a little adjusting once we observe the real behavior. Some of the defaults are mapped to similar linux counterparts, others are straight from the OpenSolaris defaults.	2009-02-05 12:26:34 -08:00
Brian Behlendorf	36b313dacf	Linux VM integration / device special files Support added to provide reasonable values for the global Solaris VM variables: minfree, desfree, lotsfree, needfree. These values are set to the sum of their per-zone linux counterparts which should be close enough for Solaris consumers. When a non-GPL app links against the SPL we cannot use the udev interfaces, which means non of the device special files are created. Because of this I had added a poor mans udev which cause the SPL to invoke an upcall and create the basic devices when a minor is registered. When a minor is unregistered we use the vnode interface to unlink the special file.	2009-02-04 15:15:41 -08:00
Brian Behlendorf	31a033ecd4	2.6.27+ portability changes - Added SPL_AC_3ARGS_ON_EACH_CPU configure check to determine if the older 4 argument version of on_each_cpu() should be used or the new 3 argument version. The retry argument was dropped in the new API which was never used anyway. - Updated work queue compatibility wrappers. The old way this worked was to pass a data point when initialized the workqueue. The new API assumed the work item is embedding in a structure and we us container_of() to find that data pointer. - Updated skc->skc_flags to be an unsigned long which is now type checked in the bit operations. This silences the warnings. - Updated autogen products and splat tests accordingly	2009-02-02 15:12:30 -08:00
Brian Behlendorf	416bae036b	Add new workqueue header	2009-01-30 21:11:42 -08:00
Brian Behlendorf	ea3e6ca9e5	kmem_cache hardening and performance improvements - Added slab work queue task which gradually ages and free's slabs from the cache which have not been used recently. - Optimized slab packing algorithm to ensure each slab contains the maximum number of objects without create to large a slab. - Fix deadlock, we can never call kv_free() under the skc_lock. We now unlink the objects and slabs from the cache itself and attach them to a private work list. The contents of the list are then subsequently freed outside the spin lock. - Move magazine create/destroy operation on to local cpu. - Further performace optimizations by minimize the usage of the large per-cache skc_lock. This includes the addition of KMC_BIT_REAPING bit mask which is used to prevent concurrent reaping, and to defer new slab creation when reaping is occuring. - Add KMC_BIT_DESTROYING bit mask which is set when the cache is being destroyed, this is used to catch any task accessing the cache while it is being destroyed. - Add comments to all the functions and additional comments to try and make everything as clear as possible. - Major cleanup and additions to the SPLAT kmem tests to more rigerously stress the cache implementation and look for any problems. This includes correctness and performance tests. - Updated portable work queue interfaces	2009-01-30 20:54:49 -08:00
Brian Behlendorf	0f233eac33	Pull the blkdev header in to the sunldi for some useful structure definitions and helper functions	2009-01-26 16:47:49 -08:00
Brian Behlendorf	48e0606a52	Implement kmem cache alignment argument	2009-01-26 09:02:04 -08:00
Brian Behlendorf	e4f3ea278e	Remove stray ` from macro	2009-01-23 08:59:11 -08:00
Brian Behlendorf	511176398c	Update debug.h to standardize VERIFY3_IMPL error messages in debug and non-debug mode	2009-01-22 09:41:47 -08:00
Brian Behlendorf	1e4ed6c990	Add missing stub headers	2009-01-09 16:04:44 -08:00
Brian Behlendorf	121d48c97d	Add basic ksid_lookupdomain and ksiddomain_rele support, just allocations	2009-01-09 15:30:53 -08:00
Brian Behlendorf	0e41414946	Add two new stub headers	2009-01-09 14:04:13 -08:00
Brian Behlendorf	97735c39e3	Add VOP_SEEK	2009-01-09 13:59:39 -08:00
Brian Behlendorf	d83ba26e18	Add missing policy includes, add missing sun ddi bits	2009-01-09 10:49:47 -08:00
Brian Behlendorf	71c8ab9c68	Drat fix missing ;	2009-01-09 10:05:03 -08:00
Brian Behlendorf	23f5c4c281	Add missing callback_context_t and fid_t types	2009-01-09 10:03:37 -08:00
Brian Behlendorf	703e7a3cf4	Add stubs for three more includes	2009-01-09 09:47:27 -08:00
Brian Behlendorf	d702c04ff1	Add 5 splat tests for list handling	2009-01-07 12:54:03 -08:00
Brian Behlendorf	4c18c39ecb	Add include/sys/compress.h header	2009-01-06 09:47:00 -08:00
Brian Behlendorf	160c63ab76	Add P2BOUNDARY macro	2009-01-06 09:23:13 -08:00
Brian Behlendorf	7adbea4141	Pull in some default page typedefs	2009-01-05 16:14:38 -08:00
Brian Behlendorf	0f37204417	Add DTRACE_PROBE(a)	2009-01-05 16:09:21 -08:00
Brian Behlendorf	b53c565e65	Stub u8_textprep.h for inclusion purposes	2009-01-05 15:37:07 -08:00
Brian Behlendorf	e9cb2b4f64	Add system taskq support	2009-01-05 15:08:03 -08:00
Brian Behlendorf	8a2b328b18	Remove u8_textprep, we will not be implementing this nightmare yet	2009-01-05 11:32:08 -08:00
Brian Behlendorf	925ca8cc01	Add sys/thread.h	2008-12-23 16:27:36 -08:00
Brian Behlendorf	bb9cfc6cc3	Define needfree	2008-12-23 15:59:36 -08:00
Brian Behlendorf	2b88beb74f	Add timer.h header	2008-12-23 15:40:20 -08:00
Brian Behlendorf	bbdec3be06	Add u8 stub	2008-12-23 15:38:15 -08:00
Brian Behlendorf	de79fdd3a8	Move sunddi include	2008-12-23 13:32:07 -08:00
Brian Behlendorf	9d457afd1b	Add sunddi to uio	2008-12-23 13:30:04 -08:00
Brian Behlendorf	dc0f920710	Minor updates	2008-12-23 13:25:52 -08:00
Brian Behlendorf	926e2b6058	Pull in lock types	2008-12-23 13:18:39 -08:00
Brian Behlendorf	f5b92a66ad	Add a few more missing header which the upstream stock kernel context expects	2008-12-23 13:03:09 -08:00
Brian Behlendorf	2ee63a549a	Add struct ddi_strtox functions	2008-12-05 16:23:57 -08:00
Brian Behlendorf	72e7de6026	Prefix META_ALIAS with SPL_	2008-11-26 13:26:05 -08:00
Brian Behlendorf	abc3ca149d	Prefix all META_* #defines with SPL to prevent colisions which include our spl_config.h. Dependent packages may do this to leverage the autoconf check we have already run aganst the kernel.	2008-11-26 13:09:37 -08:00
behlendo	7212e2cd27	Add missing autogen products git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@182 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-11-26 17:07:59 +00:00
behlendo	6a1c3d418a	* include/sys/sunddi.h, modules/spl/spl-module.c : Removed default udev support from sunddi implementation because it uses GPL-only symbols. This support is optionally available for SPL consumers if they define HAVE_GPL_ONLY_SYMBOLS and license their module as GPL using the MODULE_LICENSE("GPL") macro. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@179 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-11-13 21:43:30 +00:00
behlendo	0498e6c585	Removed useless check Fix forward NULL in splat kmem_cache test ctors/dtors git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@171 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-11-04 23:18:31 +00:00
behlendo	c8e60837b7	* spl-09-fix-kmem-track-oops.patch This fixes an oops when unloading the modules, in the case where memory tracking was enabled and there were memory leaks. The comment in the code explains what was the problem. * spl-10-fix-assert-verify-ndebug.patch This fixes ASSERT() and VERIFY() macros in non-debug builds. VERIFY() macros are supposed to check the condition and panic even in production builds, and ASSERT() macros don't need to evaluate the arguments. Also some 32-bit fixes. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@165 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-11-03 22:02:15 +00:00
behlendo	c22e7a427b	Under Solaris KM_SLEEP ensures success (or at least you hang forever). That said when working with a finite resource like memory failure really is always a possibility. It would be far better longer term if the ZFS code could be weened off this assumption and properly handle the cases where an allocation fails. Still I've applied the patch to spl-0.3.4 since this layer is supposed to emulate Solaris as closely as possible. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@164 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-11-03 21:51:33 +00:00
behlendo	a0f6da3d95	Add a SPL_AC_TYPE_ATOMIC64_T test to configure for systems which do already supprt atomic64_t types. * spl-07-kmem-cleanup.patch This moves all the debugging code from sys/kmem.h to spl-kmem.c, because the huge macros were hard to debug and were bloating functions that allocated memory. I also fixed some other minor problems, including 32-bit fixes and a reported memory leak which was just due to using the wrong free function. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@163 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-11-03 21:06:04 +00:00
behlendo	550f170525	Apply two nice improvements caught by Ricardo, spl-05-div64.patch This is a much less intrusive fix for undefined 64-bit division symbols when compiling the DMU in 32-bit kernels. * spl-06-atomic64.patch This is a workaround for 32-bit kernels that don't have atomic64_t. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@162 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-11-03 20:34:17 +00:00
behlendo	749045bbfa	Apply a nice fix caught by Ricardo, * spl-04-fix-taskq-spinlock-lockup.patch Fixes a deadlock in the BIO completion handler, due to the taskq code prematurely re-enabling interrupts when another spinlock had disabled them in the IDE IRQ handler. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@161 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-11-03 20:21:08 +00:00
behlendo	f6c81c5ea7	Reviewed and applied spl-01-rm-gpl-symbol-set_cpus_allowed.patch from Ricardo which removes a dependency on the GPL-only symbol set_cpus_allowed(). Using this symbol is simpler but in the name of portability we are adopting a spinlock based solution here to remove this dependency. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@160 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-11-03 20:07:20 +00:00
behlendo	25557fd884	Sigh more compat fixes, this is almost right for 2.6.9 - 2.6.26 kernels. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@157 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-08-11 23:47:44 +00:00
behlendo	b61a6e8bdc	Pull in initial 32-bit support patches. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@156 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-08-11 22:42:04 +00:00
behlendo	3d061e9d10	Commit bulk of remaining 2.6.9 and 2.6.26 compat changes. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@155 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-08-11 22:13:47 +00:00
behlendo	322640b7b5	Include linux/uaccess.h compat changes. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@154 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-08-11 19:10:14 +00:00
behlendo	6a6cafbe8d	Pull in timespec, list, and type compat changes to support building against a wider range of kernels. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@152 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-08-11 17:20:11 +00:00
behlendo	46c685d0c4	Add class / device portability code. Two autoconf tests were added to cover the 3 possible APIs from 2.6.9 to 2.6.26. We attempt to use the newest interfaces and if not available fallback to the oldest. This a rework of some changes proposed by Ricardo for RHEL4. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@150 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-08-10 03:50:36 +00:00
behlendo	7afde631f6	Start bringing in Ricardo's spl-00-rhel4-compat.patch, a few chunks at a time as I audit it. This chunk finishes moving the SPL entirely off the linux slab on to the SPL implementation. It differs slightly from the proposed version in that the spl continues to export to all the Solaris types and functions. These do conflict with the Linux slab so a module usings these interfaces must not include the SPL slab if they also intend to use the linux slab. Or they must explcitly #undef the macros which remap the functioin to their spl_* equivilants. A nice side of effect of dropping the entire linux slab is we don't need to autoconf checks anymore. They kept messing with the slab API endlessly! git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@148 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-08-05 04:16:09 +00:00
behlendo	a1502d76ae	- Remove hash functionality from slab in favor of direct lookups based of the spl_kmem_obj_t tacked on the end of each object. This actually isn't so back because we are now allocing large chunks for the slab and partitioning it ourselves. So there's not a ton of wasted space. We may suffer a performance hit however due to alignment issues. - Remove remaining depenancies on the linux slab implementation. We're standing on our own now for better or worse. - Rework slabs to be either kmem or vmem based. If neither KMC_VMEM of KMC_KMEM are specified we make a decent guess about what will work best for their based on the object size. Additionally we provide a kmem_virt() function caller can use to see if they have a virtual or physical address. - Minor fixups in the test suite. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@141 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-07-01 03:28:54 +00:00
behlendo	1c3832576d	Remove stray call to spl_cache_free() and remove all the cycle count which was costing me overhead. It was hurting performance pretty badly for heavily used caches. I'm also thinking the hash may be hurting me as well and it might be worth sticking a pointer in to a little space after the alloced object. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@140 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-06-28 20:03:11 +00:00
behlendo	fece7c99bf	Victory! I've reworked caches with large objects which are based by vmalloc()'ed memory. I now alloc a slab which is roughly 32*spl_obj_size and in this block of memory I place the slab descriptor, slab object descriptors, and objects themselves. This greatly reduces vmalloc lock contention. Still some minor cleanup remains and fine tuning but it's working pretty well. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@139 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-06-28 05:04:46 +00:00
behlendo	ff449ac406	Further slab improvements, I'm getting close to something which works well for the expected workloads. Improvement in this commit include: - Added DEBUG_KMEM_TRACKING #define which can optionally be set when DEBUG_KMEM is defined to do per allocation tracking. This allows us to get all the lightweight kmem debugging enabled by default which is pretty light weight, and only when looking for a memory leak we can briefly enable the per alloc tracking. - Added set_normalized_timespec() in to SPL to simply using the timespec() primatives from within a module. - Added per-spinlock cycle counters to the slab in an attempt to run down a lock contention issue. The contended lock was in vmalloc() but I'm going to leave the cycle counters in place for a little while until I'm convinced there arn't other locking improvement possible in the slab. - Added a proc interface to the slab to export per slab cache statistics to /proc/spl/kmem/slab for analysis. - Reworked spl_slab_alloc() function to allocate from kmem for small allocation and vmem for large allocations. This improved things considerably but futher work is needed. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@138 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-06-27 21:40:11 +00:00
behlendo	e9d7a2bef5	Fix for memory corruption caused by overruning the magazine when repopulating it. Plus I fixed a few more suble races in that part of the code which were catching me. Finally I fixed a small race in kmem_test8. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@137 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-06-26 19:49:42 +00:00
behlendo	4afaaefa05	Implement per-cpu local caches. This seems to have bough me another factor of 10x improvement on SMP system due to reduced lock contention. This may put me in the ballpark of what is needed. We can still further improve things on NUMA systems by creating an additional L3 cache per memory node instead of the current global pool. With luck this won't be needed. I should also take another look at the locking now that everything is working. There's a good chance I can tighten it up a little bit and improve things a little more. kmem_lock: time (sec) slabs objs hash kmem_lock: tot/max/calc tot/max/calc size/depth kmem_lock: 0.000999926 6/6/1 192/192/32 32768/0 kmem_lock: 0.000999926 4/4/2 128/128/64 32768/0 kmem_lock: 0.000999926 4/4/4 128/128/128 32768/0 kmem_lock: 0.000999926 4/4/8 128/128/256 32768/0 kmem_lock: 0.000999926 4/4/16 128/128/512 32768/0 kmem_lock: 0.000999926 4/4/32 128/128/1024 32768/0 kmem_lock: 0.000999926 4/4/64 128/128/2048 32768/0 kmem_lock: 0.000999926 8/8/128 256/256/4096 32768/0 kmem_lock: 0.003999704 24/23/256 768/736/8192 32768/1 kmem_lock: 0.012999038 44/41/512 1408/1312/16384 32768/1 kmem_lock: 0.051996153 96/93/1024 3072/2976/32768 32768/2 kmem_lock: 0.181986536 187/184/2048 5984/5888/65536 32768/3 kmem_lock: 0.655951469 342/339/4096 10944/10848/131072 32768/4 git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@136 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-06-25 20:57:45 +00:00
behlendo	d46630e0f3	The first locking issue was due to the semaphore I used. I was trying to be overly clever and the context switch when the semaphore was busy was destroying performance. Converting to a simple spin lock bough me a factor of 50 or so. That said it's still not good enough. Tests show bad performance and we are still CPU bound. The logical fix is I need to implement per-cpu hot caches to minimize the SMP contention. Linux and Solaris both have this, I was hoping to do without but it looks like that's not to be. kmem_lock: time (sec) slabs objs hash kmem_lock: tot/max/calc tot/max/calc size/depth kmem_lock: 0.022000000 7/6/64 224/177/2048 32768/1 kmem_lock: 0.039000000 13/13/128 416/404/4096 32768/1 kmem_lock: 0.079000000 23/21/256 736/672/8192 32768/1 kmem_lock: 0.158000000 48/47/512 1536/1504/16384 32768/1 kmem_lock: 0.345000000 105/105/1024 3360/3358/32768 32768/2 kmem_lock: 0.760000000 202/200/2048 6464/6400/65536 32768/3 git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@135 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-06-24 17:18:15 +00:00
behlendo	5cbd57fa91	Fix minor chaos/fc9 kernel discrepencies in build git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@133 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-06-13 23:56:26 +00:00
behlendo	2fb9b26a85	* : modules/sys/kmem-slab.c : Re-implemented the slab to no longer be based on the linux slab but to be its own complete implementation. The new slab behaves much more like the Solaris slab than the Linux slab. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@132 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-06-13 23:41:06 +00:00
behlendo	41cf38df92	Add missing () to quiet warnings in NDEBUG case git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@128 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-06-04 22:52:13 +00:00
behlendo	475cdc788e	Just use CONFIG_SLUB to detect SLUB use Add ASSERTF to the NDEBUG build Fix minor issue with various debug build flags git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@126 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-06-04 21:09:25 +00:00
behlendo	c30df9c863	Fixes: 1) Ensure mutex_init() never fails in the case of ENOMEM by retrying forever. I don't think I've ever seen this happen but it was clear after code inspection that if it did we would immediately crash. 2) Enable full debugging in check.sh for sanity tests. Might as well get as much debug as we can in the case of a failure. 3) Reworked list of kmem caches tracked by SPL in to a hash with the key based on the address of the kmem_cache_t. This should speed up the constructor/destructor/shrinker lookup needed now for newer kernel which removed the destructor support. 4) Updated kmem_cache_create to handle the case where CONFIG_SLUB is defined. The slub would occasionally merge slab caches which resulted in non-unique keys for our hash lookup in 3). To fix this we detect if the slub is enabled and then set the needed flag to prevent this merging from ever occuring. 5) New kernels removed the proc_dir_entry pointer from items registered by sysctl. This means we can no long be sneaky and manually insert things in to the sysctl tree simply by walking the proc tree. So I'm forced to create a seperate tree for all the things I can't easily support via sysctl interface. I don't like it but it will do for now. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@124 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-06-04 06:00:46 +00:00
behlendo	691d2bd733	Update utsname to use proper compatible interface to avoid API issues. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@123 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-06-03 21:20:18 +00:00
behlendo	57d862349b	Breaking the world for a little bit. If anyone is going to continue working on this branch for the next few days I suggested you work off of the 0.3.1 tag. The following changes are fairly extensive and are designed to make the SPL compatible with all kernels in the range of 2.6.18-2.6.25. There were 13 relevant API changes between these releases and I have added the needed autoconf tests to check for them. However, this has not all been tested extensively. I'll sort of the breakage on Fedora Core 9 and RHEL5 this week. SPL_AC_TYPE_UINTPTR_T SPL_AC_TYPE_KMEM_CACHE_T SPL_AC_KMEM_CACHE_DESTROY_INT SPL_AC_ATOMIC_PANIC_NOTIFIER SPL_AC_3ARGS_INIT_WORK SPL_AC_2ARGS_REGISTER_SYSCTL SPL_AC_KMEM_CACHE_T SPL_AC_KMEM_CACHE_CREATE_DTOR SPL_AC_3ARG_KMEM_CACHE_CREATE_CTOR SPL_AC_SET_SHRINKER SPL_AC_PATH_IN_NAMEIDATA SPL_AC_TASK_CURR SPL_AC_CTL_UNNUMBERED git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@119 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-06-02 17:28:49 +00:00
behlendo	715f625146	Go through and add a header with the proper UCRL number. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@114 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-05-26 04:38:26 +00:00
behlendo	cc7449ccd6	- Properly fix the debug support for all the ASSERT's, VERIFIES, etc can be compiled out when doing performance runs. - Bite the bullet and fully autoconfize the debug options in the configure time parameters. By default all the debug support is disable in the core SPL build, but available to modules which enable it when building against the SPL. To enable particular SPL debug support use the follow configure options: --enable-debug Internal ASSERTs --enable-debug-kmem Detailed memory accounting --enable-debug-mutex Detailed mutex tracking --enable-debug_kstat Kstat info exported to /proc --enable-debug-callb Additional callb debug git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@111 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-05-19 02:49:12 +00:00
behlendo	6ab69573ff	SPL additions to increase support for updated ZFS build git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@110 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-05-15 23:39:19 +00:00
behlendo	4efd41189a	Rework condition variable implementation to be consistent with other primitive implementations. Additionally ensure that GFP_ATOMIC is use for allocations when in interrupt context. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@108 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-05-15 17:10:30 +00:00
behlendo	8464443f8d	Add a comment so I remember to fix this. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@106 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-05-12 16:53:41 +00:00
behlendo	c6dc93d6a8	By default disable extra KMEM and MUTEX debugging to aid performance. They can easily be re-enabled when new stability issues are uncovered. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@105 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-05-09 22:53:20 +00:00
behlendo	5c2bb9b2c3	Stability hack. Under Solaris when KM_SLEEP is set kmem_cache_alloc() may not fail. To get this behavior I'd added a retry to the shim layer even though it is abusive to the VM, at least it should prevent the crash. Additionally I added a proc counter so I can easily check how often this is happening. It should be fairly rare, but likely will get worse and worse the longer the machine has been up. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@104 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-05-09 21:21:33 +00:00
behlendo	04a479f706	Add an almost feature complete implemenation of kstat. I chose not to support a few flags (we assert if they are used), and I did not add the libkstat interface and instead exported everything to proc for easy access. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@103 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-05-08 23:21:47 +00:00
behlendo	427a782d7d	Decrease of kmem warnign threshold back to 2 pages, no worse than a stack. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@100 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-05-07 19:33:01 +00:00
behlendo	13cdca65ec	Add vmem memory accounting git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@99 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-05-07 18:54:32 +00:00
behlendo	404992e31a	- Relocate 'stats_per' in to proper /proc/sys/spl/mutex/ directory - Shift to spinlock for mutex list addition and removal git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@98 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-05-07 17:58:22 +00:00
behlendo	4f86a887d8	Remaining issues fixed after reenabled mutex debugging. - Ensure the mutex_stats_sem and mutex_stats_list are initialized - Only spin if you have to in mutex_init git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@97 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-05-06 23:19:27 +00:00
behlendo	e8b31e8482	- Updated rwlock's to reside in a .c file instead of a static inline - Updated rwlock's so they can be safely initialized in ctors. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@96 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-05-06 23:00:49 +00:00
behlendo	d6a26c6a32	Lots of fixes here: - Detailed kmem memory allocation tracking. We can now get on spl module unload a list of all memory allocations which were not free'd and where the original alloc was. E.g. SPL: 15554:632:(spl-kmem.c:442:kmem_fini()) kmem leaked 90/319332 bytes SPL: 15554:648:(spl-kmem.c:451:kmem_fini()) address size data func:line SPL: 15554:648:(spl-kmem.c:457:kmem_fini()) ffff8100734b68b8 32 0100000001005a5a __spl_mutex_init:70 SPL: 15554:648:(spl-kmem.c:457:kmem_fini()) ffff8100734b6148 13 &tl->tl_lock __spl_mutex_init:74 SPL: 15554:648:(spl-kmem.c:457:kmem_fini()) ffff81007ac43730 32 0100000001005a5a __spl_mutex_init:70 SPL: 15554:648:(spl-kmem.c:457:kmem_fini()) ffff81007ac437d8 13 &tl->tl_lock __spl_mutex_init:74 - Shift to using rwsems in kmem implmentation, to simply locking and improve concurency. - Shift to using rwsems in mutex implementation, additionally ensure we never sleep in the init function if non-zero preempt_count or interrupts are disabled as can happen in a slab cache ctor/dtor. - Other minor formating fixes and such. TODO: - Finish the vmem memory allocation tracking - Vet all other SPL primatives for potential sleeping during *_init. I suspect the rwlock implemenation does this and should be fixes just like the mutex implemenation. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@95 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-05-06 20:38:28 +00:00
behlendo	9ab1ac14ad	Commit adaptive mutexes. This seems to have introduced some new crashes but it's not clear to me yet if these are a problem with the mutex implementation or ZFSs usage of it. Minor taskq fixes to add new tasks to the end of the pending list. Minor enhansements to the debug infrastructure. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@94 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-05-05 20:18:49 +00:00
behlendo	bcd68186d8	New an improved taskq implementation for the SPL. It allows a configurable number of threads like the Solaris version and almost all of the options are supported. Unfortunately, it appears to have made absolutely no difference to our performance numbers. I need to keep looking for where we are bottle necking. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@93 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-25 22:10:47 +00:00
behlendo	839d8b438e	Update kmem.h to properly use new debug subsystem. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@92 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-24 20:21:07 +00:00
behlendo	3561541c24	Prep for 0.2.1 tag Minor fixes to headers to use debug macros Added /proc/sys/spl/version git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@90 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-24 17:41:23 +00:00
wartens2	8100fe56f1	Make sure that when calling __vmem_alloc that we do not have __GFP_ZERO set. Once the memory is allocated then zero out the memory if __GFP_ZERO is passed to __vmem_alloc. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@88 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-24 17:07:56 +00:00
behlendo	6e605b6e58	Minor improvement to taskq handling. This is a small step towards dynamic taskqs which still need to be fully implemented. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@87 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-23 21:19:47 +00:00
behlendo	b831734a43	Stack usage is my enemy. Trade cpu cycles in the debug code to ensure I never add anything to the stack I don't absolutely need. All this debug code could be removed from a production build anyway so I'm not so worried about the performance impact. We may also consider revisting the mutex and condvar implementation to ensure no additional stack is used there. Initial indications are I have reduced the worst case stack usage to 9080 bytes. Still to large for the default 8k stacks so I have been forced to run with 16k stacks until I can reduce the worst offenders. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@83 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-22 16:55:26 +00:00
behlendo	7fea96c04f	More fixes to ensure we get good debug logs even if we're in the process of destroying the stacks. Threshhold set fairly aggressively top 80% of stack usage. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@82 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-21 22:44:11 +00:00
behlendo	892d51061e	Handful of minor stack checking fixes git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@79 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-21 18:08:33 +00:00
behlendo	937879f11d	Update SPL to use new debug infrastructure. This means: - Replacing all BUG_ON()'s with proper ASSERT()'s - Using ENTRY,EXIT,GOTO, and RETURN macro to instument call paths git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@78 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-21 17:29:47 +00:00
behlendo	2fae1b3d0a	Frist minor batch of fixes. Catch a dropped ;, and use SBUG instead of BUG. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@77 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-19 00:02:11 +00:00
behlendo	57d1b18858	First commit of lustre style internal debug support. These changes bring over everything lustre had for debugging with two exceptions. I dropped by the debug daemon and upcalls just because it made things a little easier. They can be readded easily enough if we feel they are needed. Everything compiles and seems to work on first inspection but I suspect there are a handful of issues still lingering which I'll be sorting out right away. I just wanted to get all these changes commited and safe. I'm getting a little paranoid about losing them. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@75 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-18 23:39:58 +00:00
behlendo	d61e12af5a	- Add some spinlocks to cover all the private data in the mutex. I don't think this should fix anything but it's a good idea regardless. - Drop the lock before calling the construct/destructor for the slab otherwise we can't sleep in a constructor/destructor and for long running functions we may NMI. - Do something braindead, but safe for the console debug logs for now. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@73 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-15 20:53:36 +00:00
behlendo	c5fd77fcbf	Just cleanup up an error case to avoid overspamming the console. We get the stack once from the BUG() no reason to dump it twice. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@72 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-14 18:37:20 +00:00
behlendo	12ea923056	Adjust the condition variables to simply sleep uninteruptibly. This way we don't have to contend with superious wakeups which it appears ZFS is not so careful to handle anyway. So this is probably for the best. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@70 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-11 22:49:48 +00:00
behlendo	115aed0dd8	- Add more strict in_atomic() checking to the mutex entry function just to be extra safety and paranoid. - Rewrite the thread shim to take full advantage of the new kernel kthread API. This greatly simplifies things. - Add a new regression test for thread_exit() to ensure it properly terminates a thread immediately without allowing futher execution of the thread. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@69 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-11 17:03:57 +00:00
behlendo	79f92663e3	Fix race in rwlock implementation which can occur when your task is rescheduled to a different cpu after you've taken the lock but before calling RW_LOCK_HELD is called. We need the spinlock to ensure there is a wmb() there. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@68 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-07 23:54:34 +00:00
behlendo	968eccd1d1	Update the thread shim to use the current kernel threading API. We need to use kthread_create() here for a few reasons. First off to old kernel_thread() API functioin will be going away. Secondly, and more importantly if I use kthread_create() we can then properly implement a thread_exit() function which terminates the kernel thread at any point with do_exit(). This fixes our cleanup bug which was caused by dropping a mutex twice after thread_exit() didn't really exit. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@66 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-04 04:44:16 +00:00
behlendo	996faa6869	Correctly implement atomic_cas_ptr() function. Ideally all of these atomic operations will be rewritten anyway with the correct arch specific assembly. But not today. git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@65 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-03 21:48:57 +00:00
behlendo	0a6fd143fd	- Remapped ldi_handle_t to struct block_device * which is much more useful - Added liunx block device headers to sunldi.h - Made __taskq_dispatch safe for interrupt context where it turns out we need to be useing it. - Fixed NULL const/dest bug for kmem slab caches - Places debug __dprintf debugging messages under a spin_lock_irqsave so it's safe to use then in interrupt handlers. For debugging only! git-svn-id: https://outreach.scidac.gov/svn/spl/trunk@64 7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c	2008-04-03 16:33:31 +00:00

1 2 3 4 5 ...

291 Commits