mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-03-22 08:51:30 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	647fa73cf3	Remove VN_HOLD/VN_RELE/VOP_PUTPAGE Previously these were defined to noops but rather than give the misleading impression that these are actually implemented I'm removing the type entirely for clarity.	2011-01-12 11:38:05 -08:00
Brian Behlendorf	a5b40eed17	Make vn_cache\|vn_file_cache kmem caches Both of these caches were previously allowed to be either a vmem or kmem cache based on the size of the object involved. Since we know the object won't be to large and performce is much better for a kmem cache for them to be kmem backed.	2011-01-12 11:38:05 -08:00
Brian Behlendorf	dcd9cb5a17	Clean vattr_t and vsecattr_t types Minor cleanup for the vattr_t and vsecattr_t types.	2011-01-12 11:38:04 -08:00
Brian Behlendorf	4295b530ee	Add vn_mode_to_vtype/vn_vtype to_mode helpers Add simple helpers to convert a vnode->v_type to a inode->i_mode. These should be used sparingly but they are handy to have.	2011-01-12 11:38:04 -08:00
Neependra Khare	3f688a8c38	Add cv_timedwait_interruptible() function The cv_timedwait() function by definition must wait unconditionally for cv_signal()/cv_broadcast() before waking. This causes processes to go in the D state which increases the load average. The load average is the summation of processes in D state and run queue. To avoid this it can be desirable to sleep interruptibly. These processes do not count against the load average but may be woken by a signal. It is up to the caller to determine why the process was woken it may be for one of three reasons. 1) cv_signal()/cv_broadcast() 2) the timeout expired 3) a signal was received Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-01-11 12:14:48 -08:00
Brian Behlendorf	6bf4d76f47	Linux Compat: inode->i_mutex/i_sem Create spl_inode_lock/spl_inode_unlock compability macros to simply access to the inode mutex/sem. This avoids the need to have to ugly up the code with the required #define's at every call site. At the moment the SPL only uses this in one place but higher layers can benefit from the macro.	2011-01-11 12:14:48 -08:00
Brian Behlendorf	b7dc313837	Add Thread Specific Data (TSD) Regression Test To validate the correct behavior of the TSD interfaces it's important that we add a regression test. This test is designed to minimally exercise the fundamental TSD behavior, it does not attempt to validate all potential corner cases. The test will first create 32 keys via tsd_create() and register a common destructor. Next 16 wait threads will be created each of which set/verify a random value for all 32 keys, then block waiting to be released by the control thread. Meanwhile the control thread verifies that none of the destructors have been run prematurely. The next phase of the test is to create 16 exit threads which set/verify a random value for all 32 keys. They then immediately exit. This is is designed to verify tsd_exit() which will be called via thread_exit(). This must result in all registered destructors being run and the memory for the tsd being free'd. After this tsd_destroy() is verified by destroying all 32 keys. Once again we must see the expected number of destructors run and the tsd memory free'd. At this point the blocked threads are released and they exit calling tsd_exit() which should do very little since all the tsd has already been destroyed. If this all goes off without a hitch the test passes. To ensure no memory has been leaked, I have manually verified that after spl module unload no memory is reported leaked.	2010-12-07 10:02:44 -08:00
Brian Behlendorf	9fe45dc1ac	Add Thread Specific Data (TSD) Implementation Thread specific data has implemented using a hash table, this avoids the need to add a member to the task structure and allows maximum portability between kernels. This implementation has been optimized to keep the tsd_set() and tsd_get() times as small as possible. The majority of the entries in the hash table are for specific tsd entries. These entries are hashed by the product of their key and pid because by design the key and pid are guaranteed to be unique. Their product also has the desirable properly that it will be uniformly distributed over the hash bins providing neither the pid nor key is zero. Under linux the zero pid is always the init process and thus won't be used, and this implementation is careful to never to assign a zero key. By default the hash table is sized to 512 bins which is expected to be sufficient for light to moderate usage of thread specific data. The hash table contains two additional type of entries. They first type is entry is called a 'key' entry and it is added to the hash during tsd_create(). It is used to store the address of the destructor function and it is used as an anchor point. All tsd entries which use the same key will be linked to this entry. This is used during tsd_destory() to quickly call the destructor function for all tsd associated with the key. The 'key' entry may be looked up with tsd_hash_search() by passing the key you wish to lookup and DTOR_PID constant as the pid. The second type of entry is called a 'pid' entry and it is added to the hash the first time a process set a key. The 'pid' entry is also used as an anchor and all tsd for the process will be linked to it. This list is using during tsd_exit() to ensure all registered destructors are run for the process. The 'pid' entry may be looked up with tsd_hash_search() by passing the PID_KEY constant as the key, and the process pid. Note that tsd_exit() is called by thread_exit() so if your using the Solaris thread API you should not need to call tsd_exit() directly.	2010-12-07 10:02:32 -08:00
Brian Behlendorf	058de03caa	Clear cv->cv_mutex when not in use For debugging purposes the condition varaibles keep track of the mutex used during a wait. The idea is to validate that all callers always use the same mutex. Unfortunately, we have seen cases where the caller reuses the condition variable with a different mutex but in a way which is known to be safe. My reading of the man pages suggests you should not do this and always cv_destroy()/cv_init() a new mutex. However, there is overhead in doing this and it does appear to be allowed under Solaris. To accomidate this behavior cv_wait_common() and __cv_timedwait() have been modified to clear the associated mutex when the last waiter is dropped. This ensures that while the condition variable is in use the incorrect mutex case is detected. It also allows the condition variable to be safely recycled without requiring the overhead of a cv_destroy()/cv_init() as long as it isn't currently in use. Finally, spin lock cv->cv_lock was removed because it is not required. When the condition variable is used properly the caller will always be holding the mutex so the spin lock is redundant. The lock was originally added because I expected to need to protect more than just the cv->cv_mutex. It turns out that was not the case. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-11-29 11:02:34 -08:00
Brian Behlendorf	8655ce492f	Linux 2.6.36 compat, use fops->unlocked_ioctl() As of linux-2.6.36 the last in-tree consumer of fops->ioctl() has been removed and thus fops()->ioctl() has also been removed. The replacement hook is fops->unlocked_ioctl() which has existed in kernel since 2.6.12. Since the SPL only contains support back to 2.6.18 vintage kernels, I'm not adding an autoconf check for this and simply moving everything to use fops->unlocked_ioctl().	2010-11-10 13:16:12 -08:00
Brian Behlendorf	9b2048c26b	Linux 2.6.36 compat, fs_struct->lock type change In the linux-2.6.36 kernel the fs_struct lock was changed from a rwlock_t to a spinlock_t. If the kernel would export the set_fs_pwd() symbol by default this would not have caused us any issues, but they don't. So we're forced to add a new autoconf check which sets the HAVE_FS_STRUCT_SPINLOCK define when a spinlock_t is used. We can then correctly use either spin_lock or write_lock in our custom set_fs_pwd() implementation.	2010-11-09 13:29:47 -08:00
Brian Behlendorf	1e18307b61	Fix incorrect krw_type_t type Flagged by the default compile options on archlinux 2010.05, we should be using the krw_t type not the krw_type_t type in the private data. module/splat/splat-rwlock.c: In function ‘splat_rwlock_test4_func’: module/splat/splat-rwlock.c:432:6: warning: case value ‘1’ not in enumerated type ‘krw_type_t’	2010-11-09 10:18:01 -08:00
Brian Behlendorf	23aa63cbf5	Fix 2.6.35 shrinker callback API change As of linux-2.6.35 the shrinker callback API now takes an additional argument. The shrinker struct is passed to the callback so that users can embed the shrinker structure in private data and use container_of() to access it. This removes the need to always use global state for the shrinker. To handle this we add the SPL_AC_3ARGS_SHRINKER_CALLBACK autoconf check to properly detect the API. Then we simply setup a callback function with the correct number of arguments. For now we do not make use of the new 3rd argument.	2010-10-22 14:51:26 -07:00
Brian Behlendorf	a7958f7eef	Support custom build directories One of the neat tricks an autoconf style project is capable of is allow configurion/building in a directory other than the source directory. The major advantage to this is that you can build the project various different ways while making changes in a single source tree. For example, this project is designed to work on various different Linux distributions each of which work slightly differently. This means that changes need to verified on each of those supported distributions perferably before the change is committed to the public git repo. Using nfs and custom build directories makes this much easier. I now have a single source tree in nfs mounted on several different systems each running a supported distribution. When I make a change to the source base I suspect may break things I can concurrently build from the same source on all the systems each in their own subdirectory. wget -c http://github.com/downloads/behlendorf/spl/spl-x.y.z.tar.gz tar -xzf spl-x.y.z.tar.gz cd spl-x-y-z ------------------------- run concurrently ---------------------- <ubuntu system> <fedora system> <debian system> <rhel6 system> mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6 cd ubuntu cd fedora cd debian cd rhel6 ../configure ../configure ../configure ../configure make make make make make check make check make check make check This is something the project has almost supported for a long time but finishing this support should save me lots of time.	2010-09-05 21:49:05 -07:00
Brian Behlendorf	2b3543025c	Stub out kmem cache defrag API At some point we are going to need to implement the kmem cache move callbacks to allow for kmem cache defragmentation. This commit simply lays a small part of the API ground work, it does not actually implement any of this feature. This is safe for now because the move callbacks are just an optimization. Even if they are registered we don't ever really have to call them.	2010-08-27 14:23:42 -07:00
Li Wei	4be55565fe	Fix stack overflow in vn_rdwr() due to memory reclaim Unless __GFP_IO and __GFP_FS are removed from the file mapping gfp mask we may enter memory reclaim during IO. In this case shrink_slab() entered another file system which is notoriously hungry for stack. This additional stack usage may cause a stack overflow. This patch removes __GFP_IO and __GFP_FS from the mapping gfp mask of each file during vn_open() to avoid any reclaim in the vn_rdwr() IO path. The original mask is then restored at vn_close() time. Hats off to the loop driver which does something similiar for the same reason. [...] shrink_slab+0xdc/0x153 try_to_free_pages+0x1da/0x2d7 __alloc_pages+0x1d7/0x2da do_generic_mapping_read+0x2c9/0x36f file_read_actor+0x0/0x145 __generic_file_aio_read+0x14f/0x19b generic_file_aio_read+0x34/0x39 do_sync_read+0xc7/0x104 vfs_read+0xcb/0x171 :spl:vn_rdwr+0x2b8/0x402 :zfs:vdev_file_io_start+0xad/0xe1 [...] Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-12 09:34:33 -07:00
Ricardo M. Correia	26f7245c7c	Fix taskq code to not drop tasks when TQ_SLEEP is used. When TQ_SLEEP is used, taskq_dispatch() should always succeed even if the number of pending tasks is above tq->tq_maxalloc. This semantic is similar to KM_SLEEP in kmem allocations, which also always succeed. However, we cannot block forever otherwise there is a risk of deadlock. Therefore, we still allow the number of pending tasks to go above tq->tq_maxalloc with TQ_SLEEP, but we may sleep up to 1 second per task dispatch, thereby throttling the task dispatch rate. One of the existing splat tests was also augmented to test for this scenario. The test would fail with the previous implementation but now it succeeds. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-02 11:20:31 -07:00
Brian Behlendorf	41f84a8d56	Strfree() should call kfree() not kmem_free() Using kmem_free() results in deducting X bytes from the memory accounting when --enable-debug is set. Unfortunately, currently the counterpart kmem_asprintf() and friends do not properly account for memory allocated, so we must do the same on free. If we don't then we end up with a negative number of lost bytes reported when the module is unloaded. A better long term fix would be to add the accounting in to the allocation side but that's a project for another day.	2010-07-30 22:20:58 -07:00
Brian Behlendorf	099dc9c2d2	Add uninstall Makefile targets Extend the Makefiles with an uninstall target to cleanly remove a package which was installed with 'make install'. Additionally, ensure a 'depmod -a' is run as part of the install to update the module dependency information.	2010-07-28 14:55:32 -07:00
Brian Behlendorf	10129680f8	Ensure kmem_alloc() and vmem_alloc() never fail The Solaris semantics for kmem_alloc() and vmem_alloc() are that they must never fail when called with KM_SLEEP. They may only fail if called with KM_NOSLEEP otherwise they must block until memory is available. This is quite different from how the Linux memory allocators work, under Linux a memory allocation failure is always possible and must be dealt with. At one point in the past the kmem code did properly implement this behavior, however as the code evolved this behavior was overlooked in places. This patch goes through all three implementations of the kmem/vmem allocation functions and ensures that they will all block in the KM_SLEEP case when memory is not available. They may still fail in the KM_NOSLEEP case in which case the caller is responsible for handling the failure. Special care is taken in vmalloc_nofail() to avoid thrashing the system on the virtual address space spin lock. The down side of course is if you do see a failure here, which is unlikely for 64-bit systems, your allocation will delay for an entire second. Still this is preferable to locking up your system and it is the best we can do given the constraints. Additionally, the code was cleaned up to be much more readable and comments were added to describe the various kmem-debug-* configure options. The default configure options remain: "--enable-debug-kmem --disable-debug-kmem-tracking"	2010-07-26 15:47:55 -07:00
Brian Behlendorf	849c50e7f2	Fix two minor compiler warnings In cmd/splat.c there was a comparison between an __u32 and an int. To resolve the issue simply use a __u32 and strtoul() when converting the provided user string. In module/spl/spl-vnode.c we should explicitly cast nd->last.name to a const char * which is what is expected by the prototype.	2010-07-26 10:24:26 -07:00
Brian Behlendorf	8b0eb3f0dc	Remove deadcode caused by removal of format1 arg Commit `55abb0929e` removed the never used format1 argument of spl_debug_msg(). That in turn resulted in some deadcode which should be removed since it's now useless.	2010-07-21 16:31:42 -07:00
Ricardo M. Correia	81672c0122	Display DEBUG keyword during module load when --enable-debug is used. Signed-off-by: Ricardo M. Correia <ricardo.correia@oracle.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-07-20 15:31:03 -07:00
Ricardo M. Correia	2c762de830	Fix buggy kmem_{v}asprintf() functions When the kvasprintf() call fails they should reset the arguments by calling va_start()/va_copy() and va_end() inside the loop, otherwise they'll try to read more arguments rather than starting over and reading them from the beginning. Signed-off-by: Ricardo M. Correia <ricardo.correia@oracle.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-07-20 13:51:46 -07:00
Brian Behlendorf	b17edc10a9	Prefix all SPL debug macros with 'S' To avoid conflicts with symbols defined by dependent packages all debugging symbols have been prefixed with a 'S' for SPL. Any dependent package needing to integrate with the SPL debug should include the spl-debug.h header and use the 'S' prefixed macros. They must also build with DEBUG defined.	2010-07-20 13:30:40 -07:00
Brian Behlendorf	55abb0929e	Split <sys/debug.h> header To avoid symbol conflicts with dependent packages the debug header must be split in to several parts. The <sys/debug.h> header now only contains the Solaris macro's such as ASSERT and VERIFY. The spl-debug.h header contain the spl specific debugging infrastructure and should be included by any package which needs to use the spl logging. Finally the spl-trace.h header contains internal data structures only used for the log facility and should not be included by anythign by spl-debug.c. This way dependent packages can include the standard Solaris headers without picking up any SPL debug macros. However, if the dependant package want to integrate with the SPL debugging subsystem they can then explicitly include spl-debug.h. Along with this change I have dropped the CHECK_STACK macros because the upstream Linux kernel now has much better stack depth checking built in and we don't need this complexity. Additionally SBUG has been replaced with PANIC and provided as part of the Solaris macro set. While the Solaris version is really panic() that conflicts with the Linux kernel so we'll just have to make due to PANIC. It should rarely be called directly, the prefered usage would be an ASSERT or VERIFY. There's lots of change here but this cleanup was overdue.	2010-07-20 13:29:35 -07:00
Ned Bass	8f813bb168	Proposed fix for oops on SIGINT in splat atomic:64-bit test. The threads in the splat atomic:64-bit test share the data structure atomic_priv_t ap, which lives on the kernel stack of the splat user-space utility. If splat terminates before the threads, accesses to that memory location by the other threads become invalid. Splat synchronizes with the threads with the call: wait_event_interruptible(ap.ap_waitq, splat_atomic_test1_cond(&ap, i)); Apparently, the SIGINT wakes and terminates splat prematurely, so that GPFs or other bad things happen when the threads subsequently access ap. This commit prevents this by using the uninterruptible form: wait_event(ap.ap_waitq, splat_atomic_test1_cond(&ap, i));	2010-07-15 12:50:15 -07:00
Brian Behlendorf	d0bd694ca9	Fix -Werror=format-security compiler option Noticed under Ubuntu kernel builds we should be passing a format specifier and the string, not just the string.	2010-07-14 11:53:57 -07:00
Brian Behlendorf	f0ff89fc86	Linux 2.6.35 compat: filp_fsync() dropped 'stuct dentry ' The prototype for filp_fsync() drop the unused argument 'stuct dentry '. I've fixed this by adding the needed autoconf check and moving all of those filp related functions to file_compat.h. This will simplify handling any further API changes in the future.	2010-07-14 11:40:55 -07:00
Brian Behlendorf	a4bfd8ea1b	Add __divdi3(), remove __udivdi3() kernel dependency Up until now no SPL consumer attempted to perform signed 64-bit division so there was no need to support this. That has now changed so I adding 64-bit division support for 32-bit platforms. The signed implementation is based on the unsigned version. Since the have been several bug reports in the past concerning correct 64-bit division on 32-bit platforms I added some long over due regression tests. Much to my surprise the unsigned 64-bit division regression tests failed. This was surprising because __udivdi3() was implemented by simply calling div64_u64() which is provided by the kernel. This meant that the linux kernels 64-bit division algorithm on 32-bit platforms was flawed. After some investigation this turned out to be exactly the case. Because of this I was forced to abandon the kernel helper and instead to fully implement 64-bit division in the spl. There are several published implementation out there on how to do this properly and I settled on one proposed in the book Hacker's Delight. Their proposed algoritm is freely available without restriction and I have just modified it to be linux kernel friendly. The update implementation now passed all the unsigned and signed regression tests. This should be functional, but not fast, which is good enough for out purposes. If you want fast too I'd strongly suggest you upgrade to a 64-bit platform. I have also reported the kernel bug and we'll see if we can't get it fixed up stream.	2010-07-13 16:44:02 -07:00
Brian Behlendorf	1814251453	Require gawk the usermode helper fails with awk For some reason when awk invoked by the usermode helper the command always fails. Interestingly gawk does not suffer from this problem which is why I never observed this failure since the distro I tested with all had gawk installed instead of awk. Anyway, the simplest thing to do here is to just make gawk mandatory. I've added a configure check for gawk specifically and have updated the command to call gawk not awk.	2010-07-01 16:38:08 -07:00
Brian Behlendorf	7119bf7044	Add configure check for user_path_dir() I didn't notice at the time but user_path_dir() was not introduced at the same time as set_fs_pwd() change. I had lumped the two together but in fact user_path_dir() was introduced in 2.6.27 and set_fs_pwd() taking 2 args was introduced in 2.6.25. This means builds against 2.6.25-2.6.26 kernels were broken. To fix this I've added a check for user_path_dir() and no longer assume that if set_fs_pwd() takes 2 args then user_path_dir() is also available.	2010-07-01 13:53:26 -07:00
Ned Bass	55f10ae5e9	Implementation of a regression test for TQ_FRONT. Use 3 threads and 8 tasks. Dispatch the final 3 tasks with TQ_FRONT. The first three tasks keep the worker threads busy while we stuff the queues. Use msleep() to force a known execution order, assuming TQ_FRONT is properly honored. Verify that the expected completion order occurs. The splat_taskq_test5_order() function may be useful in more than one test. This commit generalizes it by renaming the function to splat_taskq_test_order() and adding a name argument instead of assuming SPLAT_TASKQ_TEST5_NAME as the test name. The documentation for splat taskq regression test #5 swaps the two required completion orders in the diagram. This commit corrects the error. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-07-01 10:59:52 -07:00
Ned Bass	1a73940d39	Initialize the /dev/splatctl device buffer On open() and initialize the buffer with the SPL version string. The user space splat utility expects to find the SPL version string when it opens and reads from /dev/splatctl. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-07-01 10:59:46 -07:00
Ned Bass	f0d8bb26b4	Implementation of the TQ_FRONT flag. Adds a task queue to receive tasks dispatched with TQ_FRONT. Worker threads pull tasks from this high priority queue before the default pending queue. Executing tasks out of FIFO order potentially breaks taskq_lowest_id() if we do not preserve the ordering of the work list by taskqid. Therefore, instead of always appending to the work list, we search for the appropriate place to insert a task. The common case is to append to the list, so we make this operation efficient by searching the work list in reverse order. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-07-01 10:59:38 -07:00
Brian Behlendorf	79a3bf130b	Linux-2.6.33 compat, .ctl_name removed from struct ctl_table As of linux-2.6.33 the ctl_name member of the ctl_table struct has been entirely removed. The upstream code has been updated to depend entirely on the the procname member. To handle this all references to ctl_name are wrapped in a CTL_NAME macro which simply expands to nothing for newer kernels. Older kernels are supported by having it expand to .ctl_name = X just as before.	2010-06-30 12:49:12 -07:00
Brian Behlendorf	ede0bdffb6	Treat mutex->owner as volatile When HAVE_MUTEX_OWNER is defined and we are directly accessing mutex->owner treat is as volative with the ACCESS_ONCE() helper. Without this you may get a stale cached value when accessing it from different cpus. This can result in incorrect behavior from mutex_owned() and mutex_owner(). This is not a problem for the !HAVE_MUTEX_OWNER case because in this case all the accesses are covered by a spin lock which similarly gaurentees we will not be accessing stale data. Secondly, check CONFIG_SMP before allowing access to mutex->owner. I see that for non-SMP setups the kernel does not track the owner so we cannot rely on it. Thirdly, check CONFIG_MUTEX_DEBUG when this is defined and the HAVE_MUTEX_OWNER is defined surprisingly the mutex->owner will not be cleared on mutex_exit(). When this is the case the SPL needs to make sure to do it to ensure MUTEX_HELD() behaves as expected or you will certainly assert in mutex_destroy(). Finally, improve the mutex regression tests. For mutex_owned() we now minimally check that it behaves correctly when checked from the owner thread or the non-owner thread. This subtle behaviour has bit me before and I'd like to catch it early next time if it reappears. As for mutex_owned() regression test additonally verify that mutex->owner is always cleared on mutex_exit().	2010-06-28 16:02:57 -07:00
Brian Behlendorf	616df2dd8b	Fix subtle race in threads test case The call to wake_up() must be moved under the spin lock because once we drop the lock 'tp' may no longer be valid because the creating thread has exited. This basic thread implementation was correct, this was simply a flaw in the test case.	2010-06-28 12:34:20 -07:00
Brian Behlendorf	e6de04b73c	Add kmem_vasprintf function We might as well have both asprintf() variants. This allows us to safely pass a va_list through several levels of the stack using va_copy() instead of va_start().	2010-06-24 09:41:59 -07:00
Brian Behlendorf	438683c0a9	Revert "Support TQ_FRONT flag used by taskq_dispatch()" This reverts commit `eb12b3782c`.	2010-06-21 10:19:44 -07:00
Brian Behlendorf	3cb77549d1	Update warnings in kmem debug code This fix was long overdue. Most of the ground work was laid long ago to include the exact function and line number in the error message which there was an issue with a memory allocation call. However, probably due to lack of time at the moment that informatin never made it in to the error message. This patch fixes that and trys to standardize the kmem debug messages as well.	2010-06-16 16:01:16 -07:00
Brian Behlendorf	eb12b3782c	Support TQ_FRONT flag used by taskq_dispatch() Allow taskq_dispatch() to insert work items at the head of the queue instead of just the tail by passing the TQ_FRONT flag.	2010-06-11 15:57:25 -07:00
Brian Behlendorf	b868e22f05	Add kmem_asprintf(), strfree(), strdup(), and minor cleanup. This patch adds three missing Solaris functions: kmem_asprintf(), strfree(), and strdup(). They are all implemented as a thin layer which just calls their Linux counterparts. As part of this an autoconf check for kvasprintf was added because it does not appear in older kernels. If the kernel does not provide it then spl-generic implements it. Additionally the dead DEBUG_KMEM_UNIMPLEMENTED code was removed to clean things up and make the kmem.h a little more readable.	2010-06-11 15:57:25 -07:00
Brian Behlendorf	ae4c36adce	Cleanly split Linux proc.h (fs) from conflicting Solaris proc.h (process) Under linux the proc.h header is for the /proc filesystem, and under Solaris the proc/h header if for processes. This patch correctly moves the Linux proc functionality in a linux/proc_compat.h header and leaves the sys/proc.h for use by Solaris. Minor updates were required to all the call sites where it was included of course.	2010-06-11 15:57:25 -07:00
Alex Zhuravlev	1b4ad25e2f	Stack overflow on 64-bit modulus operations on 32-bit architectures. Running 'zpool create' on a 32-bit machine with an SPL compiled with gcc 4.4.4 led to a stack overlow. This turned out to be due to some sort of 'optimization' by gcc: uint64_t __umoddi3(uint64_t dividend, uint64_t divisor) { return dividend - divisor * (dividend / divisor); } This code was supposed to be using __udivdi3 to implement /, but gcc instead implemented it via __umoddi3 itself. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-06-03 09:06:55 -07:00
Brian Behlendorf	8a1c9a02fb	Minor 32-bit fix cast to hrtime_t before the mutliply. It's important to cast to hrtime_t before doing the multiply because the ts.tv_sec type is only 32-bits and we need to promote it to 64-bits.	2010-05-23 09:51:17 -07:00
Brian Behlendorf	32f5faff69	Simplify rwlock implementation. Remove RW_COUNT() from the rwlock implementation. The idea was that it could be used as a generic wrapper for getting at the internal state of a rwlock. While a good idea it's proven problematic to keep it correct for multiple archs and internal implementation changes. In short it hasn't been worth the trouble. With that and simplicity in mind things have been updated to use the rwsem_is_locked() function instead of RW_COUNT for the RW_*_HELD() functions. As for rw_upgrade() it remains only implemented for the generic rwsem implemenation. It remains to be determined if its worth the effort of adding a custom implementation for each arch.	2010-05-20 14:20:34 -07:00
Brian Behlendorf	23d91792ef	Use KM_NODEBUG macro in preference to __GFP_NOWARN.	2010-05-20 14:16:59 -07:00
Brian Behlendorf	3626ae6a70	Disable spl_debug_panic_on_bug by default. While I may prefer to have the system panic on an SBUG and to get crash dump for analysis. I suspect most peoples systems are not configured from crash dump and the best thing to so is to simply halt the thread and print an error to the console. This way they have a good chance of actually saving the stack trace and debug log.	2010-05-20 10:15:51 -07:00
Brian Behlendorf	e0dcb22e4e	Adjust 'large' object sizes in kmem:slab_large test. 64K objects are large for a kmem based slab (2M slabs) 1M objects are large for a vmem cased slab (32M slabs)	2010-05-20 09:52:37 -07:00
Brian Behlendorf	5198ea0e71	Remove kmem_set_warning() interface replace with __GFP_NOWARN flag. Remove the kmem_set_warning() hack used by the kmem-splat regression tests with a per-allocation flag called __GFP_NOWARN. This matches the lower level linux flag of similar by slightly different function. The idea is you can then explicitly set this flag on requests where you know your breaking the max 8k rule but you need/want to do it anyway. This is currently used by the regression tests where we intentionally push things to the limit but don't want the log noise. Additionally, we are forced to use it in spl_kmem_cache_create() because by default NR_CPUS is very large and theres no easy way to handle that. Finally, I've added a stack_dump() call to the warning when it is trigger to make to clear exactly where the allocation is taking place.	2010-05-19 16:53:13 -07:00
Brian Behlendorf	627a74972c	Set default debug log patch to /tmp/spl-log. Using /tmp/ is a preferable default, it can always be overriden using the module option on a case-by-case basis. Additionally standardize some log messages based on the same default log level used by the kernel.	2010-05-19 16:17:06 -07:00
Brian Behlendorf	716154c592	Public Release Prep Updated AUTHORS, COPYING, DISCLAIMER, and INSTALL files. Added standardized headers to all source file to clearly indicate the copyright, license, and to give credit where credit is due.	2010-05-17 15:18:00 -07:00
Brian Behlendorf	6020190e8f	Use do_posix_clock_monotonic_gettime() as described by comment. While this does incur slightly more overhead we should be using do_posix_clock_monotonic_gettime() for gethrtime() as described by the existing comment.	2010-05-14 09:31:22 -07:00
Brian Behlendorf	f752b46eb3	Add cv_wait_interruptible() function. This is a minor extension to the condition variable API to allow for reasonable signal handling on Linux. The cv_wait() function by definition must wait unconditionally for cv_signal()/cv_broadcast() before waking it. This makes it impossible to woken by a signal such as SIGTERM. The cv_wait_interruptible() function was added to handle this case. It behaves identically to cv_wait() with the exception that it waits interruptibly allowing a signal to wake it up. This means you do need to be careful and check issig() after waking.	2010-05-14 09:24:51 -07:00
Brian Behlendorf	97f8f6d789	Dump log from current process when required When dumping a debug log first check that it is safe to create a new thread and block waiting for it. If we are in an atomic context or irqs and disabled it is not safe to sleep and we must write out of the debug log from the current process.	2010-04-23 15:55:02 -07:00
Brian Behlendorf	d05ec4b45f	Assume TQ_SLEEP when not explicitly specified.	2010-04-23 14:39:47 -07:00
Ricardo Correia	663e02a135	Handle the FAPPEND option in vn_rdwr(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-04-23 14:39:42 -07:00
Brian Behlendorf	82a358d9c0	Update vn_set_pwd() to allow user\|kernal address for filename During module init spl_setup()->The vn_set_pwd("/") was failing with -EFAULT because user_path_dir() and __user_walk() both expect 'filename' to be a user space address and it's not in this case. To handle this the data segment size is increased to to ensure strncpy_from_user() does not fail with -EFAULT. Additionally, I've added a printk() warning to catch this and log it to the console if it ever reoccurs. I thought everything was working properly here because there consequences of this failing are subtle and usually non-critical.	2010-04-22 12:53:58 -07:00
Brian Behlendorf	16b719f006	Allow spl_config.h to be included by dependant packages (updated) We need dependent packages to be able to include spl_config.h to build properly. This was partially solved in commit `0cbaeb1` by using AH_BOTTOM to #undef common #defines (PACKAGE, VERSION, etc) which autoconf always adds and cannot be easily removed. This solution works as long as the spl_config.h is included before your projects config.h. That turns out to be easier said than done. In particular, this is a problem when your package includes its config.h using the -include gcc option which ensures the first thing included is your config.h. To handle all cases cleanly I have removed the AH_BOTTOM hack and replaced it with an AC_CONFIG_HEADERS command. This command runs immediately after spl_config.h is written and with a little awk-foo it strips the offending #defines from the file. This eliminates the problem entirely and makes header safe for inclusion. Also in this change I have removed the few places in the code where spl_config.h is included. It is now added to the gcc compile line to ensure the config results are always available. Finally, I have also disabled the verbose kernel builds. If you want them back you can always build with 'make V=1'. Since things are working now they don't need to be on by default.	2010-03-22 14:45:33 -07:00
Brian Behlendorf	aa600d8a38	Reduce max kmem based slab size Allowing MAX_ORDER-1 sized allocations for kmem based slabs have been observed to result in deadlocks. To help prvent this limit max kmem based slab size to MAX_ORDER-3. Just for the record callers should not be creating slabs like this, but if they do we should still handle it as safely as we can.	2010-03-18 13:39:51 -07:00
Brian Behlendorf	21006d08af	Remove Module.markers and Module.symver{s} in clean target Split 'modules' and 'clean' Makefile targets to allow us to cleanly remove the Module.* build products with a 'make clean'.	2010-03-08 13:39:57 -08:00
Brian Behlendorf	3977f8370f	Linux 2.6.32 compat, proc_handler() API change As of linux-2.6.32 the 'struct file *filp' argument was dropped from the proc_handle() prototype. It was apparently unused _almost_ everywhere in the kernel and this was simply cleanup. I've added a new SPL_AC_5ARGS_PROC_HANDLER autoconf check for this and the proper compat macros to correctly define the prototypes and some helper functions. It's not pretty but API compat changes rarely are.	2010-03-04 12:14:56 -08:00
Ricardo M. Correia	694921bc49	sun-misc-gitignore Add .gitignore files. Signed-off-by: Ricardo M. Correia <Ricardo.M.Correia@Sun.COM>	2010-01-08 09:37:54 -08:00
Ricardo M. Correia	f7e8739c94	sun-fix-whitespace Whitespace fixes. Signed-off-by: Ricardo M. Correia <Ricardo.M.Correia@Sun.COM>	2010-01-08 09:37:54 -08:00
Ricardo M. Correia	b520b14305	sun-fix-panic-str Fix panic() string, which was being used as a format string, instead of an already-formatted string. Signed-off-by: Ricardo M. Correia <Ricardo.M.Correia@Sun.COM>	2010-01-08 09:37:54 -08:00
Brian Behlendorf	5562e5d105	Added splat taskq task ordering test case. This test case verifies the correct behavior of taskq_wait_id(). In particular it ensure the the following two cases are handled properly: 1) Task ids larger than the waited for task id can run and complete as long as there is an available worker thread. 2) All task ids lower than the waited one must complete before unblocking even if the waited task id itself has completed.	2010-01-05 13:34:09 -08:00
Brian Behlendorf	82387586af	Optimize lowest outstanding taskqid calculation in taskq_lowest_id() In the initial version of taskq_lowest_id() the entire pending and work list was locked under the tq->tq_lock to determine the lowest outstanding taskqid. At the time this done because I was rushed and wanted to make sure it was right... fast was secondary. Well now fast is important too so I carefully thought through the pending and work list management and convinced myself it is safe and correct to simply check the first entry. I added a large comment to the source to explain this. But basically as long as we are careful to ensure the pending and work list stay sorted this is safe and fast. The motivation for this chance was that I was observing as much as 10% of the total CPU time go to waiting on the tq->tq_lock when the pending list was long. This resolves that problems and frees up that CPU time for something useful.	2010-01-04 15:52:26 -08:00
Brian Behlendorf	ef1c7a0691	Strip __GFP_ZERO from kmalloc it is not available for older kernels. This is needed to avoid a BUG_ON() on RHEL5.4 kernel 2.6.18-164.6.1, since __GFP_ZERO is not a valid flag for kmalloc().	2009-12-23 12:57:10 -08:00
Brian Behlendorf	641bebe35f	Fix kmem:slab_overcommit regression test locking This regression test could crash in splat_kmem_cache_test_reclaim() due to a race between the slab relclaim and the normal exiting of the thread. Specifically, the kct structure could be free'd by the thread performing the allocations while the reclaim function was also working on that's threads kct structure. The simplest fix is to extend the kcp->kcp_lock over the reclaim to prevent the kct from being freed. A better fix would be to ref count these structures, but since is just a regression this locking change is enough. Surprisingly this was only observed commonly under RHEL5.4 but all platform could have hit this.	2009-12-23 12:46:11 -08:00
Brian Behlendorf	242f539a2e	Add skc_flags and full header to /proc/spl/kmem/slab.	2009-12-11 11:20:08 -08:00
Brian Behlendorf	f60a5f5221	Splat vnode tests must return negative error codes. I must have been in a hurry when I wrote the vnode regression tests because the error code handling is not correct. The Solaris vnode API returns positive errno's, these need to be converted to negative errno's for Linux before being passed back to user space. Otherwise the test hardness with report the failure but errno will not be set with the correct error code. Additionally tests 3, 4, 6, and 7 may fail in the test file already exists. To avoid false positives a user mode helper has added to remove the test files in /tmp/ before running the actual test.	2009-12-10 15:06:07 -08:00
Brian Behlendorf	d04c8a563c	Atomic64 compatibility for 32-bit systems without kernel support. This patch is another step towards updating the code to handle the 32-bit kernels which I have not been regularly testing. This changes do not really impact the common case I'm expected which is the latest kernel running on an x86_64 arch. Until the linux-2.6.31 kernel the x86 arch did not have support for 64-bit atomic operations. Additionally, the new atomic_compat.h support for this case was wrong because it embedded a spinlock in the atomic variable which must always and only be 64-bits total. To handle these 32-bit issues we now simply fall back to the --enable-atomic-spinlock implementation if the kernel does not provide the 64-bit atomic funcs. The second issue this patch addresses is the DEBUG_KMEM assumption that there will always be atomic64 funcs available. On 32-bit archs this may not be true, and actually that's just fine. In that case the kernel will will never be able to allocate more the 32-bits worth anyway. So just check if atomic64 funcs are available, if they are not it means this is a 32-bit machine and we can safely use atomic_t's instead.	2009-12-04 15:54:12 -08:00
Brian Behlendorf	db1aa22297	Correctly handle division on 32-bit RHEL5 systems by returning dividend.	2009-12-01 15:53:28 -08:00
Brian Behlendorf	4e5691faf6	Only run the kmem overcommit test on 64-bit systems.	2009-12-01 11:40:47 -08:00
Brian Behlendorf	6ff686c44d	Type long expected explicitly cast for 32-bit systems.	2009-12-01 10:14:01 -08:00
Brian Behlendorf	0a6c005959	Ensure spl_config.h is include in spl-generic.c	2009-11-15 15:04:33 -08:00
Brian Behlendorf	8b45dda2bc	Linux 2.6.31 kmem cache alignment fixes and cleanup. The big fix here is the removal of kmalloc() in kv_alloc(). It used to be true in previous kernels that kmallocs over PAGE_SIZE would always be pages aligned. This is no longer true atleast in 2.6.31 there are no longer any alignment expectations. Since kv_alloc() requires the resulting address to be page align we no only either directly allocate pages in the KMC_KMEM case, or directly call __vmalloc() both of which will always return a page aligned address. Additionally, to avoid wasting memory size is always a power of two. As for cleanup several helper functions were introduced to calculate the aligned sizes of various data structures. This helps ensure no case is accidentally missed where the alignment needs to be taken in to account. The helpers now use P2ROUNDUP_TYPE instead of P2ROUNDUP which is safer since the type will be explict and we no longer count on the compiler to auto promote types hopefully as we expected. Always wnforce minimum (SPL_KMEM_CACHE_ALIGN) and maximum (PAGE_SIZE) alignment restrictions at cache creation time. Use SPL_KMEM_CACHE_ALIGN in splat alignment test.	2009-11-13 11:12:43 -08:00
Brian Behlendorf	c89fdee4d3	Remove __GFP_NOFAIL in kmem and retry internally. As of 2.6.31 it's clear __GFP_NOFAIL should no longer be used and it may disappear from the kernel at any time. To handle this I have simply added _nofail wrappers in the kmem implementation which perform the retry for non-atomic allocations. From linux-2.6.31 mm/page_alloc.c:1166 / * __GFP_NOFAIL is not to be used in new code. * * All __GFP_NOFAIL callers should be fixed so that they * properly detect and handle allocation failures. * * We most definitely don't want callers attempting to * allocate greater than order-1 page units with * __GFP_NOFAIL. */ WARN_ON_ONCE(order > 1);	2009-11-12 15:11:24 -08:00
Brian Behlendorf	baf2979ed3	Linux 2.6.31 Compatibility Updates SPL_AC_2ARGS_SET_FS_PWD macro updated to explicitly include linux/fs_struct.h which was dropped from linux/sched.h. min_wmark_pages, low_wmark_pages, high_wmark_pages macros introduced in newer kernels. For older kernels mm_compat.h was introduced to define them as needed as direct mappings to per zone min_pages, low_pages, max_pages.	2009-11-10 14:06:57 -08:00
Brian Behlendorf	055ffd98cf	Autoconf --enable-debug-* cleanup Cleanup the --enable-debug-* configure options, this has been pending for quite some time and I am glad I finally got to it. To summerize: 1) All SPL_AC_DEBUG_* macros were updated to be a more autoconf friendly. This mainly involved shift to the GNU approved usage of AC_ARG_ENABLE and ensuring AS_IF is used rather than directly using an if [ test ] construct. 2) --enable-debug-kmem=yes by default. This simply enabled keeping a running tally of total memory allocated and freed and reporting a memory leak if there was one at module unload. Additionally, it ensure /proc/spl/kmem/slab will exist by default which is handy. The overhead is low for this and it should not impact performance. 3) --enable-debug-kmem-tracking=no by default. This option was added to provide a configure option to enable to detailed memory allocation tracking. This support was always there but you had to know where to turn it on. By default this support is disabled because it is known to badly hurt performence, however it is invaluable when chasing a memory leak. 4) --enable-debug-kstat removed. After further reflection I can't see why you would ever really want to turn this support off. It is now always on which had the nice side effect of simplifying the proc handling code in spl-proc.c. We can now always assume the top level directory will be there. 5) --enable-debug-callb removed. This never really did anything, it was put in provisionally because it might have been needed. It turns out it was not so I am just removing it to prevent confusion.	2009-10-30 13:58:51 -07:00
Brian Behlendorf	5e9b5d832b	Use Linux atomic primitives by default. Previously Solaris style atomic primitives were implemented simply by wrapping the desired operation in a global spinlock. This was easy to implement at the time when I wasn't 100% sure I could safely layer the Solaris atomic primatives on the Linux counterparts. It however was likely not good for performance. After more investigation however it does appear the Solaris primitives can be layered on Linux's fairly safely. The Linux atomic_t type really just wraps a long so we can simply cast the Solaris unsigned value to either a atomic_t or atomic64_t. The only lingering problem for both implementations is that Solaris provides no atomic read function. This means reading a 64-bit value on a 32-bit arch can (and will) result in word breaking. I was very concerned about this initially, but upon further reflection it is a limitation of the Solaris API. So really we are just being bug-for-bug compatible here. With this change the default implementation is layered on top of Linux atomic types. However, because we're assuming a lot about the internal implementation of those types I've made it easy to fall-back to the generic approach. Simply build with --enable-atomic_spinlocks if issues are encountered with the new implementation.	2009-10-30 10:55:25 -07:00
Brian Behlendorf	2b5adaf18f	I should not have removed these, they are important.	2009-10-27 16:17:06 -07:00
Brian Behlendorf	4bd577d069	Rebase cmn_err on vcmn_err and don't warn about missing \n The cmn_err/vcmn_err functions are layered on top of the debug system which usually expects a newline at the end. However, there really doesn't need to be a newline there and there in fact should not be for the CE_CONT case so let's just drop the warning. Also we make a half-hearted attempt to handle a leading ! which means only send it to the syslog not the console. In this case we just send to the the debug logs and not the console.	2009-10-27 16:13:35 -07:00
Brian Behlendorf	39ab544079	Use kobject_set_name() for increased portability. As of 2.6.25 kobj->k_name was replaced with kobj->name. Some distros such as RHEL5 (2.6.18) add a patch to prevent this from being a problem but other older distros such as SLES10 (2.6.16) have not. To avoid the whole issue I'm updating the code to use kobject_set_name() which does what I want and has existed all the way back to 2.6.11.	2009-10-02 16:21:59 -07:00
Brian Behlendorf	51a727e90f	Set cwd to '/' for the process executing insmod. Ricardo has pointed out that under Solaris the cwd is set to '/' during module load, while under Linux it is set to the callers cwd. To handle this cleanly I've reworked the module _init()/_exit() macros so they call a _setup()/_cleanup() function when any SPL dependent module is loaded or unloaded. This gives us a chance to perform any needed modification of the process, in this case changing the cwd. It also handily provides a way to avoid creating wrapper init()/exit() functions because the Solaris and Linux prototypes differ slightly. All dependent modules should now call the spl helper macros spl_module_{init,exit}() instead of the native linux versions. Unfortunately, it appears that under Linux there has been no consistent API in the kernel to set the cwd in a module. Because of this I have had to add more autoconf magic than I'd like. However, what I have done is correct and has been tested on RHEL5, SLES11, FC11, and CHAOS kernels. In addition, I have change the rootdir type from a 'void ' to the correct 'vnode_t ' type. And I've set rootdir to a non-NULL value.	2009-10-01 16:06:15 -07:00
Brian Behlendorf	4d54fdee1d	Reimplement mutexs for Linux lock profiling/analysis For a generic explanation of why mutexs needed to be reimplemented to work with the kernel lock profiling see commits: `e811949a57` and `d28db80fd0` The specific changes made to the mutex implemetation are as follows. The Linux mutex structure is now directly embedded in the kmutex_t. This allows a kmutex_t to be directly case to a mutex struct and passed directly to the Linux primative. Just like with the rwlocks it is critical that these functions be implemented as '#defines to ensure the location information is preserved. The preprocessor can then do a direct replacement of the Solaris primative with the linux primative. Just as with the rwlocks we need to track the lock owner. Here things get a little more interesting because depending on your kernel version, and how you've built your kernel Linux may already do this for you. If your running a 2.6.29 or newer kernel on a SMP system the lock owner will be tracked. This was added to Linux to support adaptive mutexs, more on that shortly. Alternately, your kernel might track the lock owner if you've set CONFIG_DEBUG_MUTEXES in the kernel build. If neither of the above things is true for your kernel the kmutex_t type will include and track the lock owner to ensure correct behavior. This is all handled by a new autoconf check called SPL_AC_MUTEX_OWNER. Concerning adaptive mutexs these are a very recent development and they did not make it in to either the latest FC11 of SLES11 kernels. Ideally, I'd love to see this kernel change appear in one of these distros because it does help performance. From Linux kernel commit: 0d66bf6d3514b35eb6897629059443132992dbd7 "Testing with Ingo's test-mutex application... gave a 345% boost for VFS scalability on my testbox" However, if you don't want to backport this change yourself you can still simply export the task_curr() symbol. The kmutex_t implementation will use this symbol when it's available to provide it's own adaptive mutexs. Finally, DEBUG_MUTEX support was removed including the proc handlers. This was done because now that we are cleanly integrated with the kernel profiling all this information and much much more is available in debug kernel builds. This code was now redundant. Update mutexs validated on: - SLES10 (ppc64) - SLES11 (x86_64) - CHAOS4.2 (x86_64) - RHEL5.3 (x86_64) - RHEL6 (x86_64) - FC11 (x86_64)	2009-09-25 14:47:01 -07:00
Brian Behlendorf	d28db80fd0	Update rwlocks to track owner to ensure correct semantics The behavior of RW__HELD was updated because it was not quite right. It is not sufficient to return non-zero when the lock is help, we must only do this when the current task in the holder. This means we need to track the lock owner which is not something tracked in a Linux semaphore. After some experimentation the solution I settled on was to embed the Linux semaphore at the start of a larger krwlock_t structure which includes the owner field. This maintains good performance and allows us to cleanly intergrate with the kernel lock analysis tools. My reasons: 1) By placing the Linux semaphore at the start of krwlock_t we can then simply cast krwlock_t to a rw_semaphore and pass that on to the linux kernel. This allows us to use '#defines so the preprocessor can do direct replacement of the Solaris primative with the linux equivilant. This is important because it then maintains the location information for each rw_ call point. 2) Additionally, by adding the owner to krwlock_t we can keep this needed extra information adjacent to the lock itself. This removes the need for a fancy lookup to get the owner which is optimal for performance. We can also leverage the existing spin lock in the semaphore to ensure owner is updated correctly. 3) All helper functions which do not need to strictly be implemented as a define to preserve location information can be done as a static inline function. 4) Adding the owner to krwlock_t allows us to remove all memory allocations done during lock initialization. This is good for all the obvious reasons, we do give up the ability to specific the lock name. The Linux profiling tools will stringify the lock name used in the code via the preprocessor and use that. Update rwlocks validated on: - SLES10 (ppc64) - SLES11 (x86_64) - CHAOS4.2 (x86_64) - RHEL5.3 (x86_64) - RHEL6 (x86_64) - FC11 (x86_64)	2009-09-25 14:14:35 -07:00
Brian Behlendorf	e811949a57	Reimplement rwlocks for Linux lock profiling/analysis. It turns out that the previous rwlock implementation worked well but did not integrate properly with the upstream kernel lock profiling/ analysis tools. This is a major problem since it would be awfully nice to be able to use the automatic lock checker and profiler. The problem is that the upstream lock tools use the pre-processor to create a lock class for each uniquely named locked. Since the rwsem was embedded in a wrapper structure the name was always the same. The effect was that we only ended up with one lock class for the entire SPL which caused the lock dependency checker to flag nearly everything as a possible deadlock. The solution was to directly map a krwlock to a Linux rwsem using a typedef there by eliminating the wrapper structure. This was not done initially because the rwsem implementation is specific to the arch. To fully implement the Solaris krwlock API using only the provided rwsem API is not possible. It can only be done by directly accessing some of the internal data member of the rwsem structure. For example, the Linux API provides a different function for dropping a reader vs writer lock. Whereas the Solaris API uses the same function and the caller does not pass in what type of lock it is. This means to properly drop the lock we need to determine if the lock is currently a reader or writer lock. Then we need to call the proper Linux API function. Unfortunately, there is no provided API for this so we must extracted this information directly from arch specific lock implementation. This is all do able, and what I did, but it does complicate things considerably. The good news is that in addition to the profiling benefits of this change. We may see performance improvements due to slightly reduced overhead when creating rwlocks and manipulating them. The only function I was forced to sacrafice was rw_owner() because this information is simply not stored anywhere in the rwsem. Luckily this appears not to be a commonly used function on Solaris, and it is my understanding it is mainly used for debugging anyway. In addition to the core rwlock changes, extensive updates were made to the rwlock regression tests. Each class of test was extended to provide more API coverage and to be more rigerous in checking for misbehavior. This is a pretty significant change and with that in mind I have been careful to validate it on several platforms before committing. The full SPLAT regression test suite was run numberous times on all of the following platforms. This includes various kernels ranging from 2.6.16 to 2.6.29. - SLES10 (ppc64) - SLES11 (x86_64) - CHAOS4.2 (x86_64) - RHEL5.3 (x86_64) - RHEL6 (x86_64) - FC11 (x86_64)	2009-09-18 16:09:47 -07:00
Brian Behlendorf	6ae7fef5b9	Update global_page_state() support for 2.6.29 kernels. Basically everything we need to monitor the global memory state of the system is now cleanly available via global_page_state(). The problem is that this interface is still fairly recent, and there has been one change in the page state enum which we need to handle. These changes basically boil down to the following: - If global_page_state() is available we should use it. Several autoconf checks have been added to detect the correct enum names. - If global_page_state() is not available check to see if get_zone_counts() symbol is available and use that. - If the get_zone_counts() symbol is not exported we have no choice be to dynamically aquire it at load time. This is an absolute last resort for old kernel which we don't want to patch to cleanly export the symbol.	2009-07-28 15:06:42 -07:00
Brian Behlendorf	6b09f73939	Remove get/put_task_struct as they are not available for SLES11 This interface is going away, and it's not as if most callers actually use crhold/crfree when working with credentials. So it'll be okay they we're not taking a reference on the task structure the odds of it going away while working with a credential and pretty small.	2009-07-28 15:04:21 -07:00
Brian Behlendorf	ec7d53e99a	Add basic credential support and splat tests. The previous credential implementation simply provided the needed types and a couple of dummy functions needed. This update correctly ties the basic Solaris credential API in to one of two Linux kernel APIs. Prior to 2.6.29 the linux kernel embeded all credentials in the task structure. For these kernels, we pass around the entire task struct as if it were the credential, then we use the helper functions to extract the credential related bits. As of 2.6.29 a new credential type was added which we can and do fairly cleanly layer on top of. Once again the helper functions nicely hide the implementation details from all callers. Three tests were added to the splat test framework to verify basic correctness. They should be extended as needed when need credential functions are added.	2009-07-27 17:18:59 -07:00
Brian Behlendorf	7064b767c2	Positive Solaris ioctl return codes need to be negated for use by libc	2009-07-23 16:14:52 -07:00
Brian Behlendorf	3c9ce2bf69	Allow kmem or vmem based slab for slab_lock and slab_overcommit tests. The slab_overcommit test case could hang on a system with fragmented memory because it was creating a kmem based slab with 256K objects. To avoid this I've removed the KMC_KMEM flag which allows the slab to decide if it should be kmem or vmem backed based on the object side. The slab_lock test shares this code and will also be effected. But the point of these two tests is to stress cache locking and memory overcommit, the type of slab is not critical. In fact, allowing the slab to do the default smart thing is preferable.	2009-07-23 13:50:53 -07:00
Brian Behlendorf	2141116167	The HAVE_PATH_IN_NAMEIDATA compat macros should have been used here.	2009-07-22 14:28:19 -07:00
Brian Behlendorf	78d6de97bd	Register a basic compat ioctl handler (32 vs 64 bit compat) Simply pass the ioctl on to the normal handler. If the ioctl helper macros are used correctly this should be safe as they will handle the packing/unpacking of the data encoded in the ioctl command. And actually, if the caller does not use the IO* macros at all, and just passes small values, it will probably be OK as well. We only get in to trouble if they try and use the upper 32-bits. Endianness is not really a concern here, we we are pretty much assumed they user and kernel will match.	2009-07-21 10:13:58 -07:00
Ricardo M. Correia	e004f04c8b	Prevent integer overflow after ~164 days of uptime. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2009-07-14 15:23:25 -07:00
Brian Behlendorf	d3126abe75	Add ddi_copyin/ddi_copyout support for fake kernel originated ioctls.	2009-07-10 10:56:32 -07:00
Brian Behlendorf	915404bd50	Add basic support for TASKQ_THREADS_CPU_PCT taskq flag which is used to scale the number of threads based on the number of online CPUs. As CPUs are added/removed we should rescale the thread count appropriately, but currently this is only done at create.	2009-07-09 10:07:52 -07:00
Brian Behlendorf	c0517c35d2	Use do_div on older kernel where do_div64 doesn't exist.	2009-06-26 13:10:52 -07:00

1 2 3 4

191 Commits