mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-01-25 10:12:13 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	68a829b29d	Remove credential configure checks. The groups_search() function was never exported by a mainline kernel therefore we drop this compatibility code and always provide our own implementation. Additionally, the cred_t structure has been available since 2.6.29 so there is no longer a need to maintain compatibility code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:51 -07:00
Brian Behlendorf	44778f4110	Remove kallsyms_lookup_name() wrapper After the removable of get_vmalloc_info(), the unused global memory variables, and the optional dcache/icache shrinkers there is no longer a need for the kallsyms compatibility code. This allows us to eliminate another brittle area of the code by removing the kernel upcall this functionality depended on for older kernels. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:51 -07:00
Brian Behlendorf	89a461e70c	Remove shrink_{i,d}node_cache() wrappers This is optional functionality which may or may not be useful to ZFS when using older kernels. It is never a hard requirement. Therefore this functionality is being removed from the SPL and a simpler slimmed down version will be added to ZFS. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:51 -07:00
Brian Behlendorf	8bbbe46f86	Remove global memory variables Platforms such as Illumos and FreeBSD have historically provided global variables which summerize the memory state of a system. Linux on the otherhand doesn't expose any of this information to kernel modules and uses entirely different mechanisms for memory management. In order to simplify the original ZFS port to Linux these global variables were emulated by the SPL for the benefit of ZFS. As ZoL has matured over the years it has moved steadily away from these interfaces and now no longer depends on them at all. Therefore, this patch completely removes the global variables availrmem, minfree, desfree, lotsfree, needfree, swapfs_minfree, and swapfs_reserve. This greatly simplifies the memory management code and eliminates a common area of confusion. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:51 -07:00
Brian Behlendorf	e1310afae3	Remove get_vmalloc_info() wrapper The get_vmalloc_info() function was used to back the vmem_size() function. This was always problematic and resulted in brittle code because the kernel never provided a clean interface for modules. However, it turns out that the only caller of this function in ZFS uses it to determine the total virtual address space size. This can be determined easily without get_vmalloc_info() so vmem_size() has been updated to take this approach which allows us to shed the get_vmalloc_info() dependency. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:51 -07:00
Brian Behlendorf	50e41ab1e1	Remove on_each_cpu() wrapper The on_each_cpu() function has been available since Linux 2.6.27. There is no longer a need to maintain this compatibility code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:51 -07:00
Brian Behlendorf	b652d169b0	Remove mutex_lock_nested() wrapper The mutex_lock_nested() function has been available since Linux 2.6.18. There is no longer a need to maintain this compatibility code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:51 -07:00
Brian Behlendorf	9f36cace41	Remove kmalloc_node() compatibility code The kmalloc_node() function has been available since Linux 2.6.12. There is no longer a need to maintain this compatibility code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:51 -07:00
Brian Behlendorf	d227e114ed	Remove linux/uaccess.h header check The uaccess header has been available in the same location since Linux 2.6.18. There is no longer a need to maintain this compatibility code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:51 -07:00
Brian Behlendorf	e5b65e3179	Remove uintptr_t typedef The uintptr_t typedef has been available since Linux 2.6.24. There is no longer a need to maintain this compatibility code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:50 -07:00
Brian Behlendorf	ff0582cb39	Remove atomic64_xchg() wrappers The atomic64_xchg() and atomic64_cmpxchg() functions have been available since Linux 2.6.24. There is no longer a need to maintain this compatibility code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:50 -07:00
Brian Behlendorf	82f2f1a3af	Simplify the time compatibility wrappers Many of the time functions had grown overly complex in order to handle kernel compatibility issues. However, as of Linux 2.6.26 all the required functionality is available. This allows us to retire numerous configure checks and greatly simplify the time compatibility wrappers. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:50 -07:00
Brian Behlendorf	87f8055a91	Map highbit64() to fls64() The fls64() function has been available since Linux 2.6.16 and it should be used to implemented highbit64(). This allows us to provide an optimized implementation and simplify the code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:50 -07:00
Brian Behlendorf	bb4dee3df2	Remove utsname() wrapper There is no longer a need to wrap this because utsname() is provided by the kernel and can be called directly. This will require a small change in the ZFS code because utsname is expected to be a global structure and not a function. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:41 -07:00
Brian Behlendorf	a80d69caf0	Remove adaptive mutex implementation Since the Linux 2.6.29 kernel all mutexes have been adaptive mutexs. There is no longer any point in keeping this code so it is being removed to simplify the code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:07:28 -07:00
Brian Behlendorf	3a92530563	Update code to use misc_register()/misc_deregister() When the SPL was originally written it was designed to use the device_create() and device_destroy() functions. Unfortunately, these functions changed considerably over the years making them difficult to rely on. As it turns out a better choice would have been to use the misc_register()/misc_deregister() functions. This interface for registering character devices has remained stable, is simple, and provides everything we need. Therefore the code has been reworked to use this interface. The higher level ZFS code has always depended on these same interfaces so this is also as a step towards minimizing our kernel dependencies. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:07:28 -07:00
Brian Behlendorf	6203295438	Make license compatibility checks consistent Apply the license specified in the META file to ensure the compatibility checks are all performed consistently. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:07:28 -07:00
stf	f9bde4f74b	Avoid PAGESIZE redefinition Add #ifndef PAGESIZE to avoid redefinition warning on platforms where this value is already provided. Signed-off-by: stf <s@ctrlc.hu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #382	2014-08-18 08:55:41 -07:00
Richard Yao	ec18fe3ce8	Cleanup vn_rename() and vn_remove() zfsonlinux/spl#bcb15891ab394e11615eee08bba1fd85ac32e158 implemented Linux 3.6+ support by adding duplicate vn_rename and vn_remove functions. The new ones were cleaner, but the duplicate functions made the codebase less maintainable. This adds some compatibility shims that allow us to retire the older vn_rename and vn_remove in favor of the new ones on old kernels. The result is a net 143 line reduction in lines of code and a cleaner codebase. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #370	2014-08-13 16:25:44 -07:00
Ned Bass	2fc44f66ec	Linux 3.17 compat: remove wait_on_bit action function Linux kernel 3.17 removes the action function argument from wait_on_bit(). Add autoconf test and compatibility macro to support the new interface. The former "wait_on_bit" interface required an 'action' function to be provided which does the actual waiting. There were over 20 such functions in the kernel, many of them identical, though most cases can be satisfied by one of just two functions: one which uses io_schedule() and one which just uses schedule(). This API change was made to consolidate all of those redundant wait functions. References: torvalds/linux@7431620 Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #378	2014-08-11 14:17:00 -07:00
Tim Chase	2bf35fb754	Add atomic_swap_32() and atomic_swap_64() The atomic_swap_32() function maps to atomic_xchg(), and the atomic_swap_64() function maps to atomic64_xchg(). Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #377	2014-07-28 14:19:24 -07:00
Tim Chase	7f23e00109	Add functions and macros as used upstream. Added highbit64() and howmany() which are used in recent upstream code. Both highbit() and highbit64() should at some point be re-factored to use the optimized fls() and fls64() functions. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #363	2014-07-22 09:47:48 -07:00
Tim Chase	f6a869614e	Safer debugging and assertion macros. Spl's debugging and assertion macros macro used the typical do/while(0) form for if/else friendliness, however, this limits their use in contexts where a do loop is not valid; such as within another multi-statement style macro. The following macros have been converted to not use do/while(0): PANIC, ASSERT, ASSERTF, VERIFY, VERIFY3_IMPL PANIC has been converted to a wrapper around the new spl_PANIC() function. The other macros have been converted to use the "&&" operator for the branch-predicition conditional and also to use spl_PANIC(). The __ASSERT() macro was not touched. It is only used by the debugging infrastructure and that code, including this macro, will be retired when the tracepoint patches are merged. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #367	2014-07-01 15:14:43 -07:00
Brian Behlendorf	376dc35e22	Add spl_kmem_cache_reclaim module option The correct behavior for all registered shrinkers is to return the number of objects in their cache. In theory this allows the Linux VM to balance memory reclaim across all registered caches. In commit `b9b3715` this behavior was disabled in favor of returning -1 which notifies the VM that no additional objects are available for reclaim. This was done as a workaround to resolve thrashing in shrink_slabs() which could occur when memory was low and numerous core where in reclaim. Unfortunately, this has been observed to increase the likelihood of OOM events when SPL slab consumers are responsible for consuming the majority of memory. Therefore, this patch makes this behavior tunable. Setting the spl_kmem_cache_reclaim module option to 0x1 will result in the shrinker only being called once. This is the default behavior. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov> Closes #358	2014-05-22 10:30:12 -07:00
Brian Behlendorf	a073aeb060	Add KMC_SLAB cache type For small objects the Linux slab allocator has several advantages over its counterpart in the SPL. These include: 1) It is more memory-efficient and packs objects more tightly. 2) It is continually tuned to maximize performance. Therefore it makes sense to layer the SPLs slab allocator on top of the Linux slab allocator. This allows us to leverage the advantages above while preserving the Illumos semantics we depend on. However, there are some things we need to be careful of: 1) The Linux slab allocator was never designed to work well with large objects. Because the SPL slab must still handle this use case a cut off limit was added to transition from Linux slab backed objects to kmem or vmem backed slabs. spl_kmem_cache_slab_limit - Objects less than or equal to this size in bytes will be backed by the Linux slab. By default this value is zero which disables the Linux slab functionality. Reasonable values for this cut off limit are in the range of 4096-16386 bytes. spl_kmem_cache_kmem_limit - Objects less than or equal to this size in bytes will be backed by a kmem slab. Objects over this size will be vmem backed instead. This value defaults to 1/8 a page, or 512 bytes on an x86_64 architecture. 2) Be aware that using the Linux slab may inadvertently introduce new deadlocks. Care has been taken previously to ensure that all allocations which occur in the write path use GFP_NOIO. However, there may be internal allocations performed in the Linux slab which do not honor these flags. If this is the case a deadlock may occur. The path forward is definitely to start relying on the Linux slab. But for that to happen we need to start building confidence that there aren't any unexpected surprises lurking for us. And ideally need to move completely away from using the SPLs slab for large memory allocations. This patch is a first step. NOTES: 1) The KMC_NOMAGAZINE flag was leveraged to support the Linux slab backed caches but it is not supported for kmem/vmem backed caches. 2) Regardless of the spl_kmem_cache_*_limit settings a cache may be explicitly set to a given type by passed the KMC_KMEM, KMC_VMEM, or KMC_SLAB flags during cache creation. 3) The constructors, destructors, and reclaim callbacks are all functional and will be called regardless of the cache type. 4) KMC_SLAB caches will not appear in /proc/spl/kmem/slab due to the issues involved in presenting correct object accounting. Instead they will appear in /proc/slabinfo under the same names. 5) Several kmem SPLAT tests needed to be fixed because they relied incorrectly on internal kmem slab accounting. With the updated test cases all the SPLAT tests pass as expected. 6) An autoconf test was added to ensure that the __GFP_COMP flag was correctly added to the default flags used when allocating a slab. This is required to ensure all pages in higher order slabs are properly refcounted, see `ae16ed9`. 7) When using the SLUB allocator there is no need to attempt to set the __GFP_COMP flag. This has been the default behavior for the SLUB since Linux 2.6.25. 8) When using the SLUB it may be desirable to set the slub_nomerge kernel parameter to prevent caches from being merged. Original-patch-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: DHE <git@dehacked.net> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Closes #356	2014-05-22 10:28:01 -07:00
Chunwei Chen	1538f4b6e3	Linux 3.15 compat: NICE_TO_PRIO and PRIO_TO_NICE These macro's were exposed to make them available to other parts of the kernel and modules. References: torvalds/linux@6b6350f Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #355	2014-05-07 13:38:03 -07:00
Jorgen Lundman	d6e6e4a98e	Add support for aarch64 (ARMv8) Using the ARM reference simulation (fast model foundation v8) I cross compiled spl and zfs, to confirm it works on ARMv8 (64 bit arm architecture, called aarch64 in Linux). As it is based on previous ARM porting, the resulting patch is disappointingly small, there was very little to do. The code fixes the compile issues and has light testing done. Signed-off-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #351	2014-04-25 15:25:32 -07:00
Chunwei Chen	545e9ac00a	Add ddi_time_after and friends When comparing times gotten from ddi_get_lbolt, we have to take account of wrap around of jiffies. Therefore, we cannot use 't1 < t2'. Instead we should use 't1 - t2 < 0'. This patch add ddi_time_after and friends to address this issue. They have strict type restriction, clock_t for vanilla and int64_t for 64 version, to prevent type conversion from screwing things. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #335	2014-04-14 09:32:01 -07:00
Yuxuan Shui	6c48cd8ac2	This patch add a CTASSERT macro for compile time assertion. This macro makes the compile to spit "mixed definition and code" warning, I can't find a way to avoid it. This patch lays some groundwork for the persistent l2arc feature. See https://www.illumos.org/issues/3525. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #303	2014-04-14 09:28:53 -07:00
Richard Yao	acf0ade362	Simplify hostid logic There is plenty of compatibility code for a hw_hostid that isn't used by anything. At the same time, there are apparently issues with the current hostid logic. coredumb in #zfsonlinux on freenode reported that Fedora 17 changes its hostid on every boot, which required force importing his pool. A suggestion by wca was to adopt FreeBSD's behavior, where it treats hostid as zero if /etc/hostid does not exist Adopting FreeBSD's behavior permits us to eliminate plenty of code, including a userland helper that invokes the system's hostid as a fallback. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #224	2014-04-14 09:04:41 -07:00
Tim Chase	ed650dee76	De-inline spl_kthread_create(). The function was defined as a static inline with variable arguments which causes gcc to generate errors on some distros. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #346	2014-04-09 19:17:12 -07:00
Tim Chase	17a527cb0f	Support post-3.13 kthread_create() semantics. Provide spl_kthread_create() as a wrapper to the kernel's kthread_create() to provide pre-3.13 semantics. Re-try if the call is interrupted or if it would have returned -ENOMEM. Otherwise return NULL. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #339	2014-04-08 12:44:42 -07:00
marku89	d58a99af2f	Define the needed ISA types for Sparc Add the minimum required ISA types to support the Sparc architecture. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: marku89 <mar42@kola.li> Closes #317	2014-01-09 15:55:32 -08:00
Brian Behlendorf	2f117de8be	Include linux/vmalloc.h for ARM and Sparc Related to issue #257 which added Linux 3.10 compatibility. For ARM and Sparc architectures we must explicitly include the <linux/vmalloc.h> header to ensure the vmalloc_info structure is always defined when available. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #257 Closes #291	2014-01-07 10:45:39 -08:00
Ned Bass	184c687387	Emulate illumos interface cv_timedwait_hires() Needed for Illumos #3582. This interface is supposed to support a variable-resolution timeout with nanosecond granularity. This implementation rounds up to microsecond resolution, as nanosecond- precision timing is rarely needed for real-world performance tuning and may incur unnecessary busy-waiting. usleep_range() is used if available, otherwise udelay() or msleep() are used depending on the length of the delay interval. Add flags from sys/callo.h as these are used to control the behavior of cv_timedwait_hires(). Specifically, CALLOUT_FLAG_ABSOLUTE Normally, the expiration passed to the timeout API functions is an expiration interval. If this flag is specified, then it is interpreted as the expiration time itself. CALLOUT_FLAG_ROUNDUP Roundup the expiration time to the next resolution boundary. If this flag is not specified, the expiration time is rounded down. References: https://www.illumos.org/issues/3582 illumos/illumos-gate@0689f76 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #304	2013-11-04 09:49:24 -08:00
Ned Bass	f483a97a41	3537 add kstat_waitq_enter and friends These kstat interfaces are required to port "Illumos #3537 want pool io kstats" to ZFS on Linux. kstat_waitq_enter() kstat_waitq_exit() kstat_runq_enter() kstat_runq_exit() Additionally, zero out the ks_data buffer in __kstat_create() so that the kstat_io_t counters are initialized to zero. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-10-25 13:41:52 -07:00
Cyril Plisko	ffbf0e57c2	Kstat to use private lock by default While porting Illumos #3537 I found that ks_lock member of kstat_t structure is different between Illumos and SPL. It is a pointer to the kmutex_t in Illumos, but the mutex lock itself in SPL. Apparently Illumos kstat API allows consumer to override the lock if required. With SPL implementation it is not possible anymore. Things were alright until the first attempt to actually override the lock. Porting of Illumos #3537 introduced such code for the first time. In order to provide the Solaris/Illumos like functionality we: 1. convert ks_lock to "kmutex_t *ks_lock" 2. create a new field "kmutex_t ks_private_lock" 3. On kstat_create() ks_lock = &ks_private_lock Thus if consumer doesn't care we still have our internal lock in use. If, however, consumer does care she has a chance to set ks_lock to anything else before calling kstat_install(). The rest of the code will use ks_lock regardless of its origin. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #286	2013-10-25 13:41:30 -07:00
Brian Behlendorf	ce07767f79	Revert "Add KSTAT_TYPE_TXG type" This reverts commit `dba79fcbf2` in favor of using the generic KSTAT_TYPE_RAW callbacks. The advantage of this approach is that arbitrary types can be added without the need to add them to the SPL. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #296	2013-10-16 14:48:35 -07:00
Prakash Surya	09f38b7e60	Add wrappers for accessing PID and command info This change adds simple wrappers for accessing a thread's PID and command character string. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #296	2013-10-16 14:48:35 -07:00
Prakash Surya	56d40a686b	Add callbacks for displaying KSTAT_TYPE_RAW kstats The current implementation for displaying kstats of type KSTAT_TYPE_RAW is rather crude. This patch attempts to enhance this handling by allowing a kstat user to register formatting callbacks which can optionally be used. The callbacks allow the user to implement functions for interpreting their data and transposing it into a character buffer. This buffer, containing a string representation of the raw data, is then be displayed through the current /proc textual interface. Additionally the kstats are made writable because it's now possible to provide a useful handler via the existing ks_update() interface. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #296	2013-10-16 14:48:35 -07:00
Richard Yao	4768c0d0a6	Define SET_ERROR() Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-10-09 14:20:46 -07:00
Ned Bass	3ecf2d2bb6	Add kpreempt() compatibility macro This is needed for the Illumos #4045 write throttle patch. It is used in the arc eviction code to avoid blocking all arc activity by sitting on arcs_mtx too long. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #286	2013-10-09 13:52:55 -07:00
Richard Yao	f7fd6ddd96	Linux 3.8 compat: Use kuid_t/kgid_t when required When CONFIG_UIDGID_STRICT_TYPE_CHECKS is enabled uid_t/git_t are replaced by kuid_t/kgid_t, which are structures instead of integral types. This causes any code that uses an integral type to fail to build. The User Namespace functionality introduced in Linux 3.8 requires CONFIG_UIDGID_STRICT_TYPE_CHECKS, so we could not build against any kernel that supported it. We resolve this by converting between the new kuid_t/kgid_t structures and the original uid_t/gid_t types. Original-patch-by: DHE Rewrite-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #260	2013-08-09 10:09:29 -07:00
Richard Yao	ba06298072	Linux 3.11 compat: Replace num_physpages with totalram_pages num_physpages was removed by torvalds/linux@cfa11e08ed, so lets replace it with totalram_pages. This is a bug fix as much as it is a compatibility fix because num_physpages did not reflect the number of pages actually available to the kernel: http://lkml.indiana.edu/hypermail/linux/kernel/0908.2/01001.html Also, there are known issues with memory calculations when ZFS is in a Xen dom0. There is a chance that using totalram_pages could resolve them. This conjecture is untested at the time of writing. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #273	2013-08-08 09:14:29 -07:00
Richard Yao	f2a745c41d	Linux 3.10 compat: Do not rely on struct proc_dir_entry definition Linux kernel commit torvalds/linux#59d8053f moved the definition of struct proc_dir_entry from include/linux/proc_fs.h to the private header fs/proc/internal.h. The SPL relied on that to map Solaris' kstat to entries in /proc/spl/kstat. Since the proc_dir_entry structure is now private the only safe thing to do is wrap the opaque proc handle with our own structure. This actually ends up simplify the code and is good because it moves us away from depending on implementation details of /proc. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #257	2013-07-08 15:25:18 -07:00
Yuxuan Shui	c02ab72fb9	Linux 3.10 compat: struct vmalloc_info moved Linux kernel commmit torvalds/linux@db3808c1 moved the vmalloc_info structure from a private to a public header. Now that it's available for kernel modules use it. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #257	2013-07-08 15:09:20 -07:00
Brian Behlendorf	ab0fdfef52	Fix ASSERT0 and VERIFY0 macro typo Ensure the value is cast to a 'long long' for printing purposes. The expectation is that ASSERT0/VERIFY0 are mostly used for validating return values and thus may commonly be negative. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #246	2013-06-21 15:38:46 -07:00
Brian Behlendorf	1c6d149feb	Add ASSERT0 and VERIFY0 macros The Illumos code introduced the ASSERT0 and VERIFY0 macros which are to be used instead of ASSERT3S(x, ==, 0) and VERIFY3S(x, ==, 0). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Madhav Suresh <madhav.suresh@delphix.com> Closes #246	2013-06-18 11:41:55 -07:00
Brian Behlendorf	ab59be7bc7	Fix delay() Somewhat amazingly it went unnoticed that the delay() function doesn't actually cause the task to block. Since the task state is never changed from TASK_RUNNING before schedule_timeout() the scheduler allows to task to continue running without any delay. Using schedule_timeout_interruptible() resolves the issue by correctly setting TASK_UNINTERRUPTIBLE. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-05-01 16:35:47 -07:00
Brian Behlendorf	f6437b60c2	Add msec/usec/nsec to tick convertors Add wrappers for the Solaris MSEC_TO_TICK, USEC_TO_TICK, and NSEC_TO_TICK conversion functions. They are mapped directly to their Linux counterparts with the exception of NSEC_TO_TICK can cannot use usecs_to_jiffies() because it is not exported by the kernel. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-05-01 12:07:56 -07:00
Brian Behlendorf	4a6d8d2c3e	Change spl-kmod-devel install path Install the common spl kernel development headers under /usr/src/spl-<version>/ rather than in a kernel specific directory. The kernel specific build products such as spl_config.h and Modules.symvers are left installed under /usr/src/spl-<version>/<kernel>. This was done to be consistent with where dkms expects kernel module source to be packaged. It also allows for a common spl-kmod-devel package which includes the headers, and per-kernel spl-kmod-devel-<kernel> packages. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-03-14 12:01:05 -07:00
Richard Yao	10087fe1fa	Linux 3.9 compat: Include linux/sched/rt.h Linux 3.9 reorganized sched.h, splitting it into numerous files. torvalds/linux@8bd75c77b7 moved MAX_PRIO and MAX_RT_PRIO to linux/sched/rt.h. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-03-14 10:43:19 -07:00
Ned Bass	3d6af2dd6d	Refresh links to web site Update links to refer to the official ZFS on Linux website instead of @behlendorf's personal fork on github. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-03-04 19:09:34 -08:00
Brian Behlendorf	d1142fbffe	Remove custom install-data-local for headers Rather than use a custom install target it is cleaner to define a 'kerneldir' and set 'kernel_HEADERS' appropriately. This allows us to leverage the standing configure install support. Additionally, I took this opertunity add the missing make files to the include subdirectories. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-03-01 16:55:06 -08:00
Eric Dillmann	3cbfd259b7	Define BE_IN16 & BE_IN32 for lz4 compression The new lz4 compression algorithm, zfsonlinux/zfs@9759c60, requires the generic BE_IN16 and BE_IN32 functions. These are added to the SPL for other consumers to take advantage of. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-29 09:30:23 -08:00
Brian Behlendorf	0936c3449f	Add spl_kmem_cache_expire module option Cache aging was implemented because it was part of the default Solaris kmem_cache behavior. The idea is that per-cpu objects which haven't been accessed in several seconds should be returned to the cache. On the other hand Linux slabs never move objects back to the slabs unless there is memory pressure on the system. This behavior is now configurable through the 'spl_kmem_cache_expire' module option. The value is a bit mask with the following meaning. 0x1 - Solaris style cache aging eviction is enabled. 0x2 - Linux style low memory eviction is enabled. Both methods may be safely enabled simultaneously, but by default both are disabled. It has never been clear if the kmem cache aging (which has been around from day one) actually does any good. It has however been the source of numerous bugs so I wouldn't mind retiring it entirely. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes zfsonlinux/zfs#1227 Closes #210	2013-01-28 09:34:12 -08:00
Matt Johnston	46a75aadb7	Add cv_wait_io() to account I/O time Under Linux when a task is waiting on I/O it should call the io_schedule() function for proper accounting. The Solaris cv_wait() function provides no way to specify what the cv is waiting on therefore cv_wait_io() is introduced. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #206	2013-01-07 10:29:26 -08:00
Brian Behlendorf	eb0be2ed46	Removed SPL_AC_3ARGS_INIT_WORK check All consumers of the kernel delayed work queues have been shifted over to rely on the taskq implementation. This compatibility code can now be removed. Any new callers which need this functionality should use the taskq interfaces for delayed work items. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-12-12 09:57:10 -08:00
Brian Behlendorf	33e94ef1dd	kmem-cache: Use a taskq for async allocations Shift the asynchronous allocations over to use the taskq interfaces. This allows us to abandon the kernels delayed work queue interface and all the compatibility code it requires. This code never actually used the delay functionality it was just done this way to leverage the existing compatibility code. All that is required is a thread context to perform the allocation in. The only thing clever in this change is that we take advantage of the preallocated task queue entries to avoid a memory allocation. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-12-12 09:56:54 -08:00
Brian Behlendorf	a10287e00d	kmem-cache: Use taskqs for ageing Shift the cache and magazine ageing functionality over to the new delayed taskq interfaces. This allows us to abandon the kernels delayed work queue interface and all the compatibility code it requires. However, the delayed taskq interface does not allow us to schedule a task for a specfic cpu so the ageing code was slightly reworked. The magazine ageing delay has been directly linked to the cache ageing function. The spl_cache_age() function invokes on_each_cpu() in order to run spl_magazine_age() on each cpu. It then blocks waiting for them to complete and promptly reclaims any free slabs. When restructing the code wasn't the primary goal I think the new code is far more understable and maintainable. It also should help minimize magazine thrashing because free slabs are immediately released after the magazine is aged. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-12-12 09:56:54 -08:00
Brian Behlendorf	d9acd930b5	taskq delay/cancel functionality Add the ability to dispatch a delayed task to a taskq. The desired behavior is for the task to be queued but not executed by a worker thread until the expiration time is reached. To achieve this two new functions were added. * taskq_dispatch_delay() - This function behaves exactly like taskq_dispatch() however it takes a third 'expire_time' argument. The caller should pass the desired time the task should be executed as an absolute value in jiffies. The task is guarenteed not to run before this time, it may run slightly latter if all the worker threads are busy. * taskq_cancel_id() - Given a task id attempt to cancel the task before it gets executed. This is primarily useful for canceling delay tasks but can be used for canceling any previously dispatched task. There are three possible return values. 0 - The task was found and canceled before it was executed. ENOENT - The task was not found, either it was already run or an invalid task id was supplied by the caller. EBUSY - The task is currently executing any may not be canceled. This function will block until the task has been completed. * taskq_wait_all() - The taskq_wait_id() function was renamed taskq_wait_all() to more clearly reflect its actual behavior. It is only curreny used by the splat taskq regression tests. * taskq_wait_id() - Historically, the only difference between this function and taskq_wait() was that you passed the task id. In both functions you would block until ALL lower task ids which executed. This was semantically correct but could be very slow particularly if there were delay tasks submitted. To better accomidate the delay tasks this function was reimplemnted. It will now only block until the passed task id has been completed. This is actually a fairly low risk change for a few reasons. * Only new ZFS callers will make use of the new interfaces and very little common code was changed to support the new functions. * The existing taskq_wait() implementation was not changed just slightly refactored. * The newly optimized taskq_wait_id() implementation was never used by ZFS we can't accidentally introduce a new bug there. NOTE: This functionality does not exist in the Illumos taskqs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-12-12 09:54:07 -08:00
Brian Behlendorf	aed8671cb0	taskq style, remove #define wrappers When the taskq implementation was originally written I wrapped all the API functions in #define's. This was done as a preventative measure to ensure that a taskq symbol never conflicted with an existing kernel symbol. However, in practice the taskq symbols never conflicted. The only major conflicts occured with the kmem cache API. Since this added layer of obfuscation never bought us anything for the taskq's I'm removing it. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-12-12 09:54:07 -08:00
Brian Behlendorf	472a34caff	taskq style, convert spaces to soft tabs Update the taskq implementation to conform with the style used throughout the rest of the code. There are no functional changes in this commit. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-12-12 09:54:07 -08:00
Brian Behlendorf	ed3163484d	Track emergency object in rbtree In the initial implementation emergency objects were tracked on a per-cache list. The assumption was that under normal operation we would never allocate more than a handful of these objects. So the cost of walking the list during free was expected to be negligible. However real world usage has shown that emergency objects tend to be allocated in batches. A deadlock will be detected and several thousand emergency objects will be allocated before the original blocked slab allocation can complete. Therefore the original list has been replaced by a red black tree which is sorted by the memory address of each allocated object. This bounds the worst case insertion and removal time to O(log n) which minimize contention on the assoicated spin lock. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-11-06 14:54:19 -08:00
Brian Behlendorf	165f13c33a	Improved vmem cached deadlock detection The entire goal of performing the slab allocations asynchronously is to be able to detect when a vmalloc() deadlocks. In this case, and only this case, do we want to start allocating emergency objects. The trick here is to minimize false positives because the overhead of tracking emergency objects is far higher than normal slab objects. With that goal in mind the code was reworked to be less sensitive to slow allocations by increasing the wait time. Once a cache is is marked deadlocked all subsequent allocations which can not be satisfied with existing cache objects will immediately allocate new emergency objects. This behavior persists until the asynchronous allocation completes and clears the deadlocked flag. The result of these tweaks is that far fewer emergency objects get created which is important because this minimizes the cost of releasing them latter in kmem_cache_free(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-11-06 14:54:15 -08:00
Brian Behlendorf	df870a697f	splat: Cleanup headers Restructure the the SPLAT headers such that each test only includes the minimal set of headers it requires. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-11-06 14:48:56 -08:00
Brian Behlendorf	d2733258d0	Condition variable reference counts Reference count every entry and exit from the condition variable functions: cv_wait(), cv_wait_timeout(), cv_signal(), cv_broadcast(). This allows us to safely block in cv_destroy() until all consumers have been scheduled and are no longer accessing the condition variable memory. In addition poison the magic value at the start of cv_destroy() to ensure there are never any new callers after cv_destroy() is called. The consumer is responsible for ensuring this never occurs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-11-06 14:48:55 -08:00
Brian Behlendorf	dba79fcbf2	Add KSTAT_TYPE_TXG type Add a new kstat type for tracking useful statistics about a TXG. The new KSTAT_TYPE_TXG type can be used to tracks the following statistics per-txg. txg - Unique txg number state - State (O)pen/(Q)uiescing/(S)yncing/(C)ommitted birth; - Creation time nread - Bytes read nwritten; - Bytes written reads - IOPs read writes - IOPs write open_time; - Length in nanoseconds the txg was open quiesce_time - Length in nanoseconds the txg was quiescing sync_time; - Length in nanoseconds the txg was syncing Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-11-02 15:17:40 -07:00
Brian Behlendorf	71c9f0b003	Make kstat.ks_update() callback atomic Move the kstat ks_update() callback under the ks_lock. This enables dynamically sized kstats without modification to the kstat API. * Create a kstat with the KSTAT_FLAG_VIRTUAL flag. * Register a ->ks_update() callback which does: o Frees any existing ks_data buffer. o Set ks_data_size to the kstat array size. o Set ks_data to an allocated buffer of size ks_data_size o Populate the array of buffers with the required data. The buffer allocated in the ks_update() callback is guaranteed to remain allocated and valid while the proc sequence handler iterates over the buffer. The lock will not be dropped until kstat_seq_stop() function is run making it safe for concurrent access. To allow the ks_update() callback to perform memory allocations the lock was changed to a mutex. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-10-23 09:36:19 -07:00
Etienne Dechamps	bbdc6ae495	Add interface for file hole punching. This adds an interface to "punch holes" (deallocate space) in VFS files. The interface is identical to the Solaris VOP_SPACE interface. This interface is necessary for TRIM support on file vdevs. This is implemented using Linux fallocate(FALLOC_FL_PUNCH_HOLE), which was introduced in 2.6.38. For a brief time before 2.6.38 this was done using the truncate_range inode operation, which was quickly deprecated. This patch only supports FALLOC_FL_PUNCH_HOLE. This adds support for the truncate_range() inode operation to VOP_SPACE() for file hole punching. This API is deprecated and removed in 3.5, so it's only useful for old kernels. On tmpfs, the truncate_range() inode operation translates to shmem_truncate_range(). Unfortunately, this function expects the end offset to be inclusive and aligned to the end of a page. If it is not, the kernel will stop with a BUG_ON(). This patch fixes the issue by adapting to the constraints set forth by shmem_truncate_range(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #168	2012-10-04 16:22:07 -07:00
Brian Behlendorf	9b51f21841	Remove TQ_SLEEP -> KM_SLEEP mapping When the taskq code was originally written it seemed like a good idea to simply map TQ_SLEEP to KM_SLEEP. Unfortunately, this assumed that the TQ_* flags would never confict with any of the Linux GFP_* flags. When adding the TQ_PUSHPAGE support in commit `cd5ca4b` this invariant was accidentally broken. Therefore to support TQ_PUSHPAGE, which is needed for Linux, and prevent any further confusion I have removed this direct mapping. The TQ_SLEEP, TQ_NOSLEEP, and TQ_PUSHPAGE are no longer defined in terms of their KM_* counterparts. Instead a simple mapping function is introduce to convert TQ_* -> KM_* where needed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #171	2012-09-12 11:41:42 -07:00
Brian Behlendorf	330fe010e4	Revert "Switch KM_SLEEP to KM_PUSHPAGE" This reverts commit `cd5ca4b2f8` due to conflicts in the higher TQ_ bits which caused incorrect behavior. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-09-12 10:07:48 -07:00
Brian Behlendorf	cb5c2acebb	Add KMC_NOEMERGENCY slab flag Provide a flag to disable the use of emergency objects for a specific kmem cache. There may be instances where under no circumstances should you kmalloc() an emergency object. For example, when you cache contains very large objects (>128k). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-09-07 14:27:03 -07:00
Etienne Dechamps	ac8ca67a88	Add DKIOCTRIM for TRIM support. See dechamps/zfs@cc6cd40ad7 for details. This harmless addition was merged to simplify testing the ZFS TRIM support patches. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #167	2012-09-02 14:22:01 -07:00
Brian Behlendorf	cd5ca4b2f8	Switch KM_SLEEP to KM_PUSHPAGE Under certain circumstances the following functions may be called in a context where KM_SLEEP is unsafe and can result in a deadlocked system. To avoid this problem the unconditional KM_SLEEPs are converted to KM_PUSHPAGEs. This will prevent them from attempting to initiate any I/O during direct reclaim. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-27 12:00:55 -07:00
Brian Behlendorf	3e904f40b4	Mutex ASSERT on self deadlock Generate an assertion if we're going to deadlock the system by attempting to acquire a mutex the process is already holding. There are currently no known instances of this under normal operation, but it _might_ be possible when using a ZVOL as a swap device. I want to ensure we catch this immediately if it were to occur. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-27 12:00:55 -07:00
Brian Behlendorf	eb0f407a2b	Add PF_NOFS debugging flag PF_NOFS is a per-process debug flag which is set in current->flags to detect when a process is performing an unsafe allocation. All tasks with PF_NOFS set must strictly use KM_PUSHPAGE for allocations because if they enter direct reclaim and initiate I/O they may deadlock. When debugging is disabled, any incorrect usage will be detected and a call stack with a warning will be printed to the console. The flags will then be automatically corrected to allow for safe execution. If debugging is enabled this will be treated as a fatal condition. To avoid any risk of conflicting with the existing PF_ flags. The PF_NOFS bit shadows the rarely used PF_MUTEX_TESTER bit. Only when CONFIG_RT_MUTEX_TESTER is not set, and we know this bit is unused, will the PF_NOFS bit be valid. Happily, most existing distributions ship a kernel with CONFIG_RT_MUTEX_TESTER disabled. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-27 12:00:55 -07:00
Brian Behlendorf	d47e664ad4	Revert "Add TASKQ_NORECLAIM flag" This reverts commit `372c257233`. The use of the PF_MEMALLOC flag was always a hack to work around memory reclaim deadlocks. Those issues are believed to be resolved so this workaround can be safely reverted. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-27 12:00:42 -07:00
Brian Behlendorf	e2dcc6e2b8	Emergency slab objects This patch is designed to resolve a deadlock which can occur with __vmalloc() based slabs. The issue is that the Linux kernel does not honor the flags passed to __vmalloc(). This makes it unsafe to use in a writeback context. Unfortunately, this is a use case ZFS depends on for correct operation. Fixing this issue in the upstream kernel was pursued and patches are available which resolve the issue. https://bugs.gentoo.org/show_bug.cgi?id=416685 However, these changes were rejected because upstream felt that using __vmalloc() in the context of writeback should never be done. Their solution was for us to rewrite parts of ZFS to accomidate the Linux VM. While that is probably the right long term solution, and it is something we want to pursue, it is not a trivial task and will likely destabilize the existing code. This work has been planned for the 0.7.0 release but in the meanwhile we want to improve the SPL slab implementation to accomidate this expected ZFS usage. This is accomplished by performing the __vmalloc() asynchronously in the context of a work queue. This doesn't prevent the posibility of the worker thread from deadlocking. However, the caller can now safely block on a wait queue for the slab allocation to complete. Normally this will occur in a reasonable amount of time and the caller will be woken up when the new slab is available,. The objects will then get cached in the per-cpu magazines and everything will proceed as usual. However, if the __vmalloc() deadlocks for the reasons described above, or is just very slow, then the callers on the wait queues will timeout out. When this rare situation occurs they will attempt to kmalloc() a single minimally sized object using the GFP_NOIO flags. This allocation will not deadlock because kmalloc() will honor the passed flags and the caller will be able to make forward progress. As long as forward progress can be maintained then even if the worker thread is deadlocked the critical thread will make progress. This will eventually allow the deadlocked worker thread to complete and normal operation will resume. These emergency allocations will likely be slow since they require contiguous pages. However, their use should be rare so the impact is expected to be minimal. If that turns out not to be the case in practice further optimizations are possible. One additional concern is if these emergency objects are long lived. Right now they are simply tracked on a list which must be walked when an object is freed. Is they accumulate on a system and the list grows freeing objects will become more expensive. This could be handled relatively easily by using a hash instead of a list, but that optimization (if needed) is left for a follow up patch. Additionally, these emeregency objects could be repacked in to existing slabs as objects are freed if the kmem_cache_set_move() functionality was implemented. See issue https://github.com/zfsonlinux/spl/issues/26 for full details. This work would also help reduce ZFS's memory fragmentation problems. The /proc/spl/kmem/slab file has had two new columns added at the end. The 'emerg' column reports the current number of these emergency objects in use for the cache, and the following 'max' column shows the historical worst case. These value should give us a good idea of how often these objects are needed. Based on these values under real use cases we can tune the default behavior. Lastly, as a side benefit using a single work queue for the slab allocations should reduce cpu contention on the global virtual address space lock. This should manifest itself as reduced cpu usage for the system. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-27 12:00:42 -07:00
Brian Behlendorf	c638e9ad04	Remove autotools products Remove all of the generated autotools products from the repository and update the .gitignore files accordingly. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue zfsonlinux/zfs#718	2012-08-27 11:46:23 -07:00
Prakash Surya	45324c7c41	Add kpreempt_[dis\|en]able macros in <sys/disp.h> To support preempt enabled kernels in ZFS on Linux, there are a couple places where the ZFS code needs to disable interrupts. This change adds the Solaris preempt functions and maps them to the equivalent ZFS functions, allowing the ZFS to make use of them. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #98	2012-08-24 15:18:38 -07:00
Prakash Surya	08850eddcb	Avoid calling smp_processor_id in spl_magazine_age The spl_magazine_age function had the implied assumption that it will remain on its current cpu through its execution. In order to support preempt enabled kernels, this assumption had to be removed. The spl_kmem_magazine structure now holds the cpu id of the cpu it is local to. This allows spl_magazine_age to use this field when scheduling work to be done by the magazine's local cpu. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #98	2012-08-24 09:43:22 -07:00
Richard Yao	6576a1a70d	Fix incorrect type in spl_kmem_cache_set_move() parameter A preprocessor definition renders this harmless. However, it is a good idea to change this to be consistent. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>	2012-08-01 16:35:18 -07:00
Richard Yao	973e8269bd	Constify memory management functions This prevents warnings in ZFS that were caused by changes necessary to support PaX patched kernels. When debugging is enabled, these warnings become build failures. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #131	2012-07-03 16:07:27 -07:00
Brian Behlendorf	44e406d712	PowerPC Compatibility Usage of get_current() is not supported across all architectures. The correct interface to use is the '#define current' which will map to the appropriate function, usually current_thread_info(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #119	2012-07-02 09:33:09 -07:00
Brian Behlendorf	38d31a1e57	Remove Solaris module emulation Originally I believed that these interfaces would be needed. However, in practice it turned out that it was more straight forward and maintainable to use the native Linux interfaces. As such, this is all dead code and can be safely removed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #109	2012-05-18 13:57:44 -07:00
Richard Yao	f90096c905	Modify KM_PUSHPAGE to use GFP_NOIO instead of GFP_NOFS The resolution of issue #31 made KM_PUSHPAGE imply GFP_NOFS. This was done to prevent situations where filesystem operations which are holding locks enter direct reclaim and attempt to reaquire those same locks. This clearly will result in a deadlock. This works for datasets which are implemented in terms for filesystem operations. But unfortunately, swapping to a zvol will encounter many of the same deadlocks and GFP_NOFS will not prevent this. As such, it is appropriate to extend KM_PUSHPAGE to use the broader GFP_NOIO mask to handle these non-filesystem cases. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue zfsonlinux/zfs#342 Closes #105	2012-05-07 12:05:27 -07:00
Prakash Surya	cef7605c34	Throttle number of freed slabs based on nr_to_scan Previously, the SPL tried to maintain Solaris semantics by freeing all available (empty) slabs from its slab caches when the shrinker was called. This is not desirable when running on Linux. To make the SPL shrinker more Linux friendly, the actual number of freed slabs from each of the slab caches is now derived from nr_to_scan and skc_slab_objs. Additionally, an accounting bug was fixed in spl_slab_reclaim() which could cause us to reclaim one more slab than requested. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #101	2012-05-07 11:46:15 -07:00
Jorgen Lundman	cb75844e85	Define the needed ISA types for ARM Add the minimum required ISA types to support the ARM architecture. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-05-03 09:56:15 -07:00
Brian Behlendorf	b29012b999	Remove condition variable names Long ago I added support to the spl for condition variable names because I thought they might be needed. It turns out they aren't. In fact the official Solaris cv_init(9F) man page discourages their use in the kernel. cv_init(9F) Parameters name - Descriptive string. This is obsolete and should be NULL. (Non-NULL strings are legal, but they're a waste of kernel memory.) Therefore, I'm removing them from the spl to reclaim this memory and adding an ASSERT() to ensure no new consumers are added which make use of the name. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-04-06 12:06:19 -07:00
Brian Behlendorf	0835057ee7	Add SPL_META_RELEASE to module load/unload messages Include the ZFS_META_RELEASE in the module load/unload messages to more clearly indicate exactly what version of the SPL has been loaded. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-03-23 12:11:50 -07:00
Brian Behlendorf	9a8b7a7458	Add basic dynamic kstat support Add the bare minimum functionality to support dynamic kstats. A complete kstat implementation should be done as part of issue #84. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #84	2012-02-02 11:28:00 -08:00
Brian Behlendorf	4b2220f0b9	Add --enable-debug-log configure option Until now the notion of an internal debug logging infrastructure was conflated with enabling ASSERT()s. This patch clarifies things by cleanly breaking the two subsystem apart. The result of this is the following behavior. --enable-debug - Enable/disable code wrapped in ASSERT()s. --disable-debug ASSERT()s are used to check invariants and are never required for correct operation. They are disabled by default because they may impact performance. --enable-debug-log - Enable/disable the debug log infrastructure. --disable-debug-log This infrastructure allows the spl code and its consumer to log messages to an in-kernel log. The granularity of the logging can be controlled by a debug mask. By default the mask disables most debug messages resulting in a negligible performance impact. Because of this the debug log is enabled by default. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-02 11:27:54 -08:00
Darik Horn	588d900433	Linux 3.2 compat: rw_semaphore.wait_lock is raw The wait_lock member of the rw_semaphore struct became a raw_spinlock_t in Linux 3.2 at torvalds/linux@ddb6c9b58a. Wrap spin_lock_* function calls in a new spl_rwsem_* interface to ensure type safety if raw_spinlock_t becomes architecture specific, and to satisfy these compiler warnings: warning: passing argument 1 of ‘spinlock_check’ from incompatible pointer type [enabled by default] note: expected ‘struct spinlock_t ’ but argument is of type ‘struct raw_spinlock_t ’ Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #76 Closes: zfsonlinux/zfs#463	2012-01-11 16:28:05 -08:00
Prakash Surya	8f2503e0af	Store copy of tqent_flags prior to servicing task A preallocated taskq_ent_t's tqent_flags must be checked prior to servicing the taskq_ent_t. Once a preallocated taskq entry is serviced, the ownership of the entry is handed back to the caller of taskq_dispatch, thus the entry's contents can potentially be mangled. In particular, this is a problem in the case where a preallocated taskq entry is serviced, and the caller clears it's tqent_flags field. Thus, when the function returns and task_done is called, it looks as though the entry is not a preallocated task (when in fact it is a preallocated task). In this situation, task_done will place the preallocated taskq_ent_t structure onto the taskq_t's free list. This is a huge mistake. If the taskq_ent_t is then freed by the caller of taskq_dispatch, the taskq_t's free list will hold a pointer to garbage data. Even worse, if nothing has over written the freed memory before the pointer is dereferenced, it may still look as though it points to a valid list_head belonging to a taskq_ent_t structure. Thus, the task entry's flags are now copied prior to servicing the task. This copy is then checked to see if it is a preallocated task, and determine if the entry needs to be passed down to the task_done function. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #71	2011-12-16 16:54:00 -08:00
Prakash Surya	e7e5f78e7b	Swap taskq_ent_t with taskqid_t in taskq_thread_t The taskq_t's active thread list is sorted based on its tqt_ent->tqent_id field. The list is kept sorted solely by inserting new taskq_thread_t's in their correct sorted location; no other means is used. This means that once inserted, if a taskq_thread_t's tqt_ent->tqent_id field changes, the list runs the risk of no longer being sorted. Prior to the introduction of the taskq_dispatch_prealloc() interface, this was not a problem as a taskq_ent_t actively being serviced under the old interface should always have a static tqent_id field. Thus, once the taskq_thread_t is added to the taskq_t's active thread list, the taskq_thread_t's tqt_ent->tqent_id field would remain constant. Now, this is no longer the case. Currently, if using the taskq_dispatch_prealloc() interface, any given taskq_ent_t actively being serviced _may_ have its tqent_id value incremented. This happens when the preallocated taskq_ent_t structure is recursively dispatched. Thus, a taskq_thread_t could potentially have its tqt_ent->tqent_id field silently modified from under its feet. If this were to happen to a taskq_thread_t on a taskq_t's active thread list, this would compromise the integrity of the order of the list (as the list _may_ no longer be sorted). To get around this, the taskq_thread_t's taskq_ent_t pointer was replaced with its own static copy of the tqent_id. So, as a taskq_ent_t is pulled off of the taskq_t's pending list, a static copy of its tqent_id is made and this copy is used to sort the active thread list. Using a static copy is key in ensuring the integrity of the order of the active thread list. Even if the underlying taskq_ent_t is recursively dispatched (as has its tqent_id modified), this static copy stored inside the taskq_thread_t will remain constant. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #71	2011-12-16 13:26:54 -08:00
Prakash Surya	44217f7aad	Implement taskq_dispatch_prealloc() interface This patch implements the taskq_dispatch_prealloc() interface which was introduced by the following illumos-gate commit. It allows for a preallocated taskq_ent_t to be used when dispatching items to a taskq. This eliminates a memory allocation which helps minimize lock contention in the taskq when dispatching functions. commit 5aeb94743e3be0c51e86f73096334611ae3a058e Author: Garrett D'Amore <garrett@nexenta.com> Date: Wed Jul 27 07:13:44 2011 -0700 734 taskq_dispatch_prealloc() desired 943 zio_interrupt ends up calling taskq_dispatch with TQ_SLEEP Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #65	2011-12-13 16:10:57 -08:00
Prakash Surya	2c02b71b14	Replace tq_work_list and tq_threads in taskq_t To lay the ground work for introducing the taskq_dispatch_prealloc() interface, the tq_work_list and tq_threads fields had to be replaced with new alternatives in the taskq_t structure. The tq_threads field was replaced with tq_thread_list. Rather than storing the pointers to the taskq's kernel threads in an array, they are now stored as a list. In addition to laying the ground work for the taskq_dispatch_prealloc() interface, this change could also enable taskq threads to be dynamically created and destroyed as threads can now be added and removed to this list relatively easily. The tq_work_list field was replaced with tq_active_list. Instead of keeping a list of taskq_ent_t's which are currently being serviced, a list of taskq_threads currently servicing a taskq_ent_t is kept. This frees up the taskq_ent_t's tqent_list field when it is being serviced (i.e. now when a taskq_ent_t is being serviced, it's tqent_list field will be empty). Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #65	2011-12-13 16:10:50 -08:00
Prakash Surya	93806f58a6	Fix usage of MUTEX macro in mutex_enter_nested A call site of the MUTEX macro had incorrectly placed its closing parenthesis, causing two parameters to be passed rather than one. This change moves the misplaced parenthesis to fix the typographical error. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #70	2011-12-13 11:04:21 -08:00
Chris Dunlop	791dc876eb	Allow 64-bit timestamps to be set on 64-bit kernels ZFS and 64-bit linux are perfectly capable of dealing with 64-bit timestamps, but ZFS deliberately prevents setting them. Adjust the SPL such that TIMESPEC_OVERFLOW will not always assume 32-bit values and instead use the correct values for your kernel build. This effectively allows 64-bit timestamps on 64-bit systems. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes ZFS issue #487	2011-12-12 11:06:03 -08:00

1 2 3 4 5 ...

426 Commits