mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-26 04:07:45 +03:00

Author	SHA1	Message	Date
David Quigley	43b857fddb	Add a PAGESHIFT definition Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: David Quigley <david.quigley@intel.com> Closes #598	2017-01-31 10:36:18 -08:00
Clemens Fruhwirth	6d064f7a07	Remove stale comment from rw_tryupgrade() Commit `f58040c0fc` should have removed this comment which is no longer relevant. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org> Issue #589	2016-12-19 11:27:27 -08:00
Chunwei Chen	f200b83673	Add system_delay_taskq for long delay Add a dedicated system_delay_taskq for long delay like spa_deadman and zpl_posix_acl_free. This will allow us to use system_taskq in the manner of dispatch multiple tasks and call taskq_wait_outstanding. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #588	2016-12-08 14:00:20 -07:00
Ubuntu	cbba714667	Add TASKQID_INVALID and TASKQID_INITIAL macros Add the TASKQID_INVALID and TASKQID_INITIAL macros and update the taskq implementation and test cases to use them. This is solely for the purposes of readability and introduces no functional change. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-11-02 10:34:19 -07:00
Chunwei Chen	ae7eda1dde	Linux 4.9 compat: group_info changes In Linux 4.9, torvalds/linux@81243ea, group_info changed from 2d array via ->blocks to 1d array via ->gid. We change the spl cred functions accordingly. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #581	2016-10-20 09:33:28 -07:00
tuxoko	2529b3a80e	Linux 4.8 compat: Fix RW_READ_HELD Linux 4.8, starting from torvalds/linux@19c5d690e, will set owner to 1 when read held instead of leave it NULL. So we change the condition to `rw_owner(rwp) <= 1` in RW_READ_HELD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes zfsonlinux/zfs#5233 Closes #577	2016-10-07 20:53:58 -07:00
tuxoko	4329bd5b73	Cleanup in cred.h Remove the code that doesn't make any sense. Reviewed-by: Brian Behlendorf <behlendorf@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #569	2016-09-14 16:59:31 -07:00
Tom Caputi	d2f97b2a26	Added highbit() and lowbit() macros Signed-off-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #562	2016-07-20 10:28:46 -07:00
Tony Hutter	5ad98ad097	Add _ALIGNMENT_REQUIRED to isa_defs.h for checksums _ALIGNMENT_REQUIRED needs to be #defined in isa_defs.h in order to port the Illumos checksum code to ZoL: 4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R OpenZFS-issue: https://www.illumos.org/issues/4185 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/45818ee Signed-off-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #561	2016-06-21 13:37:04 -07:00
Chunwei Chen	f58040c0fc	Implement a proper rw_tryupgrade Current rw_tryupgrade does rw_exit and then rw_tryenter(RW_RWITER), and then does rw_enter(RW_READER) if it fails. This violate the assumption that rw_tryupgrade should be atomic and could cause extra contention or even lock inversion. This patch we implement a proper rw_tryupgrade. For rwsem-spinlock, we take the spinlock to check rwsem->count and rwsem->wait_list. For normal rwsem, we use cmpxchg on rwsem->count to change the value from single reader to single writer. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes zfsonlinux/zfs#4692 Closes #554	2016-05-31 11:44:15 -07:00
YunQiang Su	c60a51b640	Add isa_defs for MIPS GCC for MIPS only defines _LP64 when 64bit, while no _ILP32 defined when 32bit. Signed-off-by: YunQiang Su <syq@debian.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #558	2016-05-31 09:05:56 -07:00
Chunwei Chen	39cd90ef08	Add cv_timedwait_sig_hires to allow interruptible sleep Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #548	2016-05-12 14:54:15 -07:00
David Quigley	5e39e4f0b2	Add a macro to convert seconds to nanoseconds and vice-versa Required infrastructure for zfsonlinux/zfs#4600. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #546	2016-05-05 16:10:46 -07:00
Tim Chase	3bf657b90c	Use vmem_free() in dfl_free() and add dfl_alloc() This change was lost, somehow, in `e5f9a9a`. Since the arrays can be rather large, they need to be allocated with vmem_zalloc() via dfl_alloc() and freed with vmem_free() via dfl_free(). The new dfl_alloc() function should be used to allocate object of type dkioc_free_list_t in order that they're allocated from vmem. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Nikolay Borisov <kernel@kyup.com> Closes #543	2016-04-26 11:20:14 -07:00
Chunwei Chen	cdd39dd245	Use kernel provided mutex owner To reduce mutex footprint, we detect the existence of owner in kernel mutex, and rely on it if it exists. Note that before Linux 3.0, mutex owner is of type thread_info. Also note that, in Linux 3.18, the condition for owner is changed from CONFIG_DEBUG_MUTEXES \|\| CONFIG_SMP to CONFIG_DEBUG_MUTEXES \|\| CONFIG_MUTEX_SPIN_ON_OWNER Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #540	2016-04-25 17:04:07 -07:00
Dimitri John Ledkov	224817e2a8	Add support for s390[x]. Signed-off-by: Dimitri John Ledkov <xnox@ubuntu.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #537	2016-03-17 09:54:49 -07:00
Brian Behlendorf	a6ae97caed	Add rw_tryupgrade() This implementation of rw_tryupgrade() behaves slightly differently from its counterparts on other platforms. It drops the RW_READER lock and then acquires the RW_WRITER lock leaving a small window where no lock is held. On other platforms the lock is never released during the upgrade process. This is necessary under Linux because the kernel does not provide an upgrade function. There are currently no callers in the ZFS code where this change in behavior is a problem. In fact, in most cases the code is already written such that if the upgrade fails the RW_READER lock is dropped and the caller blocks waiting to acquire the lock as RW_WRITER. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Matthew Thode <prometheanfire@gentoo.org> Closes zfsonlinux/zfs#4388 Closes #534	2016-03-10 13:05:25 -08:00
Tom Caputi	18d2f56176	Changes to support zfs encryption Unused modlinkage struct removed and ntohll functions added. Signed-off-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #533	2016-02-25 11:42:46 -08:00
Richard Yao	0b43696e66	random_get_pseudo_bytes() need not provide cryptographic strength entropy Perf profiling of dd on a zvol revealed that my system spent 3.16% of its time in random_get_pseudo_bytes(). No SPL consumers need cryptographic strength entropy, so we can reduce our overhead by changing the implementation to utilize a fast PRNG. The Linux kernel did not export a suitable PRNG function until it exported get_random_int() in Linux 3.10. While we could implement an autotools check so that we use it when it is available or even try to access the symbol on older kernels where it is not exported using the fact that it is exported on newer ones as justification, we can instead implement our own pseudo-random data generator. For this purpose, I have written one based on a 128-bit pseudo-random number generator proposed in a paper by Sebastiano Vigna that itself was based on work by the late George Marsaglia. http://vigna.di.unimi.it/ftp/papers/xorshiftplus.pdf Profiling the same benchmark with an earlier variant of this patch that used a slightly different generator (roughly same number of instructions) by the same author showed that time spent in random_get_pseudo_bytes() dropped to 0.06%. That is a factor of 50 improvement. This particular generator algorithm is also well known to be fast: http://xorshift.di.unimi.it/#speed The benchmark numbers there state that it runs at 1.12ns/64-bits or 7.14 GBps of throughput on an Intel Core i7-4770 in what is presumably a single-threaded context. Using it in `random_get_pseudo_bytes()` in the manner I have will probably not reach that level of performance, but it should be fairly high and many times higher than the Linux `get_random_bytes()` function that we use now, which runs at 16.3 MB/s on my Intel Xeon E3-1276v3 processor when measured by using dd on /dev/urandom. Also, putting this generator's seed into per-CPU variables allows us to eliminate overhead from both spin locks and CPU memory barriers, which is NUMA friendly. We could have alternatively modified consumers to use something like `gethrtime() % 3` as suggested by both Matthew Ahrens and Tim Chase, but that has a few potential problems that this approach avoids: 1. Switching to `gethrtime() % 3` in hot code paths today requires diverging from illumos-gate and does nothing about potential future patches from illumos-gate that call our slow `random_get_pseudo_bytes()` in different hot code paths. Reimplementing `random_get_pseudo_bytes()` with a per-CPU PRNG avoids both of those things entirely, which means less work for us in the future. 2. Looking at the code that implements `gethrtime()`, I think it is unlikely to be faster than this per-CPU PRNG implementation of `random_get_pseudo_bytes()`. It would be best to go with something fast now so that there is no point in revisiting this from a performance perspective. 3. `gethrtime() % 3` can vary in behavior from system to system based on kernel version, architecture and clock source. In comparison, this per-CPU PRNG is about ~40 lines of code in `random_get_pseudo_bytes()` that should behave consistently across all systems regardless of kernel version, system architecture or machine clock source. It is unlikely that we would ever need to revisit this per-CPU PRNG while the same cannot be said for `gethrtime() % 3`. 4. `gethrtime()` uses CPU memory barriers and maybe atomic instructions depending on the clock source, so replacing `random_get_pseudo_bytes()` with `gethrtime()` in hot code paths could still require a future person working on NUMA scalability to reimplement it anyway while this per-CPU PRNG would not by virtue of using neither CPU memory barriers nor atomic instructions. Note that I did not check various clock sources for the presence of atomic instructions. There is simply too much code to read and given the drawbacks versus this per-cpu PRNG, there is no point in being certain. 5. I have heard of instances where poor quality pseudo-random numbers caused problems for HPC code in ways that took more than a year to identify and were remedied by switching to a higher quality source of pseudo-random numbers. While filesystems are different than HPC code, I do not think it is impossible for us to have instances where poor quality pseudo-random numbers can cause problems. Opting for a well studied PRNG algorithm that passes tests for statistical randomness over changing callers to use `gethrtime() % 3` bypasses the need to think about both whether poor quality pseudo-random numbers can cause problems and the statistical quality of numbers from `gethrtime() % 3`. 6. `gethrtime()` calls `getrawmonotonic()`, which uses seqlocks. This is probably not a huge issue, but anyone using kgdb would never be able to step through a seqlock critical section, which is not a problem either now or with the per-CPU PRNG: https://en.wikipedia.org/wiki/Seqlock The only downside that I can see is that this code's memory requirement is O(N) where N is NR_CPUS, versus the current code and `gethrtime() % 3`, which are O(1), but that should not be a problem. The seeds will use 64KB of memory at the high end (i.e `NR_CPU == 4096`) and 16 bytes of memory at the low end (i.e. `NR_CPU == 1`). In either case, we should only use a few hundred bytes of code for text, especially since `spl_rand_jump()` should be inlined into `spl_random_init()`, which should be removed during early boot as part of "Freeing unused kernel memory". In either case, the memory requirements are minuscule. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #372	2016-02-17 09:49:09 -08:00
Chunwei Chen	8f3b403a73	Allow kicking a taskq to spawn more threads This patch add a module parameter spl_taskq_kick. When writing non-zero value to it, it will scan all the taskq, if a taskq contains a task pending for more than 5 seconds, it will be forced to spawn a new thread. This is use as an emergency recovery from deadlock, not a general solution. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #529	2016-02-05 14:08:31 -08:00
Richard Yao	be29e6a6e6	kobj_read_file: Return -1 on vn_rdwr() error I noticed that the SPL implementation of kobj_read_file is not correct after comparing it with the userland implementation of kobj_read_file() in zfsonlinux/zfs#4104. Note that we no longer pass RLIM64_INFINITY with this, but our vn_rdwr implementation did not support it anyway, so there is no difference. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #513	2016-01-23 10:10:44 -08:00
Chunwei Chen	16522ac290	Use tsd to store tq for taskq_member To prevent taskq_member holding tq_lock and doing linear search, thus causing contention. We store the taskq pointer to which the thread belongs in tsd. This way taskq_member will not need to touch tq_lock, and tsd has per slot spinlock. So the contention should be reduced greatly. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #500 Closes #504 Closes #505	2016-01-20 13:07:45 -08:00
Brian Behlendorf	de77e24590	Linux 4.5 compat: pfn_t typedef The pfn_t typedef was inherited from Illumos but never directly used by any SPL consumers. This didn't cause any issues until the Linux 4.5 kernel introduced a typedef of the same name. See torvalds/linux/commit/34c0fd54, this patch removes the unused Illumos version to prevent a conflict. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Closes #524	2016-01-20 11:39:18 -08:00
Chunwei Chen	d348f23a6a	Turn on both PF_FSTRANS and PF_MEMALLOC_NOIO in spl_fstrans_mark In `b4ad50a`, we abandoned memalloc_noio_save in favor of spl_fstrans_mark because earlier kernel with it doesn't turn off __GFP_FS. However, for newer kernel, we would prefer PF_MEMALLOC_NOIO because it would work for allocation in kernel which we cannot control otherwise. So in this patch, we turn on both PF_FSTRANS and PF_MEMALLOC_NOIO in spl_fstrans_mark. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #523	2016-01-20 11:38:31 -08:00
Alex McWhirter	466bcf3be5	_ILP32 is always defined on SPARC Signed-off-by: Alex McWhirter <alexmcwhirter@triadic.us> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #520	2016-01-08 11:59:38 -08:00
Chunwei Chen	b4ad50ac5f	Use spl_fstrans_mark instead of memalloc_noio_save For earlier versions of the kernel with memalloc_noio_save, it only turns off __GFP_IO but leaves __GFP_FS untouched during direct reclaim. This would cause threads to direct reclaim into ZFS and cause deadlock. Instead, we should stick to using spl_fstrans_mark. Since we would explicitly turn off both __GFP_IO and __GFP_FS before allocation, it will work on every version of the kernel. This impacts kernel versions 3.9-3.17, see upstream kernel commit torvalds/linux@934f307 for reference. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #515 Issue zfsonlinux/zfs#4111	2015-12-18 13:24:52 -08:00
Tim Chase	200366f23f	Provide kstat for taskqs This patch provides 2 new kstats to display task queues: /proc/spl/taskqs-all - Display all task queues /proc/spl/taskqs - Display only "active" task queues A task queue is considered to be "active" if it currently has active (running) threads or if any of its pending, priority, delay or waitq lists are not empty. If the task queue has running threads, displays each thread function's address (symbolically, if possibly) and its argument. If the task queue has a non-empty list of pending, priority or delayed task queue entries (taskq_ent_t), displays each entry's thread function address and arguemnt. If the task queue has any waiters, displays each waiting task's pid. Note: This patch also updates some comments in taskq.h which referred to "taskq_t" when they should have referred to "taskq_ent_t". Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #491	2015-12-16 09:35:22 -08:00
Brian Behlendorf	2c4332cf79	Fix cstyle issues in spl-taskq.c and taskq.h This patch only addresses the issues identified by the style checker. It contains no functional changes. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-12-11 16:20:22 -08:00
Chunwei Chen	066b89e685	Don't use tq->tq_lock_flags The flags argument in spin_lock_irqsave is modified out side of spin_lock context. We cannot use a shared variable like tq->tq_lock_flags for them. This patch removes it and uses local variable for the flags. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #506	2015-12-11 16:20:03 -08:00
Olaf Faaland	326172d854	Subclass tq_lock to eliminate a lockdep warning When taskq_dispatch() calls taskq_thread_spawn() to create a new thread for a taskq, linux lockdep warns of possible recursive locking. This is a false positive. One such call chain is as follows, when a taskq needs more threads: taskq_dispatch->taskq_thread_spawn->taskq_dispatch The initial taskq_dispatch() holds tq_lock on the taskq that needed more worker threads. The later call into taskq_dispatch() takes dynamic_taskq->tq_lock. Without subclassing, lockdep believes these could potentially be the same lock and complains. A similar case occurs when taskq_dispatch() then calls task_alloc(). This patch uses spin_lock_irqsave_nested() when taking tq_lock, with one of two new lock subclasses: subclass taskq TQ_LOCK_DYNAMIC dynamic_taskq TQ_LOCK_GENERAL any other Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #480	2015-12-11 16:19:56 -08:00
Olaf Faaland	692ae8d398	Add new lock types MUTEX_NOLOCKDEP, and RW_NOLOCKDEP When running a kernel with CONFIG_LOCKDEP=y, lockdep reports possible recursive locking in some cases and possible circular locking dependency in others, within the SPL and ZFS modules. When lockdep detects these conditions, it disables further lock analysis for all locks. This causes /proc/lock_stats not to reflect full information about lock contention, even in locks without dependency issues. This commit creates a new type of mutex, MUTEX_NOLOCKDEP. This mutex type causes subsequent attempts to take or release those locks to be wrapped in lockdep_off() and lockdep_on(). This commit also creates an RW_NOLOCKDEP type analagous to MUTEX_NOLOCKDEP. MUTEX_NOLOCKDEP and RW_NOLOCKDEP are also defined in zfs, in a commit to that repo, for userspace builds. Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #480	2015-12-11 16:18:54 -08:00
zgock	0da84d1574	Fix build issue on some configured kernels The SPL fails to build with some "Configured" kernels (ex. openSUSE xen Kernel) this change should make same binaries with C compiler optimization. Signed-off-by: zgock <zgock@nuc.base.zgock-lab.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #510	2015-12-11 15:27:53 -08:00
Brian Behlendorf	225c110675	Either _ILP32 or _LP64 must be defined For some arm, powerpc, and sparc platforms it was possible that neither _ILP32 of _LP64 would be defined. Update the isa_defs.h header to explicitly set these macros and generate a compile error in the case neither are defined. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: tuxoko <tuxoko@gmail.com> Issue zfsonlinux/zfs#4048	2015-12-10 11:53:29 -08:00
Brian Behlendorf	c5a8b1e163	Revert "Make taskq_member() use ->journal_info" This reverts commit `a430c11f0b`. Using journal_info like this can cause a BUG at kernel fs/jbd2/transaction.c:425! Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #500	2015-12-08 17:12:36 -08:00
Richard Yao	a430c11f0b	Make taskq_member() use ->journal_info The ->journal_info pointer in the task_struct is reserved for use by filesystems and because the kernel can have multiple file systems on the same stack due to direct reclaim, each filesystem that touches ->journal_info in a callback function will save the value at the start of its frame and restore it at the end of its frame. This allows us to safely use ->journal_info to store a pointer to the taskq's struct in taskq threads so that ZFS code paths can detect the presence of a taskq. This could break if the ZFS code were to use taskq_member from the context of direct reclaim. However, there are no such uses of it in that manner, so this is safe. This eliminates an O(N) list traversal under a spinlock with an O(1) unlocked pointer comparison. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: tuxoko <tuxoko@gmail.com> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #500	2015-12-08 13:24:47 -08:00
Richard Yao	1683e75edc	Fix race between getf() and areleasef() If a vnode is released asynchronously through areleasef(), it is possible for the user process to reuse the file descriptor before areleasef is called. When this happens, getf() will return a stale reference, any operations in the kernel on that file descriptor will fail (as it is closed) and the operations meant for that fd will never occur from userspace's perspective. We correct this by detecting this condition in getf(), doing a putf on the old file handle, updating the file descriptor and proceeding as if everything was fine. When the areleasef() is done, it will harmlessly decrement the reference counter on the Illumos file handle. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #492	2015-12-03 15:44:47 -08:00
Tim Chase	e5f9a9afd2	Additional dkio support for TRIM/Discard Replace DKIOCTRIM with DKIOCFREE and add additional support required for Nextenta's TRIM support. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #469	2015-12-02 13:44:35 -08:00
Jason Zaman	8fc851b7b5	sysmacros: Make P2ROUNDUP not trigger int overflow The original P2ROUNDUP and P2ROUNDUP_TYPED macros contain -x which triggers PaX's integer overflow detection for unsigned integers. Replace the macros with an equivalent version that does not trigger the overflow. Axioms: A. (-(x)) === (~((x) - 1)) === (~(x) + 1) under two's complement. B. ~(x & y) === ((~(x)) \| (~(y))) under De Morgan's law. C. ~(~x) === x under the law of excluded middle. Proof: 0. (-(-(x) & -(align))) original 1. (~(-(x) & -(align)) + 1) by A 2. (((~(-(x))) \| (~(-(align)))) + 1) by B 3. (((~(~((x) - 1))) \| (~(~((align) - 1)))) + 1) by A 4. (((((x) - 1)) \| (((align) - 1))) + 1) by C Q.E.D. Signed-off-by: Jason Zaman <jason@perfinion.com> Reviewed-by: Chris Dunlop <chris@onthe.net.au> Reviewed-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes zfsonlinux/zfs#2505 Closes #488	2015-11-13 15:21:52 -08:00
Brian Behlendorf	9b13f65d28	Fix CPU hotplug Allocate a kmem cache magazine for every possible CPU which might be added to the system. This ensures that when one of these CPUs is enabled it can be safely used immediately. For many systems the number of online CPUs is identical to the number of present CPUs so this does imply an increased memory footprint. In fact, dynamically allocating the array of magazine pointers instead of using the worst case NR_CPUS can end up decreasing our memory footprint. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #482	2015-10-13 09:50:40 -07:00
Chunwei Chen	374303a3c9	Use tab indent in rwlock.h Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #473	2015-10-02 11:21:35 -07:00
Chunwei Chen	a00b3eb58f	rwsem use kernel provided owner when possible If CONFIG_RWSEM_SPIN_ON_OWNER is defined, rw_semaphore will have an owner field, so we don't need our own. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #473	2015-10-02 11:21:32 -07:00
Chunwei Chen	4f8e643afe	Don't take spin lock on rwlock owner The spin lock around rw_owner is completely unnecessary. The reason is that it is only modified in the down_write context. If you race against another thread modifying it, that means that you aren't holding the rwlock, so taking the spin lock don't eliminate the race. Also, we only check rw_owner in RW_WRITE_HELD because spl_rwsem_is_locked is unnecessary and might need to take spin lock. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #473	2015-10-02 11:20:55 -07:00
Chunwei Chen	ae89cf0f34	Restructure uio to accommodate bio_vec Starting from Linux 4.1, bio_vec will be allowed to pass into filesystem via iter_read/iter_write, so we add a bio_vec field in uio_t to hold it, and use UIO_BVEC in segflg to determine which "vec". Also, to be consistent to newer kernel, we make iovec and bio_vec immutable, and make uio act as an iterator with the new uio_skip field indicating number of bytes to skip in the first segment. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue zfsonlinux/zfs#3511 Issue zfsonlinux/zfs#3640 Closes #468	2015-08-24 10:10:21 -07:00
Tim Chase	851a549305	Include other sources of freeable memory in the freemem calculation Prevents ARC collapse when non-ZFS filesystems, the block layer or other memory consumers use a lot of reclaimable memory in the page cache. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes zfsonlinux/zfs#3680 Closes #471	2015-08-19 09:25:30 -07:00
Brian Behlendorf	8ac6ffecaf	Remove needfree, desfree, lotsfree #defines This patch reverts `77ab5dd`. This is now possible because upstream has refactored the ARC in such a way that these values are only used in a few key places. Those places have subsequently been updated to use the Linux equivalent Linux functionality. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue zfsonlinux/zfs#3637	2015-07-30 11:45:24 -07:00
Brian Behlendorf	9dc5ffbec8	Invert minclsyspri and maxclsyspri On Linux the meaning of a processes priority is inverted with respect to illumos. High values on Linux indicate a _low_ priority while high value on illumos indicate a _high_ priority. In order to preserve the logical meaning of the minclsyspri and maxclsyspri macros when they are used by the illumos wrapper functions their values have been inverted. This way when changes are merged from upstream illumos we won't need to remember to invert the macro. It could also lead to confusion. Note this change also reverts some of the priorities changes in prior commit `62aa81a`. The rational is as follows: spl_kmem_cache - High priority may result in blocked memory allocs spl_system_taskq - May perform I/O for file backed VDEVs spl_dynamic_taskq - New taskq threads should be spawned promptly Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Issue zfsonlinux/zfs#3607	2015-07-28 13:59:03 -07:00
Brian Behlendorf	62aa81a577	Add defclsyspri macro Add a new defclsyspri macro which can be used to request the default Linux scheduler priority. Neither the minclsyspri or maxclsyspri map to the default Linux kernel thread priority. This makes it awkward to create taskqs which run with the same priority as the rest of the kernel threads on the system which can lead to performance issues. All SPL callers which previously used minclsyspri or maxclsyspri have been changed to use defclsyspri. The vast majority of callers were part of the test suite which won't have an external impact. The few places where it could impact performance the change was from maxclsyspri to defclsyspri. This makes it more likely the process will be scheduled which may help performance. To facilitate further performance analysis the spl_taskq_thread_priority module option has been added. When disabled (0) all newly created kernel threads will use the default kernel thread priority. When enabled (1) the specified taskq priority will be used. By default this value is enabled (1). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-07-23 13:25:49 -07:00
Brian Behlendorf	77ab5dd33a	Add memory compatibility wrappers The function vmem_qcache_reap() and global variables 'needfree', 'desfree', and 'lotsfree' are all used in the upstream. While these variables have no meaning under Linux they're being defined as 0's to avoid needing to make additional changes to the ARC code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-06-29 09:26:29 -07:00
Brian Behlendorf	f7a973d99b	Add TASKQ_DYNAMIC feature Setting the TASKQ_DYNAMIC flag will create a taskq with dynamic semantics. Initially only a single worker thread will be created to service tasks dispatched to the queue. As additional threads are needed they will be dynamically spawned up to the max number specified by 'nthreads'. When the threads are no longer needed, because the taskq is empty, they will automatically terminate. Due to the low cost of creating and destroying threads under Linux by default new threads and spawned and terminated aggressively. There are two modules options which can be tuned to adjust this behavior if needed. * spl_taskq_thread_sequential - The number of sequential tasks, without interruption, which needed to be handled by a worker thread before a new worker thread is spawned. Default 4. * spl_taskq_thread_dynamic - Provides the ability to completely disable the use of dynamic taskqs on the system. This is provided for the purposes of debugging and troubleshooting. Default 1 (enabled). This behavior is fundamentally consistent with the dynamic taskq implementation found in both illumos and FreeBSD. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #458	2015-06-24 15:14:18 -07:00
Brian Behlendorf	5acb2307b2	Add IMPLY() and EQUIV() macros Added for upstream compatibility, they are of the form: * IMPLY(a, b) - if (a) then (b) * EQUIV(a, b) - if (a) then (b) AND if (b) then (a) Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-06-24 14:44:47 -07:00

1 2 3 4 5 ...

451 Commits