mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-22 02:27:36 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	0720116d4d	Add zfs_object_mutex_size module option Add a zfs_object_mutex_size module option to facilitate resizing the the per-dataset znode mutex array. Increasing this value may help make the deadlock described in #4106 less common, but this is not a proper fix. This patch is primarily to aid debugging and analysis. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Issue #4106	2016-01-15 15:33:44 -08:00
Brian Behlendorf	d21f279a94	Illumos 3465, 3466, 3467, 3468, 3470, 3473 3465 ::walk ... \| ::<dcmd> misinterprets input as symbol names 3466 ::tsd should handle missing/NULL values better 3467 mdb_ctf_vread() could be more useful 3468 mdb enhancements for zfs development 3470 ::whatis does not print callers from KMF_LITE 3473 mdb_get_module() returns wrong module Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Dan McDonald <danmcd@nexenta.com> References: https://www.illumos.org/issues/3468 https://github.com/illumos/illumos-gate/commit/28e4da2 Porting notes: - The only portion of this patch which applies to ZoL is a small change to types used in the refcount structure. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4216	2016-01-13 13:56:27 -08:00
Brian Behlendorf	89666a8e1c	Increase default user space stack size Under RHEL6/CentOS6 the default stack size must be increased to 32K to prevent overflowing the stack when running ztest. This isn't an issue for other distributions due to either the version of pthreads or perhaps the compiler. Doubling the stack size resolves the issue safely for all distribution and leaves us some headroom. $ sudo -E ztest -V -T 300 -f /var/tmp/ 5 vdevs, 7 datasets, 23 threads, 300 seconds... loading space map for vdev 0 of 1, metaslab 0 of 30 ... ... loading space map for vdev 0 of 1, metaslab 14 of 30 ... child died with signal 11 Exited ztest with error 3 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4215	2016-01-13 13:55:12 -08:00
Will Andrews	e6cfd633be	Illumos 3749 - zfs event processing should work on R/O root filesystems 3749 zfs event processing should work on R/O root filesystems Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3749 https://github.com/illumos/illumos-gate/commit/3cb69f7 Porting notes: - [include/sys/spa_impl.h] - `ffe9d38` Add generic errata infrastructure - `1421c89` Add visibility in to arc_read - [include/sys/fm/fs/zfs.h] - `2668527` Add linux events - `6283f55` Support custom build directories and move includes - [module/zfs/spa_config.c] - Updated spa_config_sync() to match illumos with the exception of a Linux specific block. Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-12 14:42:32 -08:00
Matthew Ahrens	d25b4497b7	Illumos 5141 - zfs minimum indirect block size is 4K 5141 zfs minimum indirect block size is 4K Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Richard Elling <richard.elling@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5141 https://github.com/illumos/illumos-gate/commit/e94f268 Porting notes: - GRUB -- GRand Unified Bootloader change wasn't merged (not applicable) Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-12 14:11:31 -08:00
Justin T. Gibbs	0eb21616fa	Illumos 6171 - dsl_prop_unregister() slows down dataset eviction. 6171 dsl_prop_unregister() slows down dataset eviction. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/6171 https://github.com/illumos/illumos-gate/commit/03bad06 Porting notes: - Conflicts - `3558fd7` Prototype/structure update for Linux - `2cf7f52` Linux compat 2.6.39: mount_nodev() - `13fe019` Illumos #3464 - `241b541` Illumos 5959 - clean up per-dataset feature count code - dsl_prop_unregister() preserved until out of tree consumers like Lustre can transition to dsl_prop_unregister_all(). - Fixing 'space or tab at end of line' in include/sys/dsl_dataset.h Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-12 10:53:12 -08:00
Matthew Ahrens	7f60329a26	Illumos 5987 - zfs prefetch code needs work 5987 zfs prefetch code needs work Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Gordon Ross <gordon.ross@nexenta.com> References: https://www.illumos.org/issues/5987 zfs prefetch code needs work illumos/illumos-gate@cf6106c 5987 zfs prefetch code needs work Porting notes: - [module/zfs/dbuf.c] - `5f6d0b6` Handle block pointers with a corrupt logical size - [module/zfs/dmu_zfetch.c] - `c65aa5b` Fix gcc missing parenthesis warnings - `428870f` Update core ZFS code from build 121 to build 141. - `79c76d5` Change KM_PUSHPAGE -> KM_SLEEP - `b8d06fc` Switch KM_SLEEP to KM_PUSHPAGE - Account for ISO C90 - mixed declarations and code - warnings - Module parameters (new/changed): - Replaced zfetch_block_cap with zfetch_max_distance (Max bytes to prefetch per stream (default 8MB; 8 * 1024 * 1024)) - Preserved zfs_prefetch_disable as 'int' for consistency with existing Linux module options. - [include/sys/trace_arc.h] - Added new tracepoints - DEFINE_ARC_BUF_HDR_EVENT(zfs_arc__sync__wait__for__async); - DEFINE_ARC_BUF_HDR_EVENT(zfs_arc__demand__hit__predictive__prefetch); - [man/man5/zfs-module-parameters.5] - Updated man page Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-12 09:02:33 -08:00
Brian Behlendorf	b870c7e5f4	Revert "Illumos 3749 - zfs event processing should work on R/O root filesystems" This reverts commit `b47637ecdc` which introduced a regression in ztest. $ ./cmd/ztest/ztest -V 5 vdevs, 7 datasets, 23 threads, 300 seconds... * Error in `/rpool/home/behlendo/src/git/zfs/cmd/ztest/.libs/lt-ztest': double free or corruption (fasttop): 0x0000000000d339f0 *	2016-01-11 14:10:30 -08:00
Matthew Ahrens	1715493f38	Illumos 4929 - want prevsnap property 4929 want prevsnap property Reviewed by: Adam Leventhal <adam.leventhal@delphix.com> Reviewed by: Matt Amdur <matt.amdur@delphix.com> Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Reviewed by: Boris Protopopov <bprotopopov@hotmail.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/4929 https://github.com/illumos/illumos-gate/commit/b461c74 Porting notes: - [include/sys/fs/zfs.h] - f67d70 Create an 'overlay' property - 11b9ec Add full SELinux support - [fs/zfs/dsl_dataset.c] - This increases the stack size of dsl_dataset_stats() but nothing has been changed until this is shown to be an issue. Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-11 11:58:26 -08:00
Matthew Ahrens	9867e8be2a	Illumos 4891 - want zdb option to dump all metadata 4891 want zdb option to dump all metadata Reviewed by: Sonu Pillai <sonu.pillai@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Garrett D'Amore <garrett@damore.org> We'd like a way for zdb to dump metadata in a machine-readable format, so that we can bring that back from a customer site for in-house diagnosis. Think of it as a crash dump for zpools, which can be used for post-mortem analysis of a malfunctioning pool References: https://www.illumos.org/issues/4891 https://github.com/illumos/illumos-gate/commit/df15e41 Porting notes: - [cmd/zdb/zdb.c] - `a5778ea` zdb: Introduce -V for verbatim import - In main() getopt 'opt' variable removed and the code was brought back in line with illumos. - [lib/libzpool/kernel.c] - `1e33ac1` Fix Solaris thread dependency by using pthreads - `f0e324f` Update utsname support - `4d58b69` Fix vn_open/vn_rdwr error handling - In vn_open() allocate 'dumppath' on heap instead of stack - Properly handle 'dump_fd == -1' error path - Free 'realpath' after added vn_dumpdir_code block Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-11 11:36:54 -08:00
Will Andrews	b47637ecdc	Illumos 3749 - zfs event processing should work on R/O root filesystems 3749 zfs event processing should work on R/O root filesystems Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3749 https://github.com/illumos/illumos-gate/commit/3cb69f7 Porting notes: - [include/sys/spa_impl.h] - `ffe9d38` Add generic errata infrastructure - `1421c89` Add visibility in to arc_read - [include/sys/fm/fs/zfs.h] - `2668527` Add linux events - `6283f55` Support custom build directories and move includes - [module/zfs/spa_config.c] - Updated spa_config_sync() to match illumos with the exception of a Linux specific block. Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-11 09:23:37 -08:00
Paul Dagnelie	fcff0f35bd	Illumos 5960, 5925 5960 zfs recv should prefetch indirect blocks 5925 zfs receive -o origin= Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> References: https://www.illumos.org/issues/5960 https://www.illumos.org/issues/5925 https://github.com/illumos/illumos-gate/commit/a2cdcdd Porting notes: - [lib/libzfs/libzfs_sendrecv.c] - `b8864a2` Fix gcc cast warnings - `325f023` Add linux kernel device support - `5c3f61e` Increase Linux pipe buffer size on 'zfs receive' - [module/zfs/zfs_vnops.c] - `3558fd7` Prototype/structure update for Linux - `c12e3a5` Restructure zfs_readdir() to fix regressions - [module/zfs/zvol.c] - Function @zvol_map_block() isn't needed in ZoL - `9965059` Prefetch start and end of volumes - [module/zfs/dmu.c] - Fixed ISO C90 - mixed declarations and code - Function dmu_prefetch() 'int i' is initialized before the following code block (c90 vs. c99) - [module/zfs/dbuf.c] - `fc5bb51` Fix stack dbuf_hold_impl() - `9b67f60` Illumos 4757, 4913 - 34229a2 Reduce stack usage for recursive traverse_visitbp() - [module/zfs/dmu_send.c] - Fixed ISO C90 - mixed declarations and code - `b58986e` Use large stacks when available - `241b541` Illumos 5959 - clean up per-dataset feature count code - `77aef6f` Use vmem_alloc() for nvlists - `00b4602` Add linux kernel memory support Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-08 15:08:19 -08:00
Alex McWhirter	466bcf3be5	_ILP32 is always defined on SPARC Signed-off-by: Alex McWhirter <alexmcwhirter@triadic.us> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #520	2016-01-08 11:59:38 -08:00
Matthew Ahrens	37f8a8835a	Illumos 5746 - more checksumming in zfs send 5746 more checksumming in zfs send Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Albert Lee <trisk@omniti.com> References: https://www.illumos.org/issues/5746 https://github.com/illumos/illumos-gate/commit/98110f0 https://github.com/zfsonlinux/zfs/issues/905 Porting notes: - Minor conflicts due to: - https://github.com/zfsonlinux/zfs/commit/2024041 - https://github.com/zfsonlinux/zfs/commit/044baf0 - https://github.com/zfsonlinux/zfs/commit/88904bb - Fix ISO C90 warnings (-Werror=declaration-after-statement) - arc_buf_t abuf; - dmu_buf_t bonus; - zio_cksum_t cksum_orig; - zio_cksum_t *cksump; - Fix format '%llx' format specifier warning - Align message in zstreamdump safe_malloc() with upstream Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3611	2015-12-30 14:24:14 -08:00
Ned Bass	43b4935e53	Prevent SA length overflow The function sa_update() accepts a 32-bit length parameter and assigns it to a 16-bit field in sa_bulk_attr_t, potentially truncating the passed-in value. This could lead to corrupt system attribute (SA) records getting written to the pool. Add a VERIFY to sa_update() to detect cases where overflow would occur. The SA length is limited to 16-bit values by the on-disk format defined by sa_hdr_phys_t. The function zfs_sa_set_xattr() is vulnerable to this bug if the unpacked nvlist of xattrs is less than 64k in size but the packed size is greater than 64k. Fix this by appropriately checking the size of the packed nvlist before calling sa_update(). Add error handling to zpl_xattr_set_sa() to keep the cached list of SA-based xattrs consistent with the data on disk. Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned transaction if sa_update() returns an error, but the DMU only allows unassigned transactions to be aborted. Wrap the sa_update() call in a VERIFY0, remove the transaction abort, and call dmu_tx_commit() unconditionally. This is consistent practice with other callers of sa_update(). Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #4150	2015-12-30 13:20:12 -08:00
Chris Williamson	23de906c72	Illumos 5745 - zfs set allows only one dataset property to be set at a time 5745 zfs set allows only one dataset property to be set at a time Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Reviewed by: Richard PALO <richard@NetBSD.org> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Approved by: Rich Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/5745 https://github.com/illumos/illumos-gate/commit/3092556 Porting notes: - Fix the missing braces around initializer, zfs_cmd_t zc = {"\0"}; - Remove extra format argument in zfs_do_set() - Declare at the top: - zfs_prop_t prop; - nvpair_t elem; - nvpair_t next; - int i; - Additionally initialize: - int added_resv = 0; - zfs_prop_t prop = 0; - Assign 0 install of NULL for uint64_t types. - zc->zc_nvlist_conf = '\0'; - zc->zc_nvlist_src = '\0'; - zc->zc_nvlist_dst = '\0'; Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3574	2015-12-29 16:59:26 -08:00
Olaf Faaland	e0553a74ad	Add lock types RW_NOLOCKDEP and MUTEX_NOLOCKDEP Both lock types were introduced in SPL to allow some locks to be taken/released with linux lockdep turned off. See SPL commit for details. Add the new lock types to zfs_context.h to allow user space compilation. Depends on SPL commit `692ae8d` SPL pull request refs/pull/480/head Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3895	2015-12-22 10:20:29 -08:00
Brian Behlendorf	6fe53787f3	Fix vdev_queue_aggregate() deadlock This deadlock may manifest itself in slightly different ways but at the core it is caused by a memory allocation blocking on file- system reclaim in the zio pipeline. This is normally impossible because zio_execute() disables filesystem reclaim by setting PF_FSTRANS on the thread. However, kmem cache allocations may still indirectly block on file system reclaim while holding the critical vq->vq_lock as shown below. To resolve this issue zio_buf_alloc_flags() is introduced which allocation flags to be passed. This can then be used in vdev_queue_aggregate() with KM_NOSLEEP when allocating the aggregate IO buffer. Since aggregating the IO is purely a performance optimization we want this to either succeed or fail quickly. Trying too hard to allocate this memory under the vq->vq_lock can negatively impact performance and result in this deadlock. * z_wr_iss zio_vdev_io_start vdev_queue_io -> Takes vq->vq_lock vdev_queue_io_to_issue vdev_queue_aggregate zio_buf_alloc -> Waiting on spl_kmem_cache process * z_wr_int zio_vdev_io_done vdev_queue_io_done mutex_lock -> Waiting on vq->vq_lock held by z_wr_iss * txg_sync spa_sync dsl_pool_sync zio_wait -> Waiting on zio being handled by z_wr_int * spl_kmem_cache spl_cache_grow_work kv_alloc spl_vmalloc ... evict zpl_evict_inode zfs_inactive dmu_tx_wait txg_wait_open -> Waiting on txg_sync Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #3808 Closes #3867	2015-12-18 13:27:12 -08:00
Chunwei Chen	b4ad50ac5f	Use spl_fstrans_mark instead of memalloc_noio_save For earlier versions of the kernel with memalloc_noio_save, it only turns off __GFP_IO but leaves __GFP_FS untouched during direct reclaim. This would cause threads to direct reclaim into ZFS and cause deadlock. Instead, we should stick to using spl_fstrans_mark. Since we would explicitly turn off both __GFP_IO and __GFP_FS before allocation, it will work on every version of the kernel. This impacts kernel versions 3.9-3.17, see upstream kernel commit torvalds/linux@934f307 for reference. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #515 Issue zfsonlinux/zfs#4111	2015-12-18 13:24:52 -08:00
Tim Chase	200366f23f	Provide kstat for taskqs This patch provides 2 new kstats to display task queues: /proc/spl/taskqs-all - Display all task queues /proc/spl/taskqs - Display only "active" task queues A task queue is considered to be "active" if it currently has active (running) threads or if any of its pending, priority, delay or waitq lists are not empty. If the task queue has running threads, displays each thread function's address (symbolically, if possibly) and its argument. If the task queue has a non-empty list of pending, priority or delayed task queue entries (taskq_ent_t), displays each entry's thread function address and arguemnt. If the task queue has any waiters, displays each waiting task's pid. Note: This patch also updates some comments in taskq.h which referred to "taskq_t" when they should have referred to "taskq_ent_t". Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #491	2015-12-16 09:35:22 -08:00
Chunwei Chen	2727b9d3b6	Use uio for zvol_{read,write} Since uio now supports bvec, we can convert bio into uio and reuse dmu_{read,write}_uio. This way, we can remove some duplicate code. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4078	2015-12-15 16:21:43 -08:00
Brian Behlendorf	2c4332cf79	Fix cstyle issues in spl-taskq.c and taskq.h This patch only addresses the issues identified by the style checker. It contains no functional changes. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-12-11 16:20:22 -08:00
Chunwei Chen	066b89e685	Don't use tq->tq_lock_flags The flags argument in spin_lock_irqsave is modified out side of spin_lock context. We cannot use a shared variable like tq->tq_lock_flags for them. This patch removes it and uses local variable for the flags. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #506	2015-12-11 16:20:03 -08:00
Olaf Faaland	326172d854	Subclass tq_lock to eliminate a lockdep warning When taskq_dispatch() calls taskq_thread_spawn() to create a new thread for a taskq, linux lockdep warns of possible recursive locking. This is a false positive. One such call chain is as follows, when a taskq needs more threads: taskq_dispatch->taskq_thread_spawn->taskq_dispatch The initial taskq_dispatch() holds tq_lock on the taskq that needed more worker threads. The later call into taskq_dispatch() takes dynamic_taskq->tq_lock. Without subclassing, lockdep believes these could potentially be the same lock and complains. A similar case occurs when taskq_dispatch() then calls task_alloc(). This patch uses spin_lock_irqsave_nested() when taking tq_lock, with one of two new lock subclasses: subclass taskq TQ_LOCK_DYNAMIC dynamic_taskq TQ_LOCK_GENERAL any other Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #480	2015-12-11 16:19:56 -08:00
Olaf Faaland	628fc52137	Fix lockdep warning in spl_inode_{lock,unlock} spl_inode_{lock,unlock} are triggering possible recursive locking warnings from lockdep. The warning is a false positive. The lock is used to protect a parent directory during delete/add operations, used in zfs when writing/removing the cache file. The inode lock is taken on both the parent inode and the file inode. VFS provides an enum to subclass the lock. This patch changes the spin_lock call to _nested version and uses the provided enum. Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #480	2015-12-11 16:19:47 -08:00
Olaf Faaland	692ae8d398	Add new lock types MUTEX_NOLOCKDEP, and RW_NOLOCKDEP When running a kernel with CONFIG_LOCKDEP=y, lockdep reports possible recursive locking in some cases and possible circular locking dependency in others, within the SPL and ZFS modules. When lockdep detects these conditions, it disables further lock analysis for all locks. This causes /proc/lock_stats not to reflect full information about lock contention, even in locks without dependency issues. This commit creates a new type of mutex, MUTEX_NOLOCKDEP. This mutex type causes subsequent attempts to take or release those locks to be wrapped in lockdep_off() and lockdep_on(). This commit also creates an RW_NOLOCKDEP type analagous to MUTEX_NOLOCKDEP. MUTEX_NOLOCKDEP and RW_NOLOCKDEP are also defined in zfs, in a commit to that repo, for userspace builds. Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #480	2015-12-11 16:18:54 -08:00
zgock	0da84d1574	Fix build issue on some configured kernels The SPL fails to build with some "Configured" kernels (ex. openSUSE xen Kernel) this change should make same binaries with C compiler optimization. Signed-off-by: zgock <zgock@nuc.base.zgock-lab.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #510	2015-12-11 15:27:53 -08:00
Brian Behlendorf	225c110675	Either _ILP32 or _LP64 must be defined For some arm, powerpc, and sparc platforms it was possible that neither _ILP32 of _LP64 would be defined. Update the isa_defs.h header to explicitly set these macros and generate a compile error in the case neither are defined. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: tuxoko <tuxoko@gmail.com> Issue zfsonlinux/zfs#4048	2015-12-10 11:53:29 -08:00
Brian Behlendorf	c5a8b1e163	Revert "Make taskq_member() use ->journal_info" This reverts commit `a430c11f0b`. Using journal_info like this can cause a BUG at kernel fs/jbd2/transaction.c:425! Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #500	2015-12-08 17:12:36 -08:00
Chunwei Chen	24ef51f660	Use spa as key besides objsetid for snapentry objsetid is not unique across pool, so using it solely as key would cause panic when automounting two snapshot on different pools with the same objsetid. We fix this by adding spa pointer as additional key. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Issue #3948 Issue #3786 Issue #3887	2015-12-08 16:38:56 -08:00
Richard Yao	a430c11f0b	Make taskq_member() use ->journal_info The ->journal_info pointer in the task_struct is reserved for use by filesystems and because the kernel can have multiple file systems on the same stack due to direct reclaim, each filesystem that touches ->journal_info in a callback function will save the value at the start of its frame and restore it at the end of its frame. This allows us to safely use ->journal_info to store a pointer to the taskq's struct in taskq threads so that ZFS code paths can detect the presence of a taskq. This could break if the ZFS code were to use taskq_member from the context of direct reclaim. However, there are no such uses of it in that manner, so this is safe. This eliminates an O(N) list traversal under a spinlock with an O(1) unlocked pointer comparison. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: tuxoko <tuxoko@gmail.com> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #500	2015-12-08 13:24:47 -08:00
Matthew Ahrens	241b541574	Illumos 5959 - clean up per-dataset feature count code 5959 clean up per-dataset feature count code Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/5959 https://github.com/illumos/illumos-gate/commit/ca0cc39 Porting notes: illumos code doesn't check for feature_get_refcount() returning ENOTSUP (which means feature is disabled) in zdb. zfsonlinux added a check in https://github.com/zfsonlinux/zfs/commit/784652c due to #3468. The check was reintroduced here. Ported-by: Witaut Bajaryn <vitaut.bayaryn@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3965	2015-12-04 14:20:20 -08:00
Brian Behlendorf	072484504f	Add zap_prefetch() interface Provide a generic interface to prefetch ZAP entries by name. This functionality is being added for external consumers such as Lustre. It is based of the existing zap_prefetch_uint64() version which is used by the deduplication code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #4061	2015-12-04 09:39:20 -08:00
ilovezfs	917b8c5cec	Ext4's typical GPT partition type not recognized Adding additional entries to the efi conversion array will help prevent the overwriting of the GPTs of disks with in-use file systems in more cases. Most notably, this adds partition type 8300 "Linux filesystem" (0FC63DAF-8483-4772-8E79-3D69D8477DE4), which is often used for ext4 and btrfs, among others. This commit itself does nothing to address the underlying problematic behavior that check_slice() isn't called on partitions of an unrecognized type, even when they contain a currently mounted file system. The additional entries were derived from these two resources: https://en.wikipedia.org/wiki/GUID_Partition_Table http://sourceforge.net/p/gptfdisk/code/ci/master/tree/parttypes.cc Signed-off-by: ilovezfs <ilovezfs@icloud.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #4016	2015-12-04 09:27:00 -08:00
Yuri Pankov	fc80384923	Illumos 934 - FreeBSD's GPT not recognized Reviewed by: Alexander Eremin <alexander.r.eremin@gmail.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Andrew Stormont <Andrew.Stormont@nexenta.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Gordon Ross <gwr@nexenta.com> References: https://www.illumos.org/issues/934 https://github.com/illumos/illumos-gate/commit/e21ea67 Ported-by: ilovezfs <ilovezfs@icloud.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #4016	2015-12-04 09:19:40 -08:00
Richard Yao	1683e75edc	Fix race between getf() and areleasef() If a vnode is released asynchronously through areleasef(), it is possible for the user process to reuse the file descriptor before areleasef is called. When this happens, getf() will return a stale reference, any operations in the kernel on that file descriptor will fail (as it is closed) and the operations meant for that fd will never occur from userspace's perspective. We correct this by detecting this condition in getf(), doing a putf on the old file handle, updating the file descriptor and proceeding as if everything was fine. When the areleasef() is done, it will harmlessly decrement the reference counter on the Illumos file handle. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #492	2015-12-03 15:44:47 -08:00
Richard Yao	b22e279797	Only trigger SET_ERROR tracepoint event on error Currently, the SET_ERROR tracepoint triggers regardless of whether there is an error or not. On Illumos, SET_ERROR only triggers on an actual error, which is avoids irrelevant noise. Linux 2.6.38 added support for conditional tracepoints, so we modify SET_ERROR to use them when they are avaliable for functionality equivalent to the Illumos functionality. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4043	2015-12-02 17:16:41 -08:00
Tim Chase	e5f9a9afd2	Additional dkio support for TRIM/Discard Replace DKIOCTRIM with DKIOCFREE and add additional support required for Nextenta's TRIM support. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #469	2015-12-02 13:44:35 -08:00
Chunwei Chen	61d482f7cd	Linux 4.4 compat: xattr operations takes xattr_handler The xattr_hander->{list,get,set} were changed to take a xattr_handler, and handler_flags argument was removed and should be accessed by handler->flags. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #4021	2015-12-01 16:48:25 -08:00
Jason Zaman	8fc851b7b5	sysmacros: Make P2ROUNDUP not trigger int overflow The original P2ROUNDUP and P2ROUNDUP_TYPED macros contain -x which triggers PaX's integer overflow detection for unsigned integers. Replace the macros with an equivalent version that does not trigger the overflow. Axioms: A. (-(x)) === (~((x) - 1)) === (~(x) + 1) under two's complement. B. ~(x & y) === ((~(x)) \| (~(y))) under De Morgan's law. C. ~(~x) === x under the law of excluded middle. Proof: 0. (-(-(x) & -(align))) original 1. (~(-(x) & -(align)) + 1) by A 2. (((~(-(x))) \| (~(-(align)))) + 1) by B 3. (((~(~((x) - 1))) \| (~(~((align) - 1)))) + 1) by A 4. (((((x) - 1)) \| (((align) - 1))) + 1) by C Q.E.D. Signed-off-by: Jason Zaman <jason@perfinion.com> Reviewed-by: Chris Dunlop <chris@onthe.net.au> Reviewed-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes zfsonlinux/zfs#2505 Closes #488	2015-11-13 15:21:52 -08:00
Justin T. Gibbs	bc4501f75a	Illumos 6267 - dn_bonus evicted too early 6267 dn_bonus evicted too early Reviewed by: Richard Yao <ryao@gentoo.org> Reviewed by: Xin LI <delphij@freebsd.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/6267 https://github.com/illumos/illumos-gate/commit/d205810 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Ned Bass bass6@llnl.gov Issue #3865 Issue #3443	2015-10-13 14:12:02 -07:00
Brian Behlendorf	9b13f65d28	Fix CPU hotplug Allocate a kmem cache magazine for every possible CPU which might be added to the system. This ensures that when one of these CPUs is enabled it can be safely used immediately. For many systems the number of online CPUs is identical to the number of present CPUs so this does imply an increased memory footprint. In fact, dynamically allocating the array of magazine pointers instead of using the worst case NR_CPUS can end up decreasing our memory footprint. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #482	2015-10-13 09:50:40 -07:00
Chunwei Chen	374303a3c9	Use tab indent in rwlock.h Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #473	2015-10-02 11:21:35 -07:00
Chunwei Chen	a00b3eb58f	rwsem use kernel provided owner when possible If CONFIG_RWSEM_SPIN_ON_OWNER is defined, rw_semaphore will have an owner field, so we don't need our own. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #473	2015-10-02 11:21:32 -07:00
Chunwei Chen	4f8e643afe	Don't take spin lock on rwlock owner The spin lock around rw_owner is completely unnecessary. The reason is that it is only modified in the down_write context. If you race against another thread modifying it, that means that you aren't holding the rwlock, so taking the spin lock don't eliminate the race. Also, we only check rw_owner in RW_WRITE_HELD because spl_rwsem_is_locked is unnecessary and might need to take spin lock. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #473	2015-10-02 11:20:55 -07:00
Lukas Wunner	784a7fe5d9	Linux 4.3 compat: bio_end_io_t / BIO_UPTODATE Commit torvalds/linux@4246a0b63b ("block: add a bi_error field to struct bio") dropped the error argument from bio_endio in favor of newly introduced bio->bi_error. This also replaces bio->bi_flags value BIO_UPTODATE. bio_endio was a 3 argument function until Linux 2.6.24, which made it a 2 argument function, and now the prototype has changed yet again to a 1 argument function. Support for pre 2.6.24 kernels was already dropped with `37f9dac592` ("zvol processing should use struct bio") which assumed the 2 argument version in zvol_request(). Remaining code to support the 3 argument version is hereby removed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Lukas Wunner <lukas@wunner.de> Issue #3799	2015-09-25 12:44:54 -07:00
Don Brady	56b3986316	Add large block support to zpios(1) benchmark As part of the large block support effort, it makes sense to add support for large blocks to zpios(1). The specifying of a zfs block size for zpios is optional and will default to 128K if the block size is not specified. `zpios ... -S size \| --blocksize size ...` This will use size ZFS blocks for each test, specified as a comma delimited list with an optional unit suffix. The supported range is powers of two from 128K through 16M. A range of block sizes can be tested as follows: `-S 128K,256K,512K,1M` Example run below (non realistic results from a VM and output abbreviated for space) ``` --regioncount=750 --regionsize=8M --chunksize=1M --offset=4K --threaddelay=0 --cleanup --human-readable --verbose --cleanup --blocksize=128K,256K,512K,1M th-cnt rg-cnt rg-sz ch-sz blksz wr-data wr-bw rd-data rd-bw --------------------------------------------------------------------- 4 750 8m 1m 128k 5g 90.06m 5g 93.37m 4 750 8m 1m 256k 5g 79.71m 5g 99.81m 4 750 8m 1m 512k 5g 42.20m 5g 93.14m 4 750 8m 1m 1m 5g 35.51m 5g 89.36m 8 750 8m 1m 128k 5g 85.49m 5g 90.81m 8 750 8m 1m 256k 5g 61.42m 5g 99.24m 8 750 8m 1m 512k 5g 49.09m 5g 108.78m 16 750 8m 1m 128k 5g 86.28m 5g 88.73m 16 750 8m 1m 256k 5g 64.34m 5g 93.47m 16 750 8m 1m 512k 5g 68.84m 5g 124.47m 16 750 8m 1m 1m 5g 53.97m 5g 97.20m --------------------------------------------------------------------- ``` Signed-off-by: Don Brady <don.brady@intel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3795 Closes #2071	2015-09-22 09:13:20 -07:00
Ned Bass	3af56fd95f	Honor xattr=sa dataset property ZFS incorrectly uses directory-based extended attributes even when xattr=sa is specified as a dataset property or mount option. Support to honor temporary mount options including "xattr" was added in commit `0282c4137e`. There are two issues with the mount option handling: * Libzfs has historically included "xattr" in its list of default mount options. This overrides the dataset property, so the dataset is always configured to use directory-based xattrs even when the xattr dataset property is set to off or sa. Address this by removing "xattr" from the set of default mount options in libzfs. * There was no way to enable system attribute-based extended attributes using temporary mount options. Add the mount options "saxattr" and "dirxattr" which enable the xattr behavior their names suggest. This approach has the advantages of mirroring the valid xattr dataset property values and following existing conventions for mount option names. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3787	2015-09-19 14:04:14 -07:00
Arne Jansen	4e0f33ffe0	Illumos 6214 - zpools going south 6214 zpools going south Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> References: https://www.illumos.org/issues/6214 http://cr.illumos.org/~webrev/sensille/6214_zpools_going_south/ Porting Notes: Reintroduce b_compress to the l2arc_buf_hdr_t. In commit `b9541d6` the compression flags were moved to the generic b_flags in the arc_buf_hdr_t. This is a problem because l2arc_compress_buf() may manipulate the compression flags and this can only be done safely under the hash lock which is not held. See Illumos 6214 for a detailed analysis of the race. HDR_GET_COMPRESS() macro was removed from arc_buf_info(). Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3757	2015-09-11 11:14:38 -07:00
Richard Yao	8198d18ca7	Reintroduce IO accounting on zvols on Linux 3.19+ zfsonlinux/zfs@e20cd6f7a8 caused us to lose IO accounting on zvols. When I originally wrote that last year, the symbols we needed to maintain IO accounting were GPL exported, but torvalds/linux@394ffa503b provided suitable symbols for restoring this functionality 4 months later. We can call them to restore the IO accounting on Linux 3.19 and later as well as any older kernels where that patch is backported. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3741	2015-09-09 09:29:24 -07:00

... 26 27 28 29 30 ...

2268 Commits