mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-22 02:27:36 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	6fe53787f3	Fix vdev_queue_aggregate() deadlock This deadlock may manifest itself in slightly different ways but at the core it is caused by a memory allocation blocking on file- system reclaim in the zio pipeline. This is normally impossible because zio_execute() disables filesystem reclaim by setting PF_FSTRANS on the thread. However, kmem cache allocations may still indirectly block on file system reclaim while holding the critical vq->vq_lock as shown below. To resolve this issue zio_buf_alloc_flags() is introduced which allocation flags to be passed. This can then be used in vdev_queue_aggregate() with KM_NOSLEEP when allocating the aggregate IO buffer. Since aggregating the IO is purely a performance optimization we want this to either succeed or fail quickly. Trying too hard to allocate this memory under the vq->vq_lock can negatively impact performance and result in this deadlock. * z_wr_iss zio_vdev_io_start vdev_queue_io -> Takes vq->vq_lock vdev_queue_io_to_issue vdev_queue_aggregate zio_buf_alloc -> Waiting on spl_kmem_cache process * z_wr_int zio_vdev_io_done vdev_queue_io_done mutex_lock -> Waiting on vq->vq_lock held by z_wr_iss * txg_sync spa_sync dsl_pool_sync zio_wait -> Waiting on zio being handled by z_wr_int * spl_kmem_cache spl_cache_grow_work kv_alloc spl_vmalloc ... evict zpl_evict_inode zfs_inactive dmu_tx_wait txg_wait_open -> Waiting on txg_sync Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #3808 Closes #3867	2015-12-18 13:27:12 -08:00
Chunwei Chen	b4ad50ac5f	Use spl_fstrans_mark instead of memalloc_noio_save For earlier versions of the kernel with memalloc_noio_save, it only turns off __GFP_IO but leaves __GFP_FS untouched during direct reclaim. This would cause threads to direct reclaim into ZFS and cause deadlock. Instead, we should stick to using spl_fstrans_mark. Since we would explicitly turn off both __GFP_IO and __GFP_FS before allocation, it will work on every version of the kernel. This impacts kernel versions 3.9-3.17, see upstream kernel commit torvalds/linux@934f307 for reference. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #515 Issue zfsonlinux/zfs#4111	2015-12-18 13:24:52 -08:00
Brian Behlendorf	a8ad3bf02c	Fix z_xattr_lock/z_teardown_lock lock inversion There exists a lock inversion between the z_xattr_lock and the z_teardown_lock. Detect this case and return EBUSY so zfs_resume_fs() will mark the inode stale and it can be safely revalidated on next access. * process-1 zpl_xattr_get -> Takes zp->z_xattr_lock __zpl_xattr_get zfs_lookup -> Takes zsb->z_teardown_lock in ZFS_ENTER macro * process-2 zfs_ioc_recv -> Takes zsb->z_teardown_lock in zfs_suspend_fs() zfs_resume_fs zfs_rezget -> Takes zp->z_xattr_lock Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #3969	2015-12-18 13:17:44 -08:00
Tim Chase	200366f23f	Provide kstat for taskqs This patch provides 2 new kstats to display task queues: /proc/spl/taskqs-all - Display all task queues /proc/spl/taskqs - Display only "active" task queues A task queue is considered to be "active" if it currently has active (running) threads or if any of its pending, priority, delay or waitq lists are not empty. If the task queue has running threads, displays each thread function's address (symbolically, if possibly) and its argument. If the task queue has a non-empty list of pending, priority or delayed task queue entries (taskq_ent_t), displays each entry's thread function address and arguemnt. If the task queue has any waiters, displays each waiting task's pid. Note: This patch also updates some comments in taskq.h which referred to "taskq_t" when they should have referred to "taskq_ent_t". Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #491	2015-12-16 09:35:22 -08:00
Chunwei Chen	2727b9d3b6	Use uio for zvol_{read,write} Since uio now supports bvec, we can convert bio into uio and reuse dmu_{read,write}_uio. This way, we can remove some duplicate code. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4078	2015-12-15 16:21:43 -08:00
Chunwei Chen	502923bb44	Fix uio_prefaultpages for 0 length iovec Userspace can freely pass in whatever iovec it feels like, and it's perfectly legal to pass an iovec which contains a zero length segment. In the current implementation, uio_prefaultpages would touch an out of bound byte in the "last byte" logic. While this probably wouldn't cause any critical error, we would like uio_prefaultpages to be able to continue gracefully. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4078	2015-12-15 16:19:55 -08:00
Brian Behlendorf	eba9e745dc	Handle damaged blk_birth in dsl_deadlist_insert() If a bit were cleared in `bp->blk_birth` such that the txg birth was now lower than any other txg_birth in the deadlist, then there will be no entry before this in the tree. This should be impossible but regardless error handling code has been added for this case. By default this is left as a fatal case and the blk_birth is logged. However, setting `zfs_recover=1` will cause the bp to be placed at the start of the deadlist even though it contains an invalid blk_birth. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Closes #4086 Closes #4089	2015-12-15 16:12:31 -08:00
Brian Behlendorf	1cdb86cba2	Handle block pointers with a corrupt logical size Commit `5f6d0b6` was originally added to gracefully handle block pointers with a damaged logical size. However, it incorrectly assumed that all passed arc_done_func_t could handle a NULL arc_buf_t. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4069 Closes #4080	2015-12-15 16:11:44 -08:00
Brian Behlendorf	245b7ab3d1	Hold the zfs_snapentry_t before dispatch While exceptionally unlikely to cause a problem the zfs_snapentry_t hold should be taken before the dispatch to prevent any possibility of the task being processed before the hold. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com>	2015-12-14 12:06:31 -08:00
Chunwei Chen	1997660170	Fix snapshot automount race cause EREMOTE When a concorrent mount finishes just before calling to zfsctl_snapshot_ismounted, if we return EISDIR, the VFS will return with EREMOTE. We should instead just return 0, so VFS may retry and would likely notice the dentry is alreadly mounted. This will be inline with when usermode helper return EBUSY. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-12-14 12:06:31 -08:00
Brian Behlendorf	5ed27c572c	Change zfs_snapshot_lock from mutex to rw lock By changing the zfs_snapshot_lock from a mutex to a rw lock the zfsctl_lookup_objset() function can be allowed to run concurrently. This should reduce the latency of fh_to_dentry lookups in ZFS snapshots which are being accessed over NFS. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com>	2015-12-14 12:06:31 -08:00
Brian Behlendorf	f22f900f15	Fix zfsctl_lookup_objset() deadlock The zfsctl_snapshot_unmount_delay() function must not be called from zfsctl_lookup_objset() while it is currently holding the zfs_snapshot_lock. This will result in a deadlock. It is safe to call zfsctl_snapshot_unmount_delay_impl() directly because the function already has a reference on the zfs_snapentry_t. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #3997	2015-12-14 12:05:52 -08:00
Brian Behlendorf	5e94284fe5	Set 'zfs_expire_snapshot=0' to disable auto-unmount There are cases where it's desirable that auto-mounted snapshots not expire after a fixed duration. They should be unmounted only when the filesystem they are a snapshot of is unmounted. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com>	2015-12-14 11:02:32 -08:00
Brian Behlendorf	2c4332cf79	Fix cstyle issues in spl-taskq.c and taskq.h This patch only addresses the issues identified by the style checker. It contains no functional changes. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-12-11 16:20:22 -08:00
Chunwei Chen	066b89e685	Don't use tq->tq_lock_flags The flags argument in spin_lock_irqsave is modified out side of spin_lock context. We cannot use a shared variable like tq->tq_lock_flags for them. This patch removes it and uses local variable for the flags. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #506	2015-12-11 16:20:03 -08:00
Olaf Faaland	326172d854	Subclass tq_lock to eliminate a lockdep warning When taskq_dispatch() calls taskq_thread_spawn() to create a new thread for a taskq, linux lockdep warns of possible recursive locking. This is a false positive. One such call chain is as follows, when a taskq needs more threads: taskq_dispatch->taskq_thread_spawn->taskq_dispatch The initial taskq_dispatch() holds tq_lock on the taskq that needed more worker threads. The later call into taskq_dispatch() takes dynamic_taskq->tq_lock. Without subclassing, lockdep believes these could potentially be the same lock and complains. A similar case occurs when taskq_dispatch() then calls task_alloc(). This patch uses spin_lock_irqsave_nested() when taking tq_lock, with one of two new lock subclasses: subclass taskq TQ_LOCK_DYNAMIC dynamic_taskq TQ_LOCK_GENERAL any other Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #480	2015-12-11 16:19:56 -08:00
Brian Behlendorf	c5a8b1e163	Revert "Make taskq_member() use ->journal_info" This reverts commit `a430c11f0b`. Using journal_info like this can cause a BUG at kernel fs/jbd2/transaction.c:425! Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #500	2015-12-08 17:12:36 -08:00
Chunwei Chen	24ef51f660	Use spa as key besides objsetid for snapentry objsetid is not unique across pool, so using it solely as key would cause panic when automounting two snapshot on different pools with the same objsetid. We fix this by adding spa pointer as additional key. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Issue #3948 Issue #3786 Issue #3887	2015-12-08 16:38:56 -08:00
Richard Yao	a430c11f0b	Make taskq_member() use ->journal_info The ->journal_info pointer in the task_struct is reserved for use by filesystems and because the kernel can have multiple file systems on the same stack due to direct reclaim, each filesystem that touches ->journal_info in a callback function will save the value at the start of its frame and restore it at the end of its frame. This allows us to safely use ->journal_info to store a pointer to the taskq's struct in taskq threads so that ZFS code paths can detect the presence of a taskq. This could break if the ZFS code were to use taskq_member from the context of direct reclaim. However, there are no such uses of it in that manner, so this is safe. This eliminates an O(N) list traversal under a spinlock with an O(1) unlocked pointer comparison. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: tuxoko <tuxoko@gmail.com> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #500	2015-12-08 13:24:47 -08:00
Brian Behlendorf	b58986eebf	Use large stacks when available While stack size will vary by architecture it has historically defaulted to 8K on x86_64 systems. However, as of Linux 3.15 the default thread stack size was increased to 16K. These kernels are now the default in most non- enterprise distributions which means we no longer need to assume 8K stacks. This patch takes advantage of that fact by appropriately reverting stack conservation changes which were made to ensure stability. Changes which may have had a negative impact on performance for certain workloads. This also has the side effect of bringing the code slightly more in line with upstream. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #4059	2015-12-07 12:20:43 -08:00
Matthew Ahrens	241b541574	Illumos 5959 - clean up per-dataset feature count code 5959 clean up per-dataset feature count code Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/5959 https://github.com/illumos/illumos-gate/commit/ca0cc39 Porting notes: illumos code doesn't check for feature_get_refcount() returning ENOTSUP (which means feature is disabled) in zdb. zfsonlinux added a check in https://github.com/zfsonlinux/zfs/commit/784652c due to #3468. The check was reintroduced here. Ported-by: Witaut Bajaryn <vitaut.bayaryn@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3965	2015-12-04 14:20:20 -08:00
Brian Behlendorf	072484504f	Add zap_prefetch() interface Provide a generic interface to prefetch ZAP entries by name. This functionality is being added for external consumers such as Lustre. It is based of the existing zap_prefetch_uint64() version which is used by the deduplication code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #4061	2015-12-04 09:39:20 -08:00
Richard Yao	1683e75edc	Fix race between getf() and areleasef() If a vnode is released asynchronously through areleasef(), it is possible for the user process to reuse the file descriptor before areleasef is called. When this happens, getf() will return a stale reference, any operations in the kernel on that file descriptor will fail (as it is closed) and the operations meant for that fd will never occur from userspace's perspective. We correct this by detecting this condition in getf(), doing a putf on the old file handle, updating the file descriptor and proceeding as if everything was fine. When the areleasef() is done, it will harmlessly decrement the reference counter on the Illumos file handle. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #492	2015-12-03 15:44:47 -08:00
tuxoko	b0fe1adeb1	Prevent rm modules.* when make install This was originally in `fe0ed8f910`, but somehow was changed and not working anymore. And it will cause the following error: modprobe: ERROR: ../libkmod/libkmod.c:506 lookup_builtin_file() could not open builtin file '/lib/modules/4.2.0-18-generic/modules.builtin.bin' Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4027	2015-12-02 14:39:12 -08:00
tuxoko	d28c5c4f04	Prevent rm modules.* when make install This was originally in `e80cd06b8e`, but somehow was changed and not working anymore. And it will cause the following error: modprobe: ERROR: ../libkmod/libkmod.c:506 lookup_builtin_file() could not open builtin file '/lib/modules/4.2.0-18-generic/modules.builtin.bin' Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #501	2015-12-02 14:38:20 -08:00
Dimitri John Ledkov	9f456111ab	spl-kmem-cache: include linux/prefetch.h for prefetchw() This is needed for architectures that do not have a builtin prefetchw() Signed-off-by: Dimitri John Ledkov <xnox@ubuntu.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #502	2015-12-02 12:45:06 -08:00
Chunwei Chen	61d482f7cd	Linux 4.4 compat: xattr operations takes xattr_handler The xattr_hander->{list,get,set} were changed to take a xattr_handler, and handler_flags argument was removed and should be accessed by handler->flags. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #4021	2015-12-01 16:48:25 -08:00
Chunwei Chen	1a09371678	Linux 4.4 compat: make_request_fn returns blk_qc_t As part of block polling support in Linux 4.4, make_request_fn should return a cookie value of type blk_qc_t. For now, we make zvol_request always return BLK_QC_T_NONE until we assess whether and how we want to support block polling. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #4021	2015-12-01 16:48:08 -08:00
tuxoko	43518d92fd	Fix zfs_dirty_data_max overflow on 32-bit On 32 bit, the calculation of zfs_dirty_data_max from phymem will overflow, causing it to be smaller than zfs_dirty_data_sync, and will cause txg being delayed while no one write to disk. The end result is horrendous write speed. On 4G ram 32-bit VM, before this patch, simple dd results in ~7MB/s. Now it can reach speed on par with 64-bit VM. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3973	2015-11-19 16:02:47 -08:00
tuxoko	d0c614ecf9	Fix null pointer in arc_kmem_reap_now on 32-bit On 32 bit system, zio_buf_cache is limit to 1M. Larger than that is all NULL. So we need to avoid reaping them. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3973	2015-11-19 16:01:47 -08:00
Chunwei Chen	d287880afd	Fix snapshot automount behavior when concurrent or fail When concurrent threads accessing the snapdir, one will succeed the user helper mount while others will get EBUSY. However, the original code treats those EBUSY threads as success and goes on to do zfsctl_snapshot_add, which causes repeated avl_add and thus panic. Also, if the snapshot is already mounted somewhere else, a thread accessing the snapdir will also get EBUSY from user helper mount. And it will cause strange things as doing follow_down_one will fail and then follow_up will jump up to the mountpoint of the filesystem and confuse the hell out of VFS. The patch fix both behavior by returning 0 immediately for the EBUSY threads. Note, this will have a side effect for the second case where the VFS will retry several times before returning ELOOP. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4018	2015-11-19 15:36:59 -08:00
Brian Behlendorf	3d8d245fb3	Follow 0/-E convention for module load errors Because errors during module load are so rare it went unnoticed that it was possible that a positive errno was returned. This would result in the module being loaded, nothing being initialized, and a system panic shortly thereafter. This is what was causing the hard failures in the automated testing. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-11-16 16:10:06 -08:00
Brian Behlendorf	e7b75d9b46	Limit maximum object size in kmem tests Limit the maximum object size to 1/128 of total system memory for the kmem cache tests. Large values can result in out of memory errors for systems with less the 512M of memory. Additionally, use the known number of objects per-slab for calculating the number of objects to use for a test. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-11-16 15:02:24 -08:00
AndCycle	256fa983f4	Obey arc_meta_limit default size when changing arc_max When decreasing the maximum ARC size preserve the 3/4 default ratio for the arc_meta_limit. Otherwise, the arc_meta_limit may be set the same as arc_max. Signed-off-by: AndCycle <andcycle@andcycle.idv.tw> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4001	2015-11-13 15:45:22 -08:00
loli10K	31f24932a4	Remove superfluous `newline` character Remove superfluous `newline` character from spl_kmem_cache_magazine_size module parameter description. Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #499	2015-11-13 15:27:45 -08:00
tuxoko	f5f2b87df0	Fix taskq dynamic spawning Currently taskq_dispatch() will spawn new task with a condition that the caller is also a member of the taskq. However, under this condition, it will still cause deadlock where a task on tq1 is waiting another thread, who is trying to dispatch a task on tq1. So this patch removes the check. For example when you do: zfs send pp/fs0@001 \| zfs recv pp/fs0_copy This will easily deadlock before this patch. Also, move the seq_task check from taskq_thread_spawn() to taskq_thread() because it's not used by the caller from taskq_dispatch(). Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #496	2015-11-13 15:02:55 -08:00
Chunwei Chen	3e7e6f34d0	Don't call kmem_cache_shrink from shrinker Linux slab will automatically free empty slab when number of partial slab is over min_partial, so we don't need to explicitly shrink it. In fact, calling kmem_cache_shrink from shrinker will cause heavy contention on kmem_cache_node->list_lock, to the point that it might cause __slab_free to livelock (see zfsonlinux/zfs#3936) Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes zfsonlinux/zfs#3936 Closes #487	2015-11-11 13:48:31 -08:00
Chunwei Chen	07d63f0cb9	Fix fail path in zfs_znode_alloc When sa_bulk_lookup() fails, unlock_new_inode() will spit out a WARNING. It will also recursive deadlock on ZFS_OBJ_HOLD_ENTER in zfs_zinactive(). Since we never call insert_inode_locked in fail path, I_NEW is never set, the inode is never hashed. So unlock_new_inode() can be safely remove it. We set z_sa_hdl to NULL in fail path so that iput path will stop at zfs_inactive() without entering zfs_zinactive(). This way we can avoid the deadlock and prevent double sa_handle_destroy(). Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3899	2015-10-13 15:57:17 -07:00
Chunwei Chen	aa159afb56	Fix use-after-free in vdev_disk_physio_completion Currently, vdev_disk_physio_completion will try to wake up an waiter without first checking the existence. This creates a race window in which complete is called after dr is freed. We add dr_wait in dio_request to indicate the existence of waiter. Also, remove dr_rw since no one is using it, and reorder dr_ref to make the struct more compact in 64bit. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3917 Issue #3880	2015-10-13 15:25:33 -07:00
Justin T. Gibbs	bc4501f75a	Illumos 6267 - dn_bonus evicted too early 6267 dn_bonus evicted too early Reviewed by: Richard Yao <ryao@gentoo.org> Reviewed by: Xin LI <delphij@freebsd.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/6267 https://github.com/illumos/illumos-gate/commit/d205810 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Ned Bass bass6@llnl.gov Issue #3865 Issue #3443	2015-10-13 14:12:02 -07:00
Brian Behlendorf	9b13f65d28	Fix CPU hotplug Allocate a kmem cache magazine for every possible CPU which might be added to the system. This ensures that when one of these CPUs is enabled it can be safely used immediately. For many systems the number of online CPUs is identical to the number of present CPUs so this does imply an increased memory footprint. In fact, dynamically allocating the array of magazine pointers instead of using the worst case NR_CPUS can end up decreasing our memory footprint. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #482	2015-10-13 09:50:40 -07:00
Brian Behlendorf	935434ef01	Fix 'arc_c < arc_c_min' panic Strictly enforce keeping 'arc_c >= arc_c_min'. The ASSERTs are left in place to catch this in a debug build but logic has been added to gracefully handle in a production build. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3904	2015-10-13 09:23:35 -07:00
Richard Yao	919efe93cb	zfs_inode_update should not call dmu_object_size_from_db under spinlock We should never block when holding a spin lock, but zfs_inode_update can block in the critical section of a spin lock in zfs_inode_update: zfs_inode_update -> dmu_object_size_from_db -> zrl_add -> mutex_enter Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3858	2015-09-30 10:47:40 -07:00
Richard Yao	bc8ffb2d08	Remove obsolete zv_lock All users of zv_lock were removed by `37f9dac`, but we forgot to remove it. Lets remove it as clean up. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3858	2015-09-30 10:43:19 -07:00
Chunwei Chen	45838e3a41	Fix uioskip crash when skip to end When doing uioskip to skip an iovec to the very end, the current loop condition will falsely check pass the end of iovec. We fix this checking uio_iovcnt first. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3806 Closes #3850	2015-09-29 10:06:58 -07:00
Brian Behlendorf	2ebe396046	Fix PAX Patch/Grsec SLAB_USERCOPY panic Support grsecurity/PaX kernel configurations where CONFIG_PAX_USERCOPY_SLABS are enabled. When this kernel option is enabled slabs which are used to copy between user and kernel space must be created with SLAB_USERCOPY. Stock Linux kernels do not have a SLAB_USERCOPY definition so this causes no change in behavior for non-PAX-enabled kernels. Verified-by: Wuffleton <null@wuffleton.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2977 Issue #3796	2015-09-28 09:18:29 -07:00
Richard Yao	b815ec32b3	Userspace can pass zero length segments via writev/readv Userspace can trigger an assertion by passing a zero-length segment when assertions are enabled: [27961.614792] VERIFY3(skip < iov->iov_len) failed (0 < 0) [27961.614795] PANIC at zfs_uio.c:187:uio_prefaultpages() [27961.614805] Call Trace: [27961.614811] dump_stack+0x45/0x57 [27961.614830] spl_dumpstack+0x44/0x50 [spl] [27961.614834] spl_panic+0xbb/0x100 [spl] [27961.614908] uio_prefaultpages+0x134/0x140 [zcommon] [27961.614930] zfs_write+0x1fd/0xe80 [zfs] [27961.615014] zpl_write_common_iovec+0x7f/0x110 [zfs] [27961.615035] zpl_iter_write+0xa0/0xd0 [zfs] [27961.615037] do_iter_readv_writev+0x59/0x80 [27961.615063] do_readv_writev+0x11b/0x260 [27961.615098] vfs_writev+0x39/0x50 [27961.615100] SyS_writev+0x4a/0xe0 [27961.615103] system_call_fastpath+0x16/0x6e The solution is to delete the assertion. This could potentially occur in uiomove as well, which contains analogous assertions that appear similarly unnecessary, so we remove those as well. Reported-by: Jonathan Vasquez <jvasquez1011@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Issue #3792	2015-09-25 12:51:16 -07:00
Brian Behlendorf	a3000f9358	Revert "dmu_objset_userquota_get_ids uses dn_bonus unsafely" This reverts commit `5f8e1e8505`. It was determined that this patch introduced the quota regression described in #3789. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3443 Issue #3789	2015-09-25 12:50:24 -07:00
Brian Behlendorf	5592404784	Fix synchronous behavior in __vdev_disk_physio() Commit `b39c22b` set the READ_SYNC and WRITE_SYNC flags for a bio based on the ZIO_PRIORITY_* flag passed in. This had the unnoticed side-effect of making the vdev_disk_io_start() synchronous for certain I/Os. This in turn resulted in vdev_disk_io_start() being able to re-dispatch zio's which would result in a RCU stalls when a disk was removed from the system. Additionally, this could negatively impact performance and explains the performance regressions reported in both #3829 and #3780. This patch resolves the issue by making the blocking behavior dependent on a 'wait' flag being passed rather than overloading the passed bio flags. Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to non-rotational devices where there is no benefit to queuing to aggregate the I/O. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3652 Issue #3780 Issue #3785 Issue #3817 Issue #3821 Issue #3829 Issue #3832 Issue #3870	2015-09-25 12:47:31 -07:00
Brian Behlendorf	ef5b2e1048	Avoid blocking in arc_reclaim_thread() As described in the comment above arc_reclaim_thread() it's critical that the reclaim thread be careful about blocking. Just like it must never wait on a hash lock, it must never wait on a task which can in turn wait on the CV in arc_get_data_buf(). This will deadlock, see issue #3822 for full backtraces showing the problem. To resolve this issue arc_kmem_reap_now() has been updated to use the asynchronous arc prune function. This means that arc_prune_async() may now be called while there are still outstanding arc_prune_tasks. However, this isn't a problem because arc_prune_async() already keeps a reference count preventing multiple outstanding tasks per registered consumer. Functionally, this behavior is the same as the counterpart illumos function dnlc_reduce_cache(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Issue #3808 Issue #3834 Issue #3822	2015-09-25 12:45:47 -07:00

... 67 68 69 70 71 ...

4850 Commits