mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-22 18:40:43 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	4bf3909e51	Disable automatic log dumping Long ago infrastructure was added to the SPL to keep an internal debug log of the last few seconds of activity. This was helpful during the early development, but these days it is no longer needed. I haven't had to resort to this debug buffer to resolve an issue for several years now. Today better more generic tools like systemtap and ftrace have evolved to the point where they can be used for this purpose. Along with the stack trace dumped to the system console, and in rare cases a crash dump we almost always have the debug we need. Therefore, I'm disabling the code which automatically dumps this log to disk during an assertion except for the case where spl_debug_panic_on_bug is set (disabled by default). This should be viewed as a first step towards either. a) Retiring this infrastructure and complexity entirely, or b) Integrating this logging more properly with ftrace. As part of this change I'm also removing from the packages the undocumented spl utility which is used to decode the binary logs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-02-05 16:13:27 -08:00
Brian Behlendorf	dd26aa535b	Cast 'zfs bad bloc' to ULL for x86 Explicitly case this value to an unsigned long long for 32-bit systems to inform the compiler that a long type should not be used. Otherwise we get the following compiler error: dmu_send.c:376: error: integer constant is too large for ‘long’ type Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-02-04 16:39:08 -08:00
Brian Behlendorf	0c5493d470	Add zfs_arc_memory_throttle_disable module option The way in which virtual box ab(uses) memory can throw off the free memory calculation in arc_memory_throttle(). The result is the txg_sync thread will effectively spin waiting for memory to be released even though there's lots of memory on the system. To handle this case I'm adding a zfs_arc_memory_throttle_disable module option largely for virtual box users. Setting this option disables free memory checks which allows the txg_sync thread to make progress. By default this option is disabled to preserve the current behavior. However, because Linux supports direct memory reclaim it's doubtful throttling due to perceived memory pressure is ever a good idea. We should enable this option by default once we've done enough real world testing to convince ourselve there aren't any unexpected side effects. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #938	2013-02-01 11:17:14 -08:00
Brian Behlendorf	1f7c30df8f	Add zfs_disable_dup_eviction module option Commit `1eb5bfa` introduced a new zfs_disable_dup_eviction tunable. It should have been made available as a module option in the original patch but was overlooked. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-02-01 09:57:57 -08:00
Brian Behlendorf	6ef94aa67a	Fix tsd_get/set() race with tsd_exit/destroy() The tsd_exit() and tsd_destroy() functions remove entries from hash bins without taking the hash bin lock. They do take the table lock, but tsd_get() and tsd_set() only take the hash bin lock to allow for maximum concurency. The result is that while tsd_get() and tsd_set() are traversing the hash bin list it can be modified by another thread in which happens to hash to the same value. To avoid this add the needed locking to tsd_exit() and tsd_destroy(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #174	2013-01-31 13:54:59 -08:00
Ned Bass	36f86f73f6	Fix mismatch between SA header size and layout When a system attribute layout is created an inconsistency may occur between the system attribute header (sa_hdr_phys_t) size and the variable-sized attribute count stored in the layout. The inconsistency results in the following failed assertion when SA_HDR_SIZE_MATCH_LAYOUT returns false: SPLError: 11315:0:(sa.c:1541:sa_find_idx_tab()) ASSERTION((IS_SA_BONUSTYPE(bonustype) && SA_HDR_SIZE_MATCH_LAYOUT(hdr, tb)) \|\| !IS_SA_BONUSTYPE(bonustype) \|\| (IS_SA_BONUSTYPE(bonustype) && hdr->sa_layout_info == 0)) failed The bug originates in this snippet from sa_find_sizes(). if (is_var_sz && var_size > 1) { if (P2ROUNDUP(hdrsize + sizeof (uint16_t), *total < full_space) { hdrsize += sizeof (uint16_t); This assumes that the current variable-sized attribute will be stored in the current buffer and accounts for the space needed to store its size in the sa_hdr_phys_t. However if the next attribute spills over we need to store a blkptr_t at the end of the bonus buffer to point to the spill block. If the current attribute is in the way of the blkptr_t then it too will be relocated into the spill block. But since we've already accounted for it in the header size we get the inconsistency described above. To avoid this, record the index of the last variable-sized attribute that prompted a hdrsize increase, and reverse the increase if we later determine that that attribute will be relocated to the spill block. Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1250	2013-01-31 10:31:19 -08:00
Ned Bass	67629d0f08	Fix rounding discrepancy in sa_find_sizes() A rounding discrepancy exists between how sa_build_layouts() and sa_find_sizes() calculate when the spill block needs to be kicked in. This results in a narrow size range where sa_build_layouts() believes there must be a spill block allocated but due to the discrepancy there isn't. A panic then occurs when the hdl->sa_spill NULL pointer is dereferenced. The following reproducer for this bug was isolated: truncate -s 128m /tmp/tank zpool create tank /tmp/tank zfs create -o xattr=sa tank/fish ln -s `perl -e 'print "z" x 41'` /tank/fish/z setfattr -hn trusted.foo -v`perl -e 'print "z"x45'` /tank/fish/z This test results in roughly the following system attribute (SA) layout: 176 bytes - "standard" SA's 41 bytes - name of symbolic link target 100 bytes - XDR encoded nvlist for xattr --- 317 bytes - total Because 317 is less than DN_MAX_BONUSLEN (320), sa_find_sizes() decides no spill block is needed. But sa_build_layouts() rounds 41 up to 48 when computing the space requirements so it tries to switch to the spill block. Note that we were only able to reproduce this bug using a combination of symbolic links and the Linux-specific xattr=sa dataset property. So while this issue is not technically Linux-specific, it may be difficult or impossible to hit the narrow size range needed to reproduce it on other platforms. To fix the discrepancy, round the running total in sa_find_sizes() up to an 8-byte boundary before accounting for each SA, since this is how they will be stored in the bonus and (possibly) spill buffers. To make the intent of the code more clear, explicitly assert key assumptions about expected alignment of data and whether spill-over will occur. Signed-off-by: Matthew Ahrens <mahrens@delphix.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1240	2013-01-31 10:31:13 -08:00
Adam H. Leventhal	89103a2643	Illumos #3447 improve the comment in txg.c 3447 improve the comment in txg.c Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Richard Elling <richard.elling@dey-sys.com> Approved by: Dan McDonald <danmcd@nexenta.com> References: illumos/illumos-gate@adbbcfface https://www.illumos.org/issues/3447 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-30 08:55:20 -08:00
Eric Dillmann	9759c60f1a	Illumos #3035 LZ4 compression support in ZFS and GRUB 3035 LZ4 compression support in ZFS and GRUB Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Christopher Siden <csiden@delphix.com> References: illumos/illumos-gate@a6f561b4ae https://www.illumos.org/issues/3035 http://wiki.illumos.org/display/illumos/LZ4+Compression+In+ZFS This patch has been slightly modified from the upstream Illumos version to be compatible with Linux. Due to the very limited stack space in the kernel a lz4 workspace kmem cache is used. Since we are using gcc we are also able to take advantage of the gcc optimized __builtin_ctz functions. Support for GRUB has been dropped from this patch. That code is available but those changes will need to made to the upstream GRUB package. Lastly, several hunks of dead code were dropped for clarity. They include the functions real_LZ4_uncompress(), LZ4_compressBound() and the Visual Studio specific hunks wrapped in _MSC_VER. Ported-by: Eric Dillmann <eric@jave.fr> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1217	2013-01-29 09:28:20 -08:00
Brian Behlendorf	0936c3449f	Add spl_kmem_cache_expire module option Cache aging was implemented because it was part of the default Solaris kmem_cache behavior. The idea is that per-cpu objects which haven't been accessed in several seconds should be returned to the cache. On the other hand Linux slabs never move objects back to the slabs unless there is memory pressure on the system. This behavior is now configurable through the 'spl_kmem_cache_expire' module option. The value is a bit mask with the following meaning. 0x1 - Solaris style cache aging eviction is enabled. 0x2 - Linux style low memory eviction is enabled. Both methods may be safely enabled simultaneously, but by default both are disabled. It has never been clear if the kmem cache aging (which has been around from day one) actually does any good. It has however been the source of numerous bugs so I wouldn't mind retiring it entirely. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes zfsonlinux/zfs#1227 Closes #210	2013-01-28 09:34:12 -08:00
Chris Wedgwood	ddc07fa57a	Avoid gcc -Werror=maybe-uninitialized warnings Explicitly set acl details to zero to silence gcc (zfs_acl_node_read can't be sure zfs_acl_znode_info will set acl_count and aclsize). Normally suppressing these warnings by setting this to zero at declaration time is a bad idea but in this instance it's hard to avoid and should be fairly safe. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1244	2013-01-28 09:10:29 -08:00
Brian Behlendorf	6772fb679a	Use dsl_dataset_snap_lookup() Retire the dmu_snapshot_id() function which was introduced in the initial .zfs control directory implementation. There is already an existing dsl_dataset_snap_lookup() which does exactly what we need, and the dmu_snapshot_id() function as implemented is racy. https://github.com/zfsonlinux/zfs/issues/1215#issuecomment-12579879 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1238	2013-01-25 15:07:40 -08:00
Brian Behlendorf	bf01b5e616	Add d_clear_d_op() compatibility Added d_clear_d_op() helper function which clears some flags and the registered dentry->d_op table. This is required because d_set_d_op() issues a warning when the dentry operations table is already set. For the .zfs control directory to work properly we must be able to override the default operations table and register custom .d_automount and .d_revalidate callbacks. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #1230	2013-01-23 16:33:29 -08:00
Ned Bass	1305d33a4b	fzap_cursor_move_to_key() should drop l_rwlock Callers of zap_deref_leaf() must be careful to drop leaf->l_rwlock since that function returns with the lock held on success. All other callers drop the lock correctly but it seems fzap_cursor_move_to_key() does not. This may block writers or cause VERIFY failures when the lock is freed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1215 Closes zfsonlinux/spl#143 Closes zfsonlinux/spl#97	2013-01-23 16:31:16 -08:00
Brian Behlendorf	09a661e960	Fix zpl_revalidate() NULL deref In zpl_revalidate() it's possible for the nameidata to be NULL for kernels which still accept the parameter. In particular, lookup_one_len() calls d_revalidate() with a NULL nameidata. Resolve the issue by checking for a NULL nameidata in which case just set the flags to 0. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1226	2013-01-22 09:38:17 -08:00
Brian Behlendorf	ee93035378	Use sb->s_d_op default dentry operations As of Linux 2.6.37 the right way to register custom dentry operations is to use the super block's ->s_d_op field. For older kernels they should be registered as part of the lookup operation. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1223	2013-01-18 15:04:23 -08:00
Massimo Maggi	babf3f9b6d	Fix zpool on zvol deadlock Commit `65d56083b4` fixes the lock inversion between spa_namespace_lock and bdev->bd_mutex but only for the first user of spa_namespace_lock: dmu_objset_own(). Later spa_namespace_lock gets acquired by dsl_prop_get_integer() though dsl_prop_get()->dsl_dataset_hold()->dsl_dir_open_spa()-> spa_open()->spa_open_common() without this "protection". By moving the mutex release after this second use, even this acquisition of the lock is "protected" by the ERESTARTSYS trick. Signed-off-by: Massimo Maggi <me@massimo-maggi.eu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1220	2013-01-18 09:44:55 -08:00
Brian Behlendorf	84dd1f4f15	Remove spl_invalidate_inodes() This functionality is no longer required by ZFS, see commit zfsonlinux/zfs@7b3e34ba5a. Since there are no other consumers, and because it adds additional autoconf complexity which must be maintained the spl_invalidate_inodes() function has been removed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue zfsonlinux/zfs#795	2013-01-17 11:40:47 -08:00
Brian Behlendorf	7973e464de	Revert "Revert "Fix unlink/xattr deadlock"" This reverts commit `53c7411919` effectively reinstating the asynchronous xattr cleanup code. These Linux changes were reverted because after testing and careful contemplation I was convinced that due to the 89260a1c8851ce05ea04b23606ba438b271d890 commit they were no longer required. Unfortunately, the deadlock described in #1176 was a case which wasn't considered. At mount zfs_unlinked_drain() can occur which will unlink a list of znodes in effectively a random order which isn't safe. The only reason it was safe to originally revert this change was the we could guarantee that the VFS would always prune the xattr leaves before the parents. Therefore, until we can cleanly resolve this deadlock for all cases we need to keep this change in spite of the xattr unlink performance penalty associated with it. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1176 Issue #457	2013-01-17 11:24:20 -08:00
Brian Behlendorf	7b3e34ba5a	Fix 'zfs rollback' on mounted file systems Rolling back a mounted filesystem with open file handles and cached dentries+inodes never worked properly in ZoL. The major issue was that Linux provides no easy mechanism for modules to invalidate the inode cache for a file system. Because of this it was possible that an inode from the previous filesystem would not get properly dropped from the cache during rolling back. Then a new inode with the same inode number would be create and collide with the existing cached inode. Ideally this would trigger an VERIFY() but in practice the error wasn't handled and it would just NULL reference. Luckily, this issue can be resolved by sprucing up the existing Solaris zfs_rezget() functionality for the Linux VFS. The way it works now is that when a file system is rolled back all the cached inodes will be traversed and refetched from disk. If a version of the cached inode exists on disk the in-core copy will be updated accordingly. If there is no match for that object on disk it will be unhashed from the inode cache and marked as stale. This will effectively make the inode unfindable for lookups allowing the inode number to be immediately recycled. The inode will then only be accessible from the cached dentries. Subsequent dentry lookups which reference a stale inode will result in the dentry being invalidated. Once invalidated the dentry will drop its reference on the inode allowing it to be safely pruned from the cache. Special care is taken for negative dentries since they do not reference any inode. These dentires will be invalidate based on when they were added to the dentry cache. Entries added before the last rollback will be invalidate to prevent them from masking real files in the dataset. Two nice side effects of this fix are: * Removes the dependency on spl_invalidate_inodes(), it can now be safely removed from the SPL when we choose to do so. * zfs_znode_alloc() no longer requires a dentry to be passed. This effectively reverts this portition of the code to its upstream counterpart. The dentry is not instantiated more correctly in the Linux ZPL layer. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #795	2013-01-17 09:51:20 -08:00
Ned Bass	f1a05fa114	Fix false ENOENT on snapshot control dentries Lookups in the snapshot control directory for an existing snapshot fail with ENOENT if an earlier lookup failed before the snapshot was created. This is because the earlier lookup causes a negative dentry to be cached which is never invalidated. The bug can be reproduced as follows (the second ls should succeed): $ ls /tank/.zfs/snapshot/s ls: cannot access /tank/.zfs/snapshot/s: No such file or directory $ zfs snap tank@s $ ls /tank/.zfs/snapshot/s ls: cannot access /tank/.zfs/snapshot/s: No such file or directory To remedy this, always invalidate cached dentries in the snapshot control directory. Since these entries never exist on disk there is no significant performance penalty for the extra lookups. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1192	2013-01-16 16:28:54 -08:00
Ned Bass	94a9bb4709	Fix quoting error in unmount command A misplaced single quote caused the umount command to fail with a syntax error when unmounting snapshots under the .zfs/snapshot control directory. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1210	2013-01-16 15:30:47 -08:00
Christopher Siden	b077fd4c4e	Illumos #3189 kernel panic in test hotspare_onoffline_004_neg 3189 kernel panic in ZFS test suite during hotspare_onoffline_004_neg Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Arne Jansen <sensille@gmx.net> Approved by: Dan McDonald <danmcd@nexenta.com> References: illumos/illumos-gate@8f0b538d1d changeset: 13818:e9ad0a945d45 https://www.illumos.org/issues/3189 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-14 10:34:53 -08:00
Arne Jansen	ff80d9b142	Illumos #1862 incremental zfs receive fails for sparse file > 8PB 1862 incremental zfs receive fails for sparse file > 8PB Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by: Simon Klinkert <klinkert@webgods.de> Approved by: Eric Schrock <eric.schrock@delphix.com> References: illumos/illumos-gate@31495a1e56 illumos changeset: 13789:f0c17d471b7a https://www.illumos.org/issues/1862 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-14 10:34:41 -08:00
Brian Behlendorf	d4899f4747	kmem-cache: Fix slab ageing soft lockup Commit `a10287e00d` slightly reworked the slab ageing code such that it is no longer dependent on the Linux delayed work queue interfaces. This was good for portability and performance, but it requires us to use the on_each_cpu() function to execute the spl_magazine_age() function. That means that the function is now executing in interrupt context whereas before it was scheduled in normal process context. And that means we need to be slightly more careful about the locking in the interrupt handler. With the reworked code it's possible that we'll be holding the skc->skc_lock and be interrupted to handle the spl_magazine_age() IRQ. This will result in a deadlock and soft lockup errors unless we're careful to detect the contention and avoid taking the lock in the interupt handler. So that's what this patch does. Alternately, (and slightly more conventionally) we could have used spin_lock_irqsave() to prevent this race entirely but I'd perfer to avoid disabling interrupts as much as possible due to performance concerns. There is absolutely no penalty for us not aging objects out of the magazine due to contention. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov> Closes zfsonlinux/zfs#1193	2013-01-14 10:07:58 -08:00
Matthew Ahrens	a94addd974	Illumos #3208 cross-endian incorrect user/group accounting 3208 moving zpool cross-endian results in incorrect user/group accounting Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: illumos/illumos-gate@e828a46d29 illumos changeset: 13835:eea81edc4f14 https://www.illumos.org/issues/3208 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #627 Closes #1136	2013-01-14 09:32:22 -08:00
Bart Coddens	5c83989071	Illumos #2618 arc.c mistypes in the comments 2618 arc.c mistypes in the comments Reviewed by: Jason King <jason.brian.king@gmail.com> Reviewed by: Josef Sipek <jeffpc@josefsipek.net> Approved by: Richard Lowe <richlowe@richlowe.net> References: illumos/illumos-gate@fc98fea58e illumos changeset: 13721:5b51a16a186f https://www.illumos.org/issues/2618 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-11 09:16:59 -08:00
Ned Bass	761394b3af	call_usermodehelper() should wait for process As of Linux 3.4 the UMH_WAIT_* constants were renumbered. In particular, the meaning of "1" changed from UMH_WAIT_PROC (wait for process to complete), to UMH_WAIT_EXEC (wait for the exec, but not the process). A number of call sites used the number 1 instead of the constant name, so the behavior was not as expected on kernels with this change. One visible consequence of this change was that processes accessing automounted snapshots received an ELOOP error because they failed to wait for zfs.mount to complete. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #816	2013-01-09 16:54:52 -08:00
Ned Bass	8842263bd0	call_usermodehelper() should wait for process As of Linux 3.4 the UMH_WAIT_* constants were renumbered. In particular, the meaning of "1" changed from UMH_WAIT_PROC (wait for process to complete), to UMH_WAIT_EXEC (wait for the exec, but not the process). A number of call sites used the number 1 instead of the constant name, so the behavior was not as expected on kernels with this change. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-09 16:54:19 -08:00
Brian Behlendorf	1c50c992ba	Revert "Avoid ELOOP on auto-mounted snapshots" This reverts commit `7afcf5b1da` which accidentally introduced a regression with the .zfs snapshot directory. While the updated code still does correctly mount the requested snapshot. It updates the vfsmount such that it references the original dataset vfsmount. The result is that the snapshot itself isn't visible. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #816	2013-01-09 11:24:47 -08:00
Brian Behlendorf	4cec9b2dc7	Only reduce __zio_execute() stack usage in kernel space Related to `91579709fc` we need to be very careful about not overrunning the stack in kernel space. However, in user space we're already allowing slightly larger stacks so this stack usage optimization is not required there. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-09 10:34:35 -08:00
George Wilson	1eb5bfa3dc	Illumos #3145 , #3212 3145 single-copy arc 3212 ztest: race condition between vdev_online() and spa_vdev_remove() Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Justin T. Gibbs <gibbs@scsiguy.com> Approved by: Eric Schrock <eric.schrock@delphix.com> References: illumos-gate/commit/9253d63df408bb48584e0b1abfcc24ef2472382e illumos changeset: 13840:97fd5cdf328a https://www.illumos.org/issues/3145 https://www.illumos.org/issues/3212 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #989 Closes #1137	2013-01-08 10:35:44 -08:00
Matthew Ahrens	753c38392d	Illumos #3104 : eliminate empty bpobjs 3104 eliminate empty bpobjs Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Eric Schrock <eric.schrock@delphix.com> References: illumos/illumos-gate@f174573681 illumos changeset: 13782:8f78aae28a63 https://www.illumos.org/issues/3104 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-08 10:35:43 -08:00
Brian Behlendorf	91579709fc	Fix __zio_execute() asynchronous dispatch To save valuable stack all zio's were made asynchronous when in the tgx_sync_thread context or during pool initialization. See commit `2fac4c2` for the original patch and motivation. Unfortuantely, the changes to dsl_pool_sync_context() made by the feature flags broke this logic causing in __zio_execute() to dispatch itself infinitely when called during pool initialization. This commit refines the existing logic to specificly target only the two cases we care about. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-08 10:35:43 -08:00
George Wilson	ea0b2538cd	Illumos #3349 : zpool upgrade -V bumps the on disk version number 3349 zpool upgrade -V bumps the on disk version number, but leaves the in core version Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Dan McDonald <danmcd@nexenta.com> References: illumos/illumos-gate@25345e4666 https://www.illumos.org/issues/3349 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-08 10:35:43 -08:00
Matthew Ahrens	29809a6cba	Illumos #3086 : unnecessarily setting DS_FLAG_INCONSISTENT on async 3086 unnecessarily setting DS_FLAG_INCONSISTENT on async destroyed datasets Reviewed by: Christopher Siden <chris.siden@delphix.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: illumos/illumos-gate@ce636f8b38 illumos changeset: 13776:cd512c80fd75 https://www.illumos.org/issues/3086 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-08 10:35:43 -08:00
Christopher Siden	b9b24bb4ca	Illumos #2762 : zpool command should have better support for feature flags 2762 zpool command should have better support for feature flags Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: illumos/illumos-gate@57221772c3 https://www.illumos.org/issues/2762 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-08 10:35:43 -08:00
George Wilson	3bc7e0fb0f	Illumos #3090 and #3102 3090 vdev_reopen() during reguid causes vdev to be treated as corrupt 3102 vdev_uberblock_load() and vdev_validate() may read the wrong label Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: illumos/illumos-gate@dfbb943217 illumos changeset: 13777:b1e53580146d https://www.illumos.org/issues/3090 https://www.illumos.org/issues/3102 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #939	2013-01-08 10:35:42 -08:00
Christopher Siden	9ae529ec5d	Illumos #2619 and #2747 2619 asynchronous destruction of ZFS file systems 2747 SPA versioning with zfs feature flags Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <gwilson@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: illumos/illumos-gate@53089ab7c8 illumos/illumos-gate@ad135b5d64 illumos changeset: 13700:2889e2596bd6 https://www.illumos.org/issues/2619 https://www.illumos.org/issues/2747 NOTE: The grub specific changes were not ported. This change must be made to the Linux grub packages. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-08 10:35:35 -08:00
Brian Behlendorf	1c7b3eaf87	RHEL 6.4 compat, fallocate() In the upstream kernel the FALLOC_FL_PUNCH_HOLE #define was introduced after the fallocate() function was moved from the inode_operations to the file_operations structure. Therefore, the SPL code assumed that if FALLOC_FL_PUNCH_HOLE was defined it was safe to use f_ops->fallocate(). Unfortunately, the RHEL6.4 kernel has only backported the FALLOC_FL_PUNCH_HOLE #define and not the fallocate() change. To address this compatibility issue the spl_filp_fallocate() helper function was added to properly detect which interface is available. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-08 09:53:13 -08:00
Ned Bass	37f000c5aa	Fix gcc array subscript above bounds warning In a debug build, certain GCC versions flag an array bounds warning in the below code from dnode_sync.c } else { int i; ASSERT(dn->dn_next_nblkptr[txgoff] < dnp->dn_nblkptr); /* the blkptrs we are losing better be unallocated */ for (i = dn->dn_next_nblkptr[txgoff]; i < dnp->dn_nblkptr; i++) ASSERT(BP_IS_HOLE(&dnp->dn_blkptr[i])); This usage is in fact safe, since the ASSERT ensures the index does not exceed to maximum possible number of block pointers. However gcc can't determine that the assignment 'i = dn->dn_next_nblkptr[txgoff];' falls within the array bounds so it issues a warning. To avoid this, initialize i to zero to make gcc happy but skip the elements before dn->dn_next_nblkptr[txgoff] in the loop body. Since a dnode contains at most 3 block pointers this overhead should be negligible. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #950	2013-01-07 11:21:52 -08:00
Matt Johnston	72938d6905	Use cv_wait_io() which will will account for iowait Update zio_wait() to use cv_wait_io() to ensure the iowait time is properly accounted for. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-07 10:52:52 -08:00
Matt Johnston	72f53c5694	Revert part of "Log I/Os longer than zio_delay_max (30s default)" This reverts commit `9dcb971983` which was originally introduced to debug occasional slow I/Os. These I/Os would complete eventually but were observed to take several 100 seconds. The root cause of this issue was the CFQ scheduler which can, under certain conditions, excessively delay an I/O from being issued to the device. This issue was mitigated somewhat by commit `84daaddedb` which ensures the I/O elevator gets changed even for DM style devices. This change isn't in any way harmful but it does conflict with a required change to properly account from I/O wait time. Because Linux does not export the io_schedule_timeout() function we must instead rely on io_schedule() via cv_wait_io(). The additional debugging information which was added to the delay event has been intentionally left in place. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-07 10:51:04 -08:00
Matt Johnston	46a75aadb7	Add cv_wait_io() to account I/O time Under Linux when a task is waiting on I/O it should call the io_schedule() function for proper accounting. The Solaris cv_wait() function provides no way to specify what the cv is waiting on therefore cv_wait_io() is introduced. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #206	2013-01-07 10:29:26 -08:00
Brian Behlendorf	65d56083b4	Fix zpool on zvol lock inversion deadlock In all but one case the spa_namespace_lock is taken before the bdev->bd_mutex lock. But Linux __blkdev_get() function calls fops->open() with the bdev->bd_mutex lock held and we must somehow still safely acquire the spa_namespace_lock. To avoid a potential lock inversion deadlock we preemptively try to take the spa_namespace_lock(). Normally it will not be contended and this is safe because spa_open_common() handles the case where the caller already holds the spa_namespace_lock. When it is contended we risk a lock inversion if we were to block waiting for the lock. Luckily, the __blkdev_get() function allows us to return -ERESTARTSYS which will result in bdev->bd_mutex being dropped, reacquired, and fops->open() being called again. This process can be repeated safely until both locks are acquired. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jorgen Lundman <lundman@lundman.net> Closes #612	2012-12-20 09:57:39 -08:00
Brian Behlendorf	d5446cfc52	Revert "Remove TSD zfs_fsyncer_key" This reverts commit `31f2b5abdf` back to the original code until the fsync(2) performance regression can be addressed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-12-20 09:56:28 -08:00
Brian Behlendorf	31f2b5abdf	Remove TSD zfs_fsyncer_key It's my understanding that the zfs_fsyncer_key TSD was added as a performance omtimization to reduce contention on the zl_lock from zil_commit(). This issue manifested itself as very long (100+ms) fsync() system call times for fsync() heavy workloads. However, under Linux I'm not seeing the same contention that was originally described. Therefore, I'm removing this code in order to ween ourselves off any dependence on TSD. If the original performance issue reappears on Linux we can revisit fixing it without resorting to TSD. This just leaves one small ZFS TSD consumer. If it can be cleanly removed from the code we'll be able to shed the SPL TSD implementation entirely. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes zfsonlinux/spl#174	2012-12-19 09:08:01 -08:00
Brian Behlendorf	034f1b331e	Fix spl_kmem_init_kallsyms_lookup() panic Due to I/O buffering the helper may return successfully before the proc handler has a chance to execute. To catch this case wait up to 1 second to verify spl_kallsyms_lookup_name_fn was updated to a non SYMBOL_POISON value. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes zfsonlinux/zfs#699 Closes zfsonlinux/zfs#859	2012-12-19 09:06:35 -08:00
Prakash Surya	84daaddedb	Set elevator for DM devices despite vdev_wholedisk The current state of udev and devicer-mapper devices makes it difficult to construct a mapping of DM partitions and their underlying DM device. For example, with a /dev directory with the following contents: $ ls -d /dev/dm-* /dev/dm-0 /dev/dm-1 /dev/dm-2 /dev/dm-3 it is not immediately apparent if these are completely separate devices, or partitions and real devices intermixed. In contrast, SCSI devices would appear as so: $ ls -d /dev/sd* /dev/sda /dev/sda1 /dev/sdb /dev/sdb1 Here, one can immediately determine that there are two devices (sda and sdb), each containing a single partition. The lack of a predictable and consistent mapping from DM devices to DM device partitions makes it difficult for user space to process these devices the same way it does SCSI devices. As a result, the ZFS utilities do not partition DM devices, and instead set the "vdev_wholedisk" label to 0 and treat them as partitions. This has the side effect that, even if ZFS has sole ownership of the device, the IO scheduler will not be modified because it is treated as a partition. This change adds an exception for DM devices in vdev_elevator_switch, allowing the elevator to be modified even though the "vdev_wholedisk" property is not set. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1149	2012-12-18 15:12:40 -08:00
Jorgen Lundman	6c2856726f	Fix using zvol as slog device During the original ZoL port the vdev_uses_zvols() function was disabled until it could be properly implemented. This prevented a zpool from use a zvol for its slog device. This patch implements that missing functionality by adding a zvol_is_zvol() function to zvol.c. Given the full path to a device it will lookup the device and verify its major number against the registered zvol major number for the system. If they match we know the device is a zvol. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1131	2012-12-18 11:02:28 -08:00

... 71 72 73 74 75 ...

4334 Commits