mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-01-25 10:12:13 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	37c56346cc	Close possible zfs_znode_held() race Check if the lock is held while holding the z_hold_locks() lock. This prevents a possible use-after-free bug for callers which are not holding the lock. There currently are no such callers so this can't cause a problem today but it has been fixed regardless. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Closes #4244 Issue #4124	2016-01-20 13:36:15 -08:00
Brian Behlendorf	4967a3eb9d	Linux 4.5 compat: xattr list handler The registered xattr .list handler was simplified in the 4.5 kernel to only perform a permission check. Given a dentry for the file it must return a boolean indicating if the name is visible. This differs slightly from the previous APIs which also required the function to copy the name in to the provided list and return its size. That is now all the responsibility of the caller. This should be straight forward change to make to ZoL since we've always required the caller to make the copy. However, this was slightly complicated by the need to support 3 older APIs. Yes, between 2.6.32 and 4.5 there are 4 versions of this interface! Therefore, while the functional change in this patch is small it includes significant cleanup to make the code understandable and maintainable. These changes include: - Improved configure checks for .list, .get, and .set interfaces. - Interfaces checked from newest to oldest. - Strict checking for each possible known interface. - Configure fails when no known interface is available. - HAVE__XATTR_LIST renamed HAVE_XATTR_LIST_ for consistency with similar iops and fops configure checks. - POSIX_ACL_XATTR_{DEFAULT\|ACCESS} were removed forcing callers to move to their replacements, XATTR_NAME_POSIX_ACL_{DEFAULT\|ACCESS}. Compatibility wrapper were added for old kernels. - ZPL_XATTR_LIST_WRAPPER added which behaves the same as the existing ZPL_XATTR_{GET\|SET} WRAPPERs. Only the inode is guaranteed to be a valid pointer, passing NULL for the 'list' and 'name' variables is allowed and must be checked for. All .list functions were updated to use the wrapper to aid readability. - zpl_xattr_filldir() updated to use the .list function for its permission check which is consistent with the updated Linux 4.5 interface. If a .list function is registered it should return 0 to indicate a name should be skipped, if there is no registered function the name will be added. - Additional documentation from xattr(7) describing the correct behavior for each namespace was added before the relevant handlers. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Issue #4228	2016-01-20 11:36:56 -08:00
Brian Behlendorf	beeed4596b	Linux 4.5 compat: get_link() / put_link() The follow_link() interface was retired in favor of get_link(). In the process of phasing in get_link() the Linux kernel went through two different versions. The first of which depended on put_link() and the final version on a delayed done function. - Improved configure checks for .follow_link, .get_link, .put_link. - Interfaces checked from newest to oldest. - Strict checking for each possible known interface. - Configure fails when no known interface is available. - Both versions .get_link are detected and supported as well two previous versions of .follow_link. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Issue #4228	2016-01-20 11:36:00 -08:00
Josef 'Jeff' Sipek	bc89ac8479	Illumos 5045 - use atomic_{inc,dec}_* instead of atomic_add_* 5045 use atomic_{inc,dec}_* instead of atomic_add_* Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Robert Mustacchi <rm@joyent.com> References: https://www.illumos.org/issues/5045 https://github.com/illumos/illumos-gate/commit/1a5e258 Porting notes: - All changes to non-ZFS files dropped. - Changes to zfs_vfsops.c dropped because they were Illumos specific. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4220	2016-01-15 15:38:36 -08:00
Marcel Telka	812e91a7e3	Illumos 4039 - zfs_rename()/zfs_link() needs stronger test for XDEV 4039 zfs_rename()/zfs_link() needs stronger test for XDEV Reviewed by: Gordon Ross <gordon.ross@nexenta.com> Reviewed by: Kevin Crowe <kevin.crowe@nexenta.com> Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@nexenta.com> References: https://www.illumos.org/issues/4039 https://github.com/illumos/illumos-gate/commit/18e6497 Porting notes: - This check was updated in Linux in a similar fashion early on in the port. Therefore, this patch just reorders the function and updates the comment so it flows the same way as the upstream code. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4218	2016-01-15 15:38:35 -08:00
George Wilson	59d4c71cca	Illumos 3557, 3558, 3559, 3560 3557 dumpvp_size is not updated correctly when a dump zvol's size is changed 3558 setting the volsize on a dump device does not return back ENOSPC 3559 setting a volsize larger than the space available sometimes succeeds 3560 dumpadm should be able to remove a dump device Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Approved by: Albert Lee <trisk@nexenta.com> References: https://www.illumos.org/issues/3559 https://github.com/illumos/illumos-gate/commit/c61ea56 Porting notes: - Internal zvol.c changes not applied due to implementation differences. The external interface and behavior was already consistent with the latest upstream code. - Retired 2.6.28 HAVE_CHECK_DISK_SIZE_CHANGE configure check. All supported kernels (2.6.32 and newer) provide this interface. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4217	2016-01-15 15:38:35 -08:00
Chunwei Chen	21f604d460	Prevent duplicated xattr between SA and dir When replacing an xattr would cause overflowing in SA, we would fallback to xattr dir. However, current implementation don't clear the one in SA, so we would end up with duplicated SA. For example, running the following script on an xattr=sa filesystem would cause duplicated "user.1". -- dup_xattr.sh begin -- randbase64() { dd if=/dev/urandom bs=1 count=$1 2>/dev/null \| openssl enc -a -A } file=$1 touch $file setfattr -h -n user.1 -v `randbase64 5000` $file setfattr -h -n user.2 -v `randbase64 20000` $file setfattr -h -n user.3 -v `randbase64 20000` $file setfattr -h -n user.1 -v `randbase64 20000` $file getfattr -m. -d $file -- dup_xattr.sh end -- Also, when a filesystem is switch from xattr=sa to xattr=on, it will never modify those in SA. This would cause strange behavior like, you cannot delete an xattr, or setxattr would cause duplicate and the result would not match when you getxattr. For example, the following shell sequence. -- shell begin -- $ sudo zfs set xattr=sa pp/fs0 $ touch zzz $ setfattr -n user.test -v asdf zzz $ sudo zfs set xattr=on pp/fs0 $ setfattr -x user.test zzz setfattr: zzz: No such attribute $ getfattr -d zzz user.test="asdf" $ setfattr -n user.test -v zxcv zzz $ getfattr -d zzz user.test="asdf" user.test="asdf" -- shell end -- We fix this behavior, by first finding where the xattr resides before setxattr. Then, after we successfully updated the xattr in one location, we will clear the other location. Note that, because update and clear are not in single tx, we could still end up with duplicated xattr. But by doing setxattr again, it can be fixed. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #3472 Closes #4153	2016-01-15 15:38:35 -08:00
Richard Yao	b10695c8f1	Remove fastwrite mutex The fast write mutex is intended to protect accounting, but it is redundant because all accounting is performed through atomic operations. It also serializes all metaslab IO behind a mutex, which introduces a theoretical scaling regression that the Illumos developers did not like when we showed this to them. Removing it makes the selection of the metaslab_group lock free as it is on Illumos. The selection is not quite the same without the lock because the loop races with IO completions, but any imbalances caused by this are likely to be corrected by subsequent metaslab group selections. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3643	2016-01-15 15:38:35 -08:00
Brian Behlendorf	c96c36fa22	Fix zsb->z_hold_mtx deadlock The zfs_znode_hold_enter() / zfs_znode_hold_exit() functions are used to serialize access to a znode and its SA buffer while the object is being created or destroyed. This kind of locking would normally reside in the znode itself but in this case that's impossible because the znode and SA buffer may not yet exist. Therefore the locking is handled externally with an array of mutexs and AVLs trees which contain per-object locks. In zfs_znode_hold_enter() a per-object lock is created as needed, inserted in to the correct AVL tree and finally the per-object lock is held. In zfs_znode_hold_exit() the process is reversed. The per-object lock is released, removed from the AVL tree and destroyed if there are no waiters. This scheme has two important properties: 1) No memory allocations are performed while holding one of the z_hold_locks. This ensures evict(), which can be called from direct memory reclaim, will never block waiting on a z_hold_locks which just happens to have hashed to the same index. 2) All locks used to serialize access to an object are per-object and never shared. This minimizes lock contention without creating a large number of dedicated locks. On the downside it does require znode_lock_t structures to be frequently allocated and freed. However, because these are backed by a kmem cache and very short lived this cost is minimal. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #4106	2016-01-15 15:33:45 -08:00
Brian Behlendorf	0720116d4d	Add zfs_object_mutex_size module option Add a zfs_object_mutex_size module option to facilitate resizing the the per-dataset znode mutex array. Increasing this value may help make the deadlock described in #4106 less common, but this is not a proper fix. This patch is primarily to aid debugging and analysis. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Issue #4106	2016-01-15 15:33:44 -08:00
Brian Behlendorf	89666a8e1c	Increase default user space stack size Under RHEL6/CentOS6 the default stack size must be increased to 32K to prevent overflowing the stack when running ztest. This isn't an issue for other distributions due to either the version of pthreads or perhaps the compiler. Doubling the stack size resolves the issue safely for all distribution and leaves us some headroom. $ sudo -E ztest -V -T 300 -f /var/tmp/ 5 vdevs, 7 datasets, 23 threads, 300 seconds... loading space map for vdev 0 of 1, metaslab 0 of 30 ... ... loading space map for vdev 0 of 1, metaslab 14 of 30 ... child died with signal 11 Exited ztest with error 3 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4215	2016-01-13 13:55:12 -08:00
Will Andrews	e6cfd633be	Illumos 3749 - zfs event processing should work on R/O root filesystems 3749 zfs event processing should work on R/O root filesystems Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3749 https://github.com/illumos/illumos-gate/commit/3cb69f7 Porting notes: - [include/sys/spa_impl.h] - `ffe9d38` Add generic errata infrastructure - `1421c89` Add visibility in to arc_read - [include/sys/fm/fs/zfs.h] - `2668527` Add linux events - `6283f55` Support custom build directories and move includes - [module/zfs/spa_config.c] - Updated spa_config_sync() to match illumos with the exception of a Linux specific block. Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-12 14:42:32 -08:00
Justin Gibbs	ee3a23b84e	Illumos 5438 - zfs_blkptr_verify should continue after zfs_panic_recover 5438 zfs_blkptr_verify should continue after zfs_panic_recover Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Xin LI <delphij@freebsd.org> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5438 https://github.com/illumos/illumos-gate/commit/5897eb4 Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-12 13:54:05 -08:00
Josef 'Jeff' Sipek	fc581e0507	Illumos 5515 - dataset user hold doesn't reject empty tags 5515 dataset user hold doesn't reject empty tags Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Approved by: Matthew Ahrens <mahrens@delphix.com> References: https://www.illumos.org/issues/5515 https://github.com/illumos/illumos-gate/commit/752fd8d Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-12 13:52:26 -08:00
George Wilson	a6fb32b85a	Illumos 6281 - prefetching should apply to 1MB reads 6281 prefetching should apply to 1MB reads Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Alexander Motin <mav@freebsd.org> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Justin Gibbs <gibbs@scsiguy.com> Reviewed by: Xin Li <delphij@freebsd.org> Approved by: Gordon Ross <gordon.ross@nexenta.com> References: https://www.illumos.org/issues/6281 https://github.com/illumos/illumos-gate/commit/6328027 Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-12 13:51:27 -08:00
Saso Kiselkov	adfe9d932b	Illumos 6367 - spa_config_tryenter incorrectly handles the multiple-lock case 6367 spa_config_tryenter incorrectly handles the multiple-lock case Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Reviewed by: Prashanth Sreenivasa <prashksp@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk> Approved by: Matthew Ahrens <mahrens@delphix.com> References: https://www.illumos.org/issues/6367 https://github.com/illumos/illumos-gate/commit/e495b6e Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-12 11:05:28 -08:00
Joe Stein	5f3d9c69d1	Illumos 6295 - metaslab_condense's dbgmsg should include vdev id 6295 metaslab_condense's dbgmsg should include vdev id Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Andriy Gapon <avg@freebsd.org> Reviewed by: Xin Li <delphij@freebsd.org> Reviewed by: Justin Gibbs <gibbs@scsiguy.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/6295 https://github.com/illumos/illumos-gate/commit/daec38e Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-12 11:02:07 -08:00
Justin T. Gibbs	0eb21616fa	Illumos 6171 - dsl_prop_unregister() slows down dataset eviction. 6171 dsl_prop_unregister() slows down dataset eviction. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/6171 https://github.com/illumos/illumos-gate/commit/03bad06 Porting notes: - Conflicts - `3558fd7` Prototype/structure update for Linux - `2cf7f52` Linux compat 2.6.39: mount_nodev() - `13fe019` Illumos #3464 - `241b541` Illumos 5959 - clean up per-dataset feature count code - dsl_prop_unregister() preserved until out of tree consumers like Lustre can transition to dsl_prop_unregister_all(). - Fixing 'space or tab at end of line' in include/sys/dsl_dataset.h Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-12 10:53:12 -08:00
Matthew Ahrens	5a28a9737a	Illumos 6288 - dmu_buf_will_dirty could be faster 6288 dmu_buf_will_dirty could be faster Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Justin Gibbs <gibbs@scsiguy.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Robert Mustacchi <rm@joyent.com> References: https://www.illumos.org/issues/6288 https://github.com/illumos/illumos-gate/commit/0f2e7d0 Porting notes: - [module/zfs/dbuf.c] - Fix 'warning: ISO C90 forbids mixed declarations and code' by moving 'dbuf_dirty_record_t *dr' to start of code block. Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-12 09:13:52 -08:00
George Wilson	2e8efe1bef	Illumos 6292 - exporting a pool while an async destroy 6292 exporting a pool while an async destroy is running can leave entries in the deferred tree Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Fabian Keil <fk@fabiankeil.de> Approved by: Gordon Ross <gordon.ross@nexenta.com> References: https://www.illumos.org/issues/6292 https://github.com/illumos/illumos-gate/commit/a443cc8 Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-12 09:10:52 -08:00
Matthew Ahrens	5511754b4f	Illumos 6319 - assertion failed in zio_ddt_write: bp->blk_birth == txg 6319 assertion failed in zio_ddt_write: bp->blk_birth == txg Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/6319 https://github.com/illumos/illumos-gate/commit/b39b744 Porting notes: - Re-enabled ztest for CentOS test slaves. Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3449	2016-01-12 09:10:52 -08:00
Matthew Ahrens	7f60329a26	Illumos 5987 - zfs prefetch code needs work 5987 zfs prefetch code needs work Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Gordon Ross <gordon.ross@nexenta.com> References: https://www.illumos.org/issues/5987 zfs prefetch code needs work illumos/illumos-gate@cf6106c 5987 zfs prefetch code needs work Porting notes: - [module/zfs/dbuf.c] - `5f6d0b6` Handle block pointers with a corrupt logical size - [module/zfs/dmu_zfetch.c] - `c65aa5b` Fix gcc missing parenthesis warnings - `428870f` Update core ZFS code from build 121 to build 141. - `79c76d5` Change KM_PUSHPAGE -> KM_SLEEP - `b8d06fc` Switch KM_SLEEP to KM_PUSHPAGE - Account for ISO C90 - mixed declarations and code - warnings - Module parameters (new/changed): - Replaced zfetch_block_cap with zfetch_max_distance (Max bytes to prefetch per stream (default 8MB; 8 * 1024 * 1024)) - Preserved zfs_prefetch_disable as 'int' for consistency with existing Linux module options. - [include/sys/trace_arc.h] - Added new tracepoints - DEFINE_ARC_BUF_HDR_EVENT(zfs_arc__sync__wait__for__async); - DEFINE_ARC_BUF_HDR_EVENT(zfs_arc__demand__hit__predictive__prefetch); - [man/man5/zfs-module-parameters.5] - Updated man page Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-12 09:02:33 -08:00
Brian Behlendorf	ab5cbbd107	Illumos 6293 - ztest failure: error == 28 (0xc == 0x1c) in ztest_tx_assign() 6293 ztest failure: error == 28 (0xc == 0x1c) in ztest_tx_assign() Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/6293 https://github.com/illumos/illumos-gate/commit/8fe00bf Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-11 14:10:31 -08:00
Brian Behlendorf	b870c7e5f4	Revert "Illumos 3749 - zfs event processing should work on R/O root filesystems" This reverts commit `b47637ecdc` which introduced a regression in ztest. $ ./cmd/ztest/ztest -V 5 vdevs, 7 datasets, 23 threads, 300 seconds... * Error in `/rpool/home/behlendo/src/git/zfs/cmd/ztest/.libs/lt-ztest': double free or corruption (fasttop): 0x0000000000d339f0 *	2016-01-11 14:10:30 -08:00
Brian Behlendorf	b858767a31	Fix 'prevsnap property' build failure Fix build failure accidentally introduced by `1715493`. This only results in a failure when debugging is disabled. dsl_dataset.c: In function 'dsl_dataset_stats': dsl_dataset.c:1698:45: error: 'dp' undeclared (first use in this function) Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-11 13:35:40 -08:00
Matthew Ahrens	1715493f38	Illumos 4929 - want prevsnap property 4929 want prevsnap property Reviewed by: Adam Leventhal <adam.leventhal@delphix.com> Reviewed by: Matt Amdur <matt.amdur@delphix.com> Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Reviewed by: Boris Protopopov <bprotopopov@hotmail.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/4929 https://github.com/illumos/illumos-gate/commit/b461c74 Porting notes: - [include/sys/fs/zfs.h] - f67d70 Create an 'overlay' property - 11b9ec Add full SELinux support - [fs/zfs/dsl_dataset.c] - This increases the stack size of dsl_dataset_stats() but nothing has been changed until this is shown to be an issue. Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-11 11:58:26 -08:00
Marcel Telka	f3c9dca093	Illumos 4638 - Panic in ZFS via rfs3_setattr()/rfs3_write(): dirtying snapshot! 4638 Panic in ZFS via rfs3_setattr()/rfs3_write(): dirtying snapshot! Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com> Reviewed by: Ilya Usvyatsky <ilya.usvyatsky@nexenta.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/4638 https://github.com/illumos/illumos-gate/commit/2144b12 Porting notes: - [module/zfs/zfs_vnops.c] - `3558fd7` Prototype/structure update for Linux - `2cf7f52` Linux compat 2.6.39: mount_nodev() - Use zfs_is_readonly() wrapper - Remove first line of comment which doesn't apply Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-11 10:29:48 -08:00
Will Andrews	b47637ecdc	Illumos 3749 - zfs event processing should work on R/O root filesystems 3749 zfs event processing should work on R/O root filesystems Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3749 https://github.com/illumos/illumos-gate/commit/3cb69f7 Porting notes: - [include/sys/spa_impl.h] - `ffe9d38` Add generic errata infrastructure - `1421c89` Add visibility in to arc_read - [include/sys/fm/fs/zfs.h] - `2668527` Add linux events - `6283f55` Support custom build directories and move includes - [module/zfs/spa_config.c] - Updated spa_config_sync() to match illumos with the exception of a Linux specific block. Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-11 09:23:37 -08:00
Brian Behlendorf	e9e3d31d2c	Allow 16M send/recv blocks Fix an off by one error introduced by `fcff0f3` which triggers an assertion when 16M blocks are used with send/recv. This fix was intentionally not folder in to the Illumos commit so it can be easily cherry-picked by upstream. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-08 20:23:23 -05:00
Paul Dagnelie	fcff0f35bd	Illumos 5960, 5925 5960 zfs recv should prefetch indirect blocks 5925 zfs receive -o origin= Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> References: https://www.illumos.org/issues/5960 https://www.illumos.org/issues/5925 https://github.com/illumos/illumos-gate/commit/a2cdcdd Porting notes: - [lib/libzfs/libzfs_sendrecv.c] - `b8864a2` Fix gcc cast warnings - `325f023` Add linux kernel device support - `5c3f61e` Increase Linux pipe buffer size on 'zfs receive' - [module/zfs/zfs_vnops.c] - `3558fd7` Prototype/structure update for Linux - `c12e3a5` Restructure zfs_readdir() to fix regressions - [module/zfs/zvol.c] - Function @zvol_map_block() isn't needed in ZoL - `9965059` Prefetch start and end of volumes - [module/zfs/dmu.c] - Fixed ISO C90 - mixed declarations and code - Function dmu_prefetch() 'int i' is initialized before the following code block (c90 vs. c99) - [module/zfs/dbuf.c] - `fc5bb51` Fix stack dbuf_hold_impl() - `9b67f60` Illumos 4757, 4913 - 34229a2 Reduce stack usage for recursive traverse_visitbp() - [module/zfs/dmu_send.c] - Fixed ISO C90 - mixed declarations and code - `b58986e` Use large stacks when available - `241b541` Illumos 5959 - clean up per-dataset feature count code - `77aef6f` Use vmem_alloc() for nvlists - `00b4602` Add linux kernel memory support Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-08 15:08:19 -08:00
Richard Sharpe	c5d0287011	Fix casesensitivity=insensitive deadlock When casesensitivity=insensitive is set for the file system, we can deadlock in a rename if the user uses different case for each path. For example rename("A/some-file.txt", "a/some-file.txt"). The simple test for this is: 1. mkdir some-dir in a ZFS file system 2. touch some-dir/some-file.txt 3. mv Some-dir/some-file.txt some-dir/some-other-file.txt This last request deadlocks trying to relock the i_mutex on the inode for the parent directory. The solution is to use d_add_ci in zpl_lookup if we are on a file system that has the casesensitivity=insensitive attribute set. This patch checks if we are working on a case insensitive file system and if so, allocates storage for the case insensitive name and passes it to zfs_lookup and then calls d_add_ci instead of d_splice_alias. The performance impact seems to be minimal even though we have introduced a kmalloc and kfree in the lookup path. The problem was found when running Microsoft's FSCT against Samba on top of ZFS On Linux. Signed-off-by: Richard Sharpe <realrichardsharpe@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4136	2016-01-08 11:05:07 -08:00
Jeremy Jones	b23ad7f350	Illumos 3139 - zdb dies when it tries to determine path of unlinked file 3139 zdb dies when it tries to determine path of unlinked file Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Approved by: Dan McDonald <danmcd@nexenta.com> References: https://github.com/illumos/illumos-gate/commit/1ce39b5 https://www.illumos.org/issues/3139 Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-01-05 11:25:41 -08:00
Matthew Ahrens	37f8a8835a	Illumos 5746 - more checksumming in zfs send 5746 more checksumming in zfs send Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Albert Lee <trisk@omniti.com> References: https://www.illumos.org/issues/5746 https://github.com/illumos/illumos-gate/commit/98110f0 https://github.com/zfsonlinux/zfs/issues/905 Porting notes: - Minor conflicts due to: - https://github.com/zfsonlinux/zfs/commit/2024041 - https://github.com/zfsonlinux/zfs/commit/044baf0 - https://github.com/zfsonlinux/zfs/commit/88904bb - Fix ISO C90 warnings (-Werror=declaration-after-statement) - arc_buf_t abuf; - dmu_buf_t bonus; - zio_cksum_t cksum_orig; - zio_cksum_t *cksump; - Fix format '%llx' format specifier warning - Align message in zstreamdump safe_malloc() with upstream Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3611	2015-12-30 14:24:14 -08:00
Ned Bass	43b4935e53	Prevent SA length overflow The function sa_update() accepts a 32-bit length parameter and assigns it to a 16-bit field in sa_bulk_attr_t, potentially truncating the passed-in value. This could lead to corrupt system attribute (SA) records getting written to the pool. Add a VERIFY to sa_update() to detect cases where overflow would occur. The SA length is limited to 16-bit values by the on-disk format defined by sa_hdr_phys_t. The function zfs_sa_set_xattr() is vulnerable to this bug if the unpacked nvlist of xattrs is less than 64k in size but the packed size is greater than 64k. Fix this by appropriately checking the size of the packed nvlist before calling sa_update(). Add error handling to zpl_xattr_set_sa() to keep the cached list of SA-based xattrs consistent with the data on disk. Lastly, zfs_sa_set_xattr() calls dmu_tx_abort() on an assigned transaction if sa_update() returns an error, but the DMU only allows unassigned transactions to be aborted. Wrap the sa_update() call in a VERIFY0, remove the transaction abort, and call dmu_tx_commit() unconditionally. This is consistent practice with other callers of sa_update(). Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #4150	2015-12-30 13:20:12 -08:00
Chunwei Chen	f5f087eb88	Make xattr dir truncate and remove in one tx We need truncate and remove be in the same tx when doing zfs_rmnode on xattr dir. Otherwise, if we truncate and crash, we'll end up with inconsistent zap object on the delete queue. We do this by skipping dmu_free_long_range and let zfs_znode_delete to do the work. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #4114 Issue #4052 Issue #4006 Issue #3018 Issue #2861	2015-12-28 09:48:26 -08:00
Chunwei Chen	29572ccdef	Fix empty xattr dir causing lockup During zfs_rmnode on a xattr dir, if the system crash just after dmu_free_long_range, we would get empty xattr dir in delete queue. This would cause blkid=0 be passed into zap_get_leaf_byblk when doing zfs_purgedir during mount, and would try to do rw_enter on a wrong structure and cause system lockup. We fix this by returning ENOENT when blkid is zero in zap_get_leaf_byblk. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4114 Closes #4052 Closes #4006 Closes #3018 Closes #2861	2015-12-28 09:41:30 -08:00
Brian Behlendorf	2ebc7b72b3	Fix z_xattr_lock/z_teardown_lock inversion There exists a lock inversion between the z_xattr_lock and the z_teardown_lock. Resolve this by taking the z_teardown_lock in all registered xattr callbacks prior to taking the z_xattr_lock. This ensures the locks are always taken is the same order thus preventing a deadlock. Note the z_teardown_lock is taken again in zfs_lookup() and this is safe because the z_teardown lock is a re-entrant read reader/writer lock. * process-1 zpl_xattr_get -> Takes zp->z_xattr_lock __zpl_xattr_get zfs_lookup -> Takes zsb->z_teardown_lock in ZFS_ENTER macro * process-2 zfs_ioc_recv -> Takes zsb->z_teardown_lock in zfs_suspend_fs() zfs_resume_fs zfs_rezget -> Takes zp->z_xattr_lock Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Closes #3943 Closes #3969 Closes #4121	2015-12-22 16:59:20 -08:00
Brian Behlendorf	228b461b56	Revert "Fix z_xattr_lock/z_teardown_lock lock inversion" This reverts commit 6b32ef572f754efc3f9edb20d022450f8e6b02d9.	2015-12-22 16:58:43 -08:00
Brian Behlendorf	151f84e2c3	Fix ztest truncated cache file Commit `efc412b` updated spa_config_write() for Linux 4.2 kernels to truncate and overwrite rather than rename the cache file. This is the correct fix but it should have only been applied for the kernel build. In user space rename(2) is needed because ztest depends on the cache file. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4129	2015-12-22 10:40:40 -08:00
Olaf Faaland	448d7aaabc	Identify locks flagged by lockdep When running a kernel with CONFIG_LOCKDEP=y, lockdep reports possible recursive locking in some cases and possible circular locking dependency in others, within the SPL and ZFS modules. This patch uses a mutex type defined in SPL, MUTEX_NOLOCKDEP, to mark such mutexes when they are initialized. This mutex type causes attempts to take or release those locks to be wrapped in lockdep_off() and lockdep_on() calls to silence the dependency checker and allow the use of lock_stats to examine contention. For RW locks, it uses an analogous lock type, RW_NOLOCKDEP. The goal is that these locks are ultimately changed back to type MUTEX_DEFAULT or RW_DEFAULT, after the locks are annotated to reflect their relationship (e.g. z_name_lock below) or any real problem with the lock dependencies are fixed. Some of the affected locks are: tc_open_lock: ============= This is an array of locks, all with same name, which txg_quiesce must take all of in order to move txg to next state. All default to the same lockdep class, and so to lockdep appears recursive. zp->z_name_lock: ================ In zfs_rmdir, dzp = znode for the directory (input to zfs_dirent_lock) zp = znode for the entry being removed (output of zfs_dirent_lock) zfs_rmdir()->zfs_dirent_lock() takes z_name_lock in dzp zfs_rmdir() takes z_name_lock in zp Since both dzp and zp are type znode_t, the locks have the same default class, and lockdep considers it a possible recursive lock attempt. l->l_rwlock: ============ zap_expand_leaf() sometimes creates two new zap leaf structures, via these call paths: zap_deref_leaf()->zap_get_leaf_byblk()->zap_leaf_open() zap_expand_leaf()->zap_create_leaf()->zap_expand_leaf()->zap_create_leaf() Because both zap_leaf_open() and zap_create_leaf() initialize l->l_rwlock in their (separate) leaf structures, the lockdep class is the same, and the linux kernel believes these might both be the same lock, and emits a possible recursive lock warning. Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3895	2015-12-22 10:21:33 -08:00
DHE	dcb6bed1df	Make zio_taskq_batch_pct user configurable Adds zio_taskq_batch_pct as an exported module parameter, allowing users to modify it at module load time. Signed-off-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4110	2015-12-18 13:46:23 -08:00
Brian Behlendorf	a58df6f536	Fix zfs_vdev_aggregation_limit bounds checking Update the bounds checking for zfs_vdev_aggregation_limit so that it has a floor of zero and a maximum value of the supported block size for the pool. Additionally add an early return when zfs_vdev_aggregation_limit equals zero to disable aggregation. For very fast solid state or memory devices it may be more expensive to perform the aggregation than to issue the IO immediately. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-12-18 13:32:06 -08:00
Brian Behlendorf	6fe53787f3	Fix vdev_queue_aggregate() deadlock This deadlock may manifest itself in slightly different ways but at the core it is caused by a memory allocation blocking on file- system reclaim in the zio pipeline. This is normally impossible because zio_execute() disables filesystem reclaim by setting PF_FSTRANS on the thread. However, kmem cache allocations may still indirectly block on file system reclaim while holding the critical vq->vq_lock as shown below. To resolve this issue zio_buf_alloc_flags() is introduced which allocation flags to be passed. This can then be used in vdev_queue_aggregate() with KM_NOSLEEP when allocating the aggregate IO buffer. Since aggregating the IO is purely a performance optimization we want this to either succeed or fail quickly. Trying too hard to allocate this memory under the vq->vq_lock can negatively impact performance and result in this deadlock. * z_wr_iss zio_vdev_io_start vdev_queue_io -> Takes vq->vq_lock vdev_queue_io_to_issue vdev_queue_aggregate zio_buf_alloc -> Waiting on spl_kmem_cache process * z_wr_int zio_vdev_io_done vdev_queue_io_done mutex_lock -> Waiting on vq->vq_lock held by z_wr_iss * txg_sync spa_sync dsl_pool_sync zio_wait -> Waiting on zio being handled by z_wr_int * spl_kmem_cache spl_cache_grow_work kv_alloc spl_vmalloc ... evict zpl_evict_inode zfs_inactive dmu_tx_wait txg_wait_open -> Waiting on txg_sync Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #3808 Closes #3867	2015-12-18 13:27:12 -08:00
Brian Behlendorf	a8ad3bf02c	Fix z_xattr_lock/z_teardown_lock lock inversion There exists a lock inversion between the z_xattr_lock and the z_teardown_lock. Detect this case and return EBUSY so zfs_resume_fs() will mark the inode stale and it can be safely revalidated on next access. * process-1 zpl_xattr_get -> Takes zp->z_xattr_lock __zpl_xattr_get zfs_lookup -> Takes zsb->z_teardown_lock in ZFS_ENTER macro * process-2 zfs_ioc_recv -> Takes zsb->z_teardown_lock in zfs_suspend_fs() zfs_resume_fs zfs_rezget -> Takes zp->z_xattr_lock Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #3969	2015-12-18 13:17:44 -08:00
Chunwei Chen	2727b9d3b6	Use uio for zvol_{read,write} Since uio now supports bvec, we can convert bio into uio and reuse dmu_{read,write}_uio. This way, we can remove some duplicate code. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4078	2015-12-15 16:21:43 -08:00
Chunwei Chen	502923bb44	Fix uio_prefaultpages for 0 length iovec Userspace can freely pass in whatever iovec it feels like, and it's perfectly legal to pass an iovec which contains a zero length segment. In the current implementation, uio_prefaultpages would touch an out of bound byte in the "last byte" logic. While this probably wouldn't cause any critical error, we would like uio_prefaultpages to be able to continue gracefully. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4078	2015-12-15 16:19:55 -08:00
Brian Behlendorf	eba9e745dc	Handle damaged blk_birth in dsl_deadlist_insert() If a bit were cleared in `bp->blk_birth` such that the txg birth was now lower than any other txg_birth in the deadlist, then there will be no entry before this in the tree. This should be impossible but regardless error handling code has been added for this case. By default this is left as a fatal case and the blk_birth is logged. However, setting `zfs_recover=1` will cause the bp to be placed at the start of the deadlist even though it contains an invalid blk_birth. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Closes #4086 Closes #4089	2015-12-15 16:12:31 -08:00
Brian Behlendorf	1cdb86cba2	Handle block pointers with a corrupt logical size Commit `5f6d0b6` was originally added to gracefully handle block pointers with a damaged logical size. However, it incorrectly assumed that all passed arc_done_func_t could handle a NULL arc_buf_t. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4069 Closes #4080	2015-12-15 16:11:44 -08:00
Brian Behlendorf	245b7ab3d1	Hold the zfs_snapentry_t before dispatch While exceptionally unlikely to cause a problem the zfs_snapentry_t hold should be taken before the dispatch to prevent any possibility of the task being processed before the hold. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com>	2015-12-14 12:06:31 -08:00
Chunwei Chen	1997660170	Fix snapshot automount race cause EREMOTE When a concorrent mount finishes just before calling to zfsctl_snapshot_ismounted, if we return EISDIR, the VFS will return with EREMOTE. We should instead just return 0, so VFS may retry and would likely notice the dentry is alreadly mounted. This will be inline with when usermode helper return EBUSY. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-12-14 12:06:31 -08:00
Brian Behlendorf	5ed27c572c	Change zfs_snapshot_lock from mutex to rw lock By changing the zfs_snapshot_lock from a mutex to a rw lock the zfsctl_lookup_objset() function can be allowed to run concurrently. This should reduce the latency of fh_to_dentry lookups in ZFS snapshots which are being accessed over NFS. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com>	2015-12-14 12:06:31 -08:00
Brian Behlendorf	f22f900f15	Fix zfsctl_lookup_objset() deadlock The zfsctl_snapshot_unmount_delay() function must not be called from zfsctl_lookup_objset() while it is currently holding the zfs_snapshot_lock. This will result in a deadlock. It is safe to call zfsctl_snapshot_unmount_delay_impl() directly because the function already has a reference on the zfs_snapentry_t. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #3997	2015-12-14 12:05:52 -08:00
Brian Behlendorf	5e94284fe5	Set 'zfs_expire_snapshot=0' to disable auto-unmount There are cases where it's desirable that auto-mounted snapshots not expire after a fixed duration. They should be unmounted only when the filesystem they are a snapshot of is unmounted. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com>	2015-12-14 11:02:32 -08:00
Chunwei Chen	24ef51f660	Use spa as key besides objsetid for snapentry objsetid is not unique across pool, so using it solely as key would cause panic when automounting two snapshot on different pools with the same objsetid. We fix this by adding spa pointer as additional key. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Issue #3948 Issue #3786 Issue #3887	2015-12-08 16:38:56 -08:00
Brian Behlendorf	b58986eebf	Use large stacks when available While stack size will vary by architecture it has historically defaulted to 8K on x86_64 systems. However, as of Linux 3.15 the default thread stack size was increased to 16K. These kernels are now the default in most non- enterprise distributions which means we no longer need to assume 8K stacks. This patch takes advantage of that fact by appropriately reverting stack conservation changes which were made to ensure stability. Changes which may have had a negative impact on performance for certain workloads. This also has the side effect of bringing the code slightly more in line with upstream. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #4059	2015-12-07 12:20:43 -08:00
Matthew Ahrens	241b541574	Illumos 5959 - clean up per-dataset feature count code 5959 clean up per-dataset feature count code Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/5959 https://github.com/illumos/illumos-gate/commit/ca0cc39 Porting notes: illumos code doesn't check for feature_get_refcount() returning ENOTSUP (which means feature is disabled) in zdb. zfsonlinux added a check in https://github.com/zfsonlinux/zfs/commit/784652c due to #3468. The check was reintroduced here. Ported-by: Witaut Bajaryn <vitaut.bayaryn@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3965	2015-12-04 14:20:20 -08:00
Brian Behlendorf	072484504f	Add zap_prefetch() interface Provide a generic interface to prefetch ZAP entries by name. This functionality is being added for external consumers such as Lustre. It is based of the existing zap_prefetch_uint64() version which is used by the deduplication code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #4061	2015-12-04 09:39:20 -08:00
tuxoko	b0fe1adeb1	Prevent rm modules.* when make install This was originally in `fe0ed8f910`, but somehow was changed and not working anymore. And it will cause the following error: modprobe: ERROR: ../libkmod/libkmod.c:506 lookup_builtin_file() could not open builtin file '/lib/modules/4.2.0-18-generic/modules.builtin.bin' Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4027	2015-12-02 14:39:12 -08:00
Chunwei Chen	61d482f7cd	Linux 4.4 compat: xattr operations takes xattr_handler The xattr_hander->{list,get,set} were changed to take a xattr_handler, and handler_flags argument was removed and should be accessed by handler->flags. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #4021	2015-12-01 16:48:25 -08:00
Chunwei Chen	1a09371678	Linux 4.4 compat: make_request_fn returns blk_qc_t As part of block polling support in Linux 4.4, make_request_fn should return a cookie value of type blk_qc_t. For now, we make zvol_request always return BLK_QC_T_NONE until we assess whether and how we want to support block polling. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #4021	2015-12-01 16:48:08 -08:00
tuxoko	43518d92fd	Fix zfs_dirty_data_max overflow on 32-bit On 32 bit, the calculation of zfs_dirty_data_max from phymem will overflow, causing it to be smaller than zfs_dirty_data_sync, and will cause txg being delayed while no one write to disk. The end result is horrendous write speed. On 4G ram 32-bit VM, before this patch, simple dd results in ~7MB/s. Now it can reach speed on par with 64-bit VM. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3973	2015-11-19 16:02:47 -08:00
tuxoko	d0c614ecf9	Fix null pointer in arc_kmem_reap_now on 32-bit On 32 bit system, zio_buf_cache is limit to 1M. Larger than that is all NULL. So we need to avoid reaping them. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3973	2015-11-19 16:01:47 -08:00
Chunwei Chen	d287880afd	Fix snapshot automount behavior when concurrent or fail When concurrent threads accessing the snapdir, one will succeed the user helper mount while others will get EBUSY. However, the original code treats those EBUSY threads as success and goes on to do zfsctl_snapshot_add, which causes repeated avl_add and thus panic. Also, if the snapshot is already mounted somewhere else, a thread accessing the snapdir will also get EBUSY from user helper mount. And it will cause strange things as doing follow_down_one will fail and then follow_up will jump up to the mountpoint of the filesystem and confuse the hell out of VFS. The patch fix both behavior by returning 0 immediately for the EBUSY threads. Note, this will have a side effect for the second case where the VFS will retry several times before returning ELOOP. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4018	2015-11-19 15:36:59 -08:00
Brian Behlendorf	3d8d245fb3	Follow 0/-E convention for module load errors Because errors during module load are so rare it went unnoticed that it was possible that a positive errno was returned. This would result in the module being loaded, nothing being initialized, and a system panic shortly thereafter. This is what was causing the hard failures in the automated testing. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-11-16 16:10:06 -08:00
AndCycle	256fa983f4	Obey arc_meta_limit default size when changing arc_max When decreasing the maximum ARC size preserve the 3/4 default ratio for the arc_meta_limit. Otherwise, the arc_meta_limit may be set the same as arc_max. Signed-off-by: AndCycle <andcycle@andcycle.idv.tw> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4001	2015-11-13 15:45:22 -08:00
Chunwei Chen	07d63f0cb9	Fix fail path in zfs_znode_alloc When sa_bulk_lookup() fails, unlock_new_inode() will spit out a WARNING. It will also recursive deadlock on ZFS_OBJ_HOLD_ENTER in zfs_zinactive(). Since we never call insert_inode_locked in fail path, I_NEW is never set, the inode is never hashed. So unlock_new_inode() can be safely remove it. We set z_sa_hdl to NULL in fail path so that iput path will stop at zfs_inactive() without entering zfs_zinactive(). This way we can avoid the deadlock and prevent double sa_handle_destroy(). Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3899	2015-10-13 15:57:17 -07:00
Chunwei Chen	aa159afb56	Fix use-after-free in vdev_disk_physio_completion Currently, vdev_disk_physio_completion will try to wake up an waiter without first checking the existence. This creates a race window in which complete is called after dr is freed. We add dr_wait in dio_request to indicate the existence of waiter. Also, remove dr_rw since no one is using it, and reorder dr_ref to make the struct more compact in 64bit. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3917 Issue #3880	2015-10-13 15:25:33 -07:00
Justin T. Gibbs	bc4501f75a	Illumos 6267 - dn_bonus evicted too early 6267 dn_bonus evicted too early Reviewed by: Richard Yao <ryao@gentoo.org> Reviewed by: Xin LI <delphij@freebsd.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/6267 https://github.com/illumos/illumos-gate/commit/d205810 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Ned Bass bass6@llnl.gov Issue #3865 Issue #3443	2015-10-13 14:12:02 -07:00
Brian Behlendorf	935434ef01	Fix 'arc_c < arc_c_min' panic Strictly enforce keeping 'arc_c >= arc_c_min'. The ASSERTs are left in place to catch this in a debug build but logic has been added to gracefully handle in a production build. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3904	2015-10-13 09:23:35 -07:00
Richard Yao	919efe93cb	zfs_inode_update should not call dmu_object_size_from_db under spinlock We should never block when holding a spin lock, but zfs_inode_update can block in the critical section of a spin lock in zfs_inode_update: zfs_inode_update -> dmu_object_size_from_db -> zrl_add -> mutex_enter Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3858	2015-09-30 10:47:40 -07:00
Richard Yao	bc8ffb2d08	Remove obsolete zv_lock All users of zv_lock were removed by `37f9dac`, but we forgot to remove it. Lets remove it as clean up. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3858	2015-09-30 10:43:19 -07:00
Chunwei Chen	45838e3a41	Fix uioskip crash when skip to end When doing uioskip to skip an iovec to the very end, the current loop condition will falsely check pass the end of iovec. We fix this checking uio_iovcnt first. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3806 Closes #3850	2015-09-29 10:06:58 -07:00
Richard Yao	b815ec32b3	Userspace can pass zero length segments via writev/readv Userspace can trigger an assertion by passing a zero-length segment when assertions are enabled: [27961.614792] VERIFY3(skip < iov->iov_len) failed (0 < 0) [27961.614795] PANIC at zfs_uio.c:187:uio_prefaultpages() [27961.614805] Call Trace: [27961.614811] dump_stack+0x45/0x57 [27961.614830] spl_dumpstack+0x44/0x50 [spl] [27961.614834] spl_panic+0xbb/0x100 [spl] [27961.614908] uio_prefaultpages+0x134/0x140 [zcommon] [27961.614930] zfs_write+0x1fd/0xe80 [zfs] [27961.615014] zpl_write_common_iovec+0x7f/0x110 [zfs] [27961.615035] zpl_iter_write+0xa0/0xd0 [zfs] [27961.615037] do_iter_readv_writev+0x59/0x80 [27961.615063] do_readv_writev+0x11b/0x260 [27961.615098] vfs_writev+0x39/0x50 [27961.615100] SyS_writev+0x4a/0xe0 [27961.615103] system_call_fastpath+0x16/0x6e The solution is to delete the assertion. This could potentially occur in uiomove as well, which contains analogous assertions that appear similarly unnecessary, so we remove those as well. Reported-by: Jonathan Vasquez <jvasquez1011@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Issue #3792	2015-09-25 12:51:16 -07:00
Brian Behlendorf	a3000f9358	Revert "dmu_objset_userquota_get_ids uses dn_bonus unsafely" This reverts commit `5f8e1e8505`. It was determined that this patch introduced the quota regression described in #3789. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3443 Issue #3789	2015-09-25 12:50:24 -07:00
Brian Behlendorf	5592404784	Fix synchronous behavior in __vdev_disk_physio() Commit `b39c22b` set the READ_SYNC and WRITE_SYNC flags for a bio based on the ZIO_PRIORITY_* flag passed in. This had the unnoticed side-effect of making the vdev_disk_io_start() synchronous for certain I/Os. This in turn resulted in vdev_disk_io_start() being able to re-dispatch zio's which would result in a RCU stalls when a disk was removed from the system. Additionally, this could negatively impact performance and explains the performance regressions reported in both #3829 and #3780. This patch resolves the issue by making the blocking behavior dependent on a 'wait' flag being passed rather than overloading the passed bio flags. Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to non-rotational devices where there is no benefit to queuing to aggregate the I/O. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3652 Issue #3780 Issue #3785 Issue #3817 Issue #3821 Issue #3829 Issue #3832 Issue #3870	2015-09-25 12:47:31 -07:00
Brian Behlendorf	ef5b2e1048	Avoid blocking in arc_reclaim_thread() As described in the comment above arc_reclaim_thread() it's critical that the reclaim thread be careful about blocking. Just like it must never wait on a hash lock, it must never wait on a task which can in turn wait on the CV in arc_get_data_buf(). This will deadlock, see issue #3822 for full backtraces showing the problem. To resolve this issue arc_kmem_reap_now() has been updated to use the asynchronous arc prune function. This means that arc_prune_async() may now be called while there are still outstanding arc_prune_tasks. However, this isn't a problem because arc_prune_async() already keeps a reference count preventing multiple outstanding tasks per registered consumer. Functionally, this behavior is the same as the counterpart illumos function dnlc_reduce_cache(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Issue #3808 Issue #3834 Issue #3822	2015-09-25 12:45:47 -07:00
Brian Behlendorf	04870568e6	Disable zpl_nr_cached_objects() callback The zpl_nr_cached_objects() function has been disabled because in the current code it doesn't provide any critical functionality and it may result in a deadlock under certain circumstances. However, because we expect to need these hooks in the future this code has not been entirely removed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3719	2015-09-25 12:45:42 -07:00
Brian Behlendorf	d4787d55ad	Allow NFS activity to defer snapshot unmounts Accessing a snapshot via NFS should cause an auto-unmount of that snapshot to be deferred until such as time as the snapshot is idle. This is analogous to the zpl_revalidate logic employed by locally mounted snapshots. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3794	2015-09-25 12:45:38 -07:00
Lukas Wunner	784a7fe5d9	Linux 4.3 compat: bio_end_io_t / BIO_UPTODATE Commit torvalds/linux@4246a0b63b ("block: add a bi_error field to struct bio") dropped the error argument from bio_endio in favor of newly introduced bio->bi_error. This also replaces bio->bi_flags value BIO_UPTODATE. bio_endio was a 3 argument function until Linux 2.6.24, which made it a 2 argument function, and now the prototype has changed yet again to a 1 argument function. Support for pre 2.6.24 kernels was already dropped with `37f9dac592` ("zvol processing should use struct bio") which assumed the 2 argument version in zvol_request(). Remaining code to support the 3 argument version is hereby removed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Lukas Wunner <lukas@wunner.de> Issue #3799	2015-09-25 12:44:54 -07:00
Don Brady	56b3986316	Add large block support to zpios(1) benchmark As part of the large block support effort, it makes sense to add support for large blocks to zpios(1). The specifying of a zfs block size for zpios is optional and will default to 128K if the block size is not specified. `zpios ... -S size \| --blocksize size ...` This will use size ZFS blocks for each test, specified as a comma delimited list with an optional unit suffix. The supported range is powers of two from 128K through 16M. A range of block sizes can be tested as follows: `-S 128K,256K,512K,1M` Example run below (non realistic results from a VM and output abbreviated for space) ``` --regioncount=750 --regionsize=8M --chunksize=1M --offset=4K --threaddelay=0 --cleanup --human-readable --verbose --cleanup --blocksize=128K,256K,512K,1M th-cnt rg-cnt rg-sz ch-sz blksz wr-data wr-bw rd-data rd-bw --------------------------------------------------------------------- 4 750 8m 1m 128k 5g 90.06m 5g 93.37m 4 750 8m 1m 256k 5g 79.71m 5g 99.81m 4 750 8m 1m 512k 5g 42.20m 5g 93.14m 4 750 8m 1m 1m 5g 35.51m 5g 89.36m 8 750 8m 1m 128k 5g 85.49m 5g 90.81m 8 750 8m 1m 256k 5g 61.42m 5g 99.24m 8 750 8m 1m 512k 5g 49.09m 5g 108.78m 16 750 8m 1m 128k 5g 86.28m 5g 88.73m 16 750 8m 1m 256k 5g 64.34m 5g 93.47m 16 750 8m 1m 512k 5g 68.84m 5g 124.47m 16 750 8m 1m 1m 5g 53.97m 5g 97.20m --------------------------------------------------------------------- ``` Signed-off-by: Don Brady <don.brady@intel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3795 Closes #2071	2015-09-22 09:13:20 -07:00
Ned Bass	3af56fd95f	Honor xattr=sa dataset property ZFS incorrectly uses directory-based extended attributes even when xattr=sa is specified as a dataset property or mount option. Support to honor temporary mount options including "xattr" was added in commit `0282c4137e`. There are two issues with the mount option handling: * Libzfs has historically included "xattr" in its list of default mount options. This overrides the dataset property, so the dataset is always configured to use directory-based xattrs even when the xattr dataset property is set to off or sa. Address this by removing "xattr" from the set of default mount options in libzfs. * There was no way to enable system attribute-based extended attributes using temporary mount options. Add the mount options "saxattr" and "dirxattr" which enable the xattr behavior their names suggest. This approach has the advantages of mirroring the valid xattr dataset property values and following existing conventions for mount option names. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3787	2015-09-19 14:04:14 -07:00
Brian Behlendorf	66aad10ce8	Fix NULL as mount(2) syscall data parameter Passing NULL for the mount data should not result in EINVAL. It should be treated as if an empty string were passed and succeed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #3771	2015-09-19 14:03:01 -07:00
Richard Yao	f52ebcb3eb	Discard on zvols should not exceed the length of a block `37f9dac592` replaced the end-start calculation with a cached value, but neglected to update it on discard operations. This can cause us to discard data not requested, causing data loss on zvols. Reported-by: Richard Connon <richard.connon@zynstra.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3798	2015-09-19 14:00:14 -07:00
Arne Jansen	4e0f33ffe0	Illumos 6214 - zpools going south 6214 zpools going south Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> References: https://www.illumos.org/issues/6214 http://cr.illumos.org/~webrev/sensille/6214_zpools_going_south/ Porting Notes: Reintroduce b_compress to the l2arc_buf_hdr_t. In commit `b9541d6` the compression flags were moved to the generic b_flags in the arc_buf_hdr_t. This is a problem because l2arc_compress_buf() may manipulate the compression flags and this can only be done safely under the hash lock which is not held. See Illumos 6214 for a detailed analysis of the race. HDR_GET_COMPRESS() macro was removed from arc_buf_info(). Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3757	2015-09-11 11:14:38 -07:00
Brian Behlendorf	9965059ab9	Prefetch start and end of volumes When adding a zvol to the system prefetch zvol_prefetch_bytes from the start and end of the volume. Prefetching these regions of the volume is desirable because they are likely to be accessed immediately by blkid(8), the kernel scanning for a partition table, or another task which probes the devices. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3659	2015-09-09 14:38:29 -07:00
Richard Yao	8198d18ca7	Reintroduce IO accounting on zvols on Linux 3.19+ zfsonlinux/zfs@e20cd6f7a8 caused us to lose IO accounting on zvols. When I originally wrote that last year, the symbols we needed to maintain IO accounting were GPL exported, but torvalds/linux@394ffa503b provided suitable symbols for restoring this functionality 4 months later. We can call them to restore the IO accounting on Linux 3.19 and later as well as any older kernels where that patch is backported. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3741	2015-09-09 09:29:24 -07:00
Brian Behlendorf	3b36f8319d	Add dbgmsg kstat Internally ZFS keeps a small log to facilitate debugging. By default the log is disabled, to enable it set zfs_dbgmsg_enable=1. The contents of the log can be accessed by reading the /proc/spl/kstat/zfs/dbgmsg file. Writing 0 to this proc file clears the log. $ echo 1 >/sys/module/zfs/parameters/zfs_dbgmsg_enable $ echo 0 >/proc/spl/kstat/zfs/dbgmsg $ zpool import tank $ cat /proc/spl/kstat/zfs/dbgmsg 1 0 0x01 -1 0 2492357525542 2525836565501 timestamp message 1441141408 spa=tank async request task=1 1441141408 txg 70 open pool version 5000; software version 5000/5; ... 1441141409 spa=tank async request task=32 1441141409 txg 72 import pool version 5000; software version 5000/5; ... 1441141414 command: lt-zpool import tank Note the zfs_dbgmsg() and dprintf() functions are both now mapped to the same log. As mentioned above the kernel debug log can be accessed though the /proc/spl/kstat/zfs/dbgmsg kstat. For user space consumers log messages are immediately written to stdout after applying the ZFS_DEBUG environment variable. $ ZFS_DEBUG=on ./cmd/ztest/ztest -V Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #3728	2015-09-04 16:08:14 -07:00
Brian Behlendorf	0500e835af	Support accessing .zfs/snapshot via NFS This patch is based on the previous work done by @andrey-ve and @yshui. It triggers the automount by using kern_path() to traverse to the known snapshout mount point. Once the snapshot is mounted NFS can access the contents of the snapshot. Allowing NFS clients to access to the .zfs/snapshot directory would normally mean that a root user on a client mounting an export with 'no_root_squash' would be able to use mkdir/rmdir/mv to manipulate snapshots on the server. To prevent configuration mistakes a zfs_admin_snapshot module option was added which disables the mkdir/rmdir/mv functionally. System administators desiring this functionally must explicitly enable it. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2797 Closes #1655 Closes #616	2015-09-04 13:23:53 -07:00
Andrey Vesnovaty	aa9b27080b	Fix invalid fileid for snapshot root dentry Prevents NFS client from detection of different fileids of snapshot root dentry before & after snapshot mount. Signed-off-by: Andrey Vesnovaty <andrey.vesnovaty@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-09-04 13:23:06 -07:00
Brian Behlendorf	e20cd6f7a8	Merge branch 'zvol' Performance improvements for zvols. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3720	2015-09-04 13:14:21 -07:00
Richard Yao	fa56567630	Support secure discard on zvols Linux 2.6.36 introduced REQ_SECURE to indicate when discards must be processed, such that we cannot do optimizations like block alignment. Consequently, the discard semantics prior to 2.6.36 require us to always process unaligned discards. Previously, we would do this optimization regardless. This patch changes things to correctly restrict this optimization to situations where REQ_SECURE exists, but is not included in the flags. Signed-off-by: Richard Yao <ryao@gentoo.org>	2015-09-04 15:37:24 -04:00
Richard Yao	37f9dac592	zvol processing should use struct bio Internally, zvols are files exposed through the block device API. This is intended to reduce overhead when things require block devices. However, the ZoL zvol code emulates a traditional block device in that it has a top half and a bottom half. This is an unnecessary source of overhead that does not exist on any other OpenZFS platform does this. This patch removes it. Early users of this patch reported double digit performance gains in IOPS on zvols in the range of 50% to 80%. Comments in the code suggest that the current implementation was done to obtain IO merging from Linux's IO elevator. However, the DMU already does write merging while arc_read() should implicitly merge read IOs because only 1 thread is permitted to fetch the buffer into ARC. In addition, commercial ZFSOnLinux distributions report that regular files are more performant than zvols under the current implementation, and the main consumers of zvols are VMs and iSCSI targets, which have their own elevators to merge IOs. Some minor refactoring allows us to register zfs_request() as our ->make_request() handler in place of the generic_make_request() function. This eliminates the layer of code that broke IO requests on zvols into a top half and a bottom half. This has several benefits: 1. No per zvol spinlocks. 2. No redundant IO elevator processing. 3. Interrupts are disabled only when actually necessary. 4. No redispatching of IOs when all taskq threads are busy. 5. Linux's page out routines will properly block. 6. Many autotools checks become obsolete. An unfortunate consequence of eliminating the layer that generic_make_request() is that we no longer calls the instrumentation hooks for block IO accounting. Those hooks are GPL-exported, so we cannot call them ourselves and consequently, we lose the ability to do IO monitoring via iostat. Since zvols are internally files mapped as block devices, this should be okay. Anyone who is willing to accept the performance penalty for the block IO layer's accounting could use the loop device in between the zvol and its consumer. Alternatively, perf and ftrace likely could be used. Also, tools like latencytop will still work. Tools such as latencytop sometimes provide a better view of performance bottlenecks than the traditional block IO accounting tools do. Lastly, if direct reclaim occurs during spacemap loading and swap is on a zvol, this code will deadlock. That deadlock could already occur with sync=always on zvols. Given that swap on zvols is not yet production ready, this is not a blocker. Signed-off-by: Richard Yao <ryao@gentoo.org>	2015-09-04 15:30:24 -04:00
Tim Chase	dca8c34da4	Prevent reclaim in the traverse prefetch thread Reclaim in the traverse prefetch thread, which is run on the system taskq, can overrun the stack. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #3733	2015-09-04 08:43:28 -07:00
Brian Behlendorf	0282c4137e	Add temporary mount options Add the required kernel side infrastructure to parse arbitrary mount options. This enables us to support temporary mount options in largely the same way it is handled on other platforms. See the 'Temporary Mount Point Properties' section of zfs(8) for complete details. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #985 Closes #3351	2015-09-03 14:14:55 -07:00
Tim Chase	69de34219a	Dbuf hash table should be sized as is the arc hash table Commit `49ddb31506` added the zfs_arc_average_blocksize parameter to allow control over the size of the arc hash table. The dbuf hash table's size should be determined similarly. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3721	2015-09-02 09:33:02 -07:00
Brian Behlendorf	6cde64351e	Add spa_slop_shift module option Allow for easy turning of a pools reserved free space. Previous versions of ZFS (v0.6.4 and earlier) held 1/64 of the pools capacity in reserve. Commits `3d45fdd` and `0c60cc3` increased this to 1/32. Setting spa_slop_shift=6 will restore the previous default setting. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3724	2015-09-02 09:30:18 -07:00
Richard Yao	fb40095f5f	Disable LBA weighting on files and SSDs The LBA weighting makes sense on rotational media where the outer tracks have twice the bandwidth of the inner tracks. However, it is detrimental on nonrotational media such as solid state disks, where the only effect is to ensure that metaslabs enter the best-fit allocation behavior sooner, which is detrimental to performance. It also makes no sense on files where the underlying filesystem can arrange things however it wants. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3712	2015-09-01 15:22:07 -07:00
tuxoko	cafbd2aca3	Check for RW_WRITE_HELD in zfs_inactive Before read locking z_teardown_inactive_lock, we need to check if we have already had write lock on it. Otherwise, we would deadlock on ourself when doing rollback: zfs_ioc_rollback ->zfs_suspend_fs (z_teardown_inactive_lock, RW_WRITER) ->zfs_resume_fs->zfs_rezget->zfs_iput_async->iput-> ... ->zfs_inactive (z_teardown_inactive_lock, RW_READER) Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2869	2015-09-01 10:17:57 -07:00
Brian Behlendorf	324dcd3733	Linux 4.2 compat: misc_deregister() The misc_deregister() function was changed to a void return type. Rather than add compatibility code to detect this change simply ignore the return code on all kernels. It was only used to log an informational error message of no real value. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-09-01 09:33:18 -07:00
Brian Behlendorf	278bee9319	Linux 3.18 compat: Snapshot auto-mounting Re-factor the .zfs/snapshot auto-mouting code to take in to account changes made to the upstream kernels. And to lay the groundwork for enabling access to .zfs snapshots via NFS clients. This patch makes the following core improvements. * All actively auto-mounted snapshots are now tracked in two global trees which are indexed by snapshot name and objset id respectively. This allows for fast lookups of any auto-mounted snapshot regardless without needing access to the parent dataset. * Snapshot entries are added to the tree in zfsctl_snapshot_mount(). However, they are now removed from the tree in the context of the unmount process. This eliminates the need complicated error logic in zfsctl_snapshot_unmount() to handle unmount failures. * References are now taken on the snapshot entries in the tree to ensure they always remain valid while a task is outstanding. * The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right after the auto-mount succeeds. This allows to kernel to unmount idle auto-mounted snapshots if needed removing the need for the zfsctl_unmount_snapshots() function. * Snapshots in active use will not be automatically unmounted. As long as at least one dentry is revalidated every zfs_expire_snapshot/2 seconds the auto-unmount expiration timer will be extended. * Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3589 Closes #3344 Closes #3295 Closes #3257 Closes #3243 Closes #3030 Closes #2841	2015-08-31 13:54:39 -07:00

1 2 3 4 5 ...

1130 Commits