mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-28 17:39:23 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	1ce0457348	Fix style A minor style issue was accidentally introduced by `aa7d06a`. This change resolves that style problem. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-05-06 10:41:17 -07:00
George Wilson	aa7d06a98a	Illumos #4101 finer-grained control of metaslab_debug Today the metaslab_debug logic performs two tasks: - load all metaslabs on import/open - don't unload metaslabs at the end of spa_sync This change provides knobs for each of these independently. References: https://illumos.org/issues/4101 https://github.com/illumos/illumos-gate/commit/0713e23 Notes: 1) This is a small piece of the metaslab improvement patch from Illumos. It was worth bringing over before the rest, since it's low risk and it can be useful on fragmented pools (e.g. Lustre MDTs). metaslab_debug_unload would give the performance benefit of the old metaslab_debug option without causing unwanted delay during pool import. Ported-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2227	2014-05-06 09:46:04 -07:00
Brian Behlendorf	cc79a5c263	Treat spill block dbufs as meta data When the system attributes (SAs) for an object exceed what can can be stored in the bonus area of a dnode a spill block is allocated. These spill blocks are currently considered data blocks. However, they should be accounted for as meta data because they are effectively an extension of the dnode. While this may seem like a minor accounting issue it has broader implications. The key thing to be aware of is that each spill block will hold a reference on its parent dnode. The dnode in turn holds a reference on its dbuf in the dnode object. This means that a single 512 byte data buffer for a spill block can pin over 16k of meta data. This is analogous to the small file situation described in `2b13331` where a relatively small number of data buffer can cause the ARC to exceed the meta limit. However, unlike the small file case a spill block can legitimately be considered meta data. By changing the spill block to meta data they will now be dropped from the cache when the meta limit is reached. This then allows the dnodes and dbufs which the spill block was pinning to be released. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov> Closes #2294	2014-05-05 13:56:59 -07:00
Brian Behlendorf	51268f31a8	Remove SELinux enforcing check from init scripts The default SELinux policy for RHEL and Fedora has been updated to include ZFS in the list of filesystems which support xattrs. Therefore, there's no longer a need to detect this in the init scripts. References: https://bugzilla.redhat.com/show_bug.cgi?id=811532 https://bugzilla.redhat.com/show_bug.cgi?id=816543 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2166	2014-05-02 11:37:46 -07:00
Richard Yao	7809eb8b65	ztest: Switch to LWP rwlock interface ztest is intended to subject the ZFS code in userland to stress that it should be able to withstand. Any failures that occur when running it are failures that likely would occur inside the kernel. However, being in userland, it is much easier to debug them. In practice, this prevents a large number of problems from reaching production code. A design decision was made by the original authors of ztest to make a distinction between userland locking primitives and kernel locking primitives. The ztest code itself calls userland locking primitives while the kernel code being run in userland will call emulated kernel locking primitives that wrap the userland locking primitives. When ztest was first ported to Linux, a decision was made to use the emulated kernel interfaces everywhere. In effect, the userland rw_rdlock()/rw_wrlock() became the kernel rw_enter() and and the userland rw_unlock() became the kernel rw_exit(). This caused a regression because of an assertion in rw_enter() to catch recursive locking. That is permitted in userland, but not in the kernel. Consequently, the ztest code itself does recursive read locking. The use of the emulated kernel interfaces consequently caused the following failure: ztest: ../../lib/libzpool/kernel.c:384: Assertion `rwlp->rw_owner != zk_thread_current() (0x1c87150 != 0x1c87150)' failed. That occurs because ztest_dmu_objset_create_destroy() will take a read lock and call ztest_dmu_object_alloc_free(). That will call ztest_io(), which will take a readlock only when asked to do ZTEST_IO_REWRITE. This triggered the assertion. The pthreads rwlock interface was based on the LWP rwlock interface implemented in Illumos libc. Luckily enough, the subset used by ztest is almost identical, so we can solve this problem by switching to the LWP thread rwlock interface in ztest. This eliminates a point of divergence with Illumos and should make code sharing slightly easier. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1970	2014-05-01 15:53:58 -07:00
Richard Yao	3af3df905f	libspl: Implement LWP rwlock interface This implements a subset of the LWP rwlock interface by wrapping the equivalent POSIX thread interface. It is a superset of the features needed by ztest. The missing bits are {,_}rw_read_held() and {,_}rw_write_held(). Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1970	2014-05-01 15:53:52 -07:00
Richard Yao	c6e924fea8	Fix libblkid ZFS detection when making new pools zfsonlinux/zfs@1db7b9be75 should have fixed this, but this particular string was overlooked. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2288	2014-05-01 13:26:33 -07:00
Brian Behlendorf	12f9a6a3f9	dmu_tx_assign() should not return ENOMEM As described in the comment above dmu_tx_assign() this function must only fail if the pool is out of space. If for some other reason the TX cannot be assigned (such as memory pressure) ERESTART must be returned. Alternately, EAGAIN could be returned to inject a delay but that isn't required because the caller will block on the condition variable waiting for the next TXG. /* * Assign tx to a transaction group. txg_how can be one of: * * (1) TXG_WAIT. If the current open txg is full, waits until there's * a new one. This should be used when you're not holding locks. * It will only fail if we're truly out of space (or over quota). * ... */ Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #2287	2014-05-01 12:08:53 -07:00
Richard Yao	9d317793aa	Implement File Attribute Support We add support for lsattr and chattr to resolve a regression caused by `88c283952f` that broke Python's xattr.list(). That changet broke Gentoo Portage's FEATURES=xattr, which depended on Python's xattr.list(). Only attributes common to both Solaris and Linux are supported. These are 'a', 'd' and 'i' in Linux's lsattr and chattr commands. File attributes exclusive to Solaris are present in the ZFS code, but cannot be accessed or modified through this method. That was the case prior to this patch. The resolution of issue zfsonlinux/zfs#229 should implement some method to permit access and modification of Solaris-specific attributes. References: https://bugs.gentoo.org/show_bug.cgi?id=483516 Original-patch-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1691	2014-05-01 10:11:18 -07:00
Richard Yao	3b4f425a5a	Refactor inode_owner_or_capable() autotools check We need inode_owner_or_capable() for ZFS file attributes in addition to xattrs, so it should go into its own file. This moves it into its own file and changes it to be more comprehensive. It will now fail if no known good API is detected. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1691	2014-05-01 10:06:49 -07:00
ilovezfs	78597769b4	Fill in mountpoint buffer before using it in errors zfs_is_mountable() fills in the mountpoint buffer, so, as in upstream, it needs to have been called before the mountpoint buffer can be used in error messages. In particular, return (zfs_error_fmt(hdl, EZFS_MOUNTFAILED, dgettext(TEXT_DOMAIN, "cannot mount '%s'"), mountpoint)); should not come before the call to zfs_is_mountable(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: ilovezfs <ilovezfs@icloud.com> Closes #2284	2014-04-30 15:52:01 -07:00
Chunwei Chen	17584980b9	Add assertion to catch 0-count page Some network related block device uses tcp_sendpage, which doesn't behave well when using 0-count page. Add assertion to catch them. This has a runtime dependency on: zfsonlinux/spl@ae16ed9 Fix crash when using ZFS on Ceph rbd Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2277	2014-04-25 15:41:19 -07:00
Jorgen Lundman	cdf37f0c59	Add support for aarch64 (ARMv8) Using the ARM reference simulation (fast model foundation v8) I cross compiled spl and zfs, to confirm it works on ARMv8 (64 bit arm architecture, called aarch64 in Linux). As it is based on previous ARM porting, the resulting patch is disappointingly small, there was very little to do. The code fixes the compile issues and has light testing done. Signed-off-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2260	2014-04-25 15:35:30 -07:00
Andrey Vesnovaty	703371d8c7	Evenly distribute the taskq threads across available CPUs The problem is described in commit `aeeb4e0c0a`. However, instead of disabling the binding to CPU altogether we just keep the last CPU index across calls to taskq_create() and thus achieve even distribution of the taskq threads across all available CPUs. The implementation based on assumption that task queues initialization performed in serial manner. Signed-off-by: Andrey Vesnovaty <andrey.vesnovaty@gmail.com> Signed-off-by: Andrey Vesnovaty <andreyv@infinidat.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #336	2014-04-25 15:29:18 -07:00
Chunwei Chen	ae16ed992b	Fix crash when using ZFS on Ceph rbd When using __get_free_pages to get high order memory, only the first page's _count will set to 1, other's will be 0. When an internal page get passed into rbd, it will eventully go into tcp_sendpage. There, it will be called with get_page and put_page, and get freed erroneously when _count jump back to 0. The solution to this problem is to use compound page. All pages in a high order compound page share a single _count. So get_page and put_page in tcp_sendpage will not cause _count jump to 0. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #251	2014-04-25 15:26:52 -07:00
Jorgen Lundman	d6e6e4a98e	Add support for aarch64 (ARMv8) Using the ARM reference simulation (fast model foundation v8) I cross compiled spl and zfs, to confirm it works on ARMv8 (64 bit arm architecture, called aarch64 in Linux). As it is based on previous ARM porting, the resulting patch is disappointingly small, there was very little to do. The code fixes the compile issues and has light testing done. Signed-off-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #351	2014-04-25 15:25:32 -07:00
Ned Bass	de39ec11b8	Fix LZ4 endianness autodetection Endianness detection in LZ4 is broken in user-space builds. This bug corrupts compressed data and manifests itself in several ztest failures. When LZ4 was originally ported to Illumos ZFS, the proper checks for Linux were stripped out. The Linux port then inherited the remaining detection code that works on Illumos but not on Linux. The current LZ4 endianness check misuses the condition defined(__BIG_ENDIAN) to indicate a big-endian system. On Linux __BIG_ENDIAN is defined uncondtionally in the user-space header /usr/include/endian.h, regardless of the endianness of the system. The kernel does not use this header, so only user-space builds are affected. While we could fix this by restoring the upstream LZ4 endianness detection code, reliable checks already exist in libspl/include/sys/isa_defs.h. This change uses the libspl results to replace the word-size and endianness checks in LZ4, simplifying the code and reducing duplication. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: DHE <git@dehacked.net> Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Fixes #1963 Fixes #1964 Fixes #1965	2014-04-20 16:55:42 -07:00
Brian Behlendorf	4fd762f8ad	Fix zfsdev_ioctl() kmem leak warning Due to an asymmetry in the kmem accounting a memory leak was being reported when it was only an accounting issue. All memory allocated with kmem_alloc() must be released with kmem_free() or it will not be properly accounted for. In this case the code used strfree() to release the memory allocated by kmem_alloc(). Presumably this was done because the size of the memory region wasn't available when the memory needed to be freed. To resolve this issue the code has been updated to use strdup() instead of kmem_alloc() to allocate the memory. Like strfree(), strdup() is not integrated with the memory accounting. This means we can use strfree() to release it like Illumos. SPL: kmem leaked 10/4368729 bytes address size data func:line ffff880067e9aa40 10 ZZZZZZZZZZ zfsdev_ioctl:5655 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Closes #2262	2014-04-18 13:30:15 -07:00
Brian Behlendorf	e0b8f62902	Various zimport.sh fixes 1) $SPLSRC and $SRCDIR should be changed to $SRC_DIR. These are vestiges of an earlier version of the script and were missed when it was updated. Additionally ensure the directory is created. 2) The 'fail' function should take an integer argument for the error code to return. Otherwise 0 (success) will be mistakenly returned and errors will we incorrectly suppressed. The error code should be meaningful enough to determine where the script failed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-04-17 09:30:55 -07:00
Tim Chase	b066274a77	Report atime and relatime as the property's actual value. Neither atime nor relatime should be considered to be "temporary mount point properties". Their semantics are enforced completely within ZFS and also they're (correctly) not documented as being temporary mount point properties. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2257	2014-04-16 11:57:17 -07:00
DHE	2dbedf5484	Uninitialized variable spa_autoreplace used Caught by ztest and valgrind. Signed-off-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2259	2014-04-16 10:59:24 -07:00
Richard Yao	89aa97059d	Change spl_kmem_cache_expire default setting to 2 This behavior is more consistent with the way memory reclaim is expected to work under Linux. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #349	2014-04-14 16:29:01 -07:00
Chunwei Chen	0b75bdb369	Use ddi_time_after and friends to compare time Also, make sure we use clock_t for ddi_get_lbolt to prevent type conversion from screwing things. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2142	2014-04-14 13:27:56 -07:00
Andrey Vesnovaty	bdfbe594a1	Expose max/min objs per slab and max slab size By default maximal number of objects in slab can't exceed (162 - 1) and slab size can't exceed 32M. Today's high end servers having couple hundreds of RAM available for ARC may run into a trouble with virtual memory because of the restriction mentioned above. Problem: Reasons for very high number of virtual memory allocations: Real slab size very small relative to the size of the entire RAM * Slabs allocated on virtual memory and fill entire ARC The result is very high number of allocated virtual memory ranges (hundreds of ranges). When virtual memory subsystem manages high number of ranges its performance become so poor that it freezes from time to time. Solution: Number of objects per slab should be increased taking into account maximal slab size which can also be increased if needed. Signed-off-by: Andrey Vesnovaty <andrey.vesnovaty@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #337	2014-04-14 09:42:04 -07:00
Chunwei Chen	545e9ac00a	Add ddi_time_after and friends When comparing times gotten from ddi_get_lbolt, we have to take account of wrap around of jiffies. Therefore, we cannot use 't1 < t2'. Instead we should use 't1 - t2 < 0'. This patch add ddi_time_after and friends to address this issue. They have strict type restriction, clock_t for vanilla and int64_t for 64 version, to prevent type conversion from screwing things. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #335	2014-04-14 09:32:01 -07:00
Yuxuan Shui	6c48cd8ac2	This patch add a CTASSERT macro for compile time assertion. This macro makes the compile to spit "mixed definition and code" warning, I can't find a way to avoid it. This patch lays some groundwork for the persistent l2arc feature. See https://www.illumos.org/issues/3525. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #303	2014-04-14 09:28:53 -07:00
Richard Yao	acf0ade362	Simplify hostid logic There is plenty of compatibility code for a hw_hostid that isn't used by anything. At the same time, there are apparently issues with the current hostid logic. coredumb in #zfsonlinux on freenode reported that Fedora 17 changes its hostid on every boot, which required force importing his pool. A suggestion by wca was to adopt FreeBSD's behavior, where it treats hostid as zero if /etc/hostid does not exist Adopting FreeBSD's behavior permits us to eliminate plenty of code, including a userland helper that invokes the system's hostid as a fallback. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #224	2014-04-14 09:04:41 -07:00
Tim Chase	3ceb71e896	Call kthread_create() correctly with fixed arguments. The kernel's kthread_create() function is defined as "..." and there is no va_list variant at the moment. The task name is pre-formatted into a local buffer and passed to kthread_create() with fixed arguments. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #347	2014-04-11 09:41:40 -07:00
Brian Behlendorf	888f7141a3	Make zimport.sh bash dependency explicit Unfortunately, the zimport.sh test script really does depend on bash. Moving to /bin/sh should be possible once the shared infrastructure scripts it depends on is made portable. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-04-10 16:07:59 -07:00
Chunwei Chen	b761912b34	Linux 3.14 compat: rq_for_each_segment in dmu_req_copy rq_for_each_segment changed from taking bio_vec * to taking bio_vec. We provide rq_for_each_segment4 which takes both. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2124	2014-04-10 14:28:51 -07:00
Chunwei Chen	22760eebef	Revert "Fix zvol+btrfs hang" After the dmu_req_copy change, bi_io_vecs are not touched, so this is no longer needed. This reverts commit `e26ade5101`. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2124	2014-04-10 14:28:47 -07:00
Chunwei Chen	215b4634c7	Refactor dmu_req_copy for immutable biovec changes Originally, dmu_req_copy modifies bv_len and bv_offset in bio_vec so that it can continue in subsequent passes. However, after the immutable biovec changes in Linux 3.14, this is not allowed. So instead, we just tell dmu_req_copy how many bytes are already copied and it will skip to the right spot accordingly. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2124	2014-04-10 14:28:43 -07:00
Chunwei Chen	d4541210f3	Linux 3.14 compat: Immutable biovec changes in vdev_disk.c bi_sector, bi_size and bi_idx are moved from bio to bio->bi_iter. This patch creates BIO_BI_*(bio) macros to hide the differences. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2124	2014-04-10 14:28:38 -07:00
Chunwei Chen	408ec0d2e1	Linux 3.14 compat: posix_acl_{create,chmod} posix_acl_{create,chmod} is changed to __posix_acl_{create_chmod} Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2124	2014-04-10 14:27:03 -07:00
Brian Behlendorf	443c3f7332	Improve zfs.sh error messages Ensure an error message is logged when the 'zfs.sh' script fails to either load a module or if udev fails to create the /dev/zfs device. Error messages for missing KERNEL_MODULES are suppressed because that functionality may just be built-in to the kernel. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-04-10 14:27:00 -07:00
Tim Chase	ed650dee76	De-inline spl_kthread_create(). The function was defined as a static inline with variable arguments which causes gcc to generate errors on some distros. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #346	2014-04-09 19:17:12 -07:00
Chris Dunlap	6ac770b196	Replace zed_file_create_dirs() with mkdirp() When processing directory components starting from the root dir, zed_file_create_dirs() contained a bug in checking the return value of mkdir(). A typo was made, and the test for (mkdir_errno != EEXIST) was erroneously written as (mkdir_errno == EEXIST). If some of the leading directory components already existed, this bug would cause the routine to exit before creating the remaining directory components. Instead of fixing the above mkdir_errno test, this commit replaces zed_file_create_dirs() with mkdirp(). This cleanup was already planned, and zed_file_create_dirs() only existed because I didn't realize mkdirp() was already in tree at the time. Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2248	2014-04-09 13:32:54 -07:00
Chris Dunlap	7368eb621e	Set errno for mkdirp() called with NULL path ptr If mkdirp() is called with a NULL ptr for the path arg, it will return -1 with errno unchanged. This is unexpected since on error it should return -1 and set errno to one of the error values listed for mkdir(2). This commit sets errno = ENOENT for this NULL ptr case. This is in accordance with the errors specified by mkdir(2): ENOENT A component of the path prefix does not exist or is a null pathname. Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2248	2014-04-09 13:32:22 -07:00
Brian Behlendorf	cc9ee13e1a	Dynamically create loop devices Several of the in-tree regression tests depend on the availability of loop devices. If for some reason no loop devices are available the tests will fail. Normally this isn't an issue because most Linux distributions create 8 loop devices by default. This is enough for our purposes. However, recent Fedora releases have only been creating a single loop device and this leads to failures. Alternately, if something else of the system is using the loop devices we may see failures. The fix for this is to update the support scripts to dynamically create loop devices as needed. The scripts need only create a node under /dev/ and the loop driver with create the minor. This behavior has been supported by the loop driver for ages. Additionally this patch updates cleanup_loop_devices() to cleanup loop devices which have already had their file store deleted. This helps prevent stale loop devices from accumulating on the system due to test failures. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov> Closes #2249	2014-04-09 13:29:32 -07:00
Richard Yao	787c455ed7	Improve partition detection on lesser used devices The format strings in efi_get_info() are intended to extract both the main device and partition number. However, this is only done correctly for hd, sd and vd devices. The format strings for ram, dm-, md and loop devices misparse the input. This causes the partition device to be incorrectly labelled as the main device with the partition being labelled 0. Reported-by: ilovezfs <ilovezfs@icloud.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2175	2014-04-08 14:45:12 -07:00
Tim Chase	17a527cb0f	Support post-3.13 kthread_create() semantics. Provide spl_kthread_create() as a wrapper to the kernel's kthread_create() to provide pre-3.13 semantics. Re-try if the call is interrupted or if it would have returned -ENOMEM. Otherwise return NULL. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #339	2014-04-08 12:44:42 -07:00
Brian Behlendorf	e19101e08f	splat cred:groupmember: Fix false positives Due to certain assumptions made in the the cred:groupmember test it could result in false positives when run on specific distributions. This was solely a bug in the test case and not in the groupmember() function which the test case was validating. To prevent future false positives the test case has been rewritten to be both more rigerous and to make fewer assumptions about the system. Minor style cleanup was done to cr_groups_search() and groupmember() functions. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-04-08 12:44:41 -07:00
Brian Behlendorf	668d2a0da5	splat kmem:slab_reclaim: Test cleanup By setting __GFP_NORETRY the kernel memory reclaim logic was allowed to abort early and dump a falled allocation stack to the console. Since this was done in a tight loop to fill memory it could result in a large number of stacks being dumped to the console. This in turn slowed down the test sufficiently so it exceeded the time limit and failed. To resolve this issue the __GFP_NORETRY flag is being removed. This is how it should have been called originally to ensure we're simulating the behavior of most callers which will use the GFP_KERNEL flag. In addition, the reclaim granularity of 1000 objects was far to coarse for this to be a realistic test. For kmem:slab_reclaim there might only be a few thousand objects total in the cache. Therefore, the SPLAT_KMEM_OBJ_RECLAIM constant for these tests was lowered. This will cause the reclaim callback to run more frequently which makes for a better test case. The frequency of the cache reaping in kmem:slab_reap was increased to accommodate the reduced number of objects released during the reclaim. These changes only impact the test cases and were done to remove false positives caused by the test case itself. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-04-08 12:44:41 -07:00
Turbo Fredriksson	b79e1f1f27	Allow specifying '-o <opts>' in defaults/init script. Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2103	2014-04-04 09:49:09 -07:00
Turbo Fredriksson	e37212f9a2	Support using overlay mounts in defaults/init script. Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2103	2014-04-04 09:48:25 -07:00
John M. Layman	cbca6076b3	Fix for re-reading /etc/mtab. This is a continuation of `fb5c53ea65`: When /etc/mtab is updated on Linux it's done atomically with rename(2). A new mtab is written, the existing mtab is unlinked, and the new mtab is renamed to /etc/mtab. This means that we must close the old file and open the new file to get the updated contents. Using rewind(3) will just move the file pointer back to the start of the file, freopen(3) will close and open the file. In this commit, a few more rewind(3) calls were replaced with freopen(3) to allow updated mtab entries to be picked up immediately. Signed-off-by: John M. Layman <jml@frijid.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2215 Issue #1611	2014-04-04 09:46:20 -07:00
Richard Yao	f3ad9cd67a	Fix locking order in zfs_zget() Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-04-04 09:12:47 -07:00
Richard Yao	6f9548c487	Fix deadlock in zfs_zget() zfsonlinux/zfs#180 occurred because of a race between inode eviction and zfs_zget(). zfsonlinux/zfs@36df284 tried to address it by making a call to the VFS to learn whether an inode is being evicted. If it was being evicted the operation was retried after dropping and reacquiring the relevant resources. Unfortunately, this introduced another deadlock. INFO: task kworker/u24:6:891 blocked for more than 120 seconds. Tainted: P O 3.13.6 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u24:6 D ffff88107fcd2e80 0 891 2 0x00000000 Workqueue: writeback bdi_writeback_workfn (flush-zfs-5) ffff8810370ff950 0000000000000002 ffff88103853d940 0000000000012e80 ffff8810370fffd8 0000000000012e80 ffff88103853d940 ffff880f5c8be098 ffff88107ffb6950 ffff8810370ff980 ffff88103a9a5b78 0000000000000000 Call Trace: [<ffffffff813dd1d4>] schedule+0x24/0x70 [<ffffffff8115fc09>] __wait_on_freeing_inode+0x99/0xc0 [<ffffffff8115fdd8>] find_inode_fast+0x78/0xb0 [<ffffffff811608c5>] ilookup+0x65/0xd0 [<ffffffffa035c5ab>] zfs_zget+0xdb/0x260 [zfs] [<ffffffffa03589d6>] zfs_get_data+0x46/0x340 [zfs] [<ffffffffa035fee1>] zil_add_block+0xa31/0xc00 [zfs] [<ffffffffa0360642>] zil_commit+0x12/0x20 [zfs] [<ffffffffa036a6e4>] zpl_putpage+0x174/0x840 [zfs] [<ffffffff811071ec>] do_writepages+0x1c/0x40 [<ffffffff8116df2b>] __writeback_single_inode+0x3b/0x2b0 [<ffffffff8116ecf7>] writeback_sb_inodes+0x247/0x420 [<ffffffff8116f5f3>] wb_writeback+0xe3/0x320 [<ffffffff81170b8e>] bdi_writeback_workfn+0xfe/0x490 [<ffffffff8106072c>] process_one_work+0x16c/0x490 [<ffffffff810613f3>] worker_thread+0x113/0x390 [<ffffffff81066edf>] kthread+0xdf/0x100 This patch implements the original fix in a slightly different manner in order to avoid both deadlocks. Instead of relying on a call to ilookup() which can block in __wait_on_freeing_inode() the return value from igrab() is used. This gives us the information that ilookup() provided without the risk of a deadlock. Alternately, this race could be closed by registering an sops->drop_inode() callback. The callback would need to detect the active SA hold thereby informing the VFS that this inode should not be evicted. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #180	2014-04-04 09:11:54 -07:00
Brian Behlendorf	8ac67298b1	Revert "Fixed a use-after-free bug in zfs_zget()." This reverts commit `36df284366`. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-04-03 16:23:28 -07:00
Chris Dunlap	7c05c6185b	Merge branch 'zed-initial' zed monitors ZFS events. When a zevent is posted, zed will run any scripts that have been enabled for the corresponding zevent class. Multiple scripts may be invoked for a given zevent. The zevent nvpairs are passed to the scripts as environment variables. Refer to the zed(8) manpage for details. Events are processed synchronously by the single thread, and there is no maximum timeout for script execution. Consequently, a misbehaving script can delay (or forever block) the processing of subsequent zevents. Plans are to address this in future commits. An EID (Event IDentifier) has been added to each event to uniquely identify it throughout the lifetime of the loaded ZFS kernel module; it is a monotonically increasing integer that resets to 1 each time the module is loaded. Initial scripts have been developed to log zevents to syslog, automatically rebuild to a hot spare device, and send email in response to checksum / data / io / resilver.finish / scrub.finish zevents. To enable email notifications, configure ZED_EMAIL in zed.rc (which is serving as a config file of sorts until a proper configuration file is implemented). To enable hot sparing, uncomment ZED_SPARE_ON_IO_ERRORS and ZED_SPARE_ON_CHECKSUM_ERRORS in zed.rc; note that the autoexpand property is not yet supported. zed is a work-in-progress. Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2	2014-04-02 16:03:51 -07:00

... 49 50 51 52 53 ...

4660 Commits