mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2025-11-16 18:18:47 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	79aada6105	Restrict release number to META version When creating packages in a git repository the release number can be automatically set by 'git describe'. This normally works well but if your repository has newer tags which match the form NAME-VERSION* the release may be incorrectly calculated. To prevent this the match patten has been restricted to the contents of the META file, NAME-VERSION. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-05-29 19:31:50 -07:00
Brian Behlendorf	c4f38ddd80	Restrict release number to META version When creating packages in a git repository the release number can be automatically set by 'git describe'. This normally works well but if your repository has newer tags which match the form NAME-VERSION* the release may be incorrectly calculated. To prevent this the match patten has been restricted to NAME-VERSION. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-05-29 19:08:03 -07:00
Brian Behlendorf	6795a698f4	Use default slab types We should not override the default memory type of the kmem cache. This was done previously to force certain objects which were slightly over object size limit cut off in to KMC_KMEM caches for better performance. The zfsonlinux/spl#356 patch slightly increases the default cut off from 511 bytes 1024 bytes for x86_64. This means there is long longer a need to override the default for the caches. And since the default values are now being used the new spl_kmem_cache_slab_limit and spl_kmem_cache_kmem_limit tunables will apply to all kmem caches. The following is a list of caches that will be impacted: \| object size \| forced type \| default type ----------------- \| ------------- \| ------------- \| -------------- dnode_t \| 936 bytes \| KMC_KMEM \| KMC_KMEM zio_cache \| 1104 bytes \| KMC_KMEM \| KMC_VMEM zio_link_cache \| 48 bytes \| KMC_KMEM \| KMC_KMEM zio_vdev_cache \| 131088 bytes \| KMC_VMEM \| KMC_VMEM zio_buf_512 \| 512 bytes \| KMC_KMEM \| KMC_KMEM zio_data_buf_512 \| 512 bytes \| KMC_KMEM \| KMC_KMEM zio_buf_1024 \| 1024 bytes \| KMC_KMEM \| KMC_KMEM zio_data_buf_1024 \| 1024 bytes \| +KMC_VMEM \| +KMC_KMEM * Cache memory type will change from KMC_KMEM to KMC_VMEM. + Cache memory type will change from KMC_VMEM to KMC_KMEM. This patch removes another slight point of divergence between ZoL and Illumos. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov> Closes #2337	2014-05-22 10:39:52 -07:00
Brian Behlendorf	376dc35e22	Add spl_kmem_cache_reclaim module option The correct behavior for all registered shrinkers is to return the number of objects in their cache. In theory this allows the Linux VM to balance memory reclaim across all registered caches. In commit `b9b3715` this behavior was disabled in favor of returning -1 which notifies the VM that no additional objects are available for reclaim. This was done as a workaround to resolve thrashing in shrink_slabs() which could occur when memory was low and numerous core where in reclaim. Unfortunately, this has been observed to increase the likelihood of OOM events when SPL slab consumers are responsible for consuming the majority of memory. Therefore, this patch makes this behavior tunable. Setting the spl_kmem_cache_reclaim module option to 0x1 will result in the shrinker only being called once. This is the default behavior. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov> Closes #358	2014-05-22 10:30:12 -07:00
Brian Behlendorf	a073aeb060	Add KMC_SLAB cache type For small objects the Linux slab allocator has several advantages over its counterpart in the SPL. These include: 1) It is more memory-efficient and packs objects more tightly. 2) It is continually tuned to maximize performance. Therefore it makes sense to layer the SPLs slab allocator on top of the Linux slab allocator. This allows us to leverage the advantages above while preserving the Illumos semantics we depend on. However, there are some things we need to be careful of: 1) The Linux slab allocator was never designed to work well with large objects. Because the SPL slab must still handle this use case a cut off limit was added to transition from Linux slab backed objects to kmem or vmem backed slabs. spl_kmem_cache_slab_limit - Objects less than or equal to this size in bytes will be backed by the Linux slab. By default this value is zero which disables the Linux slab functionality. Reasonable values for this cut off limit are in the range of 4096-16386 bytes. spl_kmem_cache_kmem_limit - Objects less than or equal to this size in bytes will be backed by a kmem slab. Objects over this size will be vmem backed instead. This value defaults to 1/8 a page, or 512 bytes on an x86_64 architecture. 2) Be aware that using the Linux slab may inadvertently introduce new deadlocks. Care has been taken previously to ensure that all allocations which occur in the write path use GFP_NOIO. However, there may be internal allocations performed in the Linux slab which do not honor these flags. If this is the case a deadlock may occur. The path forward is definitely to start relying on the Linux slab. But for that to happen we need to start building confidence that there aren't any unexpected surprises lurking for us. And ideally need to move completely away from using the SPLs slab for large memory allocations. This patch is a first step. NOTES: 1) The KMC_NOMAGAZINE flag was leveraged to support the Linux slab backed caches but it is not supported for kmem/vmem backed caches. 2) Regardless of the spl_kmem_cache_*_limit settings a cache may be explicitly set to a given type by passed the KMC_KMEM, KMC_VMEM, or KMC_SLAB flags during cache creation. 3) The constructors, destructors, and reclaim callbacks are all functional and will be called regardless of the cache type. 4) KMC_SLAB caches will not appear in /proc/spl/kmem/slab due to the issues involved in presenting correct object accounting. Instead they will appear in /proc/slabinfo under the same names. 5) Several kmem SPLAT tests needed to be fixed because they relied incorrectly on internal kmem slab accounting. With the updated test cases all the SPLAT tests pass as expected. 6) An autoconf test was added to ensure that the __GFP_COMP flag was correctly added to the default flags used when allocating a slab. This is required to ensure all pages in higher order slabs are properly refcounted, see `ae16ed9`. 7) When using the SLUB allocator there is no need to attempt to set the __GFP_COMP flag. This has been the default behavior for the SLUB since Linux 2.6.25. 8) When using the SLUB it may be desirable to set the slub_nomerge kernel parameter to prevent caches from being merged. Original-patch-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: DHE <git@dehacked.net> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Closes #356	2014-05-22 10:28:01 -07:00
Marcel Huber	58bd7ad060	Omit compiler warning by sticking to RAII Resolve gcc 4.9.0 20140507 warnings about uninitialized 'ptr' when using -Wmaybe-uninitialized. The first two cases appears appear to be legitimate but not the second two. In general this is a good practice so they are all initialized. Signed-off-by: Marcel Huber <marcelhuberfoo@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2345	2014-05-22 09:44:15 -07:00
John Albietz	5f3c101b8f	Added INTEL SSD 530 Series INTEL SSD 530 Series... SSDSC2BW24 Signed-off-by: John Albietz <inthecloud247@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2184	2014-05-19 16:57:14 -07:00
HC	f9a1ac4d59	Honor zfs_nocacheflush for file vdevs For consistency with disk vdevs honor the zfs_nocacheflush tunable. This setting is available primarily for debugging and performance analysis. Signed-off-by: HC <mmttdebbcc@yahoo.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2336	2014-05-19 13:30:48 -07:00
Tim Chase	83021b47c2	Calculate header size correctly in sa_find_sizes() In the case where a variable-sized SA overlaps the spill block pointer and a new variable-sized SA is being added, the header size was improperly calculated to include the to-be-moved SA. This problem could be reproduced when xattr=sa enabled as follows: ln -s $(perl -e 'print "x" x 120') blah setfattr -n security.selinux -v blahblah -h blah The symlink is large enough to interfere with the spill block pointer and has a typical SA registration as follows (shown in modified "zdb -dddd" <SA attr layout obj> format): [ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ] Adding the SA xattr will attempt to extend the registration to: [ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ZPL_DXATTR ] but since the ZPL_SYMLINK SA interferes with the spill block pointer, it must also be moved to the spill block which will have a registration of: [ ZPL_SYMLINK ZPL_DXATTR ] This commit updates extra_hdrsize when this condition occurs, allowing hdrsize to be subsequently decreased appropriately. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Issue #2214 Issue #2228 Issue #2316 Issue #2343	2014-05-19 11:55:50 -07:00
Tim Chase	3937ab20f3	Allow for lock-free reading zfsdev_state_list. Restructure the zfsdev_state_list to allow for lock-free reading by converting to a simple singly-linked list from which items are never deleted and over which only forward iterations are performed. It depends on, among other things, the atomicity of accessing the zs_minor integer and zs_next pointer. This fixes a lock inversion in which the zfsdev_state_lock is used by both the sync task (txg_sync) and indirectly by any user program which uses /dev/zfs; the zfsdev_release method uses the same lock and then blocks on the sync task. The most typical failure scenerio occurs when the sync task is cleaning up a user hold while various concurrent "zfs" commands are in progress. Neither Illumos nor Solaris are affected by this issue because they use DDI interface which provides lock-free reading of device state via the ddi_get_soft_state() function. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2301	2014-05-19 11:45:11 -07:00
Richard Yao	1cbae971c5	Handle ZPOOL_STATUS_HOSTID_MISMATCH in zpool status Verbatim imports can cause hostid mismatches, but things otherwise work. `zpool status` does not handle this and will fail when assertions are enabled: ``` zpool: ../../cmd/zpool/zpool_main.c:4418: status_callback: Assertion `reason == ZPOOL_STATUS_OK' failed. Program received signal SIGABRT, Aborted. ``` Lets instead add a case to display an informative message such as this: ``` pool: rpool state: ONLINE status: Mismatch between pool hostid and system hostid on imported pool. This pool was previously imported into a system with a different hostid, and then was verbatim imported into this system. action: Export this pool on all systems on which it is imported. Then import it to correct the mismatch. see: http://zfsonlinux.org/msg/ZFS-8000-EY scan: scrub repaired 0 in 0h8m with 0 errors on Thu Apr 17 19:43:57 2014 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 sda ONLINE 0 0 0 errors: No known data errors ``` Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2342	2014-05-19 10:16:29 -07:00
Chunwei Chen	bc25c9325b	Use a dedicated taskq for vdev_file Originally, vdev_file used system_taskq. This would cause a deadlock, especially on system with few CPUs. The reason is that the prefetcher threads, which are on system_taskq, will sometimes be blocked waiting for I/O to finish. If the prefetcher threads consume all the tasks in system_taskq, the I/O cannot be served and thus results in a deadlock. We fix this by creating a dedicated vdev_file_taskq for vdev_file I/O. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2270	2014-05-14 16:20:21 -07:00
Chunwei Chen	ad3412efd7	Linux 3.15: vfs_rename() added a flags argument Detect the updated vfs_rename() interface and call it with an extra flags argument. References: torvalds/linux@520c8b1 Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #355	2014-05-07 13:38:17 -07:00
Chunwei Chen	1538f4b6e3	Linux 3.15 compat: NICE_TO_PRIO and PRIO_TO_NICE These macro's were exposed to make them available to other parts of the kernel and modules. References: torvalds/linux@6b6350f Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #355	2014-05-07 13:38:03 -07:00
Brian Behlendorf	2c33b91275	Handle vdev_lookup_top() failure in dva_get_dsize_sync() The dva_get_dsize_sync() function incorrectly assumes that the call to vdev_lookup_top() cannot fail. However, the NULL dereference at clearly shows that under certain circumstances it is possible. Note that offset 0x570 (1376) maps as expected to vd->vdev_deflate_ratio. BUG: unable to handle kernel NULL pointer dereference at 00000570 crash> struct -o vdev struct vdev { [0] uint64_t vdev_id; ... ... [1376] uint64_t vdev_deflate_ratio; Given that this can happen this patch add the required error handling. In the case where vdev_lookup_top() fails assume that no deflation will occur for the DVA and use the asize. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Alexey Zhuravlev <alexey.zhuravlev@intel.com> Closes #1707 Closes #1987 Closes #1891 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-05-06 10:41:48 -07:00
Tim Chase	962d524212	Check the dataset type more rigorously when fetching properties. When fetching property values of snapshots, a check against the head dataset type must be performed. Previously, this additional check was performed only when fetching "version", "normalize", "utf8only" or "case". This caused the ZPL properties "acltype", "exec", "devices", "nbmand", "setuid" and "xattr" to be erroneously displayed with meaningless values for snapshots of volumes. It also did not allow for the display of "volsize" of a snapshot of a volume. This patch adds the headcheck flag paramater to zfs_prop_valid_for_type() and zprop_valid_for_type() to indicate the check is being done against a head dataset's type in order that properties valid only for snapshots are handled correctly. This allows the the head check in get_numeric_property() to be performed when fetching a property for a snapshot. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2265	2014-05-06 10:41:46 -07:00
Brian Behlendorf	1ce0457348	Fix style A minor style issue was accidentally introduced by `aa7d06a`. This change resolves that style problem. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-05-06 10:41:17 -07:00
George Wilson	aa7d06a98a	Illumos #4101 finer-grained control of metaslab_debug Today the metaslab_debug logic performs two tasks: - load all metaslabs on import/open - don't unload metaslabs at the end of spa_sync This change provides knobs for each of these independently. References: https://illumos.org/issues/4101 https://github.com/illumos/illumos-gate/commit/0713e23 Notes: 1) This is a small piece of the metaslab improvement patch from Illumos. It was worth bringing over before the rest, since it's low risk and it can be useful on fragmented pools (e.g. Lustre MDTs). metaslab_debug_unload would give the performance benefit of the old metaslab_debug option without causing unwanted delay during pool import. Ported-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2227	2014-05-06 09:46:04 -07:00
Brian Behlendorf	cc79a5c263	Treat spill block dbufs as meta data When the system attributes (SAs) for an object exceed what can can be stored in the bonus area of a dnode a spill block is allocated. These spill blocks are currently considered data blocks. However, they should be accounted for as meta data because they are effectively an extension of the dnode. While this may seem like a minor accounting issue it has broader implications. The key thing to be aware of is that each spill block will hold a reference on its parent dnode. The dnode in turn holds a reference on its dbuf in the dnode object. This means that a single 512 byte data buffer for a spill block can pin over 16k of meta data. This is analogous to the small file situation described in `2b13331` where a relatively small number of data buffer can cause the ARC to exceed the meta limit. However, unlike the small file case a spill block can legitimately be considered meta data. By changing the spill block to meta data they will now be dropped from the cache when the meta limit is reached. This then allows the dnodes and dbufs which the spill block was pinning to be released. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov> Closes #2294	2014-05-05 13:56:59 -07:00
Brian Behlendorf	51268f31a8	Remove SELinux enforcing check from init scripts The default SELinux policy for RHEL and Fedora has been updated to include ZFS in the list of filesystems which support xattrs. Therefore, there's no longer a need to detect this in the init scripts. References: https://bugzilla.redhat.com/show_bug.cgi?id=811532 https://bugzilla.redhat.com/show_bug.cgi?id=816543 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2166	2014-05-02 11:37:46 -07:00
Richard Yao	7809eb8b65	ztest: Switch to LWP rwlock interface ztest is intended to subject the ZFS code in userland to stress that it should be able to withstand. Any failures that occur when running it are failures that likely would occur inside the kernel. However, being in userland, it is much easier to debug them. In practice, this prevents a large number of problems from reaching production code. A design decision was made by the original authors of ztest to make a distinction between userland locking primitives and kernel locking primitives. The ztest code itself calls userland locking primitives while the kernel code being run in userland will call emulated kernel locking primitives that wrap the userland locking primitives. When ztest was first ported to Linux, a decision was made to use the emulated kernel interfaces everywhere. In effect, the userland rw_rdlock()/rw_wrlock() became the kernel rw_enter() and and the userland rw_unlock() became the kernel rw_exit(). This caused a regression because of an assertion in rw_enter() to catch recursive locking. That is permitted in userland, but not in the kernel. Consequently, the ztest code itself does recursive read locking. The use of the emulated kernel interfaces consequently caused the following failure: ztest: ../../lib/libzpool/kernel.c:384: Assertion `rwlp->rw_owner != zk_thread_current() (0x1c87150 != 0x1c87150)' failed. That occurs because ztest_dmu_objset_create_destroy() will take a read lock and call ztest_dmu_object_alloc_free(). That will call ztest_io(), which will take a readlock only when asked to do ZTEST_IO_REWRITE. This triggered the assertion. The pthreads rwlock interface was based on the LWP rwlock interface implemented in Illumos libc. Luckily enough, the subset used by ztest is almost identical, so we can solve this problem by switching to the LWP thread rwlock interface in ztest. This eliminates a point of divergence with Illumos and should make code sharing slightly easier. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1970	2014-05-01 15:53:58 -07:00
Richard Yao	3af3df905f	libspl: Implement LWP rwlock interface This implements a subset of the LWP rwlock interface by wrapping the equivalent POSIX thread interface. It is a superset of the features needed by ztest. The missing bits are {,_}rw_read_held() and {,_}rw_write_held(). Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1970	2014-05-01 15:53:52 -07:00
Richard Yao	c6e924fea8	Fix libblkid ZFS detection when making new pools zfsonlinux/zfs@1db7b9be75 should have fixed this, but this particular string was overlooked. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2288	2014-05-01 13:26:33 -07:00
Brian Behlendorf	12f9a6a3f9	dmu_tx_assign() should not return ENOMEM As described in the comment above dmu_tx_assign() this function must only fail if the pool is out of space. If for some other reason the TX cannot be assigned (such as memory pressure) ERESTART must be returned. Alternately, EAGAIN could be returned to inject a delay but that isn't required because the caller will block on the condition variable waiting for the next TXG. /* * Assign tx to a transaction group. txg_how can be one of: * * (1) TXG_WAIT. If the current open txg is full, waits until there's * a new one. This should be used when you're not holding locks. * It will only fail if we're truly out of space (or over quota). * ... */ Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #2287	2014-05-01 12:08:53 -07:00
Richard Yao	9d317793aa	Implement File Attribute Support We add support for lsattr and chattr to resolve a regression caused by `88c283952f` that broke Python's xattr.list(). That changet broke Gentoo Portage's FEATURES=xattr, which depended on Python's xattr.list(). Only attributes common to both Solaris and Linux are supported. These are 'a', 'd' and 'i' in Linux's lsattr and chattr commands. File attributes exclusive to Solaris are present in the ZFS code, but cannot be accessed or modified through this method. That was the case prior to this patch. The resolution of issue zfsonlinux/zfs#229 should implement some method to permit access and modification of Solaris-specific attributes. References: https://bugs.gentoo.org/show_bug.cgi?id=483516 Original-patch-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1691	2014-05-01 10:11:18 -07:00
Richard Yao	3b4f425a5a	Refactor inode_owner_or_capable() autotools check We need inode_owner_or_capable() for ZFS file attributes in addition to xattrs, so it should go into its own file. This moves it into its own file and changes it to be more comprehensive. It will now fail if no known good API is detected. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1691	2014-05-01 10:06:49 -07:00
ilovezfs	78597769b4	Fill in mountpoint buffer before using it in errors zfs_is_mountable() fills in the mountpoint buffer, so, as in upstream, it needs to have been called before the mountpoint buffer can be used in error messages. In particular, return (zfs_error_fmt(hdl, EZFS_MOUNTFAILED, dgettext(TEXT_DOMAIN, "cannot mount '%s'"), mountpoint)); should not come before the call to zfs_is_mountable(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: ilovezfs <ilovezfs@icloud.com> Closes #2284	2014-04-30 15:52:01 -07:00
Chunwei Chen	17584980b9	Add assertion to catch 0-count page Some network related block device uses tcp_sendpage, which doesn't behave well when using 0-count page. Add assertion to catch them. This has a runtime dependency on: zfsonlinux/spl@ae16ed9 Fix crash when using ZFS on Ceph rbd Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2277	2014-04-25 15:41:19 -07:00
Jorgen Lundman	cdf37f0c59	Add support for aarch64 (ARMv8) Using the ARM reference simulation (fast model foundation v8) I cross compiled spl and zfs, to confirm it works on ARMv8 (64 bit arm architecture, called aarch64 in Linux). As it is based on previous ARM porting, the resulting patch is disappointingly small, there was very little to do. The code fixes the compile issues and has light testing done. Signed-off-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2260	2014-04-25 15:35:30 -07:00
Andrey Vesnovaty	703371d8c7	Evenly distribute the taskq threads across available CPUs The problem is described in commit `aeeb4e0c0a`. However, instead of disabling the binding to CPU altogether we just keep the last CPU index across calls to taskq_create() and thus achieve even distribution of the taskq threads across all available CPUs. The implementation based on assumption that task queues initialization performed in serial manner. Signed-off-by: Andrey Vesnovaty <andrey.vesnovaty@gmail.com> Signed-off-by: Andrey Vesnovaty <andreyv@infinidat.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #336	2014-04-25 15:29:18 -07:00
Chunwei Chen	ae16ed992b	Fix crash when using ZFS on Ceph rbd When using __get_free_pages to get high order memory, only the first page's _count will set to 1, other's will be 0. When an internal page get passed into rbd, it will eventully go into tcp_sendpage. There, it will be called with get_page and put_page, and get freed erroneously when _count jump back to 0. The solution to this problem is to use compound page. All pages in a high order compound page share a single _count. So get_page and put_page in tcp_sendpage will not cause _count jump to 0. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #251	2014-04-25 15:26:52 -07:00
Jorgen Lundman	d6e6e4a98e	Add support for aarch64 (ARMv8) Using the ARM reference simulation (fast model foundation v8) I cross compiled spl and zfs, to confirm it works on ARMv8 (64 bit arm architecture, called aarch64 in Linux). As it is based on previous ARM porting, the resulting patch is disappointingly small, there was very little to do. The code fixes the compile issues and has light testing done. Signed-off-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #351	2014-04-25 15:25:32 -07:00
Ned Bass	de39ec11b8	Fix LZ4 endianness autodetection Endianness detection in LZ4 is broken in user-space builds. This bug corrupts compressed data and manifests itself in several ztest failures. When LZ4 was originally ported to Illumos ZFS, the proper checks for Linux were stripped out. The Linux port then inherited the remaining detection code that works on Illumos but not on Linux. The current LZ4 endianness check misuses the condition defined(__BIG_ENDIAN) to indicate a big-endian system. On Linux __BIG_ENDIAN is defined uncondtionally in the user-space header /usr/include/endian.h, regardless of the endianness of the system. The kernel does not use this header, so only user-space builds are affected. While we could fix this by restoring the upstream LZ4 endianness detection code, reliable checks already exist in libspl/include/sys/isa_defs.h. This change uses the libspl results to replace the word-size and endianness checks in LZ4, simplifying the code and reducing duplication. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: DHE <git@dehacked.net> Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Fixes #1963 Fixes #1964 Fixes #1965	2014-04-20 16:55:42 -07:00
Brian Behlendorf	4fd762f8ad	Fix zfsdev_ioctl() kmem leak warning Due to an asymmetry in the kmem accounting a memory leak was being reported when it was only an accounting issue. All memory allocated with kmem_alloc() must be released with kmem_free() or it will not be properly accounted for. In this case the code used strfree() to release the memory allocated by kmem_alloc(). Presumably this was done because the size of the memory region wasn't available when the memory needed to be freed. To resolve this issue the code has been updated to use strdup() instead of kmem_alloc() to allocate the memory. Like strfree(), strdup() is not integrated with the memory accounting. This means we can use strfree() to release it like Illumos. SPL: kmem leaked 10/4368729 bytes address size data func:line ffff880067e9aa40 10 ZZZZZZZZZZ zfsdev_ioctl:5655 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Closes #2262	2014-04-18 13:30:15 -07:00
Brian Behlendorf	e0b8f62902	Various zimport.sh fixes 1) $SPLSRC and $SRCDIR should be changed to $SRC_DIR. These are vestiges of an earlier version of the script and were missed when it was updated. Additionally ensure the directory is created. 2) The 'fail' function should take an integer argument for the error code to return. Otherwise 0 (success) will be mistakenly returned and errors will we incorrectly suppressed. The error code should be meaningful enough to determine where the script failed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-04-17 09:30:55 -07:00
Tim Chase	b066274a77	Report atime and relatime as the property's actual value. Neither atime nor relatime should be considered to be "temporary mount point properties". Their semantics are enforced completely within ZFS and also they're (correctly) not documented as being temporary mount point properties. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2257	2014-04-16 11:57:17 -07:00
DHE	2dbedf5484	Uninitialized variable spa_autoreplace used Caught by ztest and valgrind. Signed-off-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2259	2014-04-16 10:59:24 -07:00
Richard Yao	89aa97059d	Change spl_kmem_cache_expire default setting to 2 This behavior is more consistent with the way memory reclaim is expected to work under Linux. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #349	2014-04-14 16:29:01 -07:00
Chunwei Chen	0b75bdb369	Use ddi_time_after and friends to compare time Also, make sure we use clock_t for ddi_get_lbolt to prevent type conversion from screwing things. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2142	2014-04-14 13:27:56 -07:00
Andrey Vesnovaty	bdfbe594a1	Expose max/min objs per slab and max slab size By default maximal number of objects in slab can't exceed (162 - 1) and slab size can't exceed 32M. Today's high end servers having couple hundreds of RAM available for ARC may run into a trouble with virtual memory because of the restriction mentioned above. Problem: Reasons for very high number of virtual memory allocations: Real slab size very small relative to the size of the entire RAM * Slabs allocated on virtual memory and fill entire ARC The result is very high number of allocated virtual memory ranges (hundreds of ranges). When virtual memory subsystem manages high number of ranges its performance become so poor that it freezes from time to time. Solution: Number of objects per slab should be increased taking into account maximal slab size which can also be increased if needed. Signed-off-by: Andrey Vesnovaty <andrey.vesnovaty@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #337	2014-04-14 09:42:04 -07:00
Chunwei Chen	545e9ac00a	Add ddi_time_after and friends When comparing times gotten from ddi_get_lbolt, we have to take account of wrap around of jiffies. Therefore, we cannot use 't1 < t2'. Instead we should use 't1 - t2 < 0'. This patch add ddi_time_after and friends to address this issue. They have strict type restriction, clock_t for vanilla and int64_t for 64 version, to prevent type conversion from screwing things. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #335	2014-04-14 09:32:01 -07:00
Yuxuan Shui	6c48cd8ac2	This patch add a CTASSERT macro for compile time assertion. This macro makes the compile to spit "mixed definition and code" warning, I can't find a way to avoid it. This patch lays some groundwork for the persistent l2arc feature. See https://www.illumos.org/issues/3525. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #303	2014-04-14 09:28:53 -07:00
Richard Yao	acf0ade362	Simplify hostid logic There is plenty of compatibility code for a hw_hostid that isn't used by anything. At the same time, there are apparently issues with the current hostid logic. coredumb in #zfsonlinux on freenode reported that Fedora 17 changes its hostid on every boot, which required force importing his pool. A suggestion by wca was to adopt FreeBSD's behavior, where it treats hostid as zero if /etc/hostid does not exist Adopting FreeBSD's behavior permits us to eliminate plenty of code, including a userland helper that invokes the system's hostid as a fallback. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #224	2014-04-14 09:04:41 -07:00
Tim Chase	3ceb71e896	Call kthread_create() correctly with fixed arguments. The kernel's kthread_create() function is defined as "..." and there is no va_list variant at the moment. The task name is pre-formatted into a local buffer and passed to kthread_create() with fixed arguments. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #347	2014-04-11 09:41:40 -07:00
Brian Behlendorf	888f7141a3	Make zimport.sh bash dependency explicit Unfortunately, the zimport.sh test script really does depend on bash. Moving to /bin/sh should be possible once the shared infrastructure scripts it depends on is made portable. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-04-10 16:07:59 -07:00
Chunwei Chen	b761912b34	Linux 3.14 compat: rq_for_each_segment in dmu_req_copy rq_for_each_segment changed from taking bio_vec * to taking bio_vec. We provide rq_for_each_segment4 which takes both. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2124	2014-04-10 14:28:51 -07:00
Chunwei Chen	22760eebef	Revert "Fix zvol+btrfs hang" After the dmu_req_copy change, bi_io_vecs are not touched, so this is no longer needed. This reverts commit `e26ade5101`. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2124	2014-04-10 14:28:47 -07:00
Chunwei Chen	215b4634c7	Refactor dmu_req_copy for immutable biovec changes Originally, dmu_req_copy modifies bv_len and bv_offset in bio_vec so that it can continue in subsequent passes. However, after the immutable biovec changes in Linux 3.14, this is not allowed. So instead, we just tell dmu_req_copy how many bytes are already copied and it will skip to the right spot accordingly. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2124	2014-04-10 14:28:43 -07:00
Chunwei Chen	d4541210f3	Linux 3.14 compat: Immutable biovec changes in vdev_disk.c bi_sector, bi_size and bi_idx are moved from bio to bio->bi_iter. This patch creates BIO_BI_*(bio) macros to hide the differences. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2124	2014-04-10 14:28:38 -07:00
Chunwei Chen	408ec0d2e1	Linux 3.14 compat: posix_acl_{create,chmod} posix_acl_{create,chmod} is changed to __posix_acl_{create_chmod} Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2124	2014-04-10 14:27:03 -07:00

... 124 125 126 127 128 ...

8426 Commits