mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-04-17 08:54:52 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	e9832eb272	Remove zfs-dracut and zfs-test dependencies Remove from the zfs package the depenencies on the zfs-dracut and zfs-test subpackages. Neither of these packages are required for normal operation and they bring in many unnecessary dependencies during installation. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1395	2013-07-03 14:58:42 -07:00
Brian Behlendorf	91604b298c	Open pools asynchronously after module load One of the side effects of calling zvol_create_minors() in zvol_init() is that all pools listed in the cache file will be opened. Depending on the state and contents of your pool this operation can take a considerable length of time. Doing this at load time is undesirable because the kernel is holding a global module lock. This prevents other modules from loading and can serialize an otherwise parallel boot process. Doing this after module inititialization also reduces the chances of accidentally introducing a race during module init. To ensure that /dev/zvol/<pool>/<dataset> devices are still automatically created after the module load completes a udev rules has been added. When udev notices that the /dev/zfs device has been create the 'zpool list' command will be run. This then will cause all the pools listed in the zpool.cache file to be opened. Because this process in now driven asynchronously by udev there is the risk of problems in downstream distributions. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #756 Issue #1020 Issue #1234	2013-07-03 09:24:38 -07:00
Richard Yao	2a3871d4bc	Cleanup zvol initialization code The following error will occur on some (possibly all) kernels because blk_init_queue() will try to take the spinlock before we initialize it. BUG: spinlock bad magic on CPU#0, zpool/4054 lock: 0xffff88021a73de60, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 Pid: 4054, comm: zpool Not tainted 3.9.3 #11 Call Trace: [<ffffffff81478ef8>] spin_dump+0x8c/0x91 [<ffffffff81478f1e>] spin_bug+0x21/0x26 [<ffffffff812da097>] do_raw_spin_lock+0x127/0x130 [<ffffffff8147d851>] _raw_spin_lock_irq+0x21/0x30 [<ffffffff812c2c1e>] cfq_init_queue+0x1fe/0x350 [<ffffffff812aacb8>] elevator_init+0x78/0x140 [<ffffffff812b2677>] blk_init_allocated_queue+0x87/0xb0 [<ffffffff812b26d5>] blk_init_queue_node+0x35/0x70 [<ffffffff812b271e>] blk_init_queue+0xe/0x10 [<ffffffff8125211b>] __zvol_create_minor+0x24b/0x620 [<ffffffff81253264>] zvol_create_minors_cb+0x24/0x30 [<ffffffff811bd9ca>] dmu_objset_find_spa+0xea/0x510 [<ffffffff811bda71>] dmu_objset_find_spa+0x191/0x510 [<ffffffff81253ea2>] zvol_create_minors+0x92/0x180 [<ffffffff811f8d80>] spa_open_common+0x250/0x380 [<ffffffff811f8ece>] spa_open+0xe/0x10 [<ffffffff8122817e>] pool_status_check.part.22+0x1e/0x80 [<ffffffff81228a55>] zfsdev_ioctl+0x155/0x190 [<ffffffff8116a695>] do_vfs_ioctl+0x325/0x5a0 [<ffffffff8116a950>] sys_ioctl+0x40/0x80 [<ffffffff814812c9>] ? do_page_fault+0x9/0x10 [<ffffffff81483929>] system_call_fastpath+0x16/0x1b zd0: unknown partition table We fix this by calling spin_lock_init before blk_init_queue. The manner in which zvol_init() initializes structures is suspectible to a race between initialization and a probe on a zvol. We reorganize zvol_init() to prevent that. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-07-03 09:23:35 -07:00
Pawel Jakub Dawidek	526af78550	Call zvol_create_minors() in spa_open_common() when initializing pool There is an extremely odd bug that causes zvols to fail to appear on some systems, but not others. Recently, I was able to consistently reproduce this issue over a period of 1 month. The issue disappeared after I applied this change from FreeBSD. This is from FreeBSD's pool version 28 import, which occurred in revision 219089. Ported-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #441 Issue #599	2013-07-03 09:22:44 -07:00
Brian Behlendorf	c76955eaa5	Fix parse_dataset error handling A mount failure was accidentally introduced by commit `0c1171d` which reworked the parse_dataset() function to read pool names from devices. The error case where a label is read from the device but the pool name/value pair doesn't exist was not handled properly. In this case we should fall back to the previous behavior. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1560	2013-07-03 09:20:52 -07:00
George Wilson	294f68063b	Illumos #3498 panic in arc_read() 3498 panic in arc_read(): !refcount_is_zero(&pbuf->b_hdr->b_refcnt) Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: illumos/illumos-gate@1b912ec710 https://www.illumos.org/issues/3498 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1249	2013-07-02 13:34:31 -07:00
Matthew Ahrens	96b89346c0	Illumos #3122 zfs destroy filesystem should prefetch blocks 3122 zfs destroy filesystem should prefetch blocks Reviewed by: Christopher Siden <chris.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: illumos/illumos-gate@b4709335aa https://www.illumos.org/issues/3122 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1565	2013-07-02 13:34:02 -07:00
Richard Yao	3db3ff4a78	Use MAXPATHLEN instead of sizeof in snprintf This silences a GCC 4.8.0 warning by fixing a programming error caught by static analysis: ../../cmd/ztest/ztest.c: In function ‘ztest_vdev_aux_add_remove’: ../../cmd/ztest/ztest.c:2584:33: error: argument to ‘sizeof’ in ‘snprintf’ call is the same expression as the destination; did you mean to provide an explicit length? [-Werror=sizeof-pointer-memaccess] (void) snprintf(path, sizeof (path), ztest_aux_template, ^ Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1480	2013-07-02 10:39:24 -07:00
Cyril Plisko	29dee3ee9a	Add zfs_sync_pass_* tunable parameters Commit `55d85d5a8c` (backport of the upstream changes) replaced three hardcoded constants: #define SYNC_PASS_DEFERRED_FREE 2 /* defer frees after this pass / #define SYNC_PASS_DONT_COMPRESS 4 / don't compress after this pass / #define SYNC_PASS_REWRITE 1 / rewrite new bps after this pass / with a tunable parameters: int zfs_sync_pass_deferred_free = 2; / defer frees starting in this pass / int zfs_sync_pass_dont_compress = 5; / don't compress starting in this pass / int zfs_sync_pass_rewrite = 2; / rewrite new bps starting in this pass */ This commit makes these tunables available as module parameters in Linux. They should only be used for performance analysis because changing them can result in subtle and pathological performance problems. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1562	2013-07-02 09:34:18 -07:00
Li Dongyang	802e7b5feb	Add SEEK_DATA/SEEK_HOLE to lseek()/llseek() The approach taken was the rework zfs_holey() as little as possible and then just wrap the code as needed to ensure correct locking and error handling. Tested with xfstests 285 and 286. All tests pass except for 7-9 of 285 which try to reserve blocks first via fallocate(2) and fail because fallocate(2) is not yet supported. Note that the filp->f_lock spinlock did not exist prior to Linux 2.6.30, but we avoid the need for autotools check by virtue of the fact that SEEK_DATA/SEEK_HOLE support was not added until Linux 3.1. An autoconf check was added for lseek_execute() which is currently a private function but the expectation is that it will be exported perhaps as early as Linux 3.11. Reviewed-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1384	2013-07-02 09:24:43 -07:00
Matthew Ahrens	cf91b2b6b2	Readd zfs_holey() from OpenSolaris This patch restores the zfs_holey() function from OpenSolaris. This was removed by commit `3558fd7` because it wasn't clear we had a use for it in ZoL. However, this functionality is a prerequisite for adding SEEK_DATA/SEEK_HOLE support to the ZPL. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Issue #1384	2013-07-02 09:24:18 -07:00
shenyan1	0a6bef26ec	kmem_zalloc(..., KM_SLEEP) will never fail By definitition these allocations will never fail. For consistency with the rest of the code remove this dead error handling code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1558	2013-07-01 14:51:48 -07:00
Tim Chase	ab68b6e5db	Fix zfs_sb_teardown/zfs_resume_fs NULL dereference Fix a pair of conditions in which a concurrent umount can cause NULL pointer dereferences: * zfs_sb_teardown - prevent a NULL dereference by not calling dmu_objset_pool with a null z_os. * zfs_resume_fs - don't try to unmount with a null z_os. This change makes the ZoL code more consistent with both Illumos and FreeBSD. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1543	2013-07-01 14:51:45 -07:00
Cyril Plisko	64d7b6cf75	Override default SPA config location via environment When using zdb with non-default SPA config file it is not convenient to add -U <non-default-config-file-path> all the time. This commit introduces support for setting/overriding SPA config location via environment variable 'SPA_CONFIG_PATH'. If -U flag is specified in the command line it will override any other value as usual. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1545	2013-07-01 13:25:44 -07:00
Cyril Plisko	20c17b96c9	Add absent \n at the end of the help text line Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1545	2013-06-28 11:26:41 -07:00
Steven Burgess	e2e229eb18	Formating changes for zpool manpage Some of these entries were hidden before. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1553	2013-06-28 11:17:36 -07:00
Aaron Fineman	bbb75c1190	Add error message for missing /etc/mtab The zpool command should not silently fail when the /etc/mtab file does not exist. This can occur in an initramfs environment when the /etc/mtab file hasn't yet been generated. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1541	2013-06-27 14:43:37 -07:00
Ying Zhu	c12936b141	Fix module probe failure on 32-bit systems Previous commit `7ef5e54e2e` caused module probe failure on 32-bit systems, dmesg showed Unknown symbol __moddi3 This was caused by the modulo operation 'gethrtime() % tqs->stqs_count' in the committed code. Instead of implementing __moddi3 for all 32-bit systems, Behlendorf advised we can just cast the return value of gethrtime() into a uint64_t, since gethrtime does not return negative value on all circumstances we need not care about the potential overflow. Signed-off-by: Ying Zhu <casualfisher@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1551	2013-06-27 10:01:25 -07:00
Brian Behlendorf	88c283952f	Return -EOPNOTSUPP for ZFS_IOC_{GET\|SET}FLAGS Until these hooks are fully implemented return the expected -EOPNOTSUPP error to indicate they are not functional. This allows test suites such as xfstests to cleanly skip testing this functionality until it's implemented. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #229	2013-06-26 15:20:13 -07:00
Brian Behlendorf	0c1171dcb5	Allow fetching the pool from the device at mount To simplify integration with the xfstests test suite the mount.zfs helper has been extended. When passed a block device (/dev/sdX) to mount, instead of a pool/dataset, the pool name will be read from any existing zfs label and used. This allows you to mount the root dataset of a zfs filesystem by specifing any of the member vdevs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-06-26 15:20:09 -07:00
Nathaniel Clark	389cf730ce	Make spl directory setable when building rpms and add --buildroot This adds ability to set the location of spl via defines when building from the spec files. This is useful for build systems that build spl and zfs together without installing the actual rpms. Signed-off-by: Nathaniel Clark <Nathaniel.Clark@misrule.us> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1486	2013-06-21 16:00:45 -07:00
Brian Behlendorf	81eaf15107	Register correct handlers in nvlist_alloc() The non-blocking allocation handlers in nvlist_alloc() would be mistakenly assigned if any flags other than KM_SLEEP were passed. This meant that nvlists allocated with KM_PUSHPUSH or other KM_* debug flags were effectively always using atomic allocations. While these failures were unlikely it could lead to assertions because KM_PUSHPAGE allocations in particular are guaranteed to succeed or block. They must never fail. Since the existing API does not allow us to pass allocation flags to the private allocators the cleanest thing to do is to add a KM_PUSHPAGE allocator. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes zfsonlinux/spl#249	2013-06-20 09:58:15 -07:00
Matthew Ahrens	df4474f92d	Illumos #3805 arc shouldn't cache freed blocks 3805 arc shouldn't cache freed blocks Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Richard Elling <richard.elling@dey-sys.com> Reviewed by: Will Andrews <will@firepipe.net> Approved by: Dan McDonald <danmcd@nexenta.com> References: illumos/illumos-gate@6e6d5868f5 https://www.illumos.org/issues/3805 ZFS should proactively evict freed blocks from the cache. On dcenter, we saw that we were caching ~256GB of metadata, while the pool only had <4GB of metadata on disk. We were wasting about half the system's RAM (252GB) on blocks that have been freed. Even though these freed blocks will never be used again, and thus will eventually be evicted, this causes us to use memory inefficiently for 2 reasons: 1. A block that is freed has no chance of being accessed again, but will be kept in memory preferentially to a block that was accessed before it (and is thus older) but has not been freed and thus has at least some chance of being accessed again. 2. We partition the ARC into several buckets: user data that has been accessed only once (MRU) metadata that has been accessed only once (MRU) user data that has been accessed more than once (MFU) metadata that has been accessed more than once (MFU) The user data vs metadata split is somewhat arbitrary, and the primary control on how much memory is used to cache data vs metadata is to simply try to keep the proportion the same as it has been in the past (each bucket "evicts against" itself). The secondary control is to evict data before evicting metadata. Because of this bucketing, we may end up with one bucket mostly containing freed blocks that are very old, while another bucket has more recently accessed, still-allocated blocks. Data in the useful bucket (with still-allocated blocks) may be evicted in preference to data in the useless bucket (with old, freed blocks). On dcenter, we saw that the MFU metadata bucket was 230MB, while the MFU data bucket was 27GB and the MRU metadata bucket was 256GB. However, the vast majority of data in the MRU metadata bucket (256GB) was freed blocks, and thus useless. Meanwhile, the MFU metadata bucket (230MB) was constantly evicting useful blocks that will be soon needed. The problem of cache segmentation is a larger problem that needs more investigation. However, if we stop caching freed blocks, it should reduce the impact of this more fundamental issue. Ported-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1503	2013-06-20 09:55:52 -07:00
Ying Zhu	6822a0d058	Fix compile warning on 32-bit systems The definition of zfs_vdev_holder casts VDEV_HOLDER into a function pointer passing to linux kernel's block layer function blkdev_get_by_path. However current VDEV_HOLDER is defined to be wider than 32 bits and the compiler warns about potential overflows. Instead of specifying different values for 32-bit and 64-bit systems using ifdefs, choose the common factor 32-bit addresses. Redefine VDEV_HOLDER to 0x2401de7("zholder") here. Signed-off-by: Ying Zhu <casualfisher@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1520	2013-06-19 17:11:55 -07:00
George Wilson	e51be06697	Illumos #3552 , #3564 3552 condensing one space map burns 3 seconds of CPU in spa_sync() thread 3564 spa_sync() spends 5-10% of its time in metaslab_sync() (when not condensing) Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: illumos/illumos-gate@16a4a80742 https://www.illumos.org/issues/3552 https://www.illumos.org/issues/3564 Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1513	2013-06-19 16:22:39 -07:00
Madhav Suresh	c99c90015e	Illumos #3006 3006 VERIFY[S,U,P] and ASSERT[S,U,P] frequently check if first argument is zero Reviewed by Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by George Wilson <george.wilson@delphix.com> Approved by Eric Schrock <eric.schrock@delphix.com> References: illumos/illumos-gate@fb09f5aad4 https://illumos.org/issues/3006 Requires: zfsonlinux/spl@1c6d149feb Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1509	2013-06-19 15:14:10 -07:00
Richard Yao	9eaf0832ad	Improve OpenRC init script The current zfs OpenRC script's dependencies cause OpenRC to attempt to unmount ZFS filesystems at shutdown while things were still using them, which would fail. This is a cosmetic issue, but it should still be addressed. It probably does not affect systems where the rootfs is a legacy filesystem, but any system with the rootfs on ZFS needs to run the ZFS init script after the system is ready to shutdown filesystems. OpenRC's shutdown process occurs in the reverse order of the startup process. Therefore running the ZFS shutdown procedure after filesystems are ready to be unmounted requires running the startup procedure before fstab. This patch changes the dependencies of the script to expliclty run before fstab at boot when the rootfs is ZFS and to run after fstab at boot whenever the rootfs is not ZFS. This should cover most use cases. The only cases not covered well by this are systems with legacy root filesystems where people want to configure fstab to mount a non-ZFS filesystem off a zvol and possibly also systems whose pools are stored on network block devices. The former requires that the ZFS script run before fstab, which could cause ZFS datasets to mount too early and appear under the fstab mount points. The latter requires that the ZFS script run after networking starts, which precludes the ability to store any system information on ZFS. An additional OpenRC script could be written to handle non-root pools on network block devices, but that will depend on user demand and developer time. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1479	2013-06-18 17:03:25 -07:00
Christ Schlacta	fb02fabf9b	Modified arcstat.py to run on linux * Modified kstat_update() to read arcstats from proc. * Fix shebang. * Added Makefile.am entries for arcstat.py Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1506	2013-06-18 15:43:15 -07:00
Christ Schlacta	7634cd54db	Added arcstat.py from FreeNAS Original source: http://support.freenas.org/browser/nanobsd/Files/usr/local/bin/arcstat.py Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1506	2013-06-18 15:30:08 -07:00
Ned Bass	da29fe63f0	Don't leak mount flags into kernel When calling mount(), care must be taken to avoid passing in flags that are used only by the user space utilities. Otherwise we may stomp on flags that are reserved for other purposes in the kernel. In particular, openSUSE 12.3 kernels have added a new MS_RICHACL super-block flag whose value conflicts with our MS_COMMENT flag. This causes incorrect behavior such as the umask being ignored. The MS_COMMENT flag essentially serves as a placeholder in the option_map data structure of zfs_mount.c, but its value is never used. Therefore we can avoid the conflict by defining it to 0. The MS_USERS, MS_OWNER, and MS_GROUP flags also conflict with reserved flags in the kernel. While this is not known to have caused any problems, it is nevertheless incorrect. For the purposes of the mount.zfs helper, the "users", "owner", and "group" options just serve as hints to set additional implied options. Therefore we now define their associated mount flags in terms of the options that they imply rather than giving them unique values. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1457	2013-06-18 15:30:08 -07:00
Steven Burgess	fb82700616	Adds zpool split to man page Adds zpool split documentation to the zpool man page. I only documented the options that I could get to work. While it is documented on some sun blogs that devices can be specified for split, I was not able to get that to work during my testing. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1456	2013-06-18 15:30:07 -07:00
Brian Behlendorf	0377189b88	Only check directory xattr on ENOENT When SA xattrs are enabled only fallback to checking the directory xattrs when the name is not found as a SA xattr. Otherwise, the SA error which should be returned to the caller is overwritten by the directory xattr errors. Positive return values indicating success will also be immediately returned. In the case of #1437 the ERANGE error was being correctly returned by zpl_xattr_get_sa() only to be overridden with ENOENT which was returned by the subsequent unnessisary call to zpl_xattr_get_dir(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1437	2013-05-10 12:24:56 -07:00
Cyril Plisko	4f34b3bdf4	zfs_scrub_limit tunable is not used anywhere As a part of scrub/resilver tuning zfs_scrub_limit fell out of use, but the definition of the variable remained in place. Moreover various guides still (misleadingly) mention it as a way to influence resilver/scrub behavior. This commit removes its finally. Signed-off-by: Cyril Plisko <cyril.plisko@mountall.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1444	2013-05-06 14:14:06 -07:00
Ying Zhu	ee664d4631	Fix incorrect assertions in ddt_phys_decref and ddt_sync_entry The assertions in ddt_phys_decref and ddt_sync_entry cast ddp->ddp_refcnt from uint64_t to int64_t, with a reference count bigger than 2^63, e.g. the reference count of zero blocks commonly available in spare files, we may mistakenly hit these assertations, so drop the type conversions here. Signed-off-by: Ying Zhu <casualfisher@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1436	2013-05-06 14:10:55 -07:00
Brian Behlendorf	044baf009a	Use taskq for dump_bytes() The vn_rdwr() function performs I/O by calling the vfs_write() or vfs_read() functions. These functions reside just below the system call layer and the expectation is they have almost the entire 8k of stack space to work with. In fact, certain layered configurations such as ext+lvm+md+multipath require the majority of this stack to avoid stack overflows. To avoid this posibility the vn_rdwr() call in dump_bytes() has been moved to the ZIO_TYPE_FREE, taskq. This ensures that all I/O will be performed with the majority of the stack space available. This ends up being very similiar to as if the I/O were issued via sys_write() or sys_read(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1399 Closes #1423	2013-05-06 14:05:42 -07:00
Adam Leventhal	7ef5e54e2e	Illumos #3581 spa_zio_taskq[ZIO_TYPE_FREE][ZIO_TASKQ_ISSUE]->tq_lock contention 3581 spa_zio_taskq[ZIO_TYPE_FREE][ZIO_TASKQ_ISSUE]->tq_lock is piping hot Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Gordon Ross <gordon.ross@nexenta.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: illumos/illumos-gate@ec94d32 https://illumos.org/issues/3581 Notes for Linux port: Earlier commit `08d08eb` reduced contention on this taskq lock by simply reducing the number of z_fr_iss threads from 100 to one-per-CPU. We also optimized the taskq implementation in zfsonlinux/spl@3c6ed54. These changes significantly improved unlink performance to acceptable levels. This patch further reduces time spent spinning on this lock by randomly dispatching the work items over multiple independent task queues. The Illumos ZFS developers stated that this lock contention only arose after "3329 spa_sync() spends 10-20% of its time in spa_free_sync_cb()" was landed. It's not clear if 3329 affects the Linux port or not. I didn't see spa_free_sync_cb() show up in oprofile sessions while unlinking large files, but I may just not have used the right test case. I tested unlinking a 1 TB of data with and without the patch and didn't observe a meaningful difference in elapsed time. However, oprofile showed that the percent time spent in taskq_thread() was reduced from about 16% to about 5%. Aside from a possible slight performance benefit this may be worth landing if only for the sake of maintaining consistency with upstream. Ported-by: Ned Bass <bass6@llnl.gov> Closes #1327	2013-05-06 14:05:37 -07:00
George Wilson	55d85d5a8c	Illumos #3329 , #3330 , #3331 , #3335 3329 spa_sync() spends 10-20% of its time in spa_free_sync_cb() 3330 space_seg_t should have its own kmem_cache 3331 deferred frees should happen after sync_pass 1 3335 make SYNC_PASS_* constants tunable Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Dan McDonald <danmcd@nexenta.com> Approved by: Eric Schrock <eric.schrock@delphix.com> References: illumos/illumos-gate@01f55e48fb https://www.illumos.org/issues/3329 https://www.illumos.org/issues/3330 https://www.illumos.org/issues/3331 https://www.illumos.org/issues/3335 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-05-06 12:39:34 -07:00
George Wilson	5853fe790d	Illumos #3306 , #3321 3306 zdb should be able to issue reads in parallel 3321 'zpool reopen' command should be documented in the man page and help Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: illumos/illumos-gate@31d7e8fa33 https://www.illumos.org/issues/3306 https://www.illumos.org/issues/3321 The vdev_file.c implementation in this patch diverges significantly from the upstream version. For consistenty with the vdev_disk.c code the upstream version leverages the Illumos bio interfaces. This makes sense for Illumos but not for ZoL for two reasons. 1) The vdev_disk.c code in ZoL has been rewritten to use the Linux block device interfaces which differ significantly from those in Illumos. Therefore, updating the vdev_file.c to use the Illumos interfaces doesn't get you consistency with vdev_disk.c. 2) Using the upstream patch as is would requiring implementing compatibility code for those Solaris block device interfaces in user and kernel space. That additional complexity could lead to confusion and doesn't buy us anything. For these reasons I've opted to simply move the existing vn_rdwr() as is in to the taskq function. This has the advantage of being low risk and easy to understand. Moving the vn_rdwr() function in to its own taskq thread also neatly avoids the possibility of a stack overflow. Finally, because of the additional work which is being handled by the free taskq the number of threads has been increased. The thread count under Illumos defaults to 100 but was decreased to 2 in commit 08d08e due to contention. We increase it to 8 until the contention can be address by porting Illumos #3581. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1354	2013-05-03 16:53:52 -07:00
Carlos Alberto Lopez Perez	5165473737	Ensure --with-spl-timeout waits for spl_config.h and symvers The previous code was only waiting for the symver file. But the postinst target of the DKMS script for SPL will not only create the symvers file, but also the header spl_config.h. If we are waiting in the configure script of ZFS for the SPL symvers file, then we also need to wait for spl_config.h. Otherwise the configure script will abort because the spl_config.h is not yet available. On top of that, the function ZFS_AC_SPL_MODULE_SYMVERS is moved to the end of the function ZFS_AC_SPL to allow both checks share the with-spl-timeout parameter. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1431	2013-05-02 15:40:44 -07:00
Brian Behlendorf	a4914d38a7	Silence 'old_umask' uninit variable warning Recent changes have caused older versions of gcc to mistakenly flag 'old_umask' in vn_open() as an unitialized variable. To silence the warning initialize it. kernel.c: In function 'vn_open': kernel.c:525:6: error: 'old_umask' may be used uninitialized in this function Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-05-01 17:05:58 -07:00
Brian Behlendorf	937210a54b	Fix zinject list handlers The zfs_fd must be opened before calling print_all_handlers() or the ioctl() cannot be used to the zfs control device. This brings the zinject code back in sync with the Illumos implementation. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-05-01 17:05:58 -07:00
George.Wilson	cc92e9d0c3	3246 ZFS I/O deadman thread Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> NOTES: This patch has been reworked from the original in the following ways to accomidate Linux ZFS implementation ) Usage of the cyclic interface was replaced by the delayed taskq interface. This avoids the need to implement new compatibility code and allows us to rely on the existing taskq implementation. ) An extern for zfs_txg_synctime_ms was added to sys/dsl_pool.h because declaring externs in source files as was done in the original patch is just plain wrong. ) Instead of panicing the system when the deadman triggers a zevent describing the blocked vdev and the first pending I/O is posted. If the panic behavior is desired Linux provides other generic methods to panic the system when threads are observed to hang. ) For reference, to delay zios by 30 seconds for testing you can use zinject as follows: 'zinject -d <vdev> -D30 <pool>' References: illumos/illumos-gate@283b84606b https://www.illumos.org/issues/3246 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1396	2013-05-01 17:05:52 -07:00
Brian Behlendorf	57f5a2008e	Fix txg_quiesce thread deadlock A deadlock was accidentally introduced by commit `e95853a` which can occur when the system is under memory pressure. What happens is that while the txg_quiesce thread is holding the tx->tx_cpu locks it enters memory reclaim. In the context of this memory reclaim it then issues synchronous I/O to a ZVOL swap device. Because the txg_quiesce thread is holding the tx->tx_cpu locks a new txg cannot be opened to handle the I/O. Deadlock. The fix is straight forward. Move the memory allocation outside the critical region where the tx->tx_cpu locks are held. And for good measure change the offending allocation to KM_PUSHPAGE to ensure it never attempts to issue I/O during reclaim. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1274	2013-04-26 14:42:36 -07:00
Turbo Fredriksson	0c15bf16f1	Ignore *.{deb,rpm,tar.gz} files in the top directory. These are build products and should be ignored. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Issue #1402	2013-04-24 16:18:59 -07:00
Brian Behlendorf	e013670550	Set RPM_DEFINE_COMMON options When the kmod packaging was introduced the ability to pass the --enable-debug and --enable-dmu-tx options from configure all the way through to `make rpm\|deb` was accidenally lost. Update ZFS_AC_RPM to explicitlu set RPM_DEFINE_COMMON with these rpmbuild defines. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1402	2013-04-24 16:18:55 -07:00
Turbo Fredriksson	1a33036df9	Add --bump=0 to alien Preserve the release field when creating Debian packages. The --keep-version option was not used because it results in a failure when the git '<commit>_<hash>' syntax is used for the release. The '_' is a valid character for RPM packages but not for DEBs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Issue #1402 Issue #928	2013-04-24 16:18:53 -07:00
Turbo Fredriksson	d012ba3832	Support .nogitrelease file When building a custom release in a git tree provide the ability to prevent the release field from being overwritten by the `git describe` output. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1402	2013-04-24 16:18:49 -07:00
Turbo Fredriksson	382c4e5184	Possibility to disable (not start) zfs at bootup. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1402	2013-04-24 16:18:44 -07:00
Etienne Dechamps	c4933aade7	Fix various generic kmod RPM spec issues. There are a number of issues with the generic kmod RPM spec in its current state: - The "%{__id_u}" macro seems to not be available on some systems (e.g. Debian squeeze). It appears it has been deprecated. Use "${__id} -u" instead. - The way the "--with-linux=" configure option is generated in the non-RHEL/Fedora case is completely wrong with various newline and escaping issues (also, $kernel_version is not available in the generator context). The second issue made the generator shell snippet (almost) silently fail, which under specific circumstances can result in broken builds against the wrong kernel sources. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1416	2013-04-24 16:18:42 -07:00
Brian Behlendorf	f706421173	Correctly return ERANGE in getxattr(2) According to the getxattr(2) man page the ERANGE errno should be returned when the size of the value buffer is to small to hold the result. Prior to this patch the implementation would just truncate the value to size bytes. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1408	2013-04-24 12:35:04 -07:00

1 2 3 4 5 ...

1030 Commits