mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-04-17 08:54:52 +03:00

Author	SHA1	Message	Date
ilbsmart	779a6c0bf6	deadlock between mm_sem and tx assign in zfs_write() and page fault The bug time sequence: 1. thread #1, `zfs_write` assign a txg "n". 2. In a same process, thread #2, mmap page fault (which means the `mm_sem` is hold) occurred, `zfs_dirty_inode` open a txg failed, and wait previous txg "n" completed. 3. thread #1 call `uiomove` to write, however page fault is occurred in `uiomove`, which means it need `mm_sem`, but `mm_sem` is hold by thread #2, so it stuck and can't complete, then txg "n" will not complete. So thread #1 and thread #2 are deadlocked. Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Grady Wong <grady.w@xtaotech.com> Closes #7939	2018-10-16 11:11:24 -07:00
Alek P	50a343d85c	Fix changelist mounted-dataset iteration Commit `0c6d093` caused a regression in the inherit codepath. The fix is to restrict the changelist iteration on mountpoints and add proper handling for 'legacy' mountpoints Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> Closes #7988 Closes #7991	2018-10-10 21:13:13 -07:00
Tony Hutter	2ef0f8c329	Print "(repairing)" in zpool status again Historically, zpool status prints "(repairing)" for any drives that have errors during a scrub: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 /tmp/file1 ONLINE 13 0 0 (repairing) /tmp/file2 ONLINE 0 0 0 /tmp/file3 ONLINE 0 0 0 This was accidentally broken in "OpenZFS 9166 - zfs storage pool checkpoint" (`d2734cc`). This patch adds it back in. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #7779 Closes #7978	2018-10-09 20:30:32 -07:00
Prakash Surya	54eb2c410e	Verify 'zfs destroy' will unshare the dataset This change adds a new test case to the zfs-test suite to verify that when 'zfs destroy' is used on a shared dataset, the dataset will be unshared after the destroy operation completes. Reviewed by: loli10K <ezomori.nozomu@gmail.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <prakash.surya@delphix.com> Closes #7941	2018-10-03 10:17:58 -07:00
Prakash Surya	1bf490ba93	Fix "zfs destroy" when "sharenfs=on" is used When using "zfs destroy" on a dataset that is using "sharenfs=on" and has been automatically exported (by libzfs), the dataset will not be automatically unexported as it should be. This workflow appears to have been broken by this commit: `3fd3e56cfd` In that change, the "zfs_unmount" function was modified to use the "mnt.mnt_special" field when determining the mount point that is being unmounted, rather than "mnt.mnt_mountp". As a result, when "mntpt" is passed into "zfs_unshare_proto", it's value is now the dataset name rather than the mountpoint. Thus, when this value is used with the "is_shared" function (via "zfs_unshare_proto") it will not find a match (since that function assumes it'll be passed the mountpoint) and incorrectly reports that the dataset is not shared. This can be easily reproduced with the following commands: $ sudo zpool create tank xvdb $ sudo zfs create -o sharenfs=on tank/fish $ sudo zfs destroy tank/fish $ sudo zfs list -r tank NAME USED AVAIL REFER MOUNTPOINT tank 97.5K 7.27G 24K /tank $ sudo exportfs /tank/fish <world> $ sudo cat /etc/dfs/sharetab /tank/fish - nfs rw,crossmnt At this point, the "tank/fish" filesystem doesn't exist, but it's still listed as exported when looking at "exportfs" and "/etc/dfs/sharetab". Also note, this change brings us back in-sync with the illumos code, as it pertains to this one line; on illumos, "mnt.mnt_mountp" is used. Reviewed by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Prakash Surya <prakash.surya@delphix.com> Issue #6143 Closes #7941	2018-10-03 10:17:58 -07:00
Alek P	0c6d09361d	changelist should be able to iter on mounts Modified changelist_gather()ing for the mountpoint property. Now instead of iterating on all dataset descendants, we read /proc/self/mounts and iterate on the mounted descendant datasets only. Switched changelist implementation from a uu_list_* to uu_avl_* in order to reduce changlist code-path's worst case time complexity. Reviewed by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> Closes #7967	2018-10-02 12:30:58 -07:00
Brian Behlendorf	838bd5ff35	ZTS: Fix snapshot_009_pos, snapshot_010_pos Mitigate the likelihood of the newly created volumes being busy when the 'zfs destroy -r' is issued by waiting for udev to settle. Since this is not a iron clad fix I've added the test case to the known list of possible failures and referenced issue #7961. Finally, in the case this test does fail fix the cleanup logic so subsequent tests won't incorrectly fail. Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7961 Closes #7962	2018-10-01 17:15:57 -07:00
John Gallagher	d12614521a	Fixes for procfs files backed by linked lists There are some issues with the way the seq_file interface is implemented for kstats backed by linked lists (zfs_dbgmsgs and certain per-pool debugging info): * We don't account for the fact that seq_file sometimes visits a node multiple times, which results in missing messages when read through procfs. * We don't keep separate state for each reader of a file, so concurrent readers will receive incorrect results. * We don't account for the fact that entries may have been removed from the list between read syscalls, so reading from these files in procfs can cause the system to crash. This change fixes these issues and adds procfs_list, a wrapper around a linked list which abstracts away the details of implementing the seq_file interface for a list and exposing the contents of the list through procfs. Reviewed by: Don Brady <don.brady@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Gallagher <john.gallagher@delphix.com> External-issue: LX-1211 Closes #7819	2018-09-26 11:08:12 -07:00
Gregor Kopka	3ed2fbcc1c	Fix flake 8 style warnings Ran zts-report.py and test-runner.py from ./tests/test-runner/bin/ through the 2to3 (https://docs.python.org/2/library/2to3.html). Checked the result, fixed: - 'maxint' -> 'maxsize' that 2to3 missed. - 'cmp=' parameter for a 'sorted()' with a 'key=' version. - try/except wrapping of configparser import as there are still python 2.7 systems that lack a compatibility shim Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gregor Kopka <gregor@kopka.net> Closes #7925 Closes #7952	2018-09-26 11:02:26 -07:00
Brian Behlendorf	a7165d7255	Revert "Fix flake 8 style warnings" This reverts commit `b8fd4310c5` which accidentally introduced a regression for some versions of python. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #7929	2018-09-24 17:20:42 -07:00
LOLi	36e369ecb8	ZTS: Fix removal_resume_export This change simplify the test case removing part of the logic which was introducing a race condition and thus causing spurious failures: we use attempt_during_removal() from removal.kshlib instead which has been observed to be more stable. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7894 Closes #7913	2018-09-24 12:58:16 -07:00
Gregor Kopka	b8fd4310c5	Fix flake 8 style warnings Ran zts-report.py and test-runner.py from ./tests/test-runner/bin/ through the 2to3 (https://docs.python.org/2/library/2to3.html). Checked the result, fixed: - 'maxint' -> 'maxsize' that 2to3 missed. - 'cmp=' parameter for a 'sorted()' with a 'key=' version. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: John Wren Kennedy <jwk404@gmail.com> Signed-off-by: Gregor Kopka <gregor@kopka.net> Closes #7925 Closes #7929	2018-09-24 10:12:59 -07:00
LOLi	5140a58f3b	zpool should detect invalid fs property on create This change improve the handling of invalid filesystem properties when specified at pool creation: this is useful when 'zpool create -n' (dry run) is executed to detect invalid fs-level options (-O) before the actual command is run. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7620 Closes #7878	2018-09-13 13:37:42 -07:00
Brian Behlendorf	92b432139d	Add removal_resume_export to zts-report.py Add the removal_resume_export test case to the possible failure section of the zts-report.py and reference the Github issue. In the CI environment this test has proven to be unreliable due to the way it detects the removal thread. This is a flaw in the test and not device removal so update the result summary accordingly. Additionally, increase the allowed timeout in an effort to reduce the observed rate of false positves. Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7895 Issue #7894	2018-09-13 13:35:09 -07:00
Roman Strashkin	733b5722b4	zpool split can create a corrupted pool Added vdev_resilver_needed() check to verify VDEVs are fully synced, so that after split the new pool will not be corrupted. Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Roman Strashkin <roman.strashkin@nexenta.com> Closes #7865 Closes #7881	2018-09-12 18:14:42 -07:00
LOLi	0238a9755b	Fix 'zfs allow' for create time permissions When no permission set is defined for a dataset the create time permissions are incorrectly shown as if they were a permission set. This change simply correct how allow permissions are displayed. This commit also fixes a small manpage formatting issue and adds the "zfs_allow_003_pos" test case to the ZFS Test Suite. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7519 Closes #7860	2018-09-06 13:11:21 -07:00
Don Brady	cc99f275a2	Pool allocation classes Allocation Classes add the ability to have allocation classes in a pool that are dedicated to serving specific block categories, such as DDT data, metadata, and small file blocks. A pool can opt-in to this feature by adding a 'special' or 'dedup' top-level VDEV. Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Reviewed-by: Håkan Johansson <f96hajo@chalmers.se> Reviewed-by: Andreas Dilger <andreas.dilger@chamcloud.com> Reviewed-by: DHE <git@dehacked.net> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Gregor Kopka <gregor@kopka.net> Reviewed-by: Kash Pande <kash@tripleback.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #5182	2018-09-05 18:33:36 -07:00
Don Brady	b83a0e2dc1	Add basic zfs ioc input nvpair validation We want newer versions of libzfs_core to run against an existing zfs kernel module (i.e. a deferred reboot or module reload after an update). Programmatically document, via a zfs_ioc_key_t, the valid arguments for the ioc commands that rely on nvpair input arguments (i.e. non legacy commands from libzfs_core). Automatically verify the expected pairs before dispatching a command. This initial phase focuses on the non-legacy ioctls. A follow-on change can address the legacy ioctl input from the zfs_cmd_t. The zfs_ioc_key_t for zfs_keys_channel_program looks like: static const zfs_ioc_key_t zfs_keys_channel_program[] = { {"program", DATA_TYPE_STRING, 0}, {"arg", DATA_TYPE_UNKNOWN, 0}, {"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL}, {"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL}, {"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL}, }; Introduce four input errors to identify specific input failures (in addition to generic argument value errors like EINVAL, ERANGE, EBADF, and E2BIG). ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #7780	2018-09-02 12:14:01 -07:00
Don Brady	e8bcb693d6	Add zfs module feature and property info to sysfs This extends our sysfs '/sys/module/zfs' entry to include feature and property attributes. The primary consumer of this information is user processes, like the zfs CLI, that need to know what the current loaded ZFS module supports. The libzfs binary will consult this information when instantiating the zfs and zpool property tables and the pool features table. This introduces 4 kernel objects (dirs) into '/sys/module/zfs' with corresponding attributes (files): features.runtime features.pool properties.dataset properties.pool Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #7706	2018-09-02 12:09:53 -07:00
Brian Behlendorf	bb91178e60	ZTS: Fix EBUSY volume destroy failures It's possible for an unrelated process, like blkid, to have the volume open when 'zfs destroy' is run. Switch the cleanup functions to the destroy_dataset() helper which handles this case by retrying the destroy when the dataset is busy. This was done not only for volumes but also for file systems for consistency. Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7854	2018-08-31 15:30:44 -07:00
bunder2015	9e7fb6c171	ZTS: pool_checkpoint path cleanup Removing hardcoded paths in pool_checkpoint.kshlib Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7840	2018-08-30 14:45:16 -07:00
Richard Elling	6fa1e1e73a	ZTS: Fix DEV_DSKDIR trim from disk Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes #7848	2018-08-30 14:43:37 -07:00
bunder2015	de61daa597	ZTS: zvol_swap_003 path cleanup Removing hardcoded paths in zvol_swap_003 Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7839	2018-08-30 13:53:06 -07:00
bernie1995	0fe7c953b3	ZTS: path cleanup Removing hardcoded paths in many scripts. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bernie1995 <bernie.pikes@gmail.com> Issue #7507 Closes #7843	2018-08-30 13:46:55 -07:00
Brian Behlendorf	6c6949acae	ZTS: Fix zfs_create_013_pos It's possible for an unrelated process, like blkid, to have the volume open when 'zfs destroy' is run. Switch the cleanup function to the destroy_dataset() helper which handles this case by retrying the destroy when the dataset is busy. Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7847	2018-08-30 13:38:09 -07:00
Tom Caputi	c3bd3fb4ac	OpenZFS 9403 - assertion failed in arc_buf_destroy() Assertion failed in arc_buf_destroy() when concurrently reading block with checksum error. Porting notes: * The ability to zinject decompression errors has been added, but this only works at the zio_decompress() level, where we have all of the info we need to match against the user's zinject options. * The decompress_fault test has been added to test the new zinject functionality * We attempted to set zio_decompress_fail_fraction to (1 << 18) in ztest for further test coverage. Although this did uncover a few low priority issues, this unfortuantely also causes ztest to ASSERT in many locations where the code is working correctly since it is designed to fail on IO errors. Developers can manually set this variable with the '-o' option to find and debug issues. Authored by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Matt Ahrens <mahrens@delphix.com> Ported-by: Tom Caputi <tcaputi@datto.com> OpenZFS-issue: https://illumos.org/issues/9403 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/fa98e487a9 Closes #7822	2018-08-29 11:33:33 -07:00
Tom Caputi	47ab01a18f	Always wait for txg sync when umounting dataset Currently, when unmounting a filesystem, ZFS will only wait for a txg sync if the dataset is dirty and not readonly. However, this can be problematic in cases where a dataset is remounted readonly immediately before being unmounted, which often happens when the system is being shut down. Since encrypted datasets require that all I/O is completed before the dataset is disowned, this issue causes problems when write I/Os leak into the txgs after the dataset is disowned, which can happen when sync=disabled. While looking into fixes for this issue, it was discovered that dsl_dataset_is_dirty() does not return B_TRUE when the dataset has been removed from the txg dirty datasets list, but has not actually been processed yet. Furthermore, the implementation is comletely different from dmu_objset_is_dirty(), adding to the confusion. Rather than relying on this function, this patch forces the umount code path (and the remount readonly code path) to always perform a txg sync on read-write datasets and removes the function altogether. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7753 Closes #7795	2018-08-27 10:16:28 -07:00
Brian Behlendorf	a584ef2605	Direct IO support Direct IO via the O_DIRECT flag was originally introduced in XFS by IRIX for database workloads. Its purpose was to allow the database to bypass the page and buffer caches to prevent unnecessary IO operations (e.g. readahead) while preventing contention for system memory between the database and kernel caches. On Illumos, there is a library function called directio(3C) that allows user space to provide a hint to the file system that Direct IO is useful, but the file system is free to ignore it. The semantics are also entirely a file system decision. Those that do not implement it return ENOTTY. Since the semantics were never defined in any standard, O_DIRECT is implemented such that it conforms to the behavior described in the Linux open(2) man page as follows. 1. Minimize cache effects of the I/O. By design the ARC is already scan-resistant which helps mitigate the need for special O_DIRECT handling. Data which is only accessed once will be the first to be evicted from the cache. This behavior is in consistent with Illumos and FreeBSD. Future performance work may wish to investigate the benefits of immediately evicting data from the cache which has been read or written with the O_DIRECT flag. Functionally this behavior is very similar to applying the 'primarycache=metadata' property per open file. 2. O_DIRECT _MAY_ impose restrictions on IO alignment and length. No additional alignment or length restrictions are imposed. 3. O_DIRECT _MAY_ perform unbuffered IO operations directly between user memory and block device. No unbuffered IO operations are currently supported. In order to support features such as transparent compression, encryption, and checksumming a copy must be made to transform the data. 4. O_DIRECT _MAY_ imply O_DSYNC (XFS). O_DIRECT does not imply O_DSYNC for ZFS. Callers must provide O_DSYNC to request synchronous semantics. 5. O_DIRECT _MAY_ disable file locking that serializes IO operations. Applications should avoid mixing O_DIRECT and normal IO or mmap(2) IO to the same file. This is particularly true for overlapping regions. All I/O in ZFS is locked for correctness and this locking is not disabled by O_DIRECT. However, concurrently mixing O_DIRECT, mmap(2), and normal I/O on the same file is not recommended. This change is implemented by layering the aops->direct_IO operations on the existing AIO operations. Code already existed in ZFS on Linux for bypassing the page cache when O_DIRECT is specified. References: * http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch02s09.html * https://blogs.oracle.com/roch/entry/zfs_and_directio * https://ext4.wiki.kernel.org/index.php/Clarifying_Direct_IO's_Semantics * https://illumos.org/man/3c/directio Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #224 Closes #7823	2018-08-27 10:04:21 -07:00
Joao Carlos Mendes Luis	5d6ad2442b	Fedora 28: Fix misc bounds check compiler warnings Fix a bunch of truncation compiler warnings that show up on Fedora 28 (GCC 8.0.1). Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #7368 Closes #7826 Closes #7830	2018-08-26 12:55:44 -07:00
LOLi	c434d8806c	Stack overflow when destroying deeply nested clones Destroy operations on deeply nested chains of clones can overflow the stack: Depth Size Location (221 entries) ----- ---- -------- 0) 15664 48 mutex_lock+0x5/0x30 1) 15616 8 mutex_lock+0x5/0x30 ... 26) 13576 72 dsl_dataset_remove_clones_key.isra.4+0x124/0x1e0 [zfs] 27) 13504 72 dsl_dataset_remove_clones_key.isra.4+0x18a/0x1e0 [zfs] 28) 13432 72 dsl_dataset_remove_clones_key.isra.4+0x18a/0x1e0 [zfs] ... 185) 2128 72 dsl_dataset_remove_clones_key.isra.4+0x18a/0x1e0 [zfs] 186) 2056 72 dsl_dataset_remove_clones_key.isra.4+0x18a/0x1e0 [zfs] 187) 1984 72 dsl_dataset_remove_clones_key.isra.4+0x18a/0x1e0 [zfs] 188) 1912 136 dsl_destroy_snapshot_sync_impl+0x4e0/0x1090 [zfs] 189) 1776 16 dsl_destroy_snapshot_check+0x0/0x90 [zfs] ... 218) 304 128 kthread+0xdf/0x100 219) 176 48 ret_from_fork+0x22/0x40 220) 128 128 kthread+0x0/0x100 Fix this issue by converting dsl_dataset_remove_clones_key() from recursive to iterative. Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7279 Closes #7810	2018-08-22 11:03:31 -07:00
Tom Caputi	149ce888bb	Fix issues with raw receive_write_byref() This patch fixes 2 issues with raw, deduplicated send streams. The first is that datasets who had been completely received earlier in the stream were not still marked as raw receives. This caused problems when newly received datasets attempted to fetch raw data from these datasets without this flag set. The second problem was that the arc freeze checksum code was not consistent about which locks needed to be held while performing its asserts. The proper locking needed to run these asserts is actually fairly nuanced, since the asserts touch the linked list of buffers (requiring the header lock), the arc_state (requiring the b_evict_lock), and the b_freeze_cksum (requiring the b_freeze_lock). This seems like a large performance sacrifice and a lot of unneeded complexity to verify that this relatively small debug feature is working as intended, so this patch simply removes these asserts instead. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7701	2018-08-20 11:03:56 -07:00
Olaf Faaland	34fe773e30	Skip import activity test in more zdb code paths Since zdb opens the pools read-only, it cannot damage the pool in the event the pool is already imported either on the same host or on another one. If the pool vdev structure is changing while zdb is importing the pool, it may cause zdb to crash. However this is unlikely, and in any case it's a user space process and can simply be run again. For this reason, zdb should disable the multihost activity test on import that is normally run. This commit fixes a few zdb code paths where that had been overlooked. It also adds tests to ensure that several common use cases handle this properly in the future. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Gu Zheng <guzheng2331314@163.com> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7797 Closes #7801	2018-08-20 10:05:23 -07:00
Serapheim Dimitropoulos	a448a2557e	Introduce read/write kstats per dataset The following patch introduces a few statistics on reads and writes grouped by dataset. These statistics are implemented as kstats (backed by aggregate sums for performance) and can be retrieved by using the dataset objset ID number. The motivation for this change is to provide some preliminary analytics on dataset usage/performance. Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #7705	2018-08-20 09:52:37 -07:00
bunder2015	fa84714abb	ZTS: events path cleanup Removing hardcoded paths in events.cfg Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7805	2018-08-18 21:19:41 -07:00
bunder2015	80d45e089c	ZTS: largest_pool_001 path cleanup Removing hardcoded paths in largest_pool_001 Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7804	2018-08-18 21:18:31 -07:00
bunder2015	5468ee7a2f	ZTS: privilege group path cleanup Removing hardcoded paths in privilege group tests Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7803	2018-08-18 21:17:22 -07:00
Brian Behlendorf	089b16f48d	ZTS: Fix import_cache_device_replaced Allow the 'zpool replace' to run slowly without overwhelming the vdev queues by setting zfs_scan_vdev_limit=128k. This limits the number of concurrent slow IOs which need to be handled. The net effect is the test case runs approximately 3x faster putting it well under the 10 minute per-test time limit. Rename import_cache* test cases to imprt_cachefile*. Originally these were renamed due to a maximum tar name limit, this limit was removed by commit `1dfde3d9b`. Replaced instances of /var/tmp in zpool_import.cfg with $TEST_BASE_DIR. Reviewed-by: bunder2015 <omfgbunder@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7765 Closes #7802	2018-08-18 21:16:12 -07:00
Brian Behlendorf	802715b74a	ZTS: Fix reservation_001_pos It's possible for an unrelated process, like blkid, to have the volume open when 'zfs destroy' is run. Switch the cleanup function to the destroy_dataset() helper which handles this case by retrying the destroy when the dataset is busy. Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7796	2018-08-17 10:01:47 -07:00
Brian Behlendorf	1dfde3d9b2	Use posix format for dist tarballs Traditionally Automake has defaulted to the V7 tar format when creating tarballs for distributions. One of the many limitions of this format is a 99 character maximum path + file name limit. This can cause problems when adding new test cases to the ZTS due to the depth of the sub-tree and descriptive test names. This change switches the build system to the posix (aliased as pax) tar format which conforms to the POSIX.1-2001 specification. This format does not suffer from the V7 limitations, was designed to be compatible, and will become the default format in future versions of GNU tar. https://www.gnu.org/software/tar/manual/html_chapter/tar_8.html As part of this change the blockfiles directories which were originally removed due to this limit have been readded. Reviewed by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7767	2018-08-15 09:52:28 -07:00
Tom Caputi	d9c460a0b6	Added encryption support for zfs recv -o / -x One small integration that was absent from b52563 was support for zfs recv -o / -x with regards to encryption parameters. The main use cases of this are as follows: * Receiving an unencrypted stream as encrypted without needing to create a "dummy" encrypted parent so that encryption can be inheritted. * Allowing users to change their keylocation on receive, so long as the receiving dataset is an encryption root. * Allowing users to explicitly exclude or override the encryption property from an unencrypted properties stream, allowing it to be received as encrypted. * Receiving a recursive heirarchy of unencrypted datasets, encrypting the top-level one and forcing all children to inherit the encryption. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7650	2018-08-15 09:48:49 -07:00
bunder2015	64e96969a8	ZTS: delegate group path cleanup Removing hardcoded paths in delegate group tests Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7778	2018-08-13 08:24:02 -07:00
bunder2015	d02fa81c78	ZTS: acl group path cleanup Removing hardcoded paths in acl group tests Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7777	2018-08-13 08:22:41 -07:00
bunder2015	604016054e	ZTS: inuse_004 path cleanup Removing hardcoded path in inuse_004 Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7775	2018-08-13 08:16:55 -07:00
bunder2015	5fba582887	ZTS: projectquota_002 path cleanup Removing hardcoded path in projectquota_002 Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7774	2018-08-13 08:15:53 -07:00
Brian Behlendorf	94b197a0a5	ZTS: Test case reliability * Both cli_root/zpool_import/import_cache_device_replaced, and redundancy/redundancy_004_neg have been observed to fail for spurious reasons ~1% of the time. Add them to the exception list and reference the open Github issue. * Speed up replacement/replacement_001_pos to prevent it from exceeding the 10 minute per test limit and getting KILLED. File vdev creation switched to truncate -s, redundant raidz1 testing pass dropped, fixed some minor formating issues. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7766	2018-08-12 09:38:53 -07:00
LOLi	c8c308362c	Allow inherited properties in zfs_check_settable() This change modifies how 'checksum' and 'dedup' properties are verified in zfs_check_settable() handling the case where they are explicitly inherited in the dataset hierarchy when receiving a recursive send stream. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7755 Closes #7576 Closes #7757	2018-08-03 14:56:25 -07:00
Brian Behlendorf	6da0998f59	ZTS: Fix zfs_create_007_pos It's possible for an unrelated process, like blkid, to have the volume open when 'zfs destroy' is run. Switch the cleanup function to the destroy_dataset() helper which handles this case by retrying the destroy when the dataset is busy. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7763	2018-08-03 10:21:50 -07:00
Paul Dagnelie	492f64e941	OpenZFS 9112 - Improve allocation performance on high-end systems Overview ======== We parallelize the allocation process by creating the concept of "allocators". There are a certain number of allocators per metaslab group, defined by the value of a tunable at pool open time. Each allocator for a given metaslab group has up to 2 active metaslabs; one "primary", and one "secondary". The primary and secondary weight mean the same thing they did in in the pre-allocator world; primary metaslabs are used for most allocations, secondary metaslabs are used for ditto blocks being allocated in the same metaslab group. There is also the CLAIM weight, which has been separated out from the other weights, but that is less important to understanding the patch. The active metaslabs for each allocator are moved from their normal place in the metaslab tree for the group to the back of the tree. This way, they will not be selected for use by other allocators searching for new metaslabs unless all the passive metaslabs are unsuitable for allocations. If that does happen, the allocators will "steal" from each other to ensure that IOs don't fail until there is truly no space left to perform allocations. In addition, the alloc queue for each metaslab group has been broken into a separate queue for each allocator. We don't want to dramatically increase the number of inflight IOs on low-end systems, because it can significantly increase txg times. On the other hand, we want to ensure that there are enough IOs for each allocator to allow for good coalescing before sending the IOs to the disk. As a result, we take a compromise path; each allocator's alloc queue max depth starts at a certain value for every txg. Every time an IO completes, we increase the max depth. This should hopefully provide a good balance between the two failure modes, while not dramatically increasing complexity. We also parallelize the spa_alloc_tree and spa_alloc_lock, which cause very similar contention when selecting IOs to allocate. This parallelization uses the same allocator scheme as metaslab selection. Performance Results =================== Performance improvements from this change can vary significantly based on the number of CPUs in the system, whether or not the system has a NUMA architecture, the speed of the drives, the values for the various tunables, and the workload being performed. For an fio async sequential write workload on a 24 core NUMA system with 256 GB of RAM and 8 128 GB SSDs, there is a roughly 25% performance improvement. Future Work =========== Analysis of the performance of the system with this patch applied shows that a significant new bottleneck is the vdev disk queues, which also need to be parallelized. Prototyping of this change has occurred, and there was a performance improvement, but more work needs to be done before its stability has been verified and it is ready to be upstreamed. Authored by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Alexander Motin <mav@FreeBSD.org> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Gordon Ross <gwr@nexenta.com> Ported-by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Porting Notes: * Fix reservation test failures by increasing tolerance. OpenZFS-issue: https://illumos.org/issues/9112 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3f3cc3c3 Closes #7682	2018-07-31 10:52:33 -07:00
Paul Dagnelie	21d48b5eac	OpenZFS 9438 - Holes can lose birth time info if a block has a mix of birth times As reported by https://github.com/zfsonlinux/zfs/issues/4996, there is yet another hole birth issue. In this one, if a block is entirely holes, but the birth times are not all the same, we lose that information by creating one hole with the current txg as its birth time. The ZoL PR's fix approach is incorrect. Ultimately, the problem here is that when you truncate and write a file in the same transaction group, the dbuf for the indirect block will be zeroed out to deal with the truncation, and then written for the write. During this process, we will lose hole birth time information for any holes in the range. In the case where a dnode is being freed, we need to determine whether the block should be converted to a higher-level hole in the zio pipeline, and if so do it when the dnode is being synced out. Porting Notes: * The DMU_OBJECT_END change in zfs_znode.c was already applied. * Added test cases from #5675 provided by @rincebrain for hole_birth issues. These test cases should be pushed upstream to OpenZFS. * Updated mk_files which is used by several rsend tests so the files created are a little more interesting and may contain holes. Authored by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/9438 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/738e2a3c External-issue: DLPX-46861 Closes #7746	2018-07-30 09:27:49 -07:00
Brian Behlendorf	b719768e35	ZTS: Fix reservation_017_pos It's possible for an unrelated process, like blkid, to have the volume open when 'zfs destroy' is run. Switch the cleanup function to the destroy_dataset() helper which handles this case by retrying the destroy when the dataset is busy. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7750	2018-07-30 09:23:45 -07:00
Brian Behlendorf	e106a7bacb	ZTS: Add reservation_008_pos exception The reservation_008_pos test case has been observed to fail in a non-dangerous way in approximately 5% of automated test runs. Add the test case to the list of possible expected failures until the test case can be made perfectly reliable. Reviewed by: Giuseppe Di Natale <guss80@gmail.com> Reviewed by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #7741 Closes #7742	2018-07-25 10:15:39 -07:00
Brian Behlendorf	d441e85dd7	Add support for autoexpand property While the autoexpand property may seem like a small feature it depends on a significant amount of system infrastructure. Enough of that infrastructure is now in place that with a few modifications for Linux it can be supported. Auto-expand works as follows; when a block device is modified (re-sized, closed after being open r/w, etc) a change uevent is generated for udev. The ZED, which is monitoring udev events, passes the change event along to zfs_deliver_dle() if the disk or partition contains a zfs_member as identified by blkid. From here the device is matched against all imported pool vdevs using the vdev_guid which was read from the label by blkid. If a match is found the ZED reopens the pool vdev. This re-opening is important because it allows the vdev to be briefly closed so the disk partition table can be re-read. Otherwise, it wouldn't be possible to report the maximum possible expansion size. Finally, if the property autoexpand=on a vdev expansion will be attempted. After performing some sanity checks on the disk to verify that it is safe to expand, the primary partition (-part1) will be expanded and the partition table updated. The partition is then re-opened (again) to detect the updated size which allows the new capacity to be used. In order to make all of the above possible the following changes were required: * Updated the zpool_expand_001_pos and zpool_expand_003_pos tests. These tests now create a pool which is layered on a loopback, scsi_debug, and file vdev. This allows for testing of non- partitioned block device (loopback), a partition block device (scsi_debug), and a file which does not receive udev change events. This provided for better test coverage, and by removing the layering on ZFS volumes there issues surrounding layering one pool on another are avoided. * zpool_find_vdev_by_physpath() updated to accept a vdev guid. This allows for matching by guid rather than path which is a more reliable way for the ZED to reference a vdev. * Fixed zfs_zevent_wait() signal handling which could result in the ZED spinning when a signal was not handled. * Removed vdev_disk_rrpart() functionality which can be abandoned in favor of kernel provided blkdev_reread_part() function. * Added a rwlock which is held as a writer while a disk is being reopened. This is important to prevent errors from occurring for any configuration related IOs which bypass the SCL_ZIO lock. The zpool_reopen_007_pos.ksh test case was added to verify IO error are never observed when reopening. This is not expected to impact IO performance. Additional fixes which aren't critical but were discovered and resolved in the course of developing this functionality. * Added PHYS_PATH="/dev/zvol/dataset" to the vdev configuration for ZFS volumes. This is as good as a unique physical path, while the volumes are not used in the test cases anymore for other reasons this improvement was included. Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Sara Hartse <sara.hartse@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #120 Closes #2437 Closes #5771 Closes #7366 Closes #7582 Closes #7629	2018-07-23 15:40:15 -07:00
Matthew Ahrens	2e5dc449c1	OpenZFS 9337 - zfs get all is slow due to uncached metadata This project's goal is to make read-heavy channel programs and zfs(1m) administrative commands faster by caching all the metadata that they will need in the dbuf layer. This will prevent the data from being evicted, so that any future call to i.e. zfs get all won't have to go to disk (very much). There are two parts: The dbuf_metadata_cache. We identify what to put into the cache based on the object type of each dbuf. Caching objset properties os {version,normalization,utf8only,casesensitivity} in the objset_t. The reason these needed to be cached is that although they are queried frequently, they aren't stored in a dbuf type which we can easily recognize and cache in the dbuf layer; instead, we have to explicitly store them. There's already existing infrastructure for maintaining cached properties in the objset setup code, so I simply used that. Performance Testing: - Disabled kmem_flags - Tuned dbuf_cache_max_bytes very low (128K) - Tuned zfs_arc_max very low (64M) Created test pool with 400 filesystems, and 100 snapshots per filesystem. Later on in testing, added 600 more filesystems (with no snapshots) to make sure scaling didn't look different between snapshots and filesystems. Results: \| Test \| Time (trunk / diff) \| I/Os (trunk / diff) \| +------------------------+---------------------+---------------------+ \| zpool import \| 0:05 / 0:06 \| 12.9k / 12.9k \| \| zfs get all (uncached) \| 1:36 / 0:53 \| 16.7k / 5.7k \| \| zfs get all (cached) \| 1:36 / 0:51 \| 16.0k / 6.0k \| Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Thomas Caputi <tcaputi@datto.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Alek Pinchuk <apinchuk@datto.com> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> OpenZFS-issue: https://illumos.org/issues/9337 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7dec52f Closes #7668	2018-07-12 10:49:27 -07:00
Serapheim Dimitropoulos	a7ed98d8b5	OpenZFS 9330 - stack overflow when creating a deeply nested dataset Datasets that are deeply nested (~100 levels) are impractical. We just put a limit of 50 levels to newly created datasets. Existing datasets should work without a problem. The problem can be seen by attempting to create a dataset using the -p option with many levels: panic[cpu0]/thread=ffffff01cd282c20: BAD TRAP: type=8 (#df Double fault) rp=ffffffff fffffffffbc3aa60 unix:die+100 () fffffffffbc3ab70 unix:trap+157d () ffffff00083d7020 unix:_patch_xrstorq_rbx+196 () ffffff00083d7050 zfs:dbuf_rele+2e () ... ffffff00083d7080 zfs:dsl_dir_close+32 () ffffff00083d70b0 zfs:dsl_dir_evict+30 () ffffff00083d70d0 zfs:dbuf_evict_user+4a () ffffff00083d7100 zfs:dbuf_rele_and_unlock+87 () ffffff00083d7130 zfs:dbuf_rele+2e () ... The block above repeats once per directory in the ... ... create -p command, working towards the root ... ffffff00083db9f0 zfs:dsl_dataset_drop_ref+19 () ffffff00083dba20 zfs:dsl_dataset_rele+42 () ffffff00083dba70 zfs:dmu_objset_prefetch+e4 () ffffff00083dbaa0 zfs:findfunc+23 () ffffff00083dbb80 zfs:dmu_objset_find_spa+38c () ffffff00083dbbc0 zfs:dmu_objset_find+40 () ffffff00083dbc20 zfs:zfs_ioc_snapshot_list_next+4b () ffffff00083dbcc0 zfs:zfsdev_ioctl+347 () ffffff00083dbd00 genunix:cdev_ioctl+45 () ffffff00083dbd40 specfs:spec_ioctl+5a () ffffff00083dbdc0 genunix:fop_ioctl+7b () ffffff00083dbec0 genunix:ioctl+18e () ffffff00083dbf10 unix:brand_sys_sysenter+1c9 () Porting notes: * Added zfs_max_dataset_nesting module option with documentation. * Updated zfs_rename_014_neg.ksh for Linux. * Increase the zfs.sh stack warning to 15K. Enough time has passed that 16K can be reasonably assumed to be the default value. It was increased in the 3.15 kernel released in June of 2014. Authored by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@damore.org> OpenZFS-issue: https://www.illumos.org/issues/9330 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/757a75a Closes #7681	2018-07-09 13:02:50 -07:00
Brian Behlendorf	66df02497c	ZTS: clean_mirror and scrub_mirror cleanup Remove the dependency on partitionable devices for the clean_mirror and scrub_mirror test cases. This allows for the setup and cleanup of the test cases to be simplified by removing the need for complex partitioning. This change also resolves a issue where the clean_mirror devices were not being properly damaged since the device name was not a full path. The result being loopX files were being left in the top level test_results directory. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7434 Closes #7690	2018-07-09 12:46:14 -07:00
Serapheim Dimitropoulos	4d044c4c1d	OpenZFS 9238 - ZFS Spacemap Encoding V2 Motivation ========== The current space map encoding has the following disadvantages: [1] Assuming 512 sector size each entry can represent at most 16MB for a segment. This makes the encoding very inefficient for large regions of space. [2] As vdev-wide space maps have started to be used by new features (i.e. device removal, zpool checkpoint) we've started imposing limits in the vdevs that can be used with them based on the maximum addressable offset (currently 64PB for a top-level vdev). New encoding ============ The layout can be found at space_map.h and it remains backwards compatible with the old one. The introduced two-word entry format, besides extending the limits imposed by the single-entry layout, also includes a vdev field and some extra padding after its prefix. The extra padding after the prefix should is reserved for future usage (e.g. new prefixes for future encodings or new fields for flags). The new vdev field not only makes the space maps more self-descriptive, but also opens the doors for pool-wide space maps (expected to be used in the log spacemap project). One final important note is that the number of bits used for vdevs is reduced to 24 bits for blkptrs. That was decided as we don't know of any setups that use more than 16M vdevs for the time being and we wanted to fit the vdev field in the space map. In addition that gives us some extra bits in dva_t. Other references: ================= The new encoding is also discussed towards the end of the Log Space Map presentation from 2017's OpenZFS summit. Link: https://www.youtube.com/watch?v=jj2IxRkl5bQ Authored by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Gordon Ross <gwr@nexenta.com> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-commit: https://github.com/openzfs/openzfs/commit/90a56e6d OpenZFS-issue: https://www.illumos.org/issues/9238 Closes #7665	2018-07-05 12:02:34 -07:00
Ahmed Gahnem	4e82b4be78	OpenZFS 9184 - Add ZFS performance test for fixed blocksize random read/write IO This change introduces a new performance test which does random reads and writes, but instead of using `bssplit` to determine the block size, it uses a fixed blocksize. Additionally, some new IO sizes are added to other tests and timestamp data is recorded with the performance data. Authored by: Ahmed Gahnem <ahmedg@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Ported-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com> Requires-builders: perf OpenZFS-issue: https://www.illumos.org/issues/9184 OpenZFS-commit: https://github.com/openzfs/openzfs/pull/659 External-issue: DLPX-46724 Closes #7660	2018-07-02 13:46:06 -07:00
Brian Behlendorf	e03a41a604	ZTS: Improve enospc tests The enospc_002_pos test case would frequently fail due a command succeeding when it was expected to fail due to lack of space. In order to make this far less likely, files are created across multiple transaction groups in order to consume as many unused blocks as possible. The dependency that the tests run on a partitioned block device has been removed. It's simpler to use sparse files. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7663	2018-06-29 09:40:32 -07:00
Serapheim Dimitropoulos	d2734cce68	OpenZFS 9166 - zfs storage pool checkpoint Details about the motivation of this feature and its usage can be found in this blogpost: https://sdimitro.github.io/post/zpool-checkpoint/ A lightning talk of this feature can be found here: https://www.youtube.com/watch?v=fPQA8K40jAM Implementation details can be found in big block comment of spa_checkpoint.c Side-changes that are relevant to this commit but not explained elsewhere: * renames members of "struct metaslab trees to be shorter without losing meaning * space_map_{alloc,truncate}() accept a block size as a parameter. The reason is that in the current state all space maps that we allocate through the DMU use a global tunable (space_map_blksz) which defauls to 4KB. This is ok for metaslab space maps in terms of bandwirdth since they are scattered all over the disk. But for other space maps this default is probably not what we want. Examples are device removal's vdev_obsolete_sm or vdev_chedkpoint_sm from this review. Both of these have a 1:1 relationship with each vdev and could benefit from a bigger block size. Porting notes: * The part of dsl_scan_sync() which handles async destroys has been moved into the new dsl_process_async_destroys() function. * Remove "VERIFY(!(flags & FWRITE))" in "kernel.c" so zhack can write to block device backed pools. * ZTS: * Fix get_txg() in zpool_sync_001_pos due to "checkpoint_txg". * Don't use large dd block sizes on /dev/urandom under Linux in checkpoint_capacity. * Adopt Delphix-OS's setting of 4 (spa_asize_inflation = SPA_DVAS_PER_BP + 1) for the checkpoint_capacity test to speed its attempts to fill the pool * Create the base and nested pools with sync=disabled to speed up the "setup" phase. * Clear labels in test pool between checkpoint tests to avoid duplicate pool issues. * The import_rewind_device_replaced test has been marked as "known to fail" for the reasons listed in its DISCLAIMER. * New module parameters: zfs_spa_discard_memory_limit, zfs_remove_max_bytes_pause (not documented - debugging only) vdev_max_ms_count (formerly metaslabs_per_vdev) vdev_min_ms_count Authored by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://illumos.org/issues/9166 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7159fdb8 Closes #7570	2018-06-26 10:07:42 -07:00
Tim Chase	88eaf610d9	Use "eval" in history_002_pos for log_must Otherwise the output is consumed by the output redirection. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #7570	2018-06-26 09:58:05 -07:00
Serapheim Dimitropoulos	7637ef8d23	OpenZFS 9591 - ms_shift can be incorrectly changed ms_shift can be incorrectly changed changed in MOS config for indirect vdevs that have been historically expanded According to spa_config_update() we expect new vdevs to have vdev_ms_array equal to 0 and then we go ahead and set their metaslab size. The problem is that indirect vdevs also have vdev_ms_array == 0 because their metaslabs are destroyed once their removal is done. As a result, if a vdev was expanded and then removed may have its ms_shift changed if another vdev was added after its removal. Fortunately this behavior does not cause any type of crash or bad behavior in the kernel but it can confuse zdb and anyone doing any kind of analysis of the history of the pools. Authored by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Ported-by: Tim Chase <tim@chase2k.com> OpenZFS-commit: https://github.com/openzfs/openzfs/pull/651 OpenZFS-issue: https://illumos.org/issues/9591a External-issue: DLPX-58879 Closes #7644	2018-06-21 09:35:26 -07:00
Brian Behlendorf	e4a3297a04	ZTS: Adopt OpenZFS test analysis script Adopt and extend the OpenZFS ZTS results analysis script for use with ZFS on Linux. This allows for automatic analysis of tests which may be skipped for a variety or reasons or which are not entirely reliable. In addition to the list of 'known' failures, which have been updated for ZFS on Linux, there in a new 'maybe' section. This mapping include tests which might be correctly skipped depending on the test environment. This may be because of a missing dependency or lack of required kernel support. This list also includes tests which normally pass but might on occasion fail for a harmless reason. The script was also extended include a reason for why a given test might be skipped or may fail. The reason will be included after the test in the "results other than PASS that are expected" section. For failures it is preferable to set the reason to the GitHub issue number and for skipped tests several generic reasons are available. You may also specify a custom reason if needed. All tests were added back in to the linux.run file even if they are expected to failed. There is value in running tests which may not pass, the expected results for these tests has been encoded in the new analysis script. All tests which were disabled because they ran more slowly on a 32-bit system have been re-enabled. Developers working on 32-bit systems should assess what it reasonable for their environment. The unnecessary dependency on physical block devices was removed for the checksum, grow_pool, and grow_replicas test groups so they are no longer skipped. Updated the filetest_001_pos test case to run properly now that it is enabled and moved the grow tests in to a single directory. Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7638	2018-06-20 14:03:13 -07:00
John Gallagher	917f475fba	Add tunables for channel programs This patch adds tunables for modifying the maximum memory limit and maximum instruction limit that can be specified when running a channel program. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov Reviewed-by: Sara Hartse <sara.hartse@delphix.com> Signed-off-by: John Gallagher <john.gallagher@delphix.com> External-issue: LX-1085 Closes #7618	2018-06-15 15:10:42 -07:00
John Gallagher	ab24877bd3	ZTS: deletes home directories in /export/home In the cleanup for the privilege tests, an empty variable, empty because the corresponding setup is skipped on Linux, results in /export/home being deleted. This patch adds an assertion that the variable is not empty, and causes the cleanup to be skipped on Linux as well. Reviewed by: John Wren Kennedy <jwk404@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: John Gallagher <john.gallagher@delphix.com> External-issue: LX-1099 Closes #7615	2018-06-12 10:42:26 -07:00
bunder2015	5277571260	ZTS: cleanup user_run user_run leaves two files in /tmp, moving them to $TEST_BASE_DIR and adding them to the default cleanup routine. Reviewed by: John Wren Kennedy <jwk404@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7614	2018-06-12 10:37:12 -07:00
Antonio Russo	39042f9736	Tunable directory for zfs runtime scripts zpool and zed place scripts in subdirectories of libexecdir. Some distributions locate architecture independent scripts in other locations (e.g. Debian). To avoid these paths getting out of sync, centralize the definitions. Build zfs-test's default.cfg by Makefile. Use the new directory logic building tests/zfs-tests/include/default.cfg.in. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com> Closes #7597	2018-06-07 09:59:59 -07:00
Antonio Russo	7106b23640	Minor documentation, logging, and testing typos This patch collects some minor inconsistencies and typos in the documentation, logging and testing infrastructure. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com> Closes #7608	2018-06-07 09:38:39 -07:00
Alek P	6969afcefd	Always continue recursive destroy after error Currently, during a recursive zfs destroy the first error that is encountered will stop the destruction of the datasets. Errors may happen for a variety of reasons including competing deletions and busy datasets. This patch switches recursive destroy to always do a best-effort recursive dataset destroy. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> Closes #7574	2018-06-06 10:14:52 -07:00
bunder2015	62841115bc	ZTS: history path cleanup History tests were hard coded to use /tmp and didn't clean up properly after testing. Reviewed by: John Wren Kennedy <jwk404@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Issue #7507 Closes #7600	2018-06-06 10:00:22 -07:00
Tony Hutter	f0ed6c7448	Add pool state /proc entry, "SUSPENDED" pools 1. Add a proc entry to display the pool's state: $ cat /proc/spl/kstat/zfs/tank/state ONLINE This is done without using the spa config locks, so it will never hang. 2. Fix 'zpool status' and 'zpool list -o health' output to print "SUSPENDED" instead of "ONLINE" for suspended pools. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #7331 Closes #7563	2018-06-06 09:33:54 -07:00
John Wren Kennedy	ab44e511e2	OpenZFS 9245 - zfs-test failures: slog_013_pos and slog_014_pos Test 13 would fail because of attempts to zpool destroy -f a pool that was still busy. Changed those calls to destroy_pool which does a retry loop, and the problem is no longer reproducible. Also removed some non functional code in the test which is why it was originally commented out by placing it after the call to log_pass. Test 14 would fail because sometimes the check for a degraded pool would complete before the pool had changed state. Changed the logic to check in a loop with a timeout and the problem is no longer reproducible. Authored by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Yuri Pankov <yuripv@yuripv.net> Reviewed-by: George Melikov <mail@gmelikov.ru> Approved by: Dan McDonald <danmcd@joyent.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Porting Notes: * Re-enabled slog_013_pos.ksh OpenZFS-issue: https://illumos.org/issues/9245 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8f323b5 Closes #7585	2018-06-04 14:55:02 -07:00
Sara Hartse	74d42600d8	zpool reopen should detect expanded devices Update bdev_capacity to have wholedisk vdevs query the size of the underlying block device (correcting for the size of the efi parition and partition alignment) and therefore detect expanded space. Correct vdev_get_stats_ex so that the expandsize is aligned to metaslab size and new space is only reported if it is large enough for a new metaslab. Reviewed by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Wren Kennedy <jwk404@gmail.com> Signed-off-by: sara hartse <sara.hartse@delphix.com> External-issue: LX-165 Closes #7546 Issue #7582	2018-05-31 10:36:37 -07:00
John Wren Kennedy	93491c4bb9	OpenZFS 9082 - Add ZFS performance test targeting ZIL latency This adds a new test to measure ZIL performance. - Adds the ability to induce IO delays with zinject - Adds a new variable (PERF_NTHREADS_PER_FS) to allow fio threads to be distributed to individual file systems as opposed to all IO going to one, as happens elsewhere. - Refactoring of do_fio_run Authored by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: John Wren Kennedy <jwk404@gmail.com> OpenZFS-issue: https://www.illumos.org/issues/9082 OpenZFS-commit: https://github.com/openzfs/openzfs/pull/634 External-issue: DLPX-48625 Closes #7491	2018-05-30 11:59:04 -07:00
Brian Behlendorf	93ce2b4ca5	Update build system and packaging Minimal changes required to integrate the SPL sources in to the ZFS repository build infrastructure and packaging. Build system and packaging: * Renamed SPL_* autoconf m4 macros to ZFS_. Removed redundant SPL_* autoconf m4 macros. * Updated the RPM spec files to remove SPL package dependency. * The zfs package obsoletes the spl package, and the zfs-kmod package obsoletes the spl-kmod package. * The zfs-kmod-devel* packages were updated to add compatibility symlinks under /usr/src/spl-x.y.z until all dependent packages can be updated. They will be removed in a future release. * Updated copy-builtin script for in-kernel builds. * Updated DKMS package to include the spl.ko. * Updated stale AUTHORS file to include all contributors. * Updated stale COPYRIGHT and included the SPL as an exception. * Renamed README.markdown to README.md * Renamed OPENSOLARIS.LICENSE to LICENSE. * Renamed DISCLAIMER to NOTICE. Required code changes: * Removed redundant HAVE_SPL macro. * Removed _BOOT from nvpairs since it doesn't apply for Linux. * Initial header cleanup (removal of empty headers, refactoring). * Remove SPL repository clone/build from zimport.sh. * Use of DEFINE_RATELIMIT_STATE and DEFINE_SPINLOCK removed due to build issues when forcing C99 compilation. * Replaced legacy ACCESS_ONCE with READ_ONCE. * Include needed headers for `current` and `EXPORT_SYMBOL`. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> TEST_ZIMPORT_SKIP="yes" Closes #7556	2018-05-29 16:00:33 -07:00
Tony Nguyen	ba863d0be4	Profiling for perf tests Stack profiling is quite useful and Linux ZFS test suite does not current collect that data. Linux perf is a common tool for this purpose though the perf record data file can be quite large. With this change, Linux ZFS perf tests capture perf record data if perf is installed on the system and PERF_DO_PROFILING environment variable is set. Reviewed by: John Wren Kennedy <jwk404@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com> External-issue: LX-971 Closes #7549	2018-05-22 10:51:46 -07:00
George Melikov	f46209dd6b	Update `tests/README.md` and fix markdown - there are more options now - command examples are more readable in code style Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Signed-off-by: George Melikov <mail@gmelikov.ru> Closes #7538	2018-05-15 09:01:28 -07:00
Brian Behlendorf	ab6a2b5cd7	ZTS: Improve zpool_scrub_004_pos reliability It's possible for the `zpool attach` portion of this test case to complete before the `zpool scrub` can be issued. Update the test case to force the resilvering phase to take longer. Reviewed-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #5444 Closes #7541	2018-05-15 08:58:46 -07:00
Brian Behlendorf	8c64fe0442	ZTS: Update O_TMPFILE support check In CentOS 7.5 the kernel provided a compatibility wrapper to support O_TMPFILE. This results in the test setup script correctly detecting kernel support. But the ZFS module was built without O_TMPFILE support due to the non-standard CentOS kernel interface. Handle this case by updating the setup check to fail either when the kernel or the ZFS module fail to provide support. The reason will be clearly logged in the test results. Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7528	2018-05-14 20:36:30 -07:00
Pavel Zakharov	189bd0b670	OpenZFS 9190 - Fix cleanup routine in import_cachefile_device_replaced.ksh Must clear slow-disk zinject injections in test cleanup routine. Otherwise, when this test fails, it causes most subsequent tests to fail. Authored by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://illumos.org/issues/9190 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/762c6b4 Closes #7530	2018-05-14 14:30:28 -04:00
bunder2015	29badadd4e	Fix shebangs on import tests Incorrect shebangs were used when porting. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7523 Closes #7524	2018-05-11 12:37:44 -07:00
bunder2015	670d74b9ce	ZTS: enospc_002 path cleanup Removing hard-coded path used in enospc_002 Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7515	2018-05-08 21:42:58 -07:00
Tim Chase	f3d28f0a59	Streamline the zpool_import tests Don't create an ext4 file system atop $DEV_DISKDIR/$DISK2. There's likely to not be sufficient space for it to succeed. Instead, simply create the vdev files in the directory where it would have been mounted. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7459	2018-05-08 21:40:13 -07:00
Pavel Zakharov	6cb8e5306d	OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery Some work has been done lately to improve the debugability of the ZFS pool load (and import) process. This includes: 7638 Refactor spa_load_impl into several functions 8961 SPA load/import should tell us why it failed 7277 zdb should be able to print zfs_dbgmsg's To iterate on top of that, there's a few changes that were made to make the import process more resilient and crash free. One of the first tasks during the pool load process is to parse a config provided from userland that describes what devices the pool is composed of. A vdev tree is generated from that config, and then all the vdevs are opened. The Meta Object Set (MOS) of the pool is accessed, and several metadata objects that are necessary to load the pool are read. The exact configuration of the pool is also stored inside the MOS. Since the configuration provided from userland is external and might not accurately describe the vdev tree of the pool at the txg that is being loaded, it cannot be relied upon to safely operate the pool. For that reason, the configuration in the MOS is read early on. In the past, the two configurations were compared together and if there was a mismatch then the load process was aborted and an error was returned. The latter was a good way to ensure a pool does not get corrupted, however it made the pool load process needlessly fragile in cases where the vdev configuration changed or the userland configuration was outdated. Since the MOS is stored in 3 copies, the configuration provided by userland doesn't have to be perfect in order to read its contents. Hence, a new approach has been adopted: The pool is first opened with the untrusted userland configuration just so that the real configuration can be read from the MOS. The trusted MOS configuration is then used to generate a new vdev tree and the pool is re-opened. When the pool is opened with an untrusted configuration, writes are disabled to avoid accidentally damaging it. During reads, some sanity checks are performed on block pointers to see if each DVA points to a known vdev; when the configuration is untrusted, instead of panicking the system if those checks fail we simply avoid issuing reads to the invalid DVAs. This new two-step pool load process now allows rewinding pools accross vdev tree changes such as device replacement, addition, etc. Loading a pool from an external config file in a clustering environment also becomes much safer now since the pool will import even if the config is outdated and didn't, for instance, register a recent device addition. With this code in place, it became relatively easy to implement a long-sought-after feature: the ability to import a pool with missing top level (i.e. non-redundant) devices. Note that since this almost guarantees some loss of data, this feature is for now restricted to a read-only import. Porting notes (ZTS): * Fix 'make dist' target in zpool_import * The maximum path length allowed by tar is 99 characters. Several of the new test cases exceeded this limit resulting in them not being included in the tarball. Shorten the names slightly. * Set/get tunables using accessor functions. * Get last synced txg via the "zfs_txg_history" mechanism. * Clear zinject handlers in cleanup for import_cache_device_replaced and import_rewind_device_replaced in order that the zpool can be exported if there is an error. * Increase FILESIZE to 8G in zfs-test.sh to allow for a larger ext4 file system to be created on ZFS_DISK2. Also, there's no need to partition ZFS_DISK2 at all. The partitioning had already been disabled for multipath devices. Among other things, the partitioning steals some space from the ext4 file system, makes it difficult to accurately calculate the paramters to parted and can make some of the tests fail. * Increase FS_SIZE and FILE_SIZE in the zpool_import test configuration now that FILESIZE is larger. * Write more data in order that device evacuation take lonnger in a couple tests. * Use mkdir -p to avoid errors when the directory already exists. * Remove use of sudo in import_rewind_config_changed. Authored by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://illumos.org/issues/9075 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/619c0123 Closes #7459	2018-05-08 21:35:27 -07:00
Paul Dagnelie	ca0845d59e	OpenZFS 9256 - zfs send space estimation off by > 10% on some datasets Authored by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> Porting Notes: * Added tuning to man page. * Test case changes dropped, default behavior unchanged. OpenZFS-issue: https://www.illumos.org/issues/9256 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/32356b3c56 Closes #7470	2018-05-08 08:59:24 -07:00
LOLi	4ceb8dd6fd	Fix 'zpool create -t <tempname>' Creating a pool with a temporary name fails when we also specify custom dataset properties: this is because we mistakenly call zfs_set_prop_nvlist() on the "real" pool name which, as expected, cannot be found because the SPA is present in the namespace with the temporary name. Fix this by specifying the correct pool name when setting the dataset properties. Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7502 Closes #7509	2018-05-07 21:11:58 -07:00
Brian Behlendorf	c02c1becce	ZTS: Re-enable MMP tests Commit `7fab6361` inadvertently disabled the MMP test cases by creating and not removing an /etc/hostid file in the new zpool_split_props test case. When the file exists the ZTS skips the entire MMP test group rather than modify what may be a system which is already configured. Update the test case to remove the file. Additionally, because the MMP tests were disabled a regression slipped in as part of commit `9eb7b46ed0`. Fix it. Reviewed-by: Tim Chase <tim@chase2k.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7514	2018-05-07 21:08:33 -07:00
bunder2015	a82a4a15be	ZTS: remove dead cleanup code from snapshot tests Caught during path cleanups, the files referenced do not appear to be created or used anywhere. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7508	2018-05-06 21:02:10 -07:00
LOLi	fb0da71fd9	ZTS: Fix zfs_diff_timestamp When using mawk instead of gawk zfs_diff_timestamp fails consistently: this is due to a subtle difference in how mawk handles substr(). From awk(1): --- Finally, here is how mawk handles exceptional cases not discussed in the AWK book or the Posix draft. It is unsafe to assume consistency across awks and safe to skip to the next section. substr(s, i, n) returns the characters of s in the intersection of the closed interval [1, length(s)] and the half-open interval [i, i+n). When this intersection is empty, the empty string is returned; so substr("ABC", 1, 0) = "" and substr("ABC", -4, 6) = "A". --- To support running zfs_diff_timestamp with both gawk and mawk change the second parameter passed to substr(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7503 Closes #7510	2018-05-06 20:42:29 -07:00
Tom Caputi	be9a5c355c	Add support for decryption faults in zinject This patch adds the ability for zinject to trigger decryption and authentication faults in the ZIO and ARC layers. This functionality is exposed via the new "decrypt" error type, which may be provided for "data" object types. This patch also refactors some of the core encryption / decryption functions so that they have consistent prototypes, handle errors consistently, and do not have unused arguments. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7474	2018-05-02 15:36:20 -07:00
Tom Caputi	2c24b5b148	Fix issues found with zfs diff Two deadlocks / ASSERT failures were introduced in `a2c2ed1b` which would occur whenever arc_buf_fill() failed to decrypt a block of data. This occurred because the call to arc_buf_destroy() which was responsible for cleaning up the newly created buffer would attempt to take out the hdr lock that it was already holding. This was resolved by calling the underlying functions directly without retaking the lock. In addition, the dmu_diff() code did not properly ensure that keys were loaded and mapped before begining dataset traversal. It turns out that this code does not need to look at any encrypted values, so the code was altered to perform raw IO only. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7354 Closes #7456	2018-05-01 11:24:20 -07:00
loli10K	85ce3f4fd1	Adopt pyzfs from ClusterHQ This commit introduces several changes: * Update LICENSE and project information * Give a good PEP8 talk to existing Python source code * Add RPM/DEB packaging for pyzfs * Fix some outstanding issues with the existing pyzfs code caused by changes in the ABI since the last time the code was updated * Integrate pyzfs Python unittest with the ZFS Test Suite * Add missing libzfs_core functions: lzc_change_key, lzc_channel_program, lzc_channel_program_nosync, lzc_load_key, lzc_receive_one, lzc_receive_resumable, lzc_receive_with_cmdprops, lzc_receive_with_header, lzc_reopen, lzc_send_resume, lzc_sync, lzc_unload_key, lzc_remap Note: this commit slightly changes zfs_ioc_unload_key() ABI. This allow to differentiate the case where we tried to unload a key on a non-existing dataset (ENOENT) from the situation where a dataset has no key loaded: this is consistent with the "change" case where trying to zfs_ioc_change_key() from a dataset with no key results in EACCES. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7230	2018-05-01 10:33:35 -07:00
LOLi	3cbe89b12a	Fix zfs incremental send remove '-o' properties When receiving an incremental send stream with intermediary snapshots zfs_receive_one() does not correctly identify the top-level dataset: consequently we restore said snapshots as if they were children datasets in the hierarchy, forcing inheritance of any property received with 'zfs send -o' and effectively removing any locally set value. The test case did not correctly verify this situation because it uses adjacent snapshots, basically testing 'zfs send -i' instead of 'zfs send -I': this commit adds an additional intermediary snapshot to the test script. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7478	2018-04-30 20:58:29 -07:00
Antonio Russo	c83ccb3e72	Add test with two kinds of file creation orders Data loss was identified in #7401 when many small files were copied. This adds a reproducer for this bug and other similar ones: randomly generate N files. Then, listing M of them by `ls -U` order, produce those same files in a directory of the same name. This triggers the bug consistently, provided N and M are large enough. Here, N=2^16 and M=2^13. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com> Closes #7411	2018-04-30 12:45:47 -05:00
LOLi	b4555c777a	Fix 'zfs remap <poolname@snapname>' Only filesystems and volumes are valid 'zfs remap' parameters: when passed a snapshot name zfs_remap_indirects() does not handle the EINVAL returned from libzfs_core, which results in failing an assertion and consequently crashing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7454	2018-04-19 09:45:17 -07:00
Chunwei Chen	599b864813	Fix ENOSPC in "Handle zap_add() failures in ..." Commit `cc63068` caused ENOSPC error when copy a large amount of files between two directories. The reason is that the patch limits zap leaf expansion to 2 retries, and return ENOSPC when failed. The intent for limiting retries is to prevent pointlessly growing table to max size when adding a block full of entries with same name in different case in mixed mode. However, it turns out we cannot use any limit on the retry. When we copy files from one directory in readdir order, we are copying in hash order, one leaf block at a time. Which means that if the leaf block in source directory has expanded 6 times, and you copy those entries in that block, by the time you need to expand the leaf in destination directory, you need to expand it 6 times in one go. So any limit on the retry will result in error where it shouldn't. Note that while we do use different salt for different directories, it seems that the salt/hash function doesn't provide enough randomization to the hash distance to prevent this from happening. Since `cc63068` has already been reverted. This patch adds it back and removes the retry limit. Also, as it turn out, failing on zap_add() has a serious side effect for mzap_upgrade(). When upgrading from micro zap to fat zap, it will call zap_add() to transfer entries one at a time. If it hit any error halfway through, the remaining entries will be lost, causing those files to become orphan. This patch add a VERIFY to catch it. Reviewed-by: Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com> Reviewed-by: Richard Yao <ryao@gentoo.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Albert Lee <trisk@forkgnu.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #7401 Closes #7421	2018-04-18 14:19:50 -07:00
Tom Caputi	b0ee5946aa	Fix issues with raw sends of spill blocks This patch fixes 2 issues in how spill blocks are processed during raw sends. The first problem is that compressed spill blocks were using the logical length rather than the physical length to determine how much data to dump into the send stream. The second issue is a typo that caused the spill record's object number to be used where the objset's ID number was required. Both issues have been corrected, and the payload_size is now printed in zstreamdump for future debugging. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7378 Closes #7432	2018-04-17 11:19:03 -07:00
Tom Caputi	e14a32b1c8	Fix object reclaim when using large dnodes Currently, when the receive_object() code wants to reclaim an object, it always assumes that the dnode is the legacy 512 bytes, even when the incoming bonus buffer exceeds this length. This causes a buffer overflow if --enable-debug is not provided and triggers an ASSERT if it is. This patch resolves this issue and adds an ASSERT to ensure this can't happen again. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7097 Closes #7433	2018-04-17 11:13:57 -07:00
bunder2015	b40d45bc6c	ZTS: fix reservation_013_pos integer overflow When using large disks the integers for calculating sizes can overflow past 2**31. Changing to long integers with typeset should correct this. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #4444 Closes #7451	2018-04-17 10:52:53 -07:00
Matt Ahrens	d830d4795a	OpenZFS 9280 - Assertion failure while running removal_with_ganging test with 4K devices Authored by: Matt Ahrens <Matt.Ahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Approved by: Garrett D'Amore <garrett@damore.org> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/9280 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/243952c Closes #7445	2018-04-17 10:44:50 -07:00
bunder2015	3eba666332	ZTS: zpool_create_002 clean up leftover filedisk zpool_create_002_pos did not clean up filedisk files left over from running the test. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7435 Closes #7439	2018-04-15 15:17:44 -07:00
Toomas Soome	5e567da987	OpenZFS 9213 - zfs: sytem typo Authored by: Toomas Soome <tsoome@me.com> Reviewed by: C Fraire <cfraire@me.com> Reviewed by: Andy Fiddaman <omnios@citrus-it.co.uk> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Approved by: Joshua M. Clulow <josh@sysmgr.org> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Porting Notes: * The additional instances of this typo addressed in the OpenZFS patch were already resolved. OpenZFS-issue: https://illumos.org/issues/9213 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/edc8ef7d92 Closes #7436	2018-04-15 10:59:13 -07:00
Matthew Ahrens	a1d477c24c	OpenZFS 7614, 9064 - zfs device evacuation/removal OpenZFS 7614 - zfs device evacuation/removal OpenZFS 9064 - remove_mirror should wait for device removal to complete This project allows top-level vdevs to be removed from the storage pool with "zpool remove", reducing the total amount of storage in the pool. This operation copies all allocated regions of the device to be removed onto other devices, recording the mapping from old to new location. After the removal is complete, read and free operations to the removed (now "indirect") vdev must be remapped and performed at the new location on disk. The indirect mapping table is kept in memory whenever the pool is loaded, so there is minimal performance overhead when doing operations on the indirect vdev. The size of the in-memory mapping table will be reduced when its entries become "obsolete" because they are no longer used by any block pointers in the pool. An entry becomes obsolete when all the blocks that use it are freed. An entry can also become obsolete when all the snapshots that reference it are deleted, and the block pointers that reference it have been "remapped" in all filesystems/zvols (and clones). Whenever an indirect block is written, all the block pointers in it will be "remapped" to their new (concrete) locations if possible. This process can be accelerated by using the "zfs remap" command to proactively rewrite all indirect blocks that reference indirect (removed) vdevs. Note that when a device is removed, we do not verify the checksum of the data that is copied. This makes the process much faster, but if it were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy the wrong data, when we have the correct data on e.g. the other side of the mirror. At the moment, only mirrors and simple top-level vdevs can be removed and no removal is allowed if any of the top-level vdevs are raidz. Porting Notes: * Avoid zero-sized kmem_alloc() in vdev_compact_children(). The device evacuation code adds a dependency that vdev_compact_children() be able to properly empty the vdev_child array by setting it to NULL and zeroing vdev_children. Under Linux, kmem_alloc() and related functions return a sentinel pointer rather than NULL for zero-sized allocations. * Remove comment regarding "mpt" driver where zfs_remove_max_segment is initialized to SPA_MAXBLOCKSIZE. Change zfs_condense_indirect_commit_entry_delay_ticks to zfs_condense_indirect_commit_entry_delay_ms for consistency with most other tunables in which delays are specified in ms. * ZTS changes: Use set_tunable rather than mdb Use zpool sync as appropriate Use sync_pool instead of sync Kill jobs during test_removal_with_operation to allow unmount/export Don't add non-disk names such as "mirror" or "raidz" to $DISKS Use $TEST_BASE_DIR instead of /tmp Increase HZ from 100 to 1000 which is more common on Linux removal_multiple_indirection.ksh Reduce iterations in order to not time out on the code coverage builders. removal_resume_export: Functionally, the test case is correct but there exists a race where the kernel thread hasn't been fully started yet and is not visible. Wait for up to 1 second for the removal thread to be started before giving up on it. Also, increase the amount of data copied in order that the removal not finish before the export has a chance to fail. * MMP compatibility, the concept of concrete versus non-concrete devices has slightly changed the semantics of vdev_writeable(). Update mmp_random_leaf_impl() accordingly. * Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool feature which is not supported by OpenZFS. * Added support for new vdev removal tracepoints. * Test cases removal_with_zdb and removal_condense_export have been intentionally disabled. When run manually they pass as intended, but when running in the automated test environment they produce unreliable results on the latest Fedora release. They may work better once the upstream pool import refectoring is merged into ZoL at which point they will be re-enabled. Authored by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alex Reece <alex@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Richard Laager <rlaager@wiktel.com> Reviewed by: Tim Chase <tim@chase2k.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@damore.org> Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Tim Chase <tim@chase2k.com> OpenZFS-issue: https://www.illumos.org/issues/7614 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb Closes #6900	2018-04-14 12:16:17 -07:00
Tim Chase	4b0f5b2d7b	Wait for resilver after online This test performs a rapid offline/online cycle of each of several mirror vdevs. It can run so quickly that there isn't sufficient pool redundancy to perform an offline. The solution is to wait until the pool is resilvered following the online operation. Also, add a pool sync before the offline operation to help reduce spurious errors. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Issue #6900	2018-04-13 18:04:10 -07:00
Seth Forshee	93b43af10d	Allow mounting datasets more than once Currently mounting an already mounted zfs dataset results in an error, whereas it is typically allowed with other filesystems. This causes some bad interactions with mount namespaces. Take this sequence for example: - Create a dataset - Create a snapshot of the dataset - Create a clone of the snapshot - Create a new mount namespace - Rename the original dataset The rename results in unmounting and remounting the clone in the original mount namespace, however the remount fails because the dataset is still mounted in the new mount namespace. (Note that this means the mount in the new mount namespace is never being unmounted, so perhaps the unmount/remount of the clone isn't actually necessary.) The problem here is a result of the way mounting is implemented in the kernel module. Since it is not mounting block devices it uses mount_nodev() instead of the usual mount_bdev(). However, mount_nodev() is written for filesystems for which each mount is a new instance (i.e. a new super block), and zfs should be able to detect when a mount request can be satisfied using an existing super block. Change zpl_mount() to call sget() directly with it's own test callback. Passing the objset_t object as the fs data allows checking if a superblock already exists for the dataset, and in that case we just need to return a new reference for the sb's root dentry. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Closes #5796 Closes #7207	2018-04-13 10:44:05 -07:00
bunder2015	1e37dee03f	ZTS: clean up leftover ibackup_trunc files zfs_receive_raw_incremental did not clean up ibackup_trunc.* files left over from running the test. Also changing the path of the ibackup files so they can be placed in the correct directories when /var/tmp is not the temporary directory. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: bunder2015 <omfgbunder@gmail.com> Closes #7430	2018-04-13 10:35:55 -07:00
LOLi	7fab636188	Add 'zpool split' coverage to the ZFS Test Suite This change adds five new tests to the ZTS: * zpool_split_cliargs: verify command line options and arguments * zpool_split_devices: verify zpool split accepts a device list * zpool_split_encryption: verify zpool can split encrypted pools * zpool_split_props: verify zpool split can set property values * zpool_split_vdevs: verify vdev layout when splitting the pool Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7409	2018-04-12 10:57:24 -07:00
Tomohiro Kusumi	8111eb4abc	Fix calloc(3) arguments order calloc(3) takes `nelem` (or `nmemb` in glibc) first, and then size of elements. No difference expected for having these in reverse order, however should follow the standard. http://pubs.opengroup.org/onlinepubs/009695399/functions/calloc.html Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com> Closes #7405	2018-04-12 10:50:39 -07:00
Mike Gerdts	d22f3a8244	OpenZFS 9286 - want refreservation=auto Authored by: Mike Gerdts <mike.gerdts@joyent.com> Reviewed by: Allan Jude <allanjude@freebsd.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Andy Stormont <astormont@racktopsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Don Brady <don.brady@delphix.com> Porting Notes: * Adopted destroy_dataset in ZTS test cleanup * Use ksh shebang instead of bash for new tests OpenZFS-issue: https://www.illumos.org/issues/9286 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/723d0c85 Closes #7387	2018-04-11 14:52:13 -07:00
LOLi	9966754ac5	Fix zpool set feature@<feature>=disabled Commit `e4010f2` accidentally allows zpool to set pool features to "disabled"; this should only be allowed at pool creation. This commit adds additional checks and test coverage to 'zpool set'. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7402	2018-04-11 14:45:58 -07:00
Tony Hutter	4f301661df	Revert "Handle zap_add() failures in mixed ... " This reverts commit `cc63068e95`. Under certain circumstances this change can result in an ENOSPC error when adding new files to a directory. See #7401 for full details. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Issue #7401 Cloes #7416	2018-04-09 14:24:46 -07:00
Giuseppe Di Natale	7b47628acb	Clean up (k)shlib and cfg file shebangs Most kshlib files are imported by other scripts and do not have a shebang at the top of their files. Make all kshlib follow this convention. Remove shebangs from cfg files as well. Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Close #7406	2018-04-08 19:37:22 -07:00
Tony Hutter	6c9af9e8f4	Fix "file is executable, but no shebang" warnings Fedora 28's RPM build checks warn when executable files don't have a shebang line. These warnings are caused when we (incorrectly) include data & config files in the_SCRIPTS automake lines. Files in _SCRIPTS are marked executable by automake. This patch fixes the issue by including non-executable scripts in a _DATA line instead. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #7359 Closes #7395	2018-04-06 16:34:21 -07:00
Tom Caputi	1bf9a552bb	Make encrypted "zfs mount -a" failures consistent Currently, "zfs mount -a" will print a warning and fail to mount any encrypted datasets that do not have a key loaded. This patch makes the behavior of this failure consistent with other failure modes ("zfs mount -a" will silently continue, explict "zfs mount" will print a message and return an error code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7382	2018-04-06 13:28:15 -07:00
Olaf Faaland	533ea0415b	Update mmp_delay on sync or skipped, failed write When an MMP write is skipped, or fails, and time since mts->mmp_last_write is already greater than mts->mmp_delay, increase mts->mmp_delay. The original code only updated mts->mmp_delay when a write succeeded, but this results in the write(s) after delays and failed write(s) reporting an ub_mmp_delay which is too low. Update mmp_last_write and mmp_delay if a txg sync was successful. At least one uberblock was written, thus extending the time we can be sure the pool will not be imported by another host. Do not allow mmp_delay to go below (MSEC2NSEC(zfs_multihost_interval) / vdev_count_leaves()) so that a period of frequent successful MMP writes, e.g. due to frequent txg syncs, does not result in an import activity check so short it is not reliable based on mmp thread writes alone. Remove unnecessary local variable, start. We do not use the start time of the loop iteration. Add a debug message in spa_activity_check() to allow verification of the import_delay value and to prove the activity check occurred. Alter the tests that import pools and attempt to detect an activity check. Calculate the expected duration of spa_activity_check() based on module parameters at the time the import is performed, rather than a fixed time set in mmp.cfg. The fixed time may be wrong. Also, use the default zfs_multihost_interval value so the activity check is longer and easier to recognize. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7330	2018-04-04 16:38:44 -07:00
Tony Hutter	21a4f5cc86	Fedora 28: Fix misc bounds check compiler warnings Fix a bunch of (mostly) sprintf/snprintf truncation compiler warnings that show up on Fedora 28 (GCC 8.0.1). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #7361 Closes #7368	2018-04-04 10:16:47 -07:00
LOLi	f119d00c1f	Fix add_nested_replacing_spare test case Use 'zpool reopen' instead of 'zpool scrub' to kick in the spare device: this is required to avoid spurious failures caused by a race condition in events processing by the ZFS Event Daemon: P1 (zpool scrub) P2 (zed) --- zfs_ioc_pool_scan() -> dsl_scan() -> vdev_reopen() -> vdev_set_state(VDEV_STATE_CANT_OPEN) zfs_ioc_vdev_attach() -> spa_vdev_attach() -> dsl_resilver_restart() -> dsl_sync_task() -> dsl_scan_setup_check() <- dsl_scan_setup_check(): EBUSY Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7247 Closes #7342	2018-04-03 17:30:14 -07:00
Andriy Gapon	5e00213e43	OpenZFS 9164 - assert: newds == os->os_dsl_dataset Authored by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Don Brady <don.brady@delphix.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> Porting Notes: * Re-enabled and tweaked the zpool_upgrade_007_pos test case to successfully run in under 5 minutes. OpenZFS-issue: https://www.illumos.org/issues/9164 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/0e776dc06a Closes #6112 Closes #7336	2018-03-30 12:00:40 -07:00
Brian Behlendorf	b2ab468dde	Fix mmap / libaio deadlock Calling uiomove() in mappedread() under the page lock can result in a deadlock if the user space page needs to be faulted in. Resolve the issue by dropping the page lock before the uiomove(). The inode range lock protects against concurrent updates via zfs_read() and zfs_write(). Reviewed-by: Albert Lee <trisk@forkgnu.org> Reviewed-by: Chunwei Chen <david.chen@nutanix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7335 Closes #7339	2018-03-28 10:19:22 -07:00
DeHackEd	668173b576	Remove libattr requirement RHEL/CentOS 6 supports sys/xattr.h eliminating the need for libattr-devel as a dependency. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: DHE <git@dehacked.net> Closes #7344 Closes #7351	2018-03-27 16:51:32 -07:00
Alek P	272b5d730f	Add JSON output support to channel programs The changes piggyback JSON output support on top of channel programs (#6558). This way the JSON output support is targeted to scripting use cases and is easily maintainable since it really only touches one function (zfs_do_channel_program()). This patch ports Joyent's JSON nvlist library from illumos to enable easy JSON printing of channel program output nvlist. To keep the delta small I also took advantage of the fact that printing in zfs_do_channel_program() was almost always done before exiting the program. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> Closes #7281	2018-03-19 12:40:58 -07:00
Stephen Blinick	8a2a9db8df	OpenZFS 9076 - Adjust perf test concurrency settings ZFS Performance test concurrency should be lowered for better latency Work by Stephen Blinick. Nightly performance runs typically consist of two levels of concurrency; and both are fairly high. Since the IO runs are to a ZFS filesystem, within a zpool, which is based on some variable number of vdev's, the amount of IO driven to each device is variable. Additionally, different device types (HDD vs SSD, etc) can generally handle a different amount of concurrent IO before saturating. Nevertheless, in practice, it appears that most tests are well past the concurrency saturation point and therefore both perform with the same throughput, the maximum of the device. Because the queuedepth to the device(s) is so high however, the latency is much higher than the best possible at that throughput, and increases linearly with the increase in concurrency. This means that changes in code that impact latency during normal operation (before saturation) may not be apparent when a large component of the measured latency is from the IO sitting in a queue to be serviced. Therefore, changing the concurrency settings is recommended Authored by: Stephen Blinick <stephen.blinick@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: John Wren Kennedy <jwk404@gmail.com> Ported-by: John Wren Kennedy <jwk404@gmail.com> OpenZFS-issue: https://www.illumos.org/issues/9076 OpenZFS-commit: https://github.com/openzfs/openzfs/pull/562 Upstream bug: DLPX-45477 Closes #7302	2018-03-15 10:51:00 -07:00
Paul Zuchowski	8e5d14844d	zdb and inuse tests don't pass with real disks Due to zpool create auto-partioning in Linux (i.e. sdb1), certain utilities need to use the parition (sdb1) while others use the whole disk name (sdb). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #6939 Closes #7261	2018-03-07 17:03:33 -08:00
Wolfgang Bumiller	0e85048f53	Take user namespaces into account in policy checks Change file related checks to use user namespaces and make sure involved uids/gids are mappable in the current namespace. Note that checks without file ownership information will still not take user namespaces into account, as some of these should be handled via 'zfs allow' (otherwise root in a user namespace could issue commands such as `zpool export`). This also adds an initial user namespace regression test for the setgid bit loss, with a user_ns_exec helper usable in further tests. Additionally, configure checks for the required user namespace related features are added for: * ns_capable * kuid/kgid_has_mapping() * user_ns in cred_t Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Closes #6800 Closes #7270	2018-03-07 15:40:42 -08:00
Brian Behlendorf	434a3375ce	ZTS: fix send-c_stream_size_estimate The test could fail when attempting to write to a newly created volume which was missing its device node. Resolve the issue by calling block_device_wait() which blocks until udev creates the needed entry. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7276 Closes #7277	2018-03-07 09:55:54 -08:00
Giuseppe Di Natale	a07ad58847	Fix dbufstats_001_pos Implement a new helper within_tolerance to test if a value is within range of a target. Because the dbufstats and dbufs kstat file are being read at slightly different times, it is possible for stats to be slightly off. Use within_tolerance to determine if the value is "close enough" to the target. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Closes #7239 Closes #7266	2018-03-07 09:53:04 -08:00
Tony Hutter	639b18944a	Allow to limit zed's syslog chattiness Some usage patterns like send/recv of replication streams can produce a large number of events. In such a case, the current all-syslog.sh zedlet will hold up to its name, and flood the logs with mostly redundant information. Two mitigate this situation, this changeset introduces to new variables ZED_SYSLOG_SUBCLASS_INCLUDE and ZED_SYSLOG_SUBCLASS_EXCLUDE to zed.rc that give more control over which event classes end up in the syslog. Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Daniel Kobras <d.kobras@science-computing.de> Closes #6886 Closes #7260	2018-03-06 15:41:52 -08:00
Olaf Faaland	d2160d0538	Record skipped MMP writes in multihost_history Once per pass through the MMP thread's loop, the vdev tree is walked to find a suitable leaf to write the next MMP block to. If no such leaf is found, the thread sleeps for a while and resumes at the top of the loop. Add an entry to multihost_history when no leaf can be found, and record the reason in the error column. The error code for such entries is a bitfield, displayed in hex: 0x1 At least one vdev (interior or leaf) was not writeable. 0x2 At least one writeable leaf vdev was found, but it had a pending MMP write. timestamp = the time in seconds since the epoch when no leaf could be found originally. duration = the time (in ns) during which no MMP block was written for this reason. This does not include the preceeding inter-write period nor the following inter-write period. vdev_guid = the number of sequential cycles of the MMP thread looop when this occurred. Sample output, truncated to fit: For records of skipped MMP writes the right-most column, vdev_path, is reported as "-". id txg timestamp error duration mmp_delay vdev_guid ... 936 11 1520036441 0 146264 891422313 1740883117838 ... 937 11 1520036441 0 163956 888356657 7320395061548 ... 938 11 1520036442 0 130690 885314969 7320395061548 ... 939 11 1520036442 0 2001068577 882296582 1740883117838 ... 940 11 1520036443 0 161806 882296582 7320395061548 ... 941 11 1520036443 0x2 0 998020546 1 ... 942 11 1520036444 0 136585 998020546 7320395061548 ... 943 11 1520036444 0x2 0 998020257 1 ... 944 11 1520036445 5 2002662964 994160219 1740883117838 ... 945 11 1520036445 0x2 998073118 994160219 3 ... 946 11 1520036447 0 247136 994160219 7320395061548 ... Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7212	2018-03-06 15:15:15 -08:00
Giuseppe Di Natale	c7b55e71b0	Introduce a destroy_dataset helper Datasets can be busy when calling zfs destroy. Introduce a helper function to destroy datasets and use it to destroy datasets in zfs_allow_004_pos, zfs_promote_008_pos, and zfs_destroy_002_pos. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Closes #7224 Closes #7246 Closes #7249 Closes #7267	2018-03-06 14:54:57 -08:00
Tony Hutter	80d52c3919	Change checksum & IO delay ratelimit values Change checksum & IO delay ratelimit thresholds from 5/sec to 20/sec. This allows zed to actually trigger if a bunch of these events arrive in a short period of time (zed has a threshold of 10 events in 10 sec). Previously, if you had, say, 100 checksum errors in 1 sec, it would get ratelimited to 5/sec which wouldn't trigger zed to fault the drive. Also, convert the checksum and IO delay thresholds to module params for easy testing. Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #7252	2018-03-04 17:34:51 -08:00
chrisrd	d0f6fbaff3	ZTS: fix spurious failures in mv_files The test could fail because of a race condition between the files being generated in the background and attempting to move the files. Wait for all file generation to complete before trying to move the files around. Also, clean up the waiting: the 'wait' command without arguments waits for all child pids. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlop <chris@onthe.net.au> Closes #7220 Closes #7242 Closes #7258	2018-03-02 09:57:29 -08:00
John Wren Kennedy	e086e717c3	Add ZFS perf test for dbuf cache This change adds a test for sequential reads out of the dbuf cache. It's essentially a copy of sequential_reads_cached, using a smaller data set. The sequential read tests are renamed to differentiate them. Authored by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com> Closes #7225	2018-02-28 10:38:37 -08:00
John Eismeier	d699aaef09	Fix some typos Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: George Melikov <mail@gmelikov.ru> Signed-off-by: John Eismeier <john.eismeier@gmail.com> Closes #7237	2018-02-28 08:57:10 -08:00
Scot W. Stevenson	19528cf949	Add Python 3 rewrite of arc_summary.py Add new script arc_summary3.py as a complete rewrite of the arc_summary.py tool (see issue #6873) Add new options: -g/--graph - Display crude graphic representation of ARC status and quit -r/--raw - Print all available information as minimally formatted list (for grep) -s/--section - Print a single section. This replaces -p/--page, which is kept for backwards use but marked as depreciated Add new sections with information on ZIL and SPL. Notify user if sections L2ARC and VDEV are skipped instead of failing silently. Add warning that -p/--page option is depreciated. Developed for Python 3.5. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com> Closes #6873 Closes #6892	2018-02-28 08:52:34 -08:00
LOLi	4af6873af6	Fix segfault in zfs_do_bookmark() When invoked with wrong parameters 'zfs bookmark' fails to gracefully validate user input and crashes. This is a regression accidentally introduced in 587e228; this commit adds additional tests to the ZFS Test Suite to exercise this codepath. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: KireinaHoro <i@jsteward.moe> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7228 Closes #7229	2018-02-26 09:55:18 -08:00
Brian Behlendorf	2a0428f16b	ZTS: Fix zfs_share_* test case failures Prevent false positives when running the zfs_share_* test cases due to leftover stale /var/lib/nfs/etab entries. When starting the test group re-synchronize the /var/lib/nfs/etab file with /etc/exports. At this point in the testing there will be no additional `zfs share` entries to add. Reviewed by: George Melikov <mail@gmelikov.ru> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7226	2018-02-24 10:07:12 -08:00
Tony Hutter	bf95a000c4	Add scrub after resilver zed script * Add a zed script to kick off a scrub after a resilver. The script is disabled by default. * Add a optional $PATH (-P) option to zed to allow it to use a custom $PATH for its zedlets. This is needed when you're running zed under the ZTS in a local workspace. * Update test scripts to not copy in all-debug.sh and all-syslog.sh by default. They can be optionally copied in as part of zed_setup(). These scripts slow down zed considerably under heavy events loads and can cause events to be dropped or their delivery delayed. This was causing some sporadic failures in the 'fault' tests. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #4662 Closes #7086	2018-02-23 11:38:05 -08:00
LOLi	faa97c1619	Want 'zfs send -b' This change implements 'zfs send -b' which can be used to send only received property values whether or not they are overridden by local settings. This can be very useful during "restore" operations from a backup pool because it allows to send only the property values originally sent from the backup source, even though they were later modified on the destination either by a 'zfs set' operation, explicit 'zfs inherit' or overridden during the receive process via 'zfs receive -o\|-x'. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7156	2018-02-21 12:32:06 -08:00
Tom Caputi	b0918402dc	Raw receive should change key atomically Currently, raw zfs sends transfer the encrypted master keys and objset_phys_t encryption parameters in the DRR_BEGIN payload of each send file. Both of these are processed as soon as they are read in dmu_recv_stream(), meaning that the new keys are set before the new snapshot is received. In addition to the fact that this changes the user's keys for the dataset earlier than they might expect, the keys were never reset to what they originally were in the event that the receive failed. This patch splits the processing into objset handling and key handling, the later of which is moved to dmu_recv_end() so that they key change can be done atomically. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7200	2018-02-21 12:31:03 -08:00
Tom Caputi	b1d217338a	Raw receives must compress metadnode blocks Currently, the DMU relies on ZIO layer compression to free LO dnode blocks that no longer have objects in them. However, raw receives disable all compression, meaning that these blocks can never be freed. In addition to the obvious space concerns, this could also cause incremental raw receives to fail to mount since the MAC of a hole is different from that of a completely zeroed block. This patch corrects this issue by adding a special case in zio_write_compress() which will attempt to compress these blocks to a hole even if ZIO_FLAG_RAW_ENCRYPT is set. This patch also removes the zfs_mdcomp_disable tunable, since tuning it could cause these same issues. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7198	2018-02-21 12:28:52 -08:00
Giuseppe Di Natale	f2c0dee23b	Correct count_uberblocks in mmp.kshlib A log_must call was causing count_uberblocks to return more than just the uberblock count. Remove the log_must since it was only logging a sleep. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Closes #7191	2018-02-20 16:28:52 -08:00
Nasf-Fan	9c5167d19f	Project Quota on ZFS Project quota is a new ZFS system space/object usage accounting and enforcement mechanism. Similar as user/group quota, project quota is another dimension of system quota. It bases on the new object attribute - project ID. Project ID is a numerical value to indicate to which project an object belongs. An object only can belong to one project though you (the object owner or privileged user) can change the object project ID via 'chattr -p' or 'zfs project [-s] -p' explicitly. The object also can inherit the project ID from its parent when created if the parent has the project inherit flag (that can be set via 'chattr +P' or 'zfs project -s [-p]'). By accounting the spaces/objects belong to the same project, we can know how many spaces/objects used by the project. And if we set the upper limit then we can control the spaces/objects that are consumed by such project. It is useful when multiple groups and users cooperate for the same project, or a user/group needs to participate in multiple projects. Support the following commands and functionalities: zfs set projectquota@project zfs set projectobjquota@project zfs get projectquota@project zfs get projectobjquota@project zfs get projectused@project zfs get projectobjused@project zfs projectspace zfs allow projectquota zfs allow projectobjquota zfs allow projectused zfs allow projectobjused zfs unallow projectquota zfs unallow projectobjquota zfs unallow projectused zfs unallow projectobjused chattr +/-P chattr -p project_id lsattr -p This patch also supports tree quota based on the project quota via "zfs project" commands set as following: zfs project [-d\|-r] <file\|directory ...> zfs project -C [-k] [-r] <file\|directory ...> zfs project -c [-0] [-d\|-r] [-p id] <file\|directory ...> zfs project [-p id] [-r] [-s] <file\|directory ...> For "df [-i] $DIR" command, if we set INHERIT (project ID) flag on the $DIR, then the proejct [obj]quota and [obj]used values for the $DIR's project ID will be shown as the total/free (avail) resource. Keep the same behavior as EXT4/XFS does. Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by Ned Bass <bass6@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Fan Yong <fan.yong@intel.com> TEST_ZIMPORT_POOLS="zol-0.6.1 zol-0.6.2 master" Change-Id: Ib4f0544602e03fb61fd46a849d7ba51a6005693c Closes #6290	2018-02-13 14:54:54 -08:00
John Wren Kennedy	ba779f7f71	OpenZFS 9004 - Some ZFS tests used files removed with 32 bit kernel Authored by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: George Melikov <mail@gmelikov.ru> Approved by: Dan McDonald <danmcd@joyent.com> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/9004 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/fafe9b241f Closes #7149	2018-02-09 10:28:44 -08:00
sanjeevbagewadi	cc63068e95	Handle zap_add() failures in mixed case mode With "casesensitivity=mixed", zap_add() could fail when the number of files/directories with the same name (varying in case) exceed the capacity of the leaf node of a Fatzap. This results in a ASSERT() failure as zfs_link_create() does not expect zap_add() to fail. The fix is to handle these failures and rollback the transactions. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Chunwei Chen <david.chen@nutanix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com> Closes #7011 Closes #7054	2018-02-09 10:15:53 -08:00
Chunwei Chen	eb9c4532dd	Fix zdb -ed on objset for exported pool zdb -ed on objset for exported pool would failed with: failed to own dataset 'qq/fs0': No such file or directory The reason is that zdb pass objset name to spa_import, it uses that name to create a spa. Later, when dmu_objset_own tries to lookup the spa using real pool name, it can't find one. We fix this by make sure we pass pool name rather than objset name to spa_import. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #7099 Closes #6464	2018-02-09 10:11:34 -08:00
John Wren Kennedy	35e0202fd7	OpenZFS 8965 - zfs_acl_ls_001_pos fails due to no longer supported grep regex The test used \> to detect the end of a string, but this no longer works, so use $ which works as well since the string ends the line anyway. Authored by: John Wren Kennedy <john.kennedy@delphix.com> Reviewed by: Akash Ayare <aayare@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Yuri Pankov <yuripv@icloud.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Dan McDonald <danmcd@joyent.com> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/8965 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/cb1204e444 Closes #7145	2018-02-08 21:29:17 -08:00
Serapheim Dimitropoulos	5b72a38d68	OpenZFS 8677 - Open-Context Channel Programs Authored by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: Don Brady <don.brady@delphix.com> We want to be able to run channel programs outside of synching context. This would greatly improve performance for channel programs that just gather information, as they won't have to wait for synching context anymore. === What is implemented? This feature introduces the following: - A new command line flag in "zfs program" to specify our intention to run in open context. (The -n option) - A new flag/option within the channel program ioctl which selects the context. - Appropriate error handling whenever we try a channel program in open-context that contains zfs.sync* expressions. - Documentation for the new feature in the manual pages. === How do we handle zfs.sync functions in open context? When such a function is found by the interpreter and we are running in open context we abort the script and we spit out a descriptive runtime error. For example, given the script below ... arg = ... fs = arg["argv"][1] err = zfs.sync.destroy(fs) msg = "destroying " .. fs .. " err=" .. err return msg if we run it in open context, we will get back the following error: Channel program execution failed: [string "channel program"]:3: running functions from the zfs.sync submodule requires passing sync=TRUE to lzc_channel_program() (i.e. do not specify the "-n" command line argument) stack traceback: [C]: in function 'destroy' [string "channel program"]:3: in main chunk === What about testing? We've introduced new wrappers for all channel program tests that run each channel program as both (startard & open-context) and expect the appropriate behavior depending on the program using the zfs.sync module. OpenZFS-issue: https://www.illumos.org/issues/8677 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/17a49e15 Closes #6558	2018-02-08 16:05:57 -08:00
Don Brady	fc5d4b6737	Increase code coverage for Lua libraries Add test coverage for lua libraries Remove dead code in Lua implementation Signed-off-by: Don Brady <don.brady@delphix.com>	2018-02-08 15:29:38 -08:00
Don Brady	ee00bfb2e6	Add basic functional tests for zcp user properties Signed-off-by: Don Brady <don.brady@delphix.com>	2018-02-08 15:29:32 -08:00
Chris Williamson	234c91c508	OpenZFS 8600 - ZFS channel programs - snapshot Authored by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: Don Brady <don.brady@delphix.com> ZFS channel programs should be able to create snapshots. In addition to the base snapshot functionality, this entails extra logic to handle edge cases which were formerly not possible, such as creating then destroying a snapshot in the same transaction sync. OpenZFS-issue: https://www.illumos.org/issues/8600 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/68089b8b	2018-02-08 15:29:24 -08:00
Brad Lewis	af07368986	OpenZFS 8592 - ZFS channel programs - rollback Authored by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: Don Brady <don.brady@delphix.com> ZFS channel programs should be able to perform a rollback. OpenZFS-issue: https://www.illumos.org/issues/8592 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d46b5ed6	2018-02-08 15:29:14 -08:00

1 2 3 4 5 ...

538 Commits