mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-04-17 08:54:52 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	9d639d8799	ZTS: Add zfs_clone_livelist_dedup.ksh to Makefile.am Commit `86b5f4c12` added a new zfs_clone_livelist_dedup.ksh test case but didn't include it in the Makefile.am. This results in the test not being included in the dist tarball so it's never run by the CI. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #12224	2021-06-11 09:21:36 -06:00
наб	2badb3457a	Move properties, parameters, events, and concepts around manual sections The pages moved as follows: zpool-features.{5 => 7} spl{-module-parameters.5 => .4} zfs{-module-parameters.5 => .4} zfs-events.5 => into zpool-events.8 zfsconcepts.{8 => 7} zfsprops.{8 => 7} zpoolconcepts.{8 => 7} zpoolprops.{8 => 7} Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Co-authored-by: Daniel Ebdrup Jensen <debdrup@FreeBSD.org> Closes #12149 Closes #12212	2021-06-09 14:35:30 -07:00
наб	9685f363c3	tests/file_check: remove unused variable Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12187	2021-06-07 20:59:01 -07:00
Serapheim Dimitropoulos	86b5f4c121	Livelist logic should handle dedup blkptrs Update the logic to handle the dedup-case of consecutive FREEs in the livelist code. The logic still ensures that all the FREE entries are matched up with a respective ALLOC by keeping a refcount for each FREE blkptr that we encounter and ensuring that this refcount gets to zero by the time we are done processing the livelist. zdb -y no longer panics when encountering double frees Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #11480 Closes #12177	2021-06-07 13:09:07 -06:00
Rich Ercolani	6c7c7201d9	Quick fixes for two ZTS failures On FreeBSD 14, these two tests started erroring out like the objects they're attempting to examine don't exist. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12165	2021-06-01 15:34:19 -06:00
наб	c3ef9f7528	Turn shellcheck into a normal make target. Fix new files it caught This checks every file it checked (and a few more), but explicitly instead of "if it works it works" best-effort (which wasn't that good anyway) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #10512 Closes #12101	2021-06-01 11:38:49 -07:00
Rich Ercolani	f172c3088f	Correct flaws in arc_summary[23] and their test. The change correctly handles BrokenPipeError and improves the associated tests. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12037 Closes #12036	2021-05-25 20:02:01 -06:00
Christian Schwarz	0989d798fa	ZTS: remove verify_slog_support helper verify_slog_support no longer applies to ZFS since slog support is always available. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Christian Schwarz <me@cschwarz.com> Closes #12092	2021-05-24 14:57:29 -06:00
наб	93ef500388	Don't abuse vfork() According to POSIX.1, "vfork() has the same effect as fork(2), except that the behavior is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value from vfork(), [...], or calls any other function before successfully calling _exit(2) or one of the exec(3) family of functions." These do all three, and work by pure chance (or maybe they don't, but we blisfully don't know). Either way: bad idea to call vfork() from C, unless you're the standard library, and POSIX.1-2008 removes it entirely Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12015	2021-05-21 10:16:06 -07:00
Brian Behlendorf	8fb577ae6d	Fix dRAID sequential resilver silent damage handling This change addresses two distinct scenarios which are possible when performing a sequential resilver to a dRAID pool with vdevs that contain silent unknown damage. Which in this circumstance took the form of the devices being intentionally overwritten with zeros. However, it could also result from a device returning incorrect data while a sequential resilver was in progress. Scenario 1) A sequential resilver is performed while all of the dRAID vdevs are ONLINE and there is silent damage present on the vdev being resilvered. In this case, nothing will be repaired by vdev_raidz_io_done_reconstruct_known_missing() because rc->rc_error isn't set on any of the raid columns. To address this vdev_draid_io_start_read() has been updated to always mark the resilvering column as ESTALE for sequential resilver IO. Scenario 2) Multiple columns contain silent damage for the same block and a sequential resilver is performed. In this case it's impossible to generate the correct data from parity unless all of the damaged columns are being sequentially resilvered (and thus only good data is used to generate parity). This is as expected and there's nothing which can be done about it. However, we need to be careful not to make to situation worse. Since we can't verify the data is actually good without a checksum, we must only repair the devices which are being sequentially resilvered. Otherwise, an incorrect repair to a device which previously contained good data could effectively lock in the damage and make reconstruction impossible. A check for this was added to vdev_raidz_io_done_verified() along with a new test case. Lastly, this change updates the redundancy_draid_spare1 and redundancy_draid_spare3 test cases to be more representative of normal dRAID replacement operation. Specifically, what we care about is that the scrub run after a sequential resilver does not find additional blocks which need repair. This would indicate the sequential resilver failed to rebuild a section of one of the devices. Note also the tests were switched to using the verify_pool() function which still checks for checksum errors. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12061	2021-05-20 15:05:26 -07:00
наб	6fc3099248	Trim excess shellcheck annotations. Widen to all non-Korn scripts Before, make shellcheck checked scripts/{commitcheck,make_gitrev,man-dates,paxcheck,zfs-helpers,zfs, zfs-tests,zimport,zloop}.sh cmd/zed/zed.d/{{all-debug,all-syslog,data-notify,generic-notify, resilver_finish-start-scrub,scrub_finish-notify, statechange-led,statechange-notify,trim_finish-notify, zed-functions}.sh,history_event-zfs-list-cacher.sh.in} cmd/zpool/zpool.d/{dm-deps,iostat,lsblk,media,ses,smart,upath} now it also checks contrib/dracut/{02zfsexpandknowledge/module-setup, 90zfs/{export-zfs,parse-zfs,zfs-needshutdown, zfs-load-key,zfs-lib,module-setup, mount-zfs,zfs-generator}}.sh.in cmd/zed/zed.d/{pool_import-led,vdev_attach-led, resilver_finish-notify,vdev_clear-led}.sh contrib/initramfs/{zfsunlock,hooks/zfs.in,scripts/local-top/zfs} tests/zfs-tests/tests/perf/scripts/prefetch_io.sh scripts/common.sh.in contrib/bpftrace/zfs-trace.sh autogen.sh Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12042	2021-05-20 08:55:23 -07:00
Brian Behlendorf	6217656da3	Revert "Fix raw sends on encrypted datasets when copying back snapshots" Commit `d1d4769` takes into account the encryption key version to decide if the local_mac could be zeroed out. However, this could lead to failure mounting encrypted datasets created with intermediate versions of ZFS encryption available in master between major releases. In order to prevent this situation revert `d1d4769` pending a more comprehensive fix which addresses the mount failure case. Reviewed-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #11294 Issue #12025 Issue #12300 Closes #12033	2021-05-13 10:00:17 -07:00
наб	37086897b0	libzfs: add keylocation=https://, backed by fetch(3) or libcurl Add support for http and https to the keylocation properly to allow encryption keys to be fetched from the specified URL. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Issue #9543 Closes #9947 Closes #11956	2021-05-12 21:21:35 -07:00
Brian Behlendorf	93c8e91fe7	Fix dRAID self-healing short columns When dRAID performs a normal read operation only the data columns in the raid map are read from disk. This is enough information to calculate the checksum, verify it, and return the needed data to the application. It's only in the event of a checksum failure that the additional parity and any empty columns must be read since they are required for parity reconstruction. Reading these additional columns is handled by vdev_raidz_read_all() which calls vdev_draid_map_alloc_empty() to expand the raid_map_t and submit IOs for the missing columns. This all works correctly, but it fails to account for any "short" columns. These are data columns which are padded with a empty skip sector at the end. Since that empty sector is not needed for a normal read it's not read when columns is first read from disk. However, like the parity and empty columns the skip sector is needed to perform reconstruction. The fix is to mark any "short" columns as never being read by clearing the rc_tried flag when expanding the raid_map_t. This will cause the entire column to re-read from disk in the event of a checksum failure allowing the self-healing functionality to repair the block. Note that this only effects the self-healing feature because when scrubbing a pool the parity, data, and empty columns are all read initially to verify their contents. Furthermore, only blocks which contain "short" columns would be effected, and only when the memory backing the skip sector wasn't already zeroed out. This change extends the existing redundancy_raidz.ksh test case to verify self-healing (as well as resilver and scrub). Then applies the same test case to dRAID with a slightly modified version of the test script called redundancy_draid.ksh. The unused variable combrec was also removed from both test cases. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12010	2021-05-08 08:57:25 -07:00
наб	1966e959ca	Replace ZoL with OpenZFS where applicable Afterward, git grep ZoL matches: * README.md: * [ZoL Site](https://zfsonlinux.org) - Correct * etc/default/zfs.in:# ZoL userland configuration. - Changing this would induce a needless upgrade-check, if the user has modified the configuration; this can be updated the next time the defaults change * module/zfs/dmu_send.c: * ZoL < 0.7 does not handle [...] - Before 0.7 is ZoL, so fair enough Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Issue #11956	2021-05-07 17:20:37 -07:00
Ryan Moeller	ccb46cab50	ZTS: Fix xattr_002_neg passing too soon Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11970	2021-04-30 07:37:02 -07:00
наб	6f4e132fec	ZTS: cli_root/zfs_load-key: add separate key files Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Issue: #11956 Closes #11976	2021-04-30 07:31:22 -07:00
Prawn	b0269cd8ce	receive: don't fail inheriting (-x) properties on wrong dataset type Receiving datasets while blanket inheriting properties like zfs receive -x mountpoint can generally be desirable, e.g. to avoid unexpected mounts on backup hosts. Currently this will fail to receive zvols due to the mountpoint property being applicable to filesystems only. This limitation currently requires operators to special-case their minds and tools for zvols. This change gets rid of this limitation for inherit (-x) by Spiting up the dataset type handling: Warnings for inheriting (-x), errors for overriding (-o). Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: InsanePrawn <insane.prawny@gmail.com> Closes #11416 Closes #11840 Closes #11864	2021-04-26 17:23:51 -07:00
Brian Behlendorf	50d9ff93df	ZTS: Improve redundancy test scripts - Add additional logging to provide more information about why the test failed. This including logging more of the individual commands and the contents and differences of the record files on failure. - Updated get_vdevs() to properly exclude all top-level vdevs including raidz3 and draid[1-3]. - Replaced gnudd with dd. This is the only remaining place in the test suite gnudd is used and it shouldn't be needed. - The refill_test_env function expects the pool as the first argument but never sets the pool variable. - Only fill the test pools to 50% of capacity instead of 75% to help speed up the tests. - Fix replace_missing_devs() calculation, MINDEVSIZE should be MINVDEVSIZE. - Fix damage_devs() so it overwrites almost all of the device so we're guaranteed to damage filesystem blocks. - redundancy_stripe.ksh should not use log_mustnot to check if the pool is healthy since the return value may be misinterpreted. Just perform a normal conditional check and log the failure. Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11906	2021-04-18 21:58:36 -07:00
наб	86418090d7	ZTS: add zed_fd_spill to verify the fds ZEDLETs inherit Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #11891	2021-04-15 13:46:05 -07:00
Brian Behlendorf	888700bc6b	ZTS: fix removal_condense_export test case It's been observed in the CI that the required 25% of obsolete bytes in the mapping can be to high a threshold for this test resulting in condensing never being triggered and a test failure. To prevent these failures make the existing zfs_condense_indirect_obsolete_pct tuning available so the obsolete percentage can be reduced from 25% to 5% during this test. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11869	2021-04-11 21:49:13 -07:00
pablofsf	099fa7e475	Allow zfs to send replication streams with missing snapshots A tentative implementation and discussion was done in #5285. According to it a send --skip-missing\|-s flag has been added. In a replication stream, when there are snapshots missing in the hierarchy, if -s is provided print a warning and ignore dataset (and its children) instead of throwing an error Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pablo Correa Gómez <ablocorrea@hotmail.com> Closes #11710	2021-04-11 12:05:35 -07:00
Ryan Moeller	5d508d92d2	ZTS: Improve cleanup in removal_with_export Kill the removal operation on every platform, not just Linux. The test has been fixed and is now stable on FreeBSD. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11856	2021-04-08 21:10:28 -07:00
Ryan Moeller	e778b0485b	Ratelimit deadman zevents as with delay zevents Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11786	2021-04-07 16:23:57 -07:00
matt-fidd	a03b288cf0	zfs get -p only outputs 3 columns if "clones" property is empty get_clones_string currently returns an empty string for filesystem snapshots which have no clones. This breaks parsable `zfs get` output as only three columns are output, instead of 4. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Fiddaman <github@m.fiddaman.uk> Co-authored-by: matt <matt@fiddaman.net> Closes #11837	2021-04-06 16:05:54 -07:00
Brian Behlendorf	ec580225d2	ZTS: pool_checkpoint improvements The pool_checkpoint tests may incorrectly fail because several of them invoke zdb for an imported pool. In this scenario it's not unexpected for zdb to fail if the pool is modified. To resolve this these zdb checks are now done after the pool has been exported. Additionally, the default cleanup functions assumed the pool would be imported when they were run. If this was not the case they're exit early and fail to cleanup all of the test state causing subsequent tests to fail. Add a check to only destroy the pool when it is imported. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11832	2021-04-03 08:33:22 -07:00
Andrea Gelmini	bf169e9f15	Fix various typos Correct an assortment of typos throughout the code base. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #11774	2021-04-02 18:52:15 -07:00
наб	73218f41b4	zed: allow limiting concurrent jobs 200ms time-out is relatively long, but if we already hit the cap, then we'll likely be able to spawn multiple new jobs when we wake up Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #11807	2021-04-02 16:30:53 -07:00
Ryan Moeller	c05eec32a7	Allow pool names that look like Solaris disk names Nothing bad happens if a prefix of your pool name matches a disk name. This is a bit of a silly restriction at this point. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #11781 Closes #11813	2021-04-01 08:49:41 -07:00
Andrew	66e6d3f128	Fix regression in POSIX mode behavior Commit `235a85657` introduced a regression in evaluation of POSIX modes that require group DENY entries in the internal ZFS ACL. An example of such a POSX mode is 007. When write_implies_delete_child is set, then ACE_WRITE_DATA is added to `wanted_dirperms` in prior to calling zfs_zaccess_common(). This occurs is zfs_zaccess_delete(). Unfortunately, when zfs_zaccess_aces_check hits this particular DENY ACE, zfs_groupmember() is checked to determine whether access should be denied, and since zfs_groupmember() always returns B_TRUE on Linux and so this check is failed, resulting ultimately in EPERM being returned. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Andrew Walker <awalker@ixsystems.com> Closes #11760	2021-03-19 22:50:46 -07:00
Palash Gandhi	c23850759f	ZTS: New test for kernel panic induced by redacted send This change adds a new test that covers a bug fix in the binary search in the redacted send resume logic that causes a kernel panic. The bug was fixed in https://github.com/openzfs/zfs/pull/11297. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Palash Gandhi <palash.gandhi@delphix.com> Closes #11764	2021-03-19 22:47:50 -07:00
Ryan Moeller	5638803b6a	ZTS: Add tests for DOS mode attributes Create a new section of tests to run with acltype=off. For now the only test we have is for the DOS mode READONLY attribute on FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11734	2021-03-16 15:00:14 -07:00
Ryan Moeller	9305ff2edf	ZTS: Fix incorrect use of libtest in user_run by xattr_003_neg You can't use user_run to eval ksh functions defined in libtest unless you include libtest in the user shell. Fix xattr_003_neg by: * include libtest in the user shell * then run get_xattr * assert this fails * use variables for filenames so they don't change in the user's shell * don't log the contents of /etc/passwd * cleanup all byproducts Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11185	2021-03-12 16:17:30 -08:00
Ryan Moeller	e0b53a5dbb	ZTS: Use ksh and current environment for user_run The current user_run often does not work as expected. Commands are run in a different shell, with a different environment, and all output is discarded. Simplify user_run to retain the current environment, eliminate eval, and feed the command string into ksh. Enhance the logging for user_run so we can see out and err. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11185	2021-03-12 16:17:01 -08:00
George Wilson	0936981d86	zpool import cachefile improvements Importing a pool using the cachefile is ideal to reduce the time required to import a pool. However, if the devices associated with a pool in the cachefile have changed, then the import would fail. This can easily be corrected by doing a normal import which would then read the pool configuration from the labels. The goal of this change is make importing using a cachefile more resilient and auto-correcting. This is accomplished by having the cachefile import logic automatically fallback to reading the labels of the devices similar to a normal import. The main difference between the fallback logic and a normal import is that the cachefile import logic will only look at the device directories that were originally used when the cachefile was populated. Additionally, the fallback logic will always import by guid to ensure that only the pools in the cachefile would be imported. External-issue: DLPX-71980 Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Wilson <gwilson@delphix.com> Closes #11716	2021-03-12 15:42:27 -08:00
Ryan Moeller	35aa9dc6df	FreeBSD: Fix scope of deadman tunables A few deadman tunables ended up in the wrong sysctl node. Move them to vfs.zfs.deadman.* Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11715	2021-03-11 19:23:24 -08:00
Antonio Russo	b2eebe3ae7	ZTS events_002: Improve speed and reliability events_002 exercises the ZED, ensuring that it neither misses events, nor reporting events twice. On slow test hardware, some of the timeouts are insufficient to allow the ZED to properly settle. Conversely, on fast hardware these same timeouts are too long, unnecessarily slowing the test run. Instead of using a fixed timeout, wait for the expected final event before returning. Additionally, wait with a timeout for unexpected events to avoid missing them if they show up late. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes #11703	2021-03-08 08:42:45 -08:00
Ryan Moeller	b30cd70599	ZTS: Improve cleanup in zpool tests * Restore original kern.corefile value after the test. * Don't leave behind a frozen pool. * Clean up leftover vdev files. * Make zpool_002_pos and zpool_003_pos consistent in their handling of core files while here. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11694	2021-03-07 09:41:01 -08:00
nssrikanth	bedbc13daa	Cancel TRIM / initialize on FAULTED non-writeable vdevs When a device which is actively trimming or initializing becomes FAULTED, and therefore no longer writable, cancel the active TRIM or initialization. When the device is merely taken offline with `zpool offline` then stop the operation but do not cancel it. When the device is brought back online the operation will be resumed if possible. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com> Signed-off-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com> Closes #11588	2021-03-02 10:27:27 -08:00
Brian Behlendorf	3e73ea0c10	ZTS: zpool_trim_start_and_cancel_pos.ksh Several of the TRIM tests were based of the initialize tests and then adapted for TRIM. The zpool_trim_start_and_cancel_pos.ksh test was intended to be one such test but it was overlooked and actually never adapted. Update it accordingly. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11649	2021-02-27 17:19:50 -08:00
Cedric Maunoury	b9c07ec71b	send_iterate_snap : doall send without fromsnap The behavior of a NULL fromsnap was inadvertently changed for a doall send when the send/recv logic in libzfs was updated. Restore the previous behavior by correcting send_iterate_snap() to include all the snapshots in the nvlist for this case. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Cedric Maunoury <cedric.maunoury@gmail.com> Closes #11608	2021-02-24 09:48:58 -08:00
Don Brady	03e02e5b56	Checksum errors may not be counted Fix regression seen in issue #11545 where checksum errors where not being counted or showing up in a zpool event. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #11609	2021-02-19 22:33:15 -08:00
Colm	658fb8020f	Add "compatibility" property for zpool feature sets Property to allow sets of features to be specified; for compatibility with specific versions / releases / external systems. Influences the behavior of 'zpool upgrade' and 'zpool create'. Initial man page changes and test cases included. Brief synopsis: zpool create -o compatibility=off\|legacy\|file[,file...] pool vdev... compatibility = off : disable compatibility mode (enable all features) compatibility = legacy : request that no features be enabled compatibility = file[,file...] : read features from specified files. Only features present in all files will be enabled on the resulting pool. Filenames may be absolute, or relative to /etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d (/etc checked first). Only affects zpool create, zpool upgrade and zpool status. ABI changes in libzfs: * New function "zpool_load_compat" to load and parse compat sets. * Add "zpool_compat_status_t" typedef for compatibility parse status. * Add ZPOOL_PROP_COMPATIBILITY to the pool properties enum * Add ZPOOL_STATUS_COMPATIBILITY_ERR to the pool status enum An initial set of base compatibility sets are included in cmd/zpool/compatibility.d, and the Makefile for cmd/zpool is modified to install these in $pkgdatadir/compatibility.d and to create symbolic links to a reasonable set of aliases. Reviewed-by: ericloewe Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Colm Buckley <colm@tuatha.org> Closes #11468	2021-02-17 21:30:45 -08:00
José Luis Salvador Rufo	aef1830f93	Support uClibc for the tests compilations There are two issues that don't allow ZFS to be compiled using uClibc. `backtrace()`, and `program_invocation_short_name` as a `const`. This patch adds uClibc to the conditionals in the same way there are already for Glibc for `backtrace()`; and removes the external param `program_invocation_short_name` because its only used here for the whole project. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com> Closes #11600	2021-02-16 21:51:46 -08:00
George Melikov	9f8c7e6a76	ZTS: add userspace_send_encrypted.ksh to Makefile All tests need to be included in the Makefiles. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru> Closes #11541	2021-01-28 13:39:38 -08:00
Allan Jude	393e69241e	Add zdb -r <dataset> <object-id \| file> <output> While you can use zdb -R poolname vdev:offset:[<lsize>/]<psize>[:flags] to extract individual DVAs from a vdev, it would be handy for be able copy an entire file out of the pool. Given a file or object number, add support to copy the contents to a file. Useful for debugging and recovery. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com> Closes #11027	2021-01-27 21:36:01 -08:00
George Melikov	b8e6401b79	ZTS: pool_state test check for pool existence in cleanup If there is no scsi_debug module, then this test must be skipped, in this case cleanup routine should be prepared for absent pool. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru> Closes #11534	2021-01-27 17:33:30 -08:00
Matthew Ahrens	62d4287f27	RAIDZ2/3 fails to heal silently corrupted parity w/2+ bad disks When scrubbing, (non-sequential) resilvering, or correcting a checksum error using RAIDZ parity, ZFS should heal any incorrect RAIDZ parity by overwriting it. For example, if P disks are silently corrupted (P being the number of failures tolerated; e.g. RAIDZ2 has P=2), `zpool scrub` should detect and heal all the bad state on these disks, including parity. This way if there is a subsequent failure we are fully protected. With RAIDZ2 or RAIDZ3, a block can have silent damage to a parity sector, and also damage (silent or known) to a data sector. In this case the parity should be healed but it is not. The problem can be noticed by scrubbing the pool twice. Assuming there was no damage concurrent with the scrubs, the first scrub should fix all silent damage, and the second scrub should be "clean" (`zpool status` should not report checksum errors on any disks). If the bug is encountered, then the second scrub will repair the silently-damaged parity that the first scrub failed to repair, and these checksum errors will be reported after the second scrub. Since the first scrub repaired all the damaged data, the bug can not be encountered during the second scrub, so subsequent scrubs (more than two) are not necessary. The root cause of the problem is some code that was inadvertently added to `raidz_parity_verify()` by the DRAID changes. The incorrect code causes the parity healing to be aborted if there is damaged data (`rc_error != 0`) or the data disk is not present (`!rc_tried`). These checks are not necessary, because we only call `raidz_parity_verify()` if we have the correct data (which may have been reconstructed using parity, and which was verified by the checksum). This commit fixes the problem by removing the incorrect checks in `raidz_parity_verify()`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #11489 Closes #11510	2021-01-26 16:05:05 -08:00
Will Andrews	d7265b3309	ZTS: zpool_export test improvements - refactor cleanup routines into common kshlib zpool_export_cleanup func - don't require physical disks to test, just use files Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Will Andrews <will@firepipe.net> Closes #11518	2021-01-26 13:14:04 -08:00
Will Andrews	35ac0ed1fd	ZTS: improve output clarity of check_prop_source Instead of just failing, indicate the expected and actual value and source as a NOTE. Tests using this failed in an earlier version of the changeset and this information helped find the cause. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Will Andrews <will@firepipe.net> Closes #11517	2021-01-25 14:39:58 -08:00
Will Andrews	a57acbb627	ZTS: remove duplicate check_prop_source from zfs_receive There is an identical definition in zfs_set_common.kshlib already. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Will Andrews <will@firepipe.net> Closes #11516	2021-01-25 14:38:19 -08:00
Ryan Moeller	a4acc47e4b	ZTS: Use swapctl to list swap devices on FreeBSD Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11503	2021-01-24 15:56:59 -08:00
Matthew Macy	716408f560	Add basic io_uring test Provide a basic test coverage for io_uring I/O. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11497	2021-01-23 15:42:42 -08:00
Brian Atkinson	d0cd9a5cc6	Extending FreeBSD UIO Struct In FreeBSD the struct uio was just a typedef to uio_t. In order to extend this struct, outside of the definition for the struct uio, the struct uio has been embedded inside of a uio_t struct. Also renamed all the uio_* interfaces to be zfs_uio_* to make it clear this is a ZFS interface. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #11438	2021-01-20 21:27:30 -08:00
sterlingjensen	03f036cbcc	Re-apply path sanitizer, as mount(8) still mangles it Prior to util-linux 2.36.2, if a file or directory in the current working directory was named 'dataset' then mount(8) would prepend the current working directory to the dataset. Eventually, we should be able to drop this workaround. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com> Closes #11295 Closes #11462	2021-01-19 11:57:31 -08:00
Antonio Russo	f8c4d63a26	ZTS: avoid piping to special devices As described in #11445, the kernel interface kernel_{read,write} no longer act on special devices. In the ZTS, zfs send and receive are tested by piping to these devices, leading to spurious failures (for positive tests) and may mask errors (for negative tests). Until a more permanent mechanism to address this deficiency is developed, clean up the output from the ZTS by avoiding directly piping to or from /dev/null and /dev/zero. For /dev/zero input, simply use a pipe: `cat </dev/zero \|` . However, for /dev/null output, the shell semantics for pipe failures means that zfs send error codes will be masked by the successful `\| cat >/dev/null` command execution. In that case, use a temporary file under $TEST_BASE_DIR for output in favor. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes #11478	2021-01-19 11:53:35 -08:00
Antonio Russo	8752f7e320	ZTS: avoid race to unmount in zfs_rollback_001 The zfs_rollback_001 test modifies files in a temporary, test dataset repeatedly. Before each iteration, any preexisting dataset is removed, after unmounted with umount -f, if necessary. Add a short delay after the forced unmount, avoiding a race that can prevent zfs destroy from succeeding, leading to a test failure. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes #11451	2021-01-12 17:20:02 -08:00
Toomas Soome	4ba8c6b584	zfs_mount_all_mountpoints: cleanup_all should leave pool root mounted if pool root is not mounted, then zpool umount in next test will leave dataset mountpoint directory around and next zfs mount -a will fail with error: cannot mount '/testpool': directory is not empty Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #11417	2021-01-02 16:54:53 -08:00
Toomas Soome	40ab927ae8	implicit conversion from 'boolean_t' to 'ds_hold_flags_t' Build error on illumos with gcc 10 did reveal: In function 'dmu_objset_refresh_ownership': ../../common/fs/zfs/dmu_objset.c:857:25: error: implicit conversion from 'boolean_t' to 'ds_hold_flags_t' {aka 'enum ds_hold_flags'} [-Werror=enum-conversion] 857 \| dsl_dataset_disown(ds, decrypt, tag); \| ^~~~~~~ cc1: all warnings being treated as errors libzfs_input_check.c: In function 'zfs_ioc_input_tests': libzfs_input_check.c:754:28: error: implicit conversion from 'enum dmu_objset_type' to 'enum lzc_dataset_type' [-Werror=enum-conversion] 754 \| err = lzc_create(dataset, DMU_OST_ZFS, NULL, NULL, 0); \| ^~~~~~~~~~~ cc1: all warnings being treated as errors The same issue is present in openzfs, and also the same issue about ds_hold_flags_t, which currently defines exactly one valid value. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #11406	2020-12-27 16:31:02 -08:00
Brian Behlendorf	2844ad60d4	ZTS: Simplify zpool_initialize_verify_initialized Consider the test to be a success as long as the initializing pattern is found at least once per metaslab. This indicates that at least part of the free space was initialized. Ideally we'd check that the pattern was written to all free space but that's much trickier so this check is a reasonable compromise. Using a here-string to feed the loop in this test causes an empty string to still trigger the loop so we miss the `spacemaps=0` case. Pipe into the loop instead. While here, we can use `zpool wait -t initialize $TESTPOOL` to wait for the pool to initialize. Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11365	2020-12-18 08:42:59 -08:00
Matthew Ahrens	71e4ce0e52	special device removal space accounting fixes The space in special devices is not included in spa_dspace (or dsl_pool_adjustedsize(), or the zfs `available` property). Therefore there is always at least as much free space in the normal class, as there is allocated in the special class(es). And therefore, there is always enough free space to remove a special device. However, the checks for free space when removing special devices did not take this into account. This commit corrects that. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #11329	2020-12-17 12:11:56 -08:00
sterlingjensen	fb188409f1	Use the correct return type for getopt Use the correct return type for getopt otherwise clang complains about tautological-constant-out-of-range-compare. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com> Closes #11359	2020-12-17 10:19:30 -08:00
George Amanakis	c76a40bfda	Fix reporting of CKSUM errors in indirect vdevs When removing and subsequently reattaching a vdev, CKSUM errors may occur as vdev_indirect_read_all() reads from all children of a mirror in case of a resilver. Fix this by checking whether a child is missing the data and setting a flag (ic_error) which is then checked in vdev_indirect_repair() and suppresses incrementing the checksum counter. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #11277	2020-12-11 12:15:37 -08:00
Attila Fülöp	b9916b4064	ZTS: three small follow up fixes for #11167 Follow up fix for `0cb40fa3`. Remove unused variables, don't source unused libs and add missed cleanup. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #11311	2020-12-09 21:27:12 -08:00
sterlingjensen	1e4667af32	Drop path prefix workaround Canonicalization, the source of the trouble, was disabled in `9000a9f`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com> Closes #11295	2020-12-09 21:24:26 -08:00
George Melikov	8e8fdce682	ZTS: zpool_trim tests throttle trim process Otherwise trim may finish before progress checks. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru> Closes #11296	2020-12-07 10:06:10 -08:00
Brian Behlendorf	81638c999d	ZTS: Update zfs_share_concurrent_shares.ksh Occasionally an out of memory error is hit by this test case when mounting the filesystems. Try and reduce the likelihood of this occurring by reducing the thread count from 100 to 50. It also has the advantage of slightly speeding up the test. cannot mount 'testpool/testfs3/79': Cannot allocate memory filesystem successfully created, but not mounted Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11283	2020-12-06 09:47:33 -08:00
George Amanakis	d1d47691c2	Fix raw sends on encrypted datasets when copying back snapshots When sending raw encrypted datasets the user space accounting is present when it's not expected to be. This leads to the subsequent mount failure due a checksum error when verifying the local mac. Fix this by clearing the OBJSET_FLAG_USERACCOUNTING_COMPLETE and reset the local mac. This allows the user accounting to be correctly updated on first mount using the normal upgrade process. Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-By: Tom Caputi <caputit1@tcnj.edu> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #10523 Closes #11221	2020-12-04 14:34:29 -08:00
Attila Fülöp	0cb40fa389	zpool: Dryrun fails to list some devices `zpool create -n` fails to list cache and spare vdevs. `zpool add -n` fails to list spare devices. `zpool split -n` fails to list `special` and `dedup` labels. `zpool add -n` and `zpool split -n` shouldn't list hole devices. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #11122 Closes #11167	2020-12-04 14:04:39 -08:00
Ryan Moeller	4b6e2a5a33	Add -u option to 'zfs create' Add -u option to 'zfs create' that prevents file system from being automatically mounted. This is similar to the 'zfs receive -u'. Authored by: pjd <pjd@FreeBSD.org> FreeBSD-commit: freebsd/freebsd@35c58230e2 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Ported-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11254	2020-12-04 14:01:42 -08:00
loli10K	4072f465bc	Fix 'zfs userspace' for received datasets in encrypted root For encrypted receives, where user accounting is initially disabled on creation, both 'zfs userspace' and 'zfs groupspace' fails with EOPNOTSUPP: this is because dmu_objset_id_quota_upgrade_cb() forgets to set OBJSET_FLAG_USERACCOUNTING_COMPLETE on the objset flags after a successful dmu_objset_space_upgrade(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #9501 Closes #9596	2020-11-16 09:10:29 -08:00
Brian Behlendorf	b2255edcc0	Distributed Spare (dRAID) Feature This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid\|raidz\|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <he.huang@intel.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Co-authored-by: Don Brady <don.brady@delphix.com> Co-authored-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mmaybee@cray.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10102	2020-11-13 13:51:51 -08:00
Brian Behlendorf	c08d442e45	Linux: Fix mount/unmount when dataset name has a space The custom zpl_show_devname() helper should translate spaces in to the octal escape sequence \040. The getmntent(2) function is aware of this convention and properly translates the escape character back to a space when reading the fsname. Without this change the `zfs mount` and `zfs unmount` commands incorrectly detect when a dataset with a name containing spaces is mounted. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11182 Closes #11187	2020-11-11 17:14:24 -08:00
Tony Perkins	9bd14b8724	Start snapdir_iterate traversals to begin wtih the value of zero. The microzap hash can sometimes be zero for single digit snapnames. The zap cursor can then have a serialized value of two (for . and ..), and skip the first entry in the avl tree for the .zfs/snapshot directory listing, and therefore does not return all snapshots. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Cedric Berger <cedric@precidata.com> Signed-off-by: Tony Perkins <tperkins@datto.com> Closes #11039	2020-11-11 17:06:16 -08:00
sterlingjensen	a4ae4998cb	Fix memleak in cmd/mount_zfs.c Convert dynamic allocation to static buffer, simplify parse_dataset function return path. Add tests specific to the mount helper. Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com> Closes #11098	2020-11-10 15:50:44 -08:00
Ryan Moeller	52e585a822	ZTS: Add L1 corruption test Add a new test case which corrupts all level 1 block in a file. Then verifies that corruption is detected and repaired. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11141	2020-11-05 17:16:25 -08:00
Ryan Moeller	cce66dfa8e	ZTS: Output all block copies in list_file_blocks The second part of list_file_blocks transforms the object description output by zdb -ddddd $ds $objnum into a stream of lines of the form "level path offset length" for the indirect blocks in the given file. The current code only works for the first copy of L0 blocks. L1 and L2 indirect blocks have more than one copy on disk. Add one more -d to the zdb command so we get all block copies and rewrite the transformation to match more than L0 and output all DVAs. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11141	2020-11-05 17:16:21 -08:00
Ryan Moeller	94deb47872	ZTS: Fix list_file_blocks for mirror vdevs, level > 0 The first part of list_file_blocks transforms the pool configuration output by zdb -C $pool into shell code to set up a shell variable, VDEV_MAP, that maps from vdev id to the underlying vdev path. This variable is a simple indexed array. However, the vdev id in a DVA is only the id of the top level vdev. When the pool is mirrored, the top level vdev is a mirror and its children are the mirrored devices. So, what we need is to map from the top level vdev id to a list of the underlying vdev paths. ist_file_blocks does not need to work for raidz vdevs, so we can disregard that case. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11141	2020-11-05 17:16:16 -08:00
Brian Behlendorf	2d7843401a	ZTS: zdb_block_size_histogram increase variance The expected variance for this test case was originally set at 10% based on local testing. Additional testing via the CI has show it can be as large as 11%. Increase the expected maximum to 12% to prevent this test from incorrectly failing. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11148	2020-11-03 09:20:34 -08:00
Brian Behlendorf	123d2803bf	ZTS: Wait on all events in events_001_pos.ksh The events_001_pos.ksh test case can fail because it's possible, and correct, for the config_sync event to be posted after the last "expected" event. To accommodate this the run_and_verify() function has been updated to wait for all non-history events, not just the last event. This does not increase the run time of the test as long as all the events do get generated. Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11147	2020-11-03 09:19:45 -08:00
Ryan Moeller	76d04993a6	Update references to nonexistent man pages in code Refer to the correct section or alternative for FreeBSD and Linux. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11132	2020-10-30 08:55:59 -07:00
Tony Hutter	f829227f49	ZTS: Fix xattr_004_pos failure, don't use tmpfs Previously, xattr_004_pos would create files with xattrs on both tmpfs and ext2, and then copy them to zfs to verify that their xattrs were preserved. However tmpfs doesn't support xattrs. This was never noticed until Fedora 33. In Fedora 32 and older, /tmp was on the root partition (like ext4), whereas on Fedora 33 /tmp is actually tmpfs. That caused this test to fail on Fedora 33. This fix updates the test to only create the file on ext2, not tmpfs. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #11133	2020-10-30 08:47:42 -07:00
Adam D. Moss	666aa69f32	Non-l2arc pool reads shouldn't be l2arc misses The current l2_misses accounting behavior treats all reads to pools without a configured l2arc as an l2arc miss, IFF there is at least one other pool on the system which does have an l2arc configured. This makes it extremely hard to tune for an improved l2arc hit/miss ratio because this ratio will be modulated by reads from pools which do not (and should not) have l2arc devices; its upper limit will depend on the ratio of reads from l2arc'd pools and non-l2arc'd pools. This PR prevents ARC reads affecting l2arc stats (n.b. l2_misses is the only relevant one) where the target spa doesn't have an l2arc. Includes new test - l2arc_l2miss_pos.ksh Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Adam Moss <c@yotes.com> Closes #10921	2020-10-20 11:39:52 -07:00
Don Brady	13d65987a9	zed syslog entries drop important info ZED will log zevents summaries to the syslog, however the log entries tend to drop event details that can be useful for diagnosis. This is especially true for ereport events, like io, checksum, and delay. Update the all-syslog.sh script to log additional event information. Add an optional config option, ZED_SYSLOG_DISPLAY_GUIDS, to zed.rc for choosing GUIDs over names for pool and vdev. Change the default ZED_SYSLOG_SUBCLASS_EXCLUDE to exclude history_event events. These events tend to be frequent, convey no meaningful info, and are already logged in the zpool history. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #10967	2020-10-19 11:01:00 -07:00
Christian Schwarz	15a4ca4620	Fix crash caused by invalid snapshot names in redactnvl This is a follow up fix for commit `0fdd6106bb`. The VERIFY is only true when we haven't hit an error code path. See added test case for a reproducer. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christian Schwarz <me@cschwarz.com> Closes #11048	2020-10-14 14:04:19 -07:00
Ryan Moeller	485b50bb9e	Cross-platform acltype The acltype property is currently hidden on FreeBSD and does not reflect the NFSv4 style ZFS ACLs used on the platform. This makes it difficult to observe that a pool imported from FreeBSD on Linux has a different type of ACL that is being ignored, and vice versa. Add an nfsv4 acltype and expose the property on FreeBSD. Make the default acltype nfsv4 on FreeBSD. Setting acltype to an unhanded style is treated the same as setting it to off. The ACLs will not be removed, but they will be ignored. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10520	2020-10-13 21:25:48 -07:00
Richard Elling	e9527d44e6	Add zpool_influxdb command A zpool_influxdb command is introduced to ease the collection of zpool statistics into the InfluxDB time-series database. Examples are given on how to integrate with the telegraf statistics aggregator, a companion to influxdb. Finally, a grafana dashboard template is included to show how pool latency distributions can be visualized in a ZFS + telegraf + influxdb + grafana environment. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes #10786	2020-10-09 09:29:21 -07:00
Ryan Moeller	b7ab7ae241	Linux: Initialize zp in zfs_setattr_dir The value of zp is used without having been initialized under some conditions. Initialize the pointer to NULL. Add a regression test case using chown in acl/posix. However, this is not enough because the setup sets xattr=sa, which means zfs_setattr_dir will not be called. Create a second group of acl tests in acl/posix-sa duplicating the acl/posix tests with symlinks, and remove xattr=sa from the original acl/posix tests. This provides more coverage for the default xattr=on code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10043 Closes #11025	2020-10-09 09:27:14 -07:00
Brian Behlendorf	d0249a4bd0	Replace ZFS on Linux references with OpenZFS This change updates the documentation to refer to the project as OpenZFS instead ZFS on Linux. Web links have been updated to refer to https://github.com/openzfs/zfs. The extraneous zfsonlinux.org web links in the ZED and SPL sources have been dropped. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11007	2020-10-08 20:10:13 -07:00
Ryan Moeller	84cfe5d4f8	ZTS: Fix path to /dev/null in nopwrite_recsize Don't direct stdout and stderr of dd to $TEST_BASE_DIR/null, direct it to /dev/null. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11026	2020-10-08 16:39:23 -07:00
Ryan Moeller	73989f4b9e	Make dbufstat work on FreeBSD With procfs_list kstats implemented for FreeBSD, dbufs are now exposed as kstat.zfs.misc.dbufs. On FreeBSD, dbufstats can use the sysctl instead of procfs when no input file has been given. Enable the dbufstats tests on FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #11008	2020-10-08 09:40:23 -07:00
George Amanakis	a76e4e6761	Make L2ARC tests more robust Instead of relying on arbitrary timers after pool export/import or cache device off/online rely on arcstats. This makes the L2ARC tests more robust. Also cleanup some functions related to persistent L2ARC. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam Moss <c@yotes.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #10983	2020-10-05 15:29:05 -07:00
John Poduska	5b525165e9	Mismatched nvlist names in zfs_keys_send_space This causes "zfs send -vt ..." to fail with: cannot resume send: Unknown error 1030 It turns out that some of the name/value pairs in the verification list for zfs_ioc_send_space(), zfs_keys_send_space, had the wrong name, so the ioctl got kicked out in zfs_check_input_nvpairs(). Update the names accordingly. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: John Poduska <jpoduska@datto.com> Closes #10978	2020-10-02 17:40:46 -07:00
Ryan Moeller	c0bd2e0fe2	Drop references when skipping dmu_send due to EXDEV When an invalid incremental send is requested where the "to" ds is before the "from" ds, make sure to drop the reference to the pool and the dataset before returning the error. Add an assert on FreeBSD to make sure we don't hold any locks after returning from an ioctl. Add some test coverage. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10919	2020-09-30 13:19:49 -07:00
George Wilson	c494aa7f57	vdev_ashift should only be set once == Motivation and Context The new vdev ashift optimization prevents the removal of devices when a zfs configuration is comprised of disks which have different logical and physical block sizes. This is caused because we set 'spa_min_ashift' in vdev_open and then later call 'vdev_ashift_optimize'. This would result in an inconsistency between spa's ashift calculations and that of the top-level vdev. In addition, the optimization logical ignores the overridden ashift value that would be provided by '-o ashift=<val>'. == Description This change reworks the vdev ashift optimization so that it's only set the first time the device is configured. It still allows the physical and logical ahsift values to be set every time the device is opened but those values are only consulted on first open. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Cedric Berger <cedric@precidata.com> Signed-off-by: George Wilson <gwilson@delphix.com> External-Issue: DLPX-71831 Closes #10932	2020-09-18 12:13:47 -07:00
Ryan Moeller	7ead2be3d2	Rename acltype=posixacl to acltype=posix Prefer acltype=off\|posix, retaining the old names as aliases. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10918	2020-09-16 12:26:06 -07:00
Toomas Soome	1db9e6e4e4	zfs label bootenv should store data as nvlist nvlist does allow us to support different data types and systems. To encapsulate user data to/from nvlist, the libzfsbootenv library is provided. Reviewed-by: Arvind Sankar <nivedita@alum.mit.edu> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #10774	2020-09-15 15:42:27 -07:00
George Amanakis	085321621e	Add L2ARC arcstats for MFU/MRU buffers and buffer content type Currently the ARC state (MFU/MRU) of cached L2ARC buffer and their content type is unknown. Knowing this information may prove beneficial in adjusting the L2ARC caching policy. This commit adds L2ARC arcstats that display the aligned size (in bytes) of L2ARC buffers according to their content type (data/metadata) and according to their ARC state (MRU/MFU or prefetch). It also expands the existing evict_l2_eligible arcstat to differentiate between MFU and MRU buffers. L2ARC caches buffers from the MRU and MFU lists of ARC. Upon caching a buffer, its ARC state (MRU/MFU) is stored in the L2 header (b_arcs_state). The l2_m{f,r}u_asize arcstats reflect the aligned size (in bytes) of L2ARC buffers according to their ARC state (based on b_arcs_state). We also account for the case where an L2ARC and ARC cached MRU or MRU_ghost buffer transitions to MFU. The l2_prefetch_asize reflects the alinged size (in bytes) of L2ARC buffers that were cached while they had the prefetch flag set in ARC. This is dynamically updated as the prefetch flag of L2ARC buffers changes. When buffers are evicted from ARC, if they are determined to be L2ARC eligible then their logical size is recorded in evict_l2_eligible_m{r,f}u arcstats according to their ARC state upon eviction. Persistent L2ARC: When committing an L2ARC buffer to a log block (L2ARC metadata) its b_arcs_state and prefetch flag is also stored. If the buffer changes its arcstate or prefetch flag this is reflected in the above arcstats. However, the L2ARC metadata cannot currently be updated to reflect this change. Example: L2ARC caches an MRU buffer. L2ARC metadata and arcstats count this as an MRU buffer. The buffer transitions to MFU. The arcstats are updated to reflect this. Upon pool re-import or on/offlining the L2ARC device the arcstats are cleared and the buffer will now be counted as an MRU buffer, as the L2ARC metadata were not updated. Bug fix: - If l2arc_noprefetch is set, arc_read_done clears the L2CACHE flag of an ARC buffer. However, prefetches may be issued in a way that arc_read_done() is bypassed. Instead, move the related code in l2arc_write_eligible() to account for those cases too. Also add a test and update manpages for l2arc_mfuonly module parameter, and update the manpages and code comments for l2arc_noprefetch. Move persist_l2arc tests to l2arc. Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #10743	2020-09-14 10:10:44 -07:00
Don Brady	4f07282786	Avoid posting duplicate zpool events Duplicate io and checksum ereport events can misrepresent that things are worse than they seem. Ideally the zpool events and the corresponding vdev stat error counts in a zpool status should be for unique errors -- not the same error being counted over and over. This can be demonstrated in a simple example. With a single bad block in a datafile and just 5 reads of the file we end up with a degraded vdev, even though there is only one unique error in the pool. The proposed solution to the above issue, is to eliminate duplicates when posting events and when updating vdev error stats. We now save recent error events of interest when posting events so that we can easily check for duplicates when posting an error. Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #10861	2020-09-04 10:34:28 -07:00
Ryan Moeller	964791acdc	Make spa_stats.c tunables visible on FreeBSD Use ZFS_MODULE_PARAM for cross-platform tunables in spa_stats.c, and add update tunables.cfg in tests for the newly supported ones. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10858	2020-09-01 16:19:19 -07:00
Ryan Moeller	7b4e27232d	Add 'zfs rename -u' to rename without remounting Allow to rename file systems without remounting if it is possible. It is possible for file systems with 'mountpoint' property set to 'legacy' or 'none' - we don't have to change mount directory for them. Currently such file systems are unmounted on rename and not even mounted back. This introduces layering violation, as we need to update 'f_mntfromname' field in statfs structure related to mountpoint (for the dataset we are renaming and all its children). In my opinion it is worth it, as it allow to update FreeBSD in even cleaner way - in ZFS-only configuration root file system is ZFS file system with 'mountpoint' property set to 'legacy'. If root dataset is named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone it (system/oldrootfs), update FreeBSD and if it doesn't boot we can boot back from system/oldrootfs and rename it back to system/rootfs while it is mounted as /. Before it was not possible, because unmounting / was not possible. Authored by: Pawel Jakub Dawidek <pjd@FreeBSD.org> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported by: Matt Macy <mmacy@freebsd.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10839	2020-09-01 16:14:16 -07:00
Paul Dagnelie	4aa3b3bd47	Always track temporary fses and snapshots for accounting The root cause of the issue is that we only occasionally do as the comments in the code suggest and actually ignore the %recv dataset when it comes to filesystem limit tracking. Specifically, the only time we ignore it is when initializing the filesystem and snapshot limit values; when creating a new %recv dataset or deleting one, we always update the bookkeeping. This causes a problem if you init the fs count on a filesystem that already has a %recv dataset, since the bookmarking will be decremented but not incremented. This is resolved in this patch by simply always tracking the %recv dataset as a child. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #10791	2020-08-26 21:38:27 -07:00
Ryan Moeller	b0e75805ee	ZTS: Improve block_device_wait on FreeBSD FreeBSD doesn't have an equivalent to udevadm settle, so we have been resorting to a three second sleep to wait for device changes to take effect. This is far from ideal. We are mainly waiting for volmode=geom zvols to appear in /dev, so as a hack, reading the geom config will have the desired effect of quiescing the geom state. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10768	2020-08-24 08:50:15 -07:00
Tony Nguyen	c686c6fe75	ZFS performance tests should clean up NFS mount This change umounts client's NFS mount after each test so we can avoid two sporadic issues: 1) client NFS stale mount and 2) zpool export and zpool destroy failed due to dataset busy Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com> Closes #10767	2020-08-23 15:14:22 -07:00
Don Brady	7bba1d404c	'zfs share -a' should clean noauto exports This is a follow on to PR #10688 where `zfs share -a` allows the sharing of canmount=noauto datasets if they are mounted. However, when a dataset with canmount=noauto is not mounted, the command should also purge any existing entries from the exports file. Otherwise, after a reboot, the nfs server attempts to export the underlying mountpath, not the dataset. This can lead to a hard hang for existing client mounts. Instead of just skipping the adding of an export if not mounted and canmount=noauto, have it also remove an existing export of the dataset so that, after a reboot, we don't export an unmounted dataset. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #10747	2020-08-20 13:12:12 -07:00
Michael Niewöhner	10b3c7f5e4	Add zstd support to zfs This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes #6247 Closes #9024 Closes #10277 Closes #10278	2020-08-20 10:30:06 -07:00
Brian Behlendorf	5266a0728a	ZED: Do not offline a missing device if no spare is available Due to commit `d48091d` a removed device is now explicitly offlined by the ZED if no spare is available, rather than the letting ZFS detect it as UNAVAIL. This broke auto-replacing of whole-disk devices, as described in issue #10577. In short, when a new device is reinserted in the same slot, the ZED will try to ONLINE it without letting ZFS recreate the necessary partition table. This change simply avoids setting the device OFFLINE when removed if no spare is available (or if spare_on_remove is false). This change has been left minimal to allow it to be backported to 0.8.x release. The auto_offline_001_pos ZTS test has been updated accordingly. Some follow up work is planned to update the ZED so it transitions the vdev to a REMOVED state. This is a state which has always existed but there is no current interface the ZED can use to accomplish this. Therefore it's being left to a follow up PR. Reviewed-by: Gionatan Danti <g.danti@assyoma.it> Co-authored-by: Gionatan Danti <g.danti@assyoma.it> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10577 Closes #10730	2020-08-18 22:13:17 -07:00
Allan Jude	fc34dfba8e	Fix L2ARC reads when compressed ARC disabled When reading compressed blocks from the L2ARC, with compressed ARC disabled, arc_hdr_size() returns LSIZE rather than PSIZE, but the actual read is PSIZE. This causes l2arc_read_done() to compare the checksum against the wrong size, resulting in checksum failure. This manifests as an increase in the kstat l2_cksum_bad and the read being retried from the main pool, making the L2ARC ineffective. Add new L2ARC tests with Compressed ARC enabled/disabled Blocks are handled differently depending on the state of the zfs_compressed_arc_enabled tunable. If a block is compressed on-disk, and compressed_arc is enabled: - the block is read from disk - It is NOT decompressed - It is added to the ARC in its compressed form - l2arc_write_buffers() may write it to the L2ARC (as is) - l2arc_read_done() compares the checksum to the BP (compressed) However, if compressed_arc is disabled: - the block is read from disk - It is decompressed - It is added to the ARC (uncompressed) - l2arc_write_buffers() will use l2arc_apply_transforms() to recompress the block, before writing it to the L2ARC - l2arc_read_done() compares the checksum to the BP (compressed) - l2arc_read_done() will use l2arc_untransform() to uncompress it This test writes out a test file to a pool consisting of one disk and one cache device, then randomly reads from it. Since the arc_max in the tests is low, this will feed the L2ARC, and result in reads from the L2ARC. We compare the value of the kstat l2_cksum_bad before and after to determine if any blocks failed to survive the trip through the L2ARC. Sponsored-by: The FreeBSD Foundation Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allanjude@freebsd.org> Closes #10693	2020-08-13 23:31:20 -07:00
George Wilson	53c9d1d9b5	'zfs share -a' should handle 'canmount=noauto' The 'zfs share -a' currently skips any filesystems which have 'canmount=noauto' set. This behavior is unexpected since the one would expect 'zfs share -a' to share any mounted filesystem that has the 'sharenfs' property already set. This changes the behavior of 'zfs share -a' to allow the sharing of 'canmount=noauto' datasets if they are mounted. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Don Brady <don.brady@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Signed-off-by: George Wilson <gwilson@delphix.com> External-issue: DLPX-71313 Closes #10688	2020-08-11 13:55:04 -07:00
Ryan Moeller	d4e6e9597d	ZTS: Remove bashisms from zfs-tests.sh Bring zfs-tests.sh in to compliance with the other scripts by converting it /bin/sh for to avoid a dependency on bash. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10640	2020-08-07 14:10:48 -07:00
Ryan Moeller	b6737193ee	FreeBSD: Fix `zfs jail` and add a test zfs_jail was not using zfs_ioctl so failed to map the IOC number correctly. Use zfs_ioctl to perform the jail ioctl and add a test case for FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10658	2020-08-01 08:44:54 -07:00
Ryan Moeller	af524bf7ff	ZTS: FreeBSD does have a l2arc.trim_ahead tunable Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10633	2020-07-31 21:18:32 -07:00
Ryan Moeller	721ed01548	ZTS: Use POSIX-compatible space character class FreeBSD recently integrated a change which causes \s in a regex to throw an error instead of silently being misinterpreted as an s. Change the regex in zpool_colors.ksh to use [[:space:]]. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #10651	2020-07-31 18:11:21 -07:00
Matthew Ahrens	948423a3d1	zfs promote does not delete livelist of origin When a clone is promoted, its livelist is no longer accurate, so it is discarded. If the clone's origin is also a clone (i.e. we are promoting a clone of a clone), then the origin's livelist is also no longer accurate, so it should be discarded, but the code doesn't actually do that. Consider a pool with: * Filesystem A * Clone B, a clone of A * Clone C, a clone of B If we promote C, it discards C's livelist. It should discard B's livelist, but that is not happening. The impact is that when B is destroyed, we use the livelist to find the blocks to free, but the livelist is no longer correct so we end up freeing blocks that are still in use by C. The incorrectly-freed blocks can be reallocated causing checksum errors. And when C is destroyed it can double-free the incorrectly-freed blocks. The problem is that we remove the livelist of `origin_ds->ds_dir`, but the origin snapshot has already been moved to the promoted dsl_dir. So this is actually trying to remove the livelist of the promoted dsl_dir, which was already removed. As explained in a comment in the beginning of `dsl_dataset_promote_sync()`, we need to use the saved `odd` for the origin's dsl_dir. Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed by: Sara Hartse <sara.hartse@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10652	2020-07-31 08:59:00 -07:00
Don Brady	a15c6f3310	ZTS: minor improvements to alloc_class_009_pos functional test * Fixed a typo that cause one of the variations to be a no-op * Added additional coverage for adding special vdev after pool create * Added additional coverage for using 4K sector size Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #10641	2020-07-30 09:11:05 -07:00
Matthew Ahrens	3eabed74c0	Fix lua stack overflow on recursive call to gsub() The `zfs program` subcommand invokes a LUA interpreter to run ZFS "channel programs". This interpreter runs in a constrained environment, with defined memory limits. The LUA stack (used for LUA functions that call each other) is allocated in the kernel's heap, and is limited by the `-m MEMORY-LIMIT` flag and the `zfs_lua_max_memlimit` module parameter. The C stack is used by certain LUA features that are implemented in C. The C stack is limited by `LUAI_MAXCCALLS=20`, which limits call depth. Some LUA C calls use more stack space than others, and `gsub()` uses an unusually large amount. With a programming trick, it can be invoked recursively using the C stack (rather than the LUA stack). This overflows the 16KB Linux kernel stack after about 11 iterations, less than the limit of 20. One solution would be to decrease `LUAI_MAXCCALLS`. This could be made to work, but it has a few drawbacks: 1. The existing test suite does not pass with `LUAI_MAXCCALLS=10`. 2. There may be other LUA functions that use a lot of stack space, and the stack space may change depending on compiler version and options. This commit addresses the problem by adding a new limit on the amount of free space (in bytes) remaining on the C stack while running the LUA interpreter: `LUAI_MINCSTACK=4096`. If there is less than this amount of stack space remaining, a LUA runtime error is generated. Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Allan Jude <allanjude@freebsd.org> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10611 Closes #10613	2020-07-27 16:11:47 -07:00
tony-zfs	02fced3067	Add support to decode a resume token Adding a new subcommand to zstream called token. This now allows users to decode a resume token to retrieve the toname field. This can be useful for tools that need this information. The syntax works as follows zstream token <resume_token>. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: Tony Perkins <tperkins@datto.com> Closes #10558	2020-07-23 17:44:03 -07:00
Ryan Moeller	0c79b070a7	ZTS: Fix devname2devid build on FreeBSD with libudev When libudev is installed on FreeBSD, configure finds it and sets WANT_DEVNAME2DEVID, but it isn't found by the linker because we didn't specify where it is. Use LIBUDEV_LIBS so the location of the library gets added to the linker flags for devname2devid. Also use LIBUDEV_CFLAGS here in case some other platform needs it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10590	2020-07-22 10:49:22 -07:00
Brian Behlendorf	e862b7ecfc	Linux 4.10 compat: has_capability() Stock kernels older than 4.10 do not export the has_capability() function which is required by commit `e59a377`. To avoid breaking the build on older kernels revert to the safe legacy behavior and return EACCES when privileges cannot be checked. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10565 Closes #10573	2020-07-19 09:56:21 -07:00
Ryan Moeller	103bc5b957	ZTS: Fix nonportable use of stat in list_file_blocks FreeBSD stat uses -f to specify the format string rather than -c. list_file_blocks in blkdev.shlib uses stat -c %i to get a file's object ID for zdb. We already have a library function to do this portably. Use get_objnum to get the file's object ID. Take log_must off of the call to list_free_blocks in corrupt_blocks_at_level, which had masked the error. It was not good to pipe the output of log_must into the while-loop, anyway. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10572	2020-07-15 21:26:39 -07:00
Matthew Ahrens	6774931dfa	Extend zdb to print inconsistencies in livelists and metaslabs Livelists and spacemaps are data structures that are logs of allocations and frees. Livelists entries are block pointers (blkptr_t). Spacemaps entries are ranges of numbers, most often used as to track allocated/freed regions of metaslabs/vdevs. These data structures can become self-inconsistent, for example if a block or range can be "double allocated" (two allocation records without an intervening free) or "double freed" (two free records without an intervening allocation). ZDB (as well as zfs running in the kernel) can detect these inconsistencies when loading livelists and metaslab. However, it generally halts processing when the error is detected. When analyzing an on-disk problem, we often want to know the entire set of inconsistencies, which is not possible with the current behavior. This commit adds a new flag, `zdb -y`, which analyzes the livelist and metaslab data structures and displays all of their inconsistencies. Note that this is different from the leak detection performed by `zdb -b`, which checks for inconsistencies between the spacemaps and the tree of block pointers, but assumes the spacemaps are self-consistent. The specific checks added are: Verify livelists by iterating through each sublivelists and: - report leftover FREEs - report double ALLOCs and double FREEs - record leftover ALLOCs together with their TXG [see Cross Check] Verify spacemaps by iterating over each metaslab and: - iterate over spacemap and then the metaslab's entries in the spacemap log, then report any double FREEs and double ALLOCs Verify that livelists are consistenet with spacemaps. The space referenced by livelists (after using the FREE's to cancel out corresponding ALLOCs) should be allocated, according to the spacemaps. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sara Hartse <sara.hartse@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> External-issue: DLPX-66031 Closes #10515	2020-07-14 17:51:05 -07:00
Arvind Sankar	38e2e9ce83	Centralize variable substitution A bunch of places need to edit files to incorporate the configured paths i.e. bindir, sbindir etc. Move this logic into a common file. Create arc_summary by copying arc_summary[23] as appropriate at build time instead of install time. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10559	2020-07-14 17:33:44 -07:00
George Wilson	c15d36c674	Remove dependency on sharetab file and refactor sharing logic == Motivation and Context The current implementation of 'sharenfs' and 'sharesmb' relies on the use of the sharetab file. The use of this file is os-specific and not required by linux or freebsd. Currently the code must maintain updates to this file which adds complexity and presents a significant performance impact when sharing many datasets. In addition, concurrently running 'zfs sharenfs' command results in missing entries in the sharetab file leading to unexpected failures. == Description This change removes the sharetab logic from the linux and freebsd implementation of 'sharenfs' and 'sharesmb'. It still preserves an os-specific library which contains the logic required for sharing NFS or SMB. The following entry points exist in the vastly simplified libshare library: - sa_enable_share -- shares a dataset but may not commit the change - sa_disable_share -- unshares a dataset but may not commit the change - sa_is_shared -- determine if a dataset is shared - sa_commit_share -- notify NFS/SMB subsystem to commit the shares - sa_validate_shareopts -- determine if sharing options are valid The sa_commit_share entry point is provided as a performance enhancement and is not required. The sa_enable_share/sa_disable_share may commit the share as part of the implementation. Libshare provides a framework for both NFS and SMB but some operating systems may not fully support these protocols or all features of the protocol. NFS Operation: For linux, libshare updates /etc/exports.d/zfs.exports to add and remove shares and then commits the changes by invoking 'exportfs -r'. This file, is automatically read by the kernel NFS implementation which makes for better integration with the NFS systemd service. For FreeBSD, libshare updates /etc/zfs/exports to add and remove shares and then commits the changes by sending a SIGHUP to mountd. SMB Operation: For linux, libshare adds and removes files in /var/lib/samba/usershares by calling the 'net' command directly. There is no need to commit the changes. FreeBSD does not support SMB. == Performance Results To test sharing performance we created a pool with an increasing number of datasets and invoked various zfs actions that would enable and disable sharing. The performance testing was limited to NFS sharing. The following tests were performed on an 8 vCPU system with 128GB and a pool comprised of 4 50GB SSDs: Scale testing: - Share all filesystems in parallel -- zfs sharenfs=on <dataset> & - Unshare all filesystems in parallel -- zfs sharenfs=off <dataset> & Functional testing: - share each filesystem serially -- zfs share -a - unshare each filesystem serially -- zfs unshare -a - reset sharenfs property and unshare -- zfs inherit -r sharenfs <pool> For 'zfs sharenfs=on' scale testing we saw an average reduction in time of 89.43% and for 'zfs sharenfs=off' we saw an average reduction in time of 83.36%. Functional testing also shows a huge improvement: - zfs share -- 97.97% reduction in time - zfs unshare -- 96.47% reduction in time - zfs inhert -r sharenfs -- 99.01% reduction in time Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Bryant G. Ly <bryangly@gmail.com> Signed-off-by: George Wilson <gwilson@delphix.com> External-Issue: DLPX-68690 Closes #1603 Closes #7692 Closes #7943 Closes #10300	2020-07-13 09:19:18 -07:00
Serapheim Dimitropoulos	6f1db5f37e	Unconditionally enable debugging for libzpool We already enable -DDEBUG unconditionally (meaning regardless of this is a debug build or a performance build) for zdb and ztest as they are mostly used for development and debugging. This patch enables -DDEBUG for libzpool extending the debugging checks for zdb, ztest, and a couple of other test utilities. In addition to passing -DDEBUG we also enable -DZFS_DEBUG so all assertion checks work s expected. We do so not only in libzpool but in every utility that links to it, even if the utility doesn't directly use any functionality wrapped in ZFS_DEBUG macro definitions. The reason is that these utilities may still include headers that contain structs that have more fields when ZFS_DEBUG is defined. This can be a problem as enabling that flag for libzpool but not for zdb can lead into random problems (e.g. segmentation faults) as zdb may be have an incorrect view of a struct passed to it by libzpool. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #10549	2020-07-10 15:30:31 -07:00
Arvind Sankar	3e597dee11	Use abs_top_builddir when referencing libraries libtool stores absolute paths in the dependency_libs component of the .la files. If the Makefile for a dependent library refers to the libraries by relative path, some libraries end up duplicated on the link command line. As an example, libzfs specifies libzfs_core, libnvpair and libuutil as dependencies to be linked in. The .la file for libzfs_core also specifies libnvpair, but using an absolute path, with the result that libnvpair is present twice in the linker command line for producing libzfs. While the only thing this causes is to slightly slow down the linking, we can avoid it by using absolute paths everywhere, including for convenience libraries just for consistency. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10538	2020-07-10 14:26:32 -07:00
Arvind Sankar	1537105a8c	Add config.rpath for AM_GNU_GETTEXT Commit `e8864b1b28` ("config: libintl/libiconv for gettext() detection") added an empty config.rpath with a comment that the real one doesn't work with libtool. However, an empty config.rpath doesn't really work: eg. on FreeBSD, where libintl is in /usr/local/lib, configure thinks that gettext doesn't exist and NLS should be disabled, which currently isn't supported in the source, and hence requires manual workaround to directly link -lintl without relying on configure. config.rpath is essential to let it be detected either in --prefix or using --with-libintl-prefix. I also don't see the mentioned issue with libtool flags applied to compilation, it seems to work fine to pass LTLIBINTL to libtool. It's unnecessary to include LTLIBICONV as the configure test will automatically append that to LTLIBINTL if it is necessary to link with libiconv. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10538	2020-07-10 14:26:12 -07:00
Arvind Sankar	4d61ade1a3	Clean up lib dependencies libzutil is currently statically linked into libzfs, libzfs_core and libzpool. Avoid the unnecessary duplication by removing it from libzfs and libzpool, and adding libzfs_core to libzpool. Remove a few unnecessary dependencies: - libuutil from libzfs_core - libtirpc from libspl - keep only libcrypto in libzfs, as we don't use any functions from libssl - librt is only used for clock_gettime, however on modern systems that's in libc rather than librt. Add a configure check to see if we actually need librt - libdl from raidz_test Add a few missing dependencies: - zlib to libefi and libzfs - libuuid to zpool, and libuuid and libudev to zed - libnvpair uses assertions, so add assert.c to provide aok and libspl_assertf Sort the LDADD for programs so that libraries that satisfy dependencies come at the end rather than the beginning of the linker command line. Revamp the configure tests for libaries to use FIND_SYSTEM_LIBRARY instead. This can take advantage of pkg-config, and it also avoids polluting LIBS. List all the required dependencies in the pkgconfig files, and move the one for libzfs_core into the latter's directory. Install pkgconfig files in $(libdir)/pkgconfig on linux and $(prefix)/libdata/pkgconfig on FreeBSD, instead of /usr/share/pkgconfig, as the more correct location for library .pc files. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10538	2020-07-10 14:26:00 -07:00
Arvind Sankar	b6437ea41c	Move libspl_assertf into .c file Variadic functions cannot be inlined. libspl_assertf ends up being duplicated in every file that uses it. Fix this by moving the function into a new assert.c. Also move the definition of aok into the new file instead of zone.c. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10538	2020-07-10 14:25:24 -07:00
Ryan Moeller	a2ec738c75	ZTS: Make bc conditional use compatible with new BSD bc FreeBSD recently replaced the GNU bc and dc in the base system with BSD licensed versions. They are supposed to be compatible with all the features present in the GNU versions, but it turns out they are picky about `if` statements having a corresponding `else`. ZTS uses `echo "if ($x > $y) 1" \| bc` in a few places, which causes tests to fail unexpectedly with the new bc. Change the two expressions in ZTS to `if ($x > $y) 1 else 0` for compatibility with the new BSD bc. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10551	2020-07-09 17:49:02 -07:00
Brian Behlendorf	9a49d3f3d3	Add device rebuild feature The device_rebuild feature enables sequential reconstruction when resilvering. Mirror vdevs can be rebuilt in LBA order which may more quickly restore redundancy depending on the pools average block size, overall fragmentation and the performance characteristics of the devices. However, block checksums cannot be verified as part of the rebuild thus a scrub is automatically started after the sequential resilver completes. The new '-s' option has been added to the `zpool attach` and `zpool replace` command to request sequential reconstruction instead of healing reconstruction when resilvering. zpool attach -s <pool> <existing vdev> <new vdev> zpool replace -s <pool> <old vdev> <new vdev> The `zpool status` output has been updated to report the progress of sequential resilvering in the same way as healing resilvering. The one notable difference is that multiple sequential resilvers may be in progress as long as they're operating on different top-level vdevs. The `zpool wait -t resilver` command was extended to wait on sequential resilvers. From this perspective they are no different than healing resilvers. Sequential resilvers cannot be supported for RAIDZ, but are compatible with the dRAID feature being developed. As part of this change the resilver_restart_* tests were moved in to the functional/replacement directory. Additionally, the replacement tests were renamed and extended to verify both resilvering and rebuilding. Original-patch-by: Isaac Huang <he.huang@intel.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: John Poduska <jpoduska@datto.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10349	2020-07-03 11:05:50 -07:00
Robert Novak	bfcbec6f5d	Add block histogram to zdb The block histogram tracks the changes to psize, lsize and asize both in the count of the number of blocks (by blocksize) and the total length of all of the blocks for that blocksize. It also keeps a running total of the cumulative size of all of the blocks up to each size to help determine the size of caching SSDs to be added to zfs hardware deployments. The block history counts and lengths are summarized in bins which are powers of two. Even rows with counts of zero are printed. This change is accessed by specifying one of two options: zdb -bbb pool zdb -Pbbb pool The first version prints the table in fixed size columns. The second prints in "parseable" output that can be placed into a CSV file. Fixed Column, nicenum output sample: block psize lsize asize size Count Length Cum. Count Length Cum. Count Length Cum. 512: 3.50K 1.75M 1.75M 3.43K 1.71M 1.71M 3.41K 1.71M 1.71M 1K: 3.65K 3.67M 5.43M 3.43K 3.44M 5.15M 3.50K 3.51M 5.22M 2K: 3.45K 6.92M 12.3M 3.41K 6.83M 12.0M 3.59K 7.26M 12.5M 4K: 3.44K 13.8M 26.1M 3.43K 13.7M 25.7M 3.49K 14.1M 26.6M 8K: 3.42K 27.3M 53.5M 3.41K 27.3M 53.0M 3.44K 27.6M 54.2M 16K: 3.43K 54.9M 108M 3.50K 56.1M 109M 3.42K 54.7M 109M 32K: 3.44K 110M 219M 3.41K 109M 218M 3.43K 110M 219M 64K: 3.41K 218M 437M 3.41K 218M 437M 3.44K 221M 439M 128K: 3.41K 437M 874M 3.70K 474M 911M 3.41K 437M 876M 256K: 3.41K 874M 1.71G 3.41K 874M 1.74G 3.41K 874M 1.71G 512K: 3.41K 1.71G 3.41G 3.41K 1.71G 3.45G 3.41K 1.71G 3.42G 1M: 3.41K 3.41G 6.82G 3.41K 3.41G 6.86G 3.41K 3.41G 6.83G 2M: 0 0 6.82G 0 0 6.86G 0 0 6.83G 4M: 0 0 6.82G 0 0 6.86G 0 0 6.83G 8M: 0 0 6.82G 0 0 6.86G 0 0 6.83G 16M: 0 0 6.82G 0 0 6.86G 0 0 6.83G Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Robert E. Novak <novak5@llnl.gov> Closes: #9158 Closes #10315	2020-06-26 15:09:20 -07:00
Arvind Sankar	6b99fc0620	Fixes for make dist Reduce the usage of EXTRA_DIST. If files are conditionally included in _SOURCES, _HEADERS etc, automake is smart enough to dist all files that could possibly be included, but this does not apply to EXTRA_DIST, resulting in make dist depending on the configuration. Add some files that were missing altogether in various Makefile's. The changes to disted files in this commit (excluding deleted files): +./cmd/zed/agents/README.md +./etc/init.d/README.md +./lib/libspl/os/freebsd/getexecname.c +./lib/libspl/os/freebsd/gethostid.c +./lib/libspl/os/freebsd/getmntany.c +./lib/libspl/os/freebsd/mnttab.c -./lib/libzfs/libzfs_core.pc -./lib/libzfs/libzfs.pc +./lib/libzfs/os/freebsd/libzfs_compat.c +./lib/libzfs/os/freebsd/libzfs_fsshare.c +./lib/libzfs/os/freebsd/libzfs_ioctl_compat.c +./lib/libzfs/os/freebsd/libzfs_zmount.c +./lib/libzutil/os/freebsd/zutil_compat.c +./lib/libzutil/os/freebsd/zutil_device_path_os.c +./lib/libzutil/os/freebsd/zutil_import_os.c +./module/lua/README.zfs +./module/os/linux/spl/README.md +./tests/README.md +./tests/zfs-tests/tests/functional/cli_root/zfs_clone/zfs_clone_rm_nested.ksh +./tests/zfs-tests/tests/functional/cli_root/zfs_send/zfs_send_encrypted_unloaded.ksh +./tests/zfs-tests/tests/functional/inheritance/README.config +./tests/zfs-tests/tests/functional/inheritance/README.state +./tests/zfs-tests/tests/functional/rsend/rsend_016_neg.ksh +./tests/zfs-tests/tests/perf/fio/sequential_readwrite.fio Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10501	2020-06-26 14:20:02 -07:00
felixdoerre	221e67040f	pam: implement a zfs_key pam module Implements a pam module for automatically loading zfs encryption keys for home datasets. The pam module: - loads a zfs key and mounts the dataset when a session opens. - unmounts the dataset and unloads the key when the session closes. - when the user is logged on and changes the password, the module changes the encryption key. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: @jengelh <jengelh@inai.de> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Felix Dörre <felix@dogcraft.de> Closes #9886 Closes #9903	2020-06-24 18:45:44 -07:00
Ryan Moeller	9192f27c1d	Add zfs_multihost_interval tunable handler for FreeBSD This tunable required a handler to be implemented for ZFS_MODULE_PARAM_CALL. Add the handler so the tunable can be declared in common code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10490	2020-06-23 13:32:42 -07:00
Matthew Ahrens	540493ba4f	Clarify comments in config/.m4, vdev_geom.c, zfs_allow_.ksh Rephrase comments to be more clear. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10481	2020-06-22 09:46:37 -07:00
Arvind Sankar	c3fe42aabd	Remove dead code Delete unused functions. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10470	2020-06-18 12:21:18 -07:00
Arvind Sankar	65c7cc49bf	Mark functions as static Mark functions used only in the same translation unit as static. This only includes functions that do not have a prototype in a header file either. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10470	2020-06-18 12:20:38 -07:00
adilger	f734301d22	linux: add basic fallocate(mode=0/2) compatibility Implement semi-compatible functionality for mode=0 (preallocation) and mode=FALLOC_FL_KEEP_SIZE (preallocation beyond EOF) for ZPL. Since ZFS does COW and snapshots, preallocating blocks for a file cannot guarantee that writes to the file will not run out of space. Even if the first overwrite was guaranteed, it would not handle any later overwrite of blocks due to COW, so strict compliance is futile. Instead, make a best-effort check that at least enough free space is currently available in the pool (with a bit of margin), then create a sparse file of the requested size and continue on with life. This does not handle all cases (e.g. several fallocate() calls before writing into the files when the filesystem is nearly full), which would require a more complex mechanism to be implemented, probably based on a modified version of dmu_prealloc(), but is usable as-is. A new module option zfs_fallocate_reserve_percent is used to control the reserve margin for any single fallocate call. By default, this is 110% of the requested preallocation size, so an additional 10% of available space is reserved for overhead to allow the application a good chance of finishing the write when the fallocate() succeeds. If the heuristics of this basic fallocate implementation are not desirable, the old non-functional behavior of returning EOPNOTSUPP for calls can be restored by setting zfs_fallocate_reserve_percent=0. The parameter of zfs_statvfs() is changed to take an inode instead of a dentry, since no dentry is available in zfs_fallocate_common(). A few tests from @behlendorf cover basic fallocate functionality. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Arshad Hussain <arshad.super@gmail.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andreas Dilger <adilger@dilger.ca> Issue #326 Closes #10408	2020-06-18 11:22:11 -07:00
Matthew Ahrens	f66434268c	Remove unnecessary references to slavery The horrible effects of human slavery continue to impact society. The casual use of the term "slave" in computer software is an unnecessary reference to a painful human experience. This commit removes all possible references to the term "slave". Implementation notes: The zpool.d/slaves script is renamed to dm-deps, which uses the same terminology as `dmsetup deps`. References to the `/sys/class/block/$dev/slaves` directory remain. This directory name is determined by the Linux kernel. Although `dmsetup deps` provides the same information, it unfortunately requires elevated privileges, whereas the `/sys/...` directory is world-readable. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10435	2020-06-10 17:07:59 -07:00
Andrea Gelmini	dd4bc569b9	Fix typos Correct various typos in the comments and tests. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #10423	2020-06-09 21:24:09 -07:00
Matthew Ahrens	7bcb7f0840	File incorrectly zeroed when receiving incremental stream that toggles -L Background: By increasing the recordsize property above the default of 128KB, a filesystem may have "large" blocks. By default, a send stream of such a filesystem does not contain large WRITE records, instead it decreases objects' block sizes to 128KB and splits the large blocks into 128KB blocks, allowing the large-block filesystem to be received by a system that does not support the `large_blocks` feature. A send stream generated by `zfs send -L` (or `--large-block`) preserves the large block size on the receiving system, by using large WRITE records. When receiving an incremental send stream for a filesystem with large blocks, if the send stream's -L flag was toggled, a bug is encountered in which the file's contents are incorrectly zeroed out. The contents of any blocks that were not modified by this send stream will be lost. "Toggled" means that the previous send used `-L`, but this incremental does not use `-L` (-L to no-L); or that the previous send did not use `-L`, but this incremental does use `-L` (no-L to -L). Changes: This commit addresses the problem with several changes to the semantics of zfs send/receive: 1. "-L to no-L" incrementals are rejected. If the previous send used `-L`, but this incremental does not use `-L`, the `zfs receive` will fail with this error message: incremental send stream requires -L (--large-block), to match previous receive. 2. "no-L to -L" incrementals are handled correctly, preserving the smaller (128KB) block size of any already-received files that used large blocks on the sending system but were split by `zfs send` without the `-L` flag. 3. A new send stream format flag is added, `SWITCH_TO_LARGE_BLOCKS`. This feature indicates that we can correctly handle "no-L to -L" incrementals. This flag is currently not set on any send streams. In the future, we intend for incremental send streams of snapshots that have large blocks to use `-L` by default, and these streams will also have the `SWITCH_TO_LARGE_BLOCKS` feature set. This ensures that streams from the default use of `zfs send` won't encounter the bug mentioned above, because they can't be received by software with the bug. Implementation notes: To facilitate accessing the ZPL's generation number, `zfs_space_delta_cb()` has been renamed to `zpl_get_file_info()` and restructured to fill in a struct with ZPL-specific info including owner and generation. In the "no-L to -L" case, if this is a compressed send stream (from `zfs send -cL`), large WRITE records that are being written to small (128KB) blocksize files need to be decompressed so that they can be written split up into multiple blocks. The zio pipeline will recompress each smaller block individually. A new test case, `send-L_toggle`, is added, which tests the "no-L to -L" case and verifies that we get an error for the "-L to no-L" case. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #6224 Closes #10383	2020-06-09 10:41:01 -07:00
Igor K	6722be2823	ZTS: Fix add-o_ashift.ksh Use option '-o' after action for compatibility Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Igor Kozhukhov <igor@dilos.org> Closes #10426	2020-06-09 10:31:16 -07:00
George Amanakis	b7654bd794	Trim L2ARC The l2arc_evict() function is responsible for evicting buffers which reference the next bytes of the L2ARC device to be overwritten. Teach this function to additionally TRIM that vdev space before it is overwritten if the device has been filled with data. This is done by vdev_trim_simple() which trims by issuing a new type of TRIM, TRIM_TYPE_SIMPLE. We also implement a "Trim Ahead" feature. It is a zfs module parameter, expressed in % of the current write size. This trims ahead of the current write size. A minimum of 64MB will be trimmed. The default is 0 which disables TRIM on L2ARC as it can put significant stress to underlying storage devices. To enable TRIM on L2ARC we set l2arc_trim_ahead > 0. We also implement TRIM of the whole cache device upon addition to a pool, pool creation or when the header of the device is invalid upon importing a pool or onlining a cache device. This is dependent on l2arc_trim_ahead > 0. TRIM of the whole device is done with TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t. We save the TRIM state for the whole device and the time of completion on-disk in the header, and restore these upon L2ARC rebuild so that zpool status -t can correctly report them. Whole device TRIM is done asynchronously so that the user can export of the pool or remove the cache device while it is trimming (ie if it is too slow). We do not TRIM the whole device if persistent L2ARC has been disabled by l2arc_rebuild_enabled = 0 because we may not want to lose all cached buffers (eg we may want to import the pool with l2arc_rebuild_enabled = 0 only once because of memory pressure). If persistent L2ARC has been disabled by setting the module parameter l2arc_rebuild_blocks_min_l2size to a value greater than the size of the cache device then the whole device is trimmed upon creation or import of a pool if l2arc_trim_ahead > 0. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #9713 Closes #9789 Closes #10224	2020-06-09 10:15:08 -07:00
alaviss	13dd63ff81	mkfile: include missing headers Without these headers, compilation fails on musl libc with offset_t being undeclared and MIN being implictly declared. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Hiếu Lê <leorize+oss@disroot.org> Closes #10406	2020-06-05 17:22:10 -07:00
Ryan Moeller	4f8b2356cc	ZTS: Retry export/destroy when busy in zpool_import_012 It can take a moment for the NFS server to give up the mountpoint after unsharing a filesystem. Use log_must_busy to retry export/destroy a few times after switching off sharenfs. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10380	2020-05-27 17:18:06 -07:00
Brian Behlendorf	c946d5a913	ZTS: Fix zfs_mount.kshlib cleanup Update cleanup_filesystem to use destroy_dataset when performing cleanup. This ensures the destroy is retried if the pool is busy preventing occasional failures. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10358	2020-05-23 17:13:42 -07:00
felixdoerre	501a1511ae	mount: use the mount syscall directly Allow zfs datasets to be mounted on Linux without relying on the invocation of an external processes. This is the same behavior which is implemented for FreeBSD. Use of the libmount library was originally considered because it provides functionality to properly lock and update the /etc/mtab file. However, these days /etc/mtab is typically a symlink to /proc/self/mounts so there's nothing to updated. Therefore, we call mount(2) directly and avoid any additional dependencies. If required the legacy behavior can be enabled by setting the ZFS_MOUNT_HELPER environment variable. This may be needed in environments where SELinux in enabled and the zfs binary does not have mount permission. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Felix Dörre <felix@dogcraft.de> #10294	2020-05-20 18:02:41 -07:00
Paul Dagnelie	de4f06c275	Small program that converts a dataset id and an object id to a path Small program that converts a dataset id and an object id to a path Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #10204	2020-05-20 10:05:33 -07:00
John Wren Kennedy	e1fcd940e7	ZTS: zpool_split_indirect deletes zfstest log file The cleanup routine for this test attempts to remove some temporary files with `rm -f $VDEV_*`, but VDEV_ is undefined. As a result, all files in the current working directory (/var/tmp/test_results/current) get removed instead. This includes the complete log file of all tests. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: George Amanakis <gamanakis@gmail.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: John Kennedy <john.kennedy@delphix.com> Closes #10324	2020-05-14 09:39:47 -07:00
John Poduska	41035a0496	Resilver restarts unnecessarily when it encounters errors When a resilver finishes, vdev_dtl_reassess is called to hopefully excise DTL_MISSING (amongst other things). If there are errors during the resilver, they are tracked in DTL_SCRUB, as spelled out in the block comment in vdev.c. DTL_SCRUB is in-core only, so it can only be used if the pool was online for the whole resilver. This state is tracked with the spa_scrub_started flag, which only gets set when the scan is initialized. Unfortunately, this flag gets cleared right before vdev_dtl_reassess gets called, so if there are any errors during the scan, DTL_MISSING will never get excised and the resilver will just continually restart. This fix simply moves clearing that flag until after the call to vdev_dtl_reasses. In addition, if a pool is imported and already has scn_errors > 0, this change will restart the resilver immediately instead of doing the rest of the scan and then restarting it from the beginning. On the other hand, if scn_errors == 0 at import, then no errors have been encountered so far, so the spa_scrub_started flag can be safely set. A test has been added to verify that resilver does not restart when relevant DTL's are available. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: John Poduska <jpoduska@datto.com> Closes #10291	2020-05-13 10:54:27 -07:00

1 2 3 4 5 ...

959 Commits