mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-04-17 08:54:52 +03:00

Author	SHA1	Message	Date
Don Brady	13d65987a9	zed syslog entries drop important info ZED will log zevents summaries to the syslog, however the log entries tend to drop event details that can be useful for diagnosis. This is especially true for ereport events, like io, checksum, and delay. Update the all-syslog.sh script to log additional event information. Add an optional config option, ZED_SYSLOG_DISPLAY_GUIDS, to zed.rc for choosing GUIDs over names for pool and vdev. Change the default ZED_SYSLOG_SUBCLASS_EXCLUDE to exclude history_event events. These events tend to be frequent, convey no meaningful info, and are already logged in the zpool history. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #10967	2020-10-19 11:01:00 -07:00
Ryan Moeller	ab6a0e236e	Ignore zpool_influxdb binary This was requested but forgotten in #10786. Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #11071	2020-10-16 13:21:28 -07:00
Christian Schwarz	61868bb14d	zil_parse: make callback parameters const Code cleanup, a follow up commit to `4d55ea81`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Co-authored-by: Ryan Moeller <ryan@freqlabs.com> Signed-off-by: Christian Schwarz <me@cschwarz.com> Closes #11020	2020-10-09 09:34:54 -07:00
Richard Elling	e9527d44e6	Add zpool_influxdb command A zpool_influxdb command is introduced to ease the collection of zpool statistics into the InfluxDB time-series database. Examples are given on how to integrate with the telegraf statistics aggregator, a companion to influxdb. Finally, a grafana dashboard template is included to show how pool latency distributions can be visualized in a ZFS + telegraf + influxdb + grafana environment. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com> Closes #10786	2020-10-09 09:29:21 -07:00
Brian Behlendorf	d0249a4bd0	Replace ZFS on Linux references with OpenZFS This change updates the documentation to refer to the project as OpenZFS instead ZFS on Linux. Web links have been updated to refer to https://github.com/openzfs/zfs. The extraneous zfsonlinux.org web links in the ZED and SPL sources have been dropped. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11007	2020-10-08 20:10:13 -07:00
Ryan Moeller	73989f4b9e	Make dbufstat work on FreeBSD With procfs_list kstats implemented for FreeBSD, dbufs are now exposed as kstat.zfs.misc.dbufs. On FreeBSD, dbufstats can use the sysctl instead of procfs when no input file has been given. Enable the dbufstats tests on FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #11008	2020-10-08 09:40:23 -07:00
Toomas Soome	4e84f67a96	zdb should not output binary data on terminal The zdb is interpreting byte array as textual string in dump_zap, but there are also binary arrays and we should not output binary data on terminal. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Toomas Soome <tsoome@me.com> External-issue: https://www.illumos.org/issues/12012 External-issue: https://www.illumos.org/issues/11713 Closes #11006	2020-10-05 14:05:28 -07:00
Allan Jude	cf2667759f	zfs userspace: use zfs_path_to_zhandle so argument can be a path Change zfs userspace subcommand to use zfs_path_to_zhandle() so that the provided dataset can be a path (/usr) or a dataset (rpool/usr). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Closes #8915	2020-09-25 14:37:10 -07:00
George Wilson	c494aa7f57	vdev_ashift should only be set once == Motivation and Context The new vdev ashift optimization prevents the removal of devices when a zfs configuration is comprised of disks which have different logical and physical block sizes. This is caused because we set 'spa_min_ashift' in vdev_open and then later call 'vdev_ashift_optimize'. This would result in an inconsistency between spa's ashift calculations and that of the top-level vdev. In addition, the optimization logical ignores the overridden ashift value that would be provided by '-o ashift=<val>'. == Description This change reworks the vdev ashift optimization so that it's only set the first time the device is configured. It still allows the physical and logical ahsift values to be set every time the device is opened but those values are only consulted on first open. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Cedric Berger <cedric@precidata.com> Signed-off-by: George Wilson <gwilson@delphix.com> External-Issue: DLPX-71831 Closes #10932	2020-09-18 12:13:47 -07:00
Matthew Ahrens	a57f954226	zdb leak detection fails with in-progress device removal When a device removal is in progress, there are 2 locations for the data that's already been moved: the original location, on the device that's being removed; and the new location, which is pointed to by the indirect mapping. When doing leak detection, zdb needs to know about both locations. To determine what's already been copied, we load the spacemaps of the removing vdev, omit the blocks that are yet to be copied, and then use the vdev's remap op to find the new location. The problem is with an optimization to the spacemap-loading code in zdb. When processing the log spacemaps, we ignore entries that are not relevant because they are past the point that's been copied. However, entries which span the point that's been copied (i.e. they are partly relevant and partly irrelevant) are processed normally. This can lead to an illegal spacemap operation, for example if offsets up to 100KB have been copied, and the spacemap log has the following entries: ALLOC 50KB-150KB (partly relevant) FREE 50KB-100KB (entirely relevant) FREE 100KB-150KB (entirely irrlevant - ignored) ALLOC 50KB-150KB (partly relevant) Because the entirely irrelevant entry was ignored, its space remains in the spacemap. When the last entry is processed, we attempt to add it to the spacemap, but it partially overlaps with the 100-150KB entry that was left over. This problem was discovered by ztest/zloop. One solution would be to also ignore the irrelevant parts of partially-irrelevant entries (i.e. when processing the ALLOC 50-150, to only add 50-100 to the spacemap). However, this commit implements a simpler solution, which is to remove this optimization entirely. I.e. to process the entire spacemap log, without regard for the point that's been copied. After reconstructing the entire allocatable range tree, there's already code to remove the parts that have not yet been copied. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> External-issue: DLPX-71820 Closes #10920	2020-09-17 10:55:30 -07:00
Georgy Yakovlev	9cc177baa0	cmd/zgenhostid: replace with simple c implementation It was discovered that dracut scripts and zgenhostid always generate little-endian /etc/hostid. This commit provides simple endianess-aware binary and updates the scripts to use it. New features include: -f flag to force overwrite. -o flag to write to different file (for dracut) accepting both 0x01234567 and 01234567 values as input Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org> Closes #10887 Closes #10925	2020-09-16 12:25:12 -07:00
George Amanakis	085321621e	Add L2ARC arcstats for MFU/MRU buffers and buffer content type Currently the ARC state (MFU/MRU) of cached L2ARC buffer and their content type is unknown. Knowing this information may prove beneficial in adjusting the L2ARC caching policy. This commit adds L2ARC arcstats that display the aligned size (in bytes) of L2ARC buffers according to their content type (data/metadata) and according to their ARC state (MRU/MFU or prefetch). It also expands the existing evict_l2_eligible arcstat to differentiate between MFU and MRU buffers. L2ARC caches buffers from the MRU and MFU lists of ARC. Upon caching a buffer, its ARC state (MRU/MFU) is stored in the L2 header (b_arcs_state). The l2_m{f,r}u_asize arcstats reflect the aligned size (in bytes) of L2ARC buffers according to their ARC state (based on b_arcs_state). We also account for the case where an L2ARC and ARC cached MRU or MRU_ghost buffer transitions to MFU. The l2_prefetch_asize reflects the alinged size (in bytes) of L2ARC buffers that were cached while they had the prefetch flag set in ARC. This is dynamically updated as the prefetch flag of L2ARC buffers changes. When buffers are evicted from ARC, if they are determined to be L2ARC eligible then their logical size is recorded in evict_l2_eligible_m{r,f}u arcstats according to their ARC state upon eviction. Persistent L2ARC: When committing an L2ARC buffer to a log block (L2ARC metadata) its b_arcs_state and prefetch flag is also stored. If the buffer changes its arcstate or prefetch flag this is reflected in the above arcstats. However, the L2ARC metadata cannot currently be updated to reflect this change. Example: L2ARC caches an MRU buffer. L2ARC metadata and arcstats count this as an MRU buffer. The buffer transitions to MFU. The arcstats are updated to reflect this. Upon pool re-import or on/offlining the L2ARC device the arcstats are cleared and the buffer will now be counted as an MRU buffer, as the L2ARC metadata were not updated. Bug fix: - If l2arc_noprefetch is set, arc_read_done clears the L2CACHE flag of an ARC buffer. However, prefetches may be issued in a way that arc_read_done() is bypassed. Instead, move the related code in l2arc_write_eligible() to account for those cases too. Also add a test and update manpages for l2arc_mfuonly module parameter, and update the manpages and code comments for l2arc_noprefetch. Move persist_l2arc tests to l2arc. Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #10743	2020-09-14 10:10:44 -07:00
xdch47	c2c7ca0d6d	Force the use of '.' as decimal separator. This solves issues occurring with a different decimal operator and keeps the command line interface consistent for all locales . E.g. `zfs set quota=0.5T` Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Felix Neumärker <xdch47@posteo.de> Closes #10878	2020-09-09 10:14:04 -07:00
Ryan Moeller	7b4e27232d	Add 'zfs rename -u' to rename without remounting Allow to rename file systems without remounting if it is possible. It is possible for file systems with 'mountpoint' property set to 'legacy' or 'none' - we don't have to change mount directory for them. Currently such file systems are unmounted on rename and not even mounted back. This introduces layering violation, as we need to update 'f_mntfromname' field in statfs structure related to mountpoint (for the dataset we are renaming and all its children). In my opinion it is worth it, as it allow to update FreeBSD in even cleaner way - in ZFS-only configuration root file system is ZFS file system with 'mountpoint' property set to 'legacy'. If root dataset is named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone it (system/oldrootfs), update FreeBSD and if it doesn't boot we can boot back from system/oldrootfs and rename it back to system/rootfs while it is mounted as /. Before it was not possible, because unmounting / was not possible. Authored by: Pawel Jakub Dawidek <pjd@FreeBSD.org> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported by: Matt Macy <mmacy@freebsd.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10839	2020-09-01 16:14:16 -07:00
Spencer Kinny	abe4fbfd01	Typo Correction Corrected the typo in zfs/cmd/zfs/zfs_main.c line number 404 pbkfd2iters to pbkdf2iters Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Spencer Kinny <spencerkinny1995@gmail.com> Closes #10850	2020-08-30 14:14:32 -07:00
Ryan Moeller	a2f944a140	zpool: Change base URL for ZFS messages to openzfs-docs Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10820	2020-08-26 21:43:06 -07:00
Ryan Moeller	6fe3498ca3	Import vdev ashift optimization from FreeBSD Many modern devices use physical allocation units that are much larger than the minimum logical allocation size accessible by external commands. Two prevalent examples of this are 512e disk drives (512b logical sector, 4K physical sector) and flash devices (512b logical sector, 4K or larger allocation block size, and 128k or larger erase block size). Operations that modify less than the physical sector size result in a costly read-modify-write or garbage collection sequence on these devices. Simply exporting the true physical sector of the device to ZFS would yield optimal performance, but has two serious drawbacks: 1. Existing pools created with devices that have different logical and physical block sizes, but were configured to use the logical block size (e.g. because the OS version used for pool construction reported the logical block size instead of the physical block size) will suddenly find that the vdev allocation size has increased. This can be easily tolerated for active members of the array, but ZFS would prevent replacement of a vdev with another identical device because it now appears that the smaller allocation size required by the pool is not supported by the new device. 2. The device's physical block size may be too large to be supported by ZFS. The optimal allocation size for the vdev may be quite large. For example, a RAID controller may export a vdev that requires read-modify-write cycles unless accessed using 64k aligned/sized requests. ZFS currently has an 8k minimum block size limit. Reporting both the logical and physical allocation sizes for vdevs solves these problems. A device may be used so long as the logical block size is compatible with the configuration. By comparing the logical and physical block sizes, new configurations can be optimized and administrators can be notified of any existing pools that are sub-optimal. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Matthew Macy <mmacy@freebsd.org> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10619	2020-08-21 12:53:17 -07:00
Brian Behlendorf	64025fa3a1	Silence 'make checkbashisms' Commit `d2bce6d03` added the 'make checkbashisms' target but did not resolve all of the bashisms in the scripts. This commit doesn't resolve them all either but it does fix up a few, and it excludes the others so 'make checkstyle' no longer prints warnings. It's a small step in the right direction. * Dracut is Linux specific and itself depends on bash. Therefore all dracut support scripts can be bash specific, update their shebang accordingly. * zed-functions.sh, zfs-import, zfs-mount, zfs-zed, smart paxcheck.sh, make_gitrev.sh - these scripts were excuded from the check until they can be updated and properly tested. * zfsunlock - only whole values for sleep are allowed. * vdev_id - removed unneeded locals; use && instead of -a. * dkms.mkconf, dkms.postbuil - use \|\| instead of -o. Reviewed-by: InsanePrawn <insane.prawny@gmail.com> Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10755	2020-08-20 13:45:47 -07:00
Don Brady	7bba1d404c	'zfs share -a' should clean noauto exports This is a follow on to PR #10688 where `zfs share -a` allows the sharing of canmount=noauto datasets if they are mounted. However, when a dataset with canmount=noauto is not mounted, the command should also purge any existing entries from the exports file. Otherwise, after a reboot, the nfs server attempts to export the underlying mountpath, not the dataset. This can lead to a hard hang for existing client mounts. Instead of just skipping the adding of an export if not mounted and canmount=noauto, have it also remove an existing export of the dataset so that, after a reboot, we don't export an unmounted dataset. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #10747	2020-08-20 13:12:12 -07:00
Michael Niewöhner	10b3c7f5e4	Add zstd support to zfs This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes #6247 Closes #9024 Closes #10277 Closes #10278	2020-08-20 10:30:06 -07:00
Brian Behlendorf	5266a0728a	ZED: Do not offline a missing device if no spare is available Due to commit `d48091d` a removed device is now explicitly offlined by the ZED if no spare is available, rather than the letting ZFS detect it as UNAVAIL. This broke auto-replacing of whole-disk devices, as described in issue #10577. In short, when a new device is reinserted in the same slot, the ZED will try to ONLINE it without letting ZFS recreate the necessary partition table. This change simply avoids setting the device OFFLINE when removed if no spare is available (or if spare_on_remove is false). This change has been left minimal to allow it to be backported to 0.8.x release. The auto_offline_001_pos ZTS test has been updated accordingly. Some follow up work is planned to update the ZED so it transitions the vdev to a REMOVED state. This is a state which has always existed but there is no current interface the ZED can use to accomplish this. Therefore it's being left to a follow up PR. Reviewed-by: Gionatan Danti <g.danti@assyoma.it> Co-authored-by: Gionatan Danti <g.danti@assyoma.it> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10577 Closes #10730	2020-08-18 22:13:17 -07:00
Matthew Ahrens	85ec5cbae2	Include scatter_chunk_waste in arc_size The ARC caches data in scatter ABD's, which are collections of pages, which are typically 4K. Therefore, the space used to cache each block is rounded up to a multiple of 4K. The ABD subsystem tracks this wasted memory in the `scatter_chunk_waste` kstat. However, the ARC's `size` is not aware of the memory used by this round-up, it only accounts for the size that it requested from the ABD subsystem. Therefore, the ARC is effectively using more memory than it is aware of, due to the `scatter_chunk_waste`. This impacts observability, e.g. `arcstat` will show that the ARC is using less memory than it effectively is. It also impacts how the ARC responds to memory pressure. As the amount of `scatter_chunk_waste` changes, it appears to the ARC as memory pressure, so it needs to resize `arc_c`. If the sector size (`1<<ashift`) is the same as the page size (or larger), there won't be any waste. If the (compressed) block size is relatively large compared to the page size, the amount of `scatter_chunk_waste` will be small, so the problematic effects are minimal. However, if using 512B sectors (`ashift=9`), and the (compressed) block size is small (e.g. `compression=on` with the default `volblocksize=8k` or a decreased `recordsize`), the amount of `scatter_chunk_waste` can be very large. On a production system, with `arc_size` at a constant 50% of memory, `scatter_chunk_waste` has been been observed to be 10-30% of memory. This commit adds `scatter_chunk_waste` to `arc_size`, and adds a new `waste` field to `arcstat`. As a result, the ARC's memory usage is more observable, and `arc_c` does not need to be adjusted as frequently. Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10701	2020-08-17 20:04:04 -07:00
George Amanakis	9352d8c004	Fix reporting of L2ARC writes in arc_summary3 arc_summary3 reports L2ARC writes in bytes. However, the related arc_stat is reported as hits. arc_summary2 report this correctly. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #10717	2020-08-17 11:04:06 -07:00
George Wilson	53c9d1d9b5	'zfs share -a' should handle 'canmount=noauto' The 'zfs share -a' currently skips any filesystems which have 'canmount=noauto' set. This behavior is unexpected since the one would expect 'zfs share -a' to share any mounted filesystem that has the 'sharenfs' property already set. This changes the behavior of 'zfs share -a' to allow the sharing of 'canmount=noauto' datasets if they are mounted. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Don Brady <don.brady@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Signed-off-by: George Wilson <gwilson@delphix.com> External-issue: DLPX-71313 Closes #10688	2020-08-11 13:55:04 -07:00
Matthew Macy	47ed79ff60	Changes to make openzfs build within FreeBSD buildworld A collection of header changes to enable FreeBSD to build with vendored OpenZFS. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10635	2020-07-31 21:30:31 -07:00
Matthew Macy	27d96d2254	Rename refcount.h to zfs_refcount.h Renamed to avoid conflicting with refcount.h when a different implementation is already provided by the platform. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10620	2020-07-29 16:35:33 -07:00
tony-zfs	02fced3067	Add support to decode a resume token Adding a new subcommand to zstream called token. This now allows users to decode a resume token to retrieve the toname field. This can be useful for tools that need this information. The syntax works as follows zstream token <resume_token>. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: Tony Perkins <tperkins@datto.com> Closes #10558	2020-07-23 17:44:03 -07:00
Ryan Moeller	0421f257b2	FreeBSD: Add legacy arc_min and arc_max These tunables were renamed from vfs.zfs.arc_min and vfs.zfs.arc_max to vfs.zfs.arc.min and vfs.zfs.arc.max. Add legacy compat tunables for the old names. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10579	2020-07-19 10:15:34 -07:00
Matthew Ahrens	6774931dfa	Extend zdb to print inconsistencies in livelists and metaslabs Livelists and spacemaps are data structures that are logs of allocations and frees. Livelists entries are block pointers (blkptr_t). Spacemaps entries are ranges of numbers, most often used as to track allocated/freed regions of metaslabs/vdevs. These data structures can become self-inconsistent, for example if a block or range can be "double allocated" (two allocation records without an intervening free) or "double freed" (two free records without an intervening allocation). ZDB (as well as zfs running in the kernel) can detect these inconsistencies when loading livelists and metaslab. However, it generally halts processing when the error is detected. When analyzing an on-disk problem, we often want to know the entire set of inconsistencies, which is not possible with the current behavior. This commit adds a new flag, `zdb -y`, which analyzes the livelist and metaslab data structures and displays all of their inconsistencies. Note that this is different from the leak detection performed by `zdb -b`, which checks for inconsistencies between the spacemaps and the tree of block pointers, but assumes the spacemaps are self-consistent. The specific checks added are: Verify livelists by iterating through each sublivelists and: - report leftover FREEs - report double ALLOCs and double FREEs - record leftover ALLOCs together with their TXG [see Cross Check] Verify spacemaps by iterating over each metaslab and: - iterate over spacemap and then the metaslab's entries in the spacemap log, then report any double FREEs and double ALLOCs Verify that livelists are consistenet with spacemaps. The space referenced by livelists (after using the FREE's to cancel out corresponding ALLOCs) should be allocated, according to the spacemaps. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sara Hartse <sara.hartse@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> External-issue: DLPX-66031 Closes #10515	2020-07-14 17:51:05 -07:00
Arvind Sankar	38e2e9ce83	Centralize variable substitution A bunch of places need to edit files to incorporate the configured paths i.e. bindir, sbindir etc. Move this logic into a common file. Create arc_summary by copying arc_summary[23] as appropriate at build time instead of install time. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10559	2020-07-14 17:33:44 -07:00
George Wilson	c15d36c674	Remove dependency on sharetab file and refactor sharing logic == Motivation and Context The current implementation of 'sharenfs' and 'sharesmb' relies on the use of the sharetab file. The use of this file is os-specific and not required by linux or freebsd. Currently the code must maintain updates to this file which adds complexity and presents a significant performance impact when sharing many datasets. In addition, concurrently running 'zfs sharenfs' command results in missing entries in the sharetab file leading to unexpected failures. == Description This change removes the sharetab logic from the linux and freebsd implementation of 'sharenfs' and 'sharesmb'. It still preserves an os-specific library which contains the logic required for sharing NFS or SMB. The following entry points exist in the vastly simplified libshare library: - sa_enable_share -- shares a dataset but may not commit the change - sa_disable_share -- unshares a dataset but may not commit the change - sa_is_shared -- determine if a dataset is shared - sa_commit_share -- notify NFS/SMB subsystem to commit the shares - sa_validate_shareopts -- determine if sharing options are valid The sa_commit_share entry point is provided as a performance enhancement and is not required. The sa_enable_share/sa_disable_share may commit the share as part of the implementation. Libshare provides a framework for both NFS and SMB but some operating systems may not fully support these protocols or all features of the protocol. NFS Operation: For linux, libshare updates /etc/exports.d/zfs.exports to add and remove shares and then commits the changes by invoking 'exportfs -r'. This file, is automatically read by the kernel NFS implementation which makes for better integration with the NFS systemd service. For FreeBSD, libshare updates /etc/zfs/exports to add and remove shares and then commits the changes by sending a SIGHUP to mountd. SMB Operation: For linux, libshare adds and removes files in /var/lib/samba/usershares by calling the 'net' command directly. There is no need to commit the changes. FreeBSD does not support SMB. == Performance Results To test sharing performance we created a pool with an increasing number of datasets and invoked various zfs actions that would enable and disable sharing. The performance testing was limited to NFS sharing. The following tests were performed on an 8 vCPU system with 128GB and a pool comprised of 4 50GB SSDs: Scale testing: - Share all filesystems in parallel -- zfs sharenfs=on <dataset> & - Unshare all filesystems in parallel -- zfs sharenfs=off <dataset> & Functional testing: - share each filesystem serially -- zfs share -a - unshare each filesystem serially -- zfs unshare -a - reset sharenfs property and unshare -- zfs inherit -r sharenfs <pool> For 'zfs sharenfs=on' scale testing we saw an average reduction in time of 89.43% and for 'zfs sharenfs=off' we saw an average reduction in time of 83.36%. Functional testing also shows a huge improvement: - zfs share -- 97.97% reduction in time - zfs unshare -- 96.47% reduction in time - zfs inhert -r sharenfs -- 99.01% reduction in time Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Bryant G. Ly <bryangly@gmail.com> Signed-off-by: George Wilson <gwilson@delphix.com> External-Issue: DLPX-68690 Closes #1603 Closes #7692 Closes #7943 Closes #10300	2020-07-13 09:19:18 -07:00
Serapheim Dimitropoulos	6f1db5f37e	Unconditionally enable debugging for libzpool We already enable -DDEBUG unconditionally (meaning regardless of this is a debug build or a performance build) for zdb and ztest as they are mostly used for development and debugging. This patch enables -DDEBUG for libzpool extending the debugging checks for zdb, ztest, and a couple of other test utilities. In addition to passing -DDEBUG we also enable -DZFS_DEBUG so all assertion checks work s expected. We do so not only in libzpool but in every utility that links to it, even if the utility doesn't directly use any functionality wrapped in ZFS_DEBUG macro definitions. The reason is that these utilities may still include headers that contain structs that have more fields when ZFS_DEBUG is defined. This can be a problem as enabling that flag for libzpool but not for zdb can lead into random problems (e.g. segmentation faults) as zdb may be have an incorrect view of a struct passed to it by libzpool. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #10549	2020-07-10 15:30:31 -07:00
Arvind Sankar	3e597dee11	Use abs_top_builddir when referencing libraries libtool stores absolute paths in the dependency_libs component of the .la files. If the Makefile for a dependent library refers to the libraries by relative path, some libraries end up duplicated on the link command line. As an example, libzfs specifies libzfs_core, libnvpair and libuutil as dependencies to be linked in. The .la file for libzfs_core also specifies libnvpair, but using an absolute path, with the result that libnvpair is present twice in the linker command line for producing libzfs. While the only thing this causes is to slightly slow down the linking, we can avoid it by using absolute paths everywhere, including for convenience libraries just for consistency. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10538	2020-07-10 14:26:32 -07:00
Arvind Sankar	1537105a8c	Add config.rpath for AM_GNU_GETTEXT Commit `e8864b1b28` ("config: libintl/libiconv for gettext() detection") added an empty config.rpath with a comment that the real one doesn't work with libtool. However, an empty config.rpath doesn't really work: eg. on FreeBSD, where libintl is in /usr/local/lib, configure thinks that gettext doesn't exist and NLS should be disabled, which currently isn't supported in the source, and hence requires manual workaround to directly link -lintl without relying on configure. config.rpath is essential to let it be detected either in --prefix or using --with-libintl-prefix. I also don't see the mentioned issue with libtool flags applied to compilation, it seems to work fine to pass LTLIBINTL to libtool. It's unnecessary to include LTLIBICONV as the configure test will automatically append that to LTLIBINTL if it is necessary to link with libiconv. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10538	2020-07-10 14:26:12 -07:00
Arvind Sankar	4d61ade1a3	Clean up lib dependencies libzutil is currently statically linked into libzfs, libzfs_core and libzpool. Avoid the unnecessary duplication by removing it from libzfs and libzpool, and adding libzfs_core to libzpool. Remove a few unnecessary dependencies: - libuutil from libzfs_core - libtirpc from libspl - keep only libcrypto in libzfs, as we don't use any functions from libssl - librt is only used for clock_gettime, however on modern systems that's in libc rather than librt. Add a configure check to see if we actually need librt - libdl from raidz_test Add a few missing dependencies: - zlib to libefi and libzfs - libuuid to zpool, and libuuid and libudev to zed - libnvpair uses assertions, so add assert.c to provide aok and libspl_assertf Sort the LDADD for programs so that libraries that satisfy dependencies come at the end rather than the beginning of the linker command line. Revamp the configure tests for libaries to use FIND_SYSTEM_LIBRARY instead. This can take advantage of pkg-config, and it also avoids polluting LIBS. List all the required dependencies in the pkgconfig files, and move the one for libzfs_core into the latter's directory. Install pkgconfig files in $(libdir)/pkgconfig on linux and $(prefix)/libdata/pkgconfig on FreeBSD, instead of /usr/share/pkgconfig, as the more correct location for library .pc files. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10538	2020-07-10 14:26:00 -07:00
Brian Behlendorf	9a49d3f3d3	Add device rebuild feature The device_rebuild feature enables sequential reconstruction when resilvering. Mirror vdevs can be rebuilt in LBA order which may more quickly restore redundancy depending on the pools average block size, overall fragmentation and the performance characteristics of the devices. However, block checksums cannot be verified as part of the rebuild thus a scrub is automatically started after the sequential resilver completes. The new '-s' option has been added to the `zpool attach` and `zpool replace` command to request sequential reconstruction instead of healing reconstruction when resilvering. zpool attach -s <pool> <existing vdev> <new vdev> zpool replace -s <pool> <old vdev> <new vdev> The `zpool status` output has been updated to report the progress of sequential resilvering in the same way as healing resilvering. The one notable difference is that multiple sequential resilvers may be in progress as long as they're operating on different top-level vdevs. The `zpool wait -t resilver` command was extended to wait on sequential resilvers. From this perspective they are no different than healing resilvers. Sequential resilvers cannot be supported for RAIDZ, but are compatible with the dRAID feature being developed. As part of this change the resilver_restart_* tests were moved in to the functional/replacement directory. Additionally, the replacement tests were renamed and extended to verify both resilvering and rebuilding. Original-patch-by: Isaac Huang <he.huang@intel.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: John Poduska <jpoduska@datto.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10349	2020-07-03 11:05:50 -07:00
Brian Behlendorf	67b1362f04	Style fixes * Fix cstyle issue in shrinker.h which exceeded 80 columns. * Silence shellcheck warning in zpool.d/smart script. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2020-06-27 17:38:55 -07:00
Allan Jude	3bc92b9ef6	Make zstreamdump output the size of the payload for BEGIN records This is helpful for determining the size of the nvlist of snapshots and properties Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com> Closes #10505	2020-06-27 10:29:47 -07:00
Matthew Ahrens	7b232e9354	arcstat: add 'avail', fix 'free' The meaning of the `free` field is currently `zfs_arc_sys_free`, which is the target amount of memory to leave free for the system, and is constant after booting. This commit changes the meaning of `free` to arc_free_memory(), the amount of memory that the ARC considers to be free. It also adds a new arcstat field `avail`, which tracks `arc_available_memory()`. Since `avail` can be negative, it also updates the arcstat script to pretty-print negative values. example output: $ arcstat -f time,miss,arcsz,c,grow,need,free,avail 1 time miss arcsz c grow need free avail 15:03:02 39K 114G 114G 0 0 2.4G 407M 15:03:03 42K 114G 114G 0 0 2.1G 120M 15:03:04 40K 114G 114G 0 0 1.8G -177M 15:03:05 24K 113G 112G 0 0 1.7G -269M 15:03:06 29K 111G 110G 0 0 1.6G -385M 15:03:07 27K 110G 108G 0 0 1.4G -535M 15:03:08 13K 108G 108G 0 0 2.2G 239M 15:03:09 33K 107G 107G 0 0 1.3G -639M 15:03:10 16K 105G 102G 0 0 2.6G 704M 15:03:11 7.2K 102G 102G 0 0 5.1G 3.1G 15:03:12 42K 103G 102G 0 0 4.8G 2.8G Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10494	2020-06-26 18:05:28 -07:00
Robert Novak	bfcbec6f5d	Add block histogram to zdb The block histogram tracks the changes to psize, lsize and asize both in the count of the number of blocks (by blocksize) and the total length of all of the blocks for that blocksize. It also keeps a running total of the cumulative size of all of the blocks up to each size to help determine the size of caching SSDs to be added to zfs hardware deployments. The block history counts and lengths are summarized in bins which are powers of two. Even rows with counts of zero are printed. This change is accessed by specifying one of two options: zdb -bbb pool zdb -Pbbb pool The first version prints the table in fixed size columns. The second prints in "parseable" output that can be placed into a CSV file. Fixed Column, nicenum output sample: block psize lsize asize size Count Length Cum. Count Length Cum. Count Length Cum. 512: 3.50K 1.75M 1.75M 3.43K 1.71M 1.71M 3.41K 1.71M 1.71M 1K: 3.65K 3.67M 5.43M 3.43K 3.44M 5.15M 3.50K 3.51M 5.22M 2K: 3.45K 6.92M 12.3M 3.41K 6.83M 12.0M 3.59K 7.26M 12.5M 4K: 3.44K 13.8M 26.1M 3.43K 13.7M 25.7M 3.49K 14.1M 26.6M 8K: 3.42K 27.3M 53.5M 3.41K 27.3M 53.0M 3.44K 27.6M 54.2M 16K: 3.43K 54.9M 108M 3.50K 56.1M 109M 3.42K 54.7M 109M 32K: 3.44K 110M 219M 3.41K 109M 218M 3.43K 110M 219M 64K: 3.41K 218M 437M 3.41K 218M 437M 3.44K 221M 439M 128K: 3.41K 437M 874M 3.70K 474M 911M 3.41K 437M 876M 256K: 3.41K 874M 1.71G 3.41K 874M 1.74G 3.41K 874M 1.71G 512K: 3.41K 1.71G 3.41G 3.41K 1.71G 3.45G 3.41K 1.71G 3.42G 1M: 3.41K 3.41G 6.82G 3.41K 3.41G 6.86G 3.41K 3.41G 6.83G 2M: 0 0 6.82G 0 0 6.86G 0 0 6.83G 4M: 0 0 6.82G 0 0 6.86G 0 0 6.83G 8M: 0 0 6.82G 0 0 6.86G 0 0 6.83G 16M: 0 0 6.82G 0 0 6.86G 0 0 6.83G Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Robert E. Novak <novak5@llnl.gov> Closes: #9158 Closes #10315	2020-06-26 15:09:20 -07:00
Arvind Sankar	6b99fc0620	Fixes for make dist Reduce the usage of EXTRA_DIST. If files are conditionally included in _SOURCES, _HEADERS etc, automake is smart enough to dist all files that could possibly be included, but this does not apply to EXTRA_DIST, resulting in make dist depending on the configuration. Add some files that were missing altogether in various Makefile's. The changes to disted files in this commit (excluding deleted files): +./cmd/zed/agents/README.md +./etc/init.d/README.md +./lib/libspl/os/freebsd/getexecname.c +./lib/libspl/os/freebsd/gethostid.c +./lib/libspl/os/freebsd/getmntany.c +./lib/libspl/os/freebsd/mnttab.c -./lib/libzfs/libzfs_core.pc -./lib/libzfs/libzfs.pc +./lib/libzfs/os/freebsd/libzfs_compat.c +./lib/libzfs/os/freebsd/libzfs_fsshare.c +./lib/libzfs/os/freebsd/libzfs_ioctl_compat.c +./lib/libzfs/os/freebsd/libzfs_zmount.c +./lib/libzutil/os/freebsd/zutil_compat.c +./lib/libzutil/os/freebsd/zutil_device_path_os.c +./lib/libzutil/os/freebsd/zutil_import_os.c +./module/lua/README.zfs +./module/os/linux/spl/README.md +./tests/README.md +./tests/zfs-tests/tests/functional/cli_root/zfs_clone/zfs_clone_rm_nested.ksh +./tests/zfs-tests/tests/functional/cli_root/zfs_send/zfs_send_encrypted_unloaded.ksh +./tests/zfs-tests/tests/functional/inheritance/README.config +./tests/zfs-tests/tests/functional/inheritance/README.state +./tests/zfs-tests/tests/functional/rsend/rsend_016_neg.ksh +./tests/zfs-tests/tests/perf/fio/sequential_readwrite.fio Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10501	2020-06-26 14:20:02 -07:00
Arvind Sankar	5ca349f95d	Fix check for sed --in-place The test added in commit `4313a5b4c5` ("Detect if sed supports --in-place") doesn't work at least on my system (autoconfig-2.69). The issue is that SED has already been found and cached before this function is evaluated, with the result that the test is completely skipped. ... checking for a sed that does not truncate output... /usr/bin/sed ... checking for sed --in-place... (cached) /usr/bin/sed The first test is executed by libtool.m4. This looks to have been around in libtool for at least 15 years or so, not sure why this was not encountered at the time of the original commit. Fix this by caching the value of the ac_inplace flag rather than the path to SED. Also use $SED and add AC_REQUIRE to ensure that we use the sed that was located by the standard configure test. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10493	2020-06-24 18:19:59 -07:00
Kevin P. Fleming	f21de6883f	Add trim_finish notify script for ZED Allow users to configure notifications when TRIM operations are completed on pools. Unlike resilver_finish and scrub_finish, the trim_finish event is generated for each vdev in the pool which was trimmed, so the script will generate a notification for each one. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Kevin P. Fleming <kevin@km6g.us> Closes #10491	2020-06-24 16:57:13 -07:00
Jorgen Lundman	68301ba20e	zed additional features This commit adds two features to zed, that macOS desires. The first is that when you unload the kernel module, zed would enter into a cpubusy loop calling zfs_events_next() repeatedly. We now look for ENODEV, returned by kernel, so zed can exit gracefully. Second feature is -I (idle) (alas -P persist was taken) is for the deamon to; 1; if started without ZFS kernel module, stick around waiting for it. 2; if kernel module is unloaded, go back to 1. This is due to daemons in macOS is started by launchctl, and is expected to stick around. Currently, the busy loop only exists when errno is ENODEV. This is to ensure that functionality that upstream expects is not changed. It did not care about errors before, and it still does not. (with the exception of ENODEV). However, it is probably better that all errors (ERESTART notwithstanding) exits the loop, and the issues complaining about zed taking all CPU will go away. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jorgen Lundman <lundman@lundman.net> Closes #10476	2020-06-22 09:53:34 -07:00
Serapheim Dimitropoulos	42d8d1d66a	Remove unnecessary terminology from error-injection in ztest Rephrase error-injection comment in ztest to be more clear. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Sara Hartse <sara.hartse@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes #10482	2020-06-22 09:48:36 -07:00
Andriy Gapon	a8bd6dcf87	zfs allow/unallow should work with numeric uid/gid And that should work even (especially) if there is no matching user or group name. The change is originally by Xin Lin <delphij@FreeBSD.org>. Original-patch-by: Xin Li <delphij@FreeBSD.org> Reviewed-by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed-by: Andy Stormont <astormont@racktopsystems.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andriy Gapon <avg@FreeBSD.org> Closes #9792 Closes #10280	2020-06-19 10:38:43 -07:00
Arvind Sankar	60356b1a21	Add include files for prototypes Include the header with prototypes in the file that provides definitions as well, to catch any mismatch between prototype and definition. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10470	2020-06-18 12:21:25 -07:00
Arvind Sankar	c3fe42aabd	Remove dead code Delete unused functions. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10470	2020-06-18 12:21:18 -07:00
Arvind Sankar	65c7cc49bf	Mark functions as static Mark functions used only in the same translation unit as static. This only includes functions that do not have a prototype in a header file either. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10470	2020-06-18 12:20:38 -07:00
Jorgen Lundman	d553fb9b9e	Avoid adding new primitives in zpool wait zpool wait brought in sem_init() and family, which is a primitive set not previously used in Open ZFS. It also happens to be deprecated on macOS. Replace with phtread API calls. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Gallagher <john.gallagher@delphix.com> Signed-off-by: Jorgen Lundman <lundman@lundman.net> Closes #10468	2020-06-18 10:44:45 -07:00
Matthew Ahrens	ba54b180a5	Remove refences to blacklist/whitelist These terms reinforce the incorrect notion that black is bad and white is good. Replace this language with more specific terms which are also more clear and don't rely on metaphor. Specifically: * When vdevs are specified on the command line, they are the "selected" vdevs. * Entries in /dev/ which should not be considered as possible disks are "excluded" devices. Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10457	2020-06-16 11:41:45 -07:00
Matthew Ahrens	f66434268c	Remove unnecessary references to slavery The horrible effects of human slavery continue to impact society. The casual use of the term "slave" in computer software is an unnecessary reference to a painful human experience. This commit removes all possible references to the term "slave". Implementation notes: The zpool.d/slaves script is renamed to dm-deps, which uses the same terminology as `dmsetup deps`. References to the `/sys/class/block/$dev/slaves` directory remain. This directory name is determined by the Linux kernel. Although `dmsetup deps` provides the same information, it unfortunately requires elevated privileges, whereas the `/sys/...` directory is world-readable. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10435	2020-06-10 17:07:59 -07:00
Arvind Sankar	66786f7943	Fix VPATH builds for user config cmd/zpool and lib/libzutil Makefile's use -I., which won't work with a VPATH build. Replace it with -I$(srcdir) instead. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10379 Closes #10421	2020-06-10 09:25:37 -07:00
Matthew Ahrens	7bcb7f0840	File incorrectly zeroed when receiving incremental stream that toggles -L Background: By increasing the recordsize property above the default of 128KB, a filesystem may have "large" blocks. By default, a send stream of such a filesystem does not contain large WRITE records, instead it decreases objects' block sizes to 128KB and splits the large blocks into 128KB blocks, allowing the large-block filesystem to be received by a system that does not support the `large_blocks` feature. A send stream generated by `zfs send -L` (or `--large-block`) preserves the large block size on the receiving system, by using large WRITE records. When receiving an incremental send stream for a filesystem with large blocks, if the send stream's -L flag was toggled, a bug is encountered in which the file's contents are incorrectly zeroed out. The contents of any blocks that were not modified by this send stream will be lost. "Toggled" means that the previous send used `-L`, but this incremental does not use `-L` (-L to no-L); or that the previous send did not use `-L`, but this incremental does use `-L` (no-L to -L). Changes: This commit addresses the problem with several changes to the semantics of zfs send/receive: 1. "-L to no-L" incrementals are rejected. If the previous send used `-L`, but this incremental does not use `-L`, the `zfs receive` will fail with this error message: incremental send stream requires -L (--large-block), to match previous receive. 2. "no-L to -L" incrementals are handled correctly, preserving the smaller (128KB) block size of any already-received files that used large blocks on the sending system but were split by `zfs send` without the `-L` flag. 3. A new send stream format flag is added, `SWITCH_TO_LARGE_BLOCKS`. This feature indicates that we can correctly handle "no-L to -L" incrementals. This flag is currently not set on any send streams. In the future, we intend for incremental send streams of snapshots that have large blocks to use `-L` by default, and these streams will also have the `SWITCH_TO_LARGE_BLOCKS` feature set. This ensures that streams from the default use of `zfs send` won't encounter the bug mentioned above, because they can't be received by software with the bug. Implementation notes: To facilitate accessing the ZPL's generation number, `zfs_space_delta_cb()` has been renamed to `zpl_get_file_info()` and restructured to fill in a struct with ZPL-specific info including owner and generation. In the "no-L to -L" case, if this is a compressed send stream (from `zfs send -cL`), large WRITE records that are being written to small (128KB) blocksize files need to be decompressed so that they can be written split up into multiple blocks. The zio pipeline will recompress each smaller block individually. A new test case, `send-L_toggle`, is added, which tests the "no-L to -L" case and verifies that we get an error for the "-L to no-L" case. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #6224 Closes #10383	2020-06-09 10:41:01 -07:00
George Amanakis	b7654bd794	Trim L2ARC The l2arc_evict() function is responsible for evicting buffers which reference the next bytes of the L2ARC device to be overwritten. Teach this function to additionally TRIM that vdev space before it is overwritten if the device has been filled with data. This is done by vdev_trim_simple() which trims by issuing a new type of TRIM, TRIM_TYPE_SIMPLE. We also implement a "Trim Ahead" feature. It is a zfs module parameter, expressed in % of the current write size. This trims ahead of the current write size. A minimum of 64MB will be trimmed. The default is 0 which disables TRIM on L2ARC as it can put significant stress to underlying storage devices. To enable TRIM on L2ARC we set l2arc_trim_ahead > 0. We also implement TRIM of the whole cache device upon addition to a pool, pool creation or when the header of the device is invalid upon importing a pool or onlining a cache device. This is dependent on l2arc_trim_ahead > 0. TRIM of the whole device is done with TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t. We save the TRIM state for the whole device and the time of completion on-disk in the header, and restore these upon L2ARC rebuild so that zpool status -t can correctly report them. Whole device TRIM is done asynchronously so that the user can export of the pool or remove the cache device while it is trimming (ie if it is too slow). We do not TRIM the whole device if persistent L2ARC has been disabled by l2arc_rebuild_enabled = 0 because we may not want to lose all cached buffers (eg we may want to import the pool with l2arc_rebuild_enabled = 0 only once because of memory pressure). If persistent L2ARC has been disabled by setting the module parameter l2arc_rebuild_blocks_min_l2size to a value greater than the size of the cache device then the whole device is trimmed upon creation or import of a pool if l2arc_trim_ahead > 0. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #9713 Closes #9789 Closes #10224	2020-06-09 10:15:08 -07:00
Brian Behlendorf	c1f3de18a4	ztest: Fix spa_open() ENOENT failures The pool may not be imported when the previous pass is terminated. In which case, spa_open() will return ENOENT to indicate the pool is not currently imported. Refactor to code slightly to handle this case by importing the pool and then retrying the spa_open(). The ztest_import() function was moved before ztest_run() and the import logic split in to a small internal helper function. The ztest_freeze() function was also moved but no changes were made. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10407	2020-06-06 12:51:35 -07:00
Brian Behlendorf	3d93161b01	ztest: Fix ztest_run_zdb() failure It's possible for ztest to be killed while the pool is exported which results in an empty cache file. This is a valid state to test, but the validation check performed by ztest_run_zdb() depends on the pool being in the cache file. If it's not the following error is printed. zdb -bccsv -G -d -Y -U /tmp/zloop-run/zpool.cache ztest zdb: can't open '/tmp/zloop-run': No such file or directory Resolve these failures by removing the dependency on the cache file. Functionally, we only care that the pool can be imported and that the zdb verification passes. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10385	2020-05-29 21:14:10 -07:00
Brian Behlendorf	d1b84da8c1	Revert "Let zfs mount all tolerate in-progress mounts" This reverts commit `a9cd8bf` which introduced a segfault when running `zfs mount -a` multiple times when there are mountpoints which are not empty. This segfault is now seen frequently by the CI after the mount code was updated to directly call mount(2). The original reason this logic was added is described in #8881. Since then the systemd `zfs-share.target` has been updated to run "After" the `zfs-mount.server` which should avoid this issue. Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9560 Closes #10364	2020-05-26 16:07:50 -07:00
felixdoerre	501a1511ae	mount: use the mount syscall directly Allow zfs datasets to be mounted on Linux without relying on the invocation of an external processes. This is the same behavior which is implemented for FreeBSD. Use of the libmount library was originally considered because it provides functionality to properly lock and update the /etc/mtab file. However, these days /etc/mtab is typically a symlink to /proc/self/mounts so there's nothing to updated. Therefore, we call mount(2) directly and avoid any additional dependencies. If required the legacy behavior can be enabled by setting the ZFS_MOUNT_HELPER environment variable. This may be needed in environments where SELinux in enabled and the zfs binary does not have mount permission. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Felix Dörre <felix@dogcraft.de> #10294	2020-05-20 18:02:41 -07:00
Paul Dagnelie	de4f06c275	Small program that converts a dataset id and an object id to a path Small program that converts a dataset id and an object id to a path Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #10204	2020-05-20 10:05:33 -07:00
George Amanakis	7cd723e685	Fix gcc 10.1 stringop-truncation error As we do not expect the destination of these strncpy calls to be NULL terminated, substitute them with memcpy. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #10346	2020-05-19 14:24:10 -07:00
AJ Jordan	b29e31d80d	Fix outdated comment header Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: AJ Jordan <alex@strugee.net> Closes #10288	2020-05-11 16:23:16 -07:00
AJ Jordan	2b21da4f76	Fix inconsistent capitalization in `arcstat -v` Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: AJ Jordan <alex@strugee.net> Closes #10288	2020-05-11 16:21:08 -07:00
Petros Koutoupis	bd95f00d4b	Fixed LDADD library links in Makefiles for cross compilation builds When building on native dev system, there are no issues but when cross-compiling for target system, some linker errors are observed. The only way to avoid these errors is by adjusting the Makefile.am of those various components to add the library dependencies. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Petros Koutoupis <petros@petroskoutoupis.com> Closes #10304	2020-05-09 10:17:08 -07:00
George Amanakis	657fd33bcf	Improvements on persistent L2ARC Functional changes: We implement refcounts of log blocks and their aligned size on the cache device along with two corresponding arcstats. The refcounts are reflected in the header of the device and provide valuable information as to whether log blocks are accounted for correctly. These are dynamically adjusted as log blocks are committed/evicted. zdb also uses this information in the device header and compares it to the corresponding values as reported by dump_l2arc_log_blocks() which emulates l2arc_rebuild(). If the refcounts saved in the device header report higher values, zdb exits with an error. For this feature to work correctly there should be no active writes on the device. This is also employed in the tests of persistent L2ARC. We extend the structure of the cache device header by adding the two new variables mirroring the refcounts after the existing variables to preserve backward compatibility in terms of persistent L2ARC. 1) a new arcstat "l2_log_blk_asize" and refcount "l2ad_lb_asize" which reflect the total aligned size of log blocks on the device. This is also reflected in the header of the cache device as "dh_lb_asize". 2) a new arcstat "l2arc_log_blk_count" and refcount "l2ad_lb_count" which reflect the total number of L2ARC log blocks present on cache devices. It is also reflected in the header of the cache device as "dh_lb_count". In l2arc_rebuild_vdev() if the amount of committed log entries in a log block is 0 and the device header is valid we update the device header. This will facilitate trimming of the whole device in this case when TRIM for L2ARC is implemented. Improve loop protection in l2arc_rebuild() by using the starting offset of the payload of each log block instead of the starting offset of the log block. If the zio in l2arc_write_buffers() fails, restore the lbps array in the header of the device to its previous state in l2arc_write_done(). If l2arc_rebuild() ends the rebuild process without restoring any L2ARC log blocks in ARC and without any other error, this means that the lbps array in the header is pointing to non-existent or invalid log blocks. Reset the device header in this case. In l2arc_rebuild() change the zfs_dbgmsg messages to spa_history_log_internal() making them user visible with zpool history command. Non-functional changes: Make the first test in persistent L2ARC use `zdb -lll` to increase coverage in `zdb.c`. Rename psize with asize when referring to log blocks, since L2ARC_SET_PSIZE stores the vdev aligned size for log blocks. Also rename dh_log_blk_entries to dh_log_entries to make it clear that it is a mirror of l2ad_log_entries. Added comments for both changes. Fix inaccurate comments for example in l2arc_log_blk_restore(). Add asserts at the end in l2arc_evict() and l2arc_write_buffers(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #10228	2020-05-07 16:34:03 -07:00
Paul Dagnelie	108a454a46	Add support for boot environment data to be stored in the label Modern bootloaders leverage data stored in the root filesystem to enable some of their powerful features. GRUB specifically has a grubenv file which can store large amounts of configuration data that can be read and written at boot time and during normal operation. This allows sysadmins to configure useful features like automated failover after failed boot attempts. Unfortunately, due to the Copy-on-Write nature of ZFS, the standard behavior of these tools cannot handle writing to ZFS files safely at boot time. We need an alternative way to store data that allows the bootloader to make changes to the data. This work is very similar to work that was done on Illumos to enable similar functionality in the FreeBSD bootloader. This patch is different in that the data being stored is a raw grubenv file; this file can store arbitrary variables and values, and the scripting provided by grub is powerful enough that special structures are not required to implement advanced behavior. We repurpose the second padding area in each label to store the grubenv file, protected by an embedded checksum. We add two ioctls to get and set this data, and libzfs_core and libzfs functions to access them more easily. There are no direct command line interfaces to these functions; these will be added directly to the bootloader utilities. Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #10009	2020-05-07 09:36:33 -07:00
Philip Pokorny	a36bad1759	Fix column width calculation issue with certain terminal widths If the reported terminal width is 0 or less than 42, the signed variable width was set to a negative number that was then assigned to the unsigned column width becoming a huge number. Add comments and change logic to better explain what's happening. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Philip Pokorny <ppokorny@mindspring.com> Closes #10247	2020-05-06 17:17:38 -07:00
Ryan Moeller	154e48eac9	zdb: Fix ignored zfs_arc_max tuning Running zdb -l $disk shows a warning that zfs_arc_max is being ignored. zdb sets zfs_arc_max below zfs_arc_min, which causes the value to be ignored by arc_tuning_update(). Set zfs_arc_min to the bare minimum in zdb, which is below zfs_arc_max. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allanjude@freebsd.org> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10269	2020-04-30 17:48:58 -07:00
alex	47c9299fcc	zfs_create: round up volume size to multiple of bs Round up the volume size requested in `zfs create -V size` to the next higher multiple of the volblocksize. Updates the man page and adds a test to verify the new behavior. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reported-by: puffi <puffi@users.noreply.github.com> Signed-off-by: Alex John <alex@stty.io> Closes #8541 Closes #10196	2020-04-24 19:04:34 -07:00
Brian Behlendorf	6de3e59bdd	Fix unitialized variable in `zstream redup` command Fix uninitialized variable in `zstream redup` command. The compiler may determine the 'stream_offset' variable can be uninitialized because not all rdt_lookup() exit paths set it. This should never happen in practice as documented by the assert, but initialize it regardless to resolve the warning. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10241 Closes #10244	2020-04-23 15:54:38 -07:00
Matthew Ahrens	196bee4cfd	Remove deduplicated send/receive code Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such streams) are deprecated. Deduplicated send streams can be received by first converting them to non-deduplicated with the `zstream redup` command. This commit removes the code for sending and receiving deduplicated send streams. `zfs send -D` will now print a warning, ignore the `-D` flag, and generate a regular (non-deduplicated) send stream. `zfs receive` of a deduplicated send stream will print an error message and fail. The resulting code simplification (especially in the kernel's support for receiving dedup streams) should help enable future performance enhancements. Several new tests are added which leverage `zstream redup`. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Issue #7887 Issue #10117 Issue #10156 Closes #10212	2020-04-23 10:06:57 -07:00
Niklas Haas	a84c92f933	Don't attempt trimming "hole" vdevs On zpools containing hole vdevs (e.g. removed log devices), the `zpool trim` (and presumably `zpool initialize`) commands will attempt calling their respective functions on "hole", which fails, as this is not a real vdev. Avoid this by removing HOLE vdevs in zpool_collect_leaves. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Niklas Haas <git@haasn.xyz> Closes #10227	2020-04-21 09:29:31 -07:00
George Amanakis	9249f1272e	Persistent L2ARC minor fixes Minor fixes on persistent L2ARC improving code readability and fixing a typo in zdb.c when byte-swapping a log block. It also improves the pesist_l2arc_007_pos.ksh test by giving it more time to retrieve log blocks on the cache device. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #10210	2020-04-17 09:27:40 -07:00
Ryan Moeller	813c8564ee	Fix SC2086 note in zpool.d/smart ./cmd/zpool/zpool.d/smart:78:32: note: Double quote to prevent globbing and word splitting. [SC2086] Reported by latest shellcheck on FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10194	2020-04-14 13:18:23 -07:00
Matthew Macy	9f0a21e641	Add FreeBSD support to OpenZFS Add the FreeBSD platform code to the OpenZFS repository. As of this commit the source can be compiled and tested on FreeBSD 11 and 12. Subsequent commits are now required to compile on FreeBSD and Linux. Additionally, they must pass the ZFS Test Suite on FreeBSD which is being run by the CI. As of this commit 1230 tests pass on FreeBSD and there are no unexpected failures. Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #898 Closes #8987	2020-04-14 11:36:28 -07:00
Joao Carlos Mendes Luis	75c62019f3	Fix allocation errors, detected using ASAN The test for VDEV_TYPE_INDIRECT is done after a memory allocation, and could return from function without freeing it. Since we don't need that allocation yet, just postpone it. Add a missing free() when buffer is no longer needed. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: João Carlos Mendes Luís <jonny@jonny.eng.br> Closes #10193	2020-04-13 10:54:41 -07:00
Brian Behlendorf	8080848254	Minor `zstream redup` command fixes * Fix uninitialized variable in `zstream redup` command. The 'rdt.ddt_count' variable is uninitialized because it was allocated from the stack and not globally. Initialize it. This was reported by gcc when compiling with debugging enabled. zstream_redup.c:157:16: error: 'rdt.ddt_count' may be used uninitialized in this function [-Werror=maybe-uninitialized] * Remove the cmd/zstreamdump/.gitignore file. It's no longer needed now that the zstreamdump command is a script. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10192	2020-04-10 21:10:09 -07:00
Matthew Ahrens	c618f87cd2	Add `zstream redup` command to convert deduplicated send streams Deduplicated send and receive is deprecated. To ease migration to the new dedup-send-less world, the commit adds a `zstream redup` utility to convert deduplicated send streams to normal streams, so that they can continue to be received indefinitely. The new `zstream` command also replaces the functionality of `zstreamdump`, by way of the `zstream dump` subcommand. The `zstreamdump` command is replaced by a shell script which invokes `zstream dump`. The way that `zstream redup` works under the hood is that as we read the send stream, we build up a hash table which maps from `<GUID, object, offset> -> <file_offset>`. Whenever we see a WRITE record, we add a new entry to the hash table, which indicates where in the stream file to find the WRITE record for this block. (The key is `drr_toguid, drr_object, drr_offset`.) For entries other than WRITE_BYREF, we pass them through unchanged (except for the running checksum, which is recalculated). For WRITE_BYREF records, we change them to WRITE records. We find the referenced WRITE record by looking in the hash table (for the record with key `drr_refguid, drr_refobject, drr_refoffset`), and then reading the record header and payload from the specified offset in the stream file. This is why the stream can not be a pipe. The found WRITE record replaces the WRITE_BYREF record, with its `drr_toguid`, `drr_object`, and `drr_offset` fields changed to be the same as the WRITE_BYREF's (i.e. we are writing the same logical block, but with the data supplied by the previous WRITE record). This algorithm requires memory proportional to the number of WRITE records (same as `zfs send -D`), but the size per WRITE record is relatively low (40 bytes, vs. 72 for `zfs send -D`). A 1TB send stream with 8KB blocks (`recordsize=8k`) would use around 5GB of RAM to "redup". Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10124 Closes #10156	2020-04-10 10:39:55 -07:00
George Amanakis	77f6826b83	Persistent L2ARC This commit makes the L2ARC persistent across reboots. We implement a light-weight persistent L2ARC metadata structure that allows L2ARC contents to be recovered after a reboot. This significantly eases the impact a reboot has on read performance on systems with large caches. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Saso Kiselkov <skiselkov@gmail.com> Co-authored-by: Jorgen Lundman <lundman@lundman.net> Co-authored-by: George Amanakis <gamanakis@gmail.com> Ported-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #925 Closes #1823 Closes #2672 Closes #3744 Closes #9582	2020-04-10 10:33:35 -07:00
Paul Dagnelie	5a42ef04fd	Add 'zfs wait' command Add a mechanism to wait for delete queue to drain. When doing redacted send/recv, many workflows involve deleting files that contain sensitive data. Because of the way zfs handles file deletions, snapshots taken quickly after a rm operation can sometimes still contain the file in question, especially if the file is very large. This can result in issues for redacted send/recv users who expect the deleted files to be redacted in the send streams, and not appear in their clones. This change duplicates much of the zpool wait related logic into a zfs wait command, which can be used to wait until the internal deleteq has been drained. Additional wait activities may be added in the future. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Gallagher <john.gallagher@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #9707	2020-04-01 10:02:06 -07:00
alex	1d2ddb9bb9	zfs_get: change time format string from %k to %H Issue #10090 reported that snapshots created between midnight and 1 AM are missing a padded zero in the creation property This change fixes the bug reported in issue #10090 where snapshots created between midnight and 1 AM were missing a padded zero in the creation timestamp output. The leading zero was missing because the time format string used `%k` which formats the hour as a decimal number from 0 to 23 where single digits are preceded by blanks[0] and is fixed by changing it to `%H` which formats the hour as 00-23. The difference in output is as below ``` -Thu Mar 26 0:39 2020 +Thu Mar 26 00:39 2020 ``` Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Alex John <alex@stty.io> Closes #10090 Closes #10153	2020-03-26 08:28:22 -07:00
Matthew Ahrens	652bdc9b0e	Deprecate deduplicated send streams Dedup send can only deduplicate over the set of blocks in the send command being invoked, and it does not take advantage of the dedup table to do so. This is a very common misconception among not only users, but developers, and makes the feature seem more useful than it is. As a result, many users are using the feature but not getting any benefit from it. Dedup send requires a nontrivial expenditure of memory and CPU to operate, especially if the dataset(s) being sent is (are) not already using a dedup-strength checksum. Dedup send adds developer burden. It expands the test matrix when developing new features, causing bugs in released code, and delaying development efforts by forcing more testing to be done. As a result, we are deprecating the use of `zfs send -D` and receiving of such streams. This change adds a warning to the man page, and also prints the warning whenever dedup send or receive are used. In a future release, we plan to: 1. remove the kernel code for generating deduplicated streams 2. make `zfs send -D` generate regular, non-deduplicated streams 3. remove the kernel code for receiving deduplicated streams 4. make `zfs receive` of deduplicated streams process them in userland to "re-duplicate" them, so that they can still be received. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #7887 Closes #10117	2020-03-18 13:31:10 -07:00
Ryan Moeller	22df2457a7	Avoid core dump on invalid redaction bookmark libzfs aborts and dumps core on EINVAL from the kernel when trying to do a redacted send with a bookmark that is not a redaction bookmark. Move redacted bookmark validation into libzfs. Check if the bookmark given for redactions is actually a redaction bookmark. Print an error message and exit gracefully if it is not. Don't abort on EINVAL in zfs_send_one. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10138	2020-03-18 12:54:12 -07:00
Avatat	4df8b2c373	Changed decimals to integers in the arcstat script Changed interval value type from decimal to integer, because of deprecation warning in Python 3.8 and above. Also changed kstat values type from decimal to integer, because all the values are integers. Fixed behavior of arcstat when run without args. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Bartosz Zieba <bartosz@zieba.pro> Closes #10132 Closes #10142	2020-03-18 11:50:45 -07:00
Mariusz Zaborski	a57d3d45d6	Add option for forcible unmounting dataset while receiving snapshot. Currently when the dataset is in use we can't receive snapshots. zfs send test/1@asd \| zfs recv -FM test/2 cannot unmount '/test/2': Device busy This commits add option 'M' which attempts to forcibly unmount the dataset. Thanks to this we can enforce receiving snapshots in a single step. Note that this functionality is not supported on Linux because the VFS will prevent active mounted filesystems from being unmounted, even with the force option. This is the intended VFS behavior. Test cases were added to verify the expected behavior based on the platform. Discussed-with: Pawel Jakub Dawidek <pjd@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allanjude@freebsd.org> External-issue: https://reviews.freebsd.org/D22306 Closes #9904	2020-03-17 10:08:32 -07:00
Ryan Moeller	4d32abaa87	libzfs: Fix bounds checks for float parsing UINT64_MAX is not exactly representable as a double. The closest representation is UINT64_MAX + 1, so we can use a >= comparison instead of > for the bounds check. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10127	2020-03-16 11:56:29 -07:00
Brian Behlendorf	2288d41968	Add trim support to zpool wait Manual trims fall into the category of long-running pool activities which people might want to wait synchronously for. This change adds support to 'zpool wait' for waiting for manual trim operations to complete. It also adds a '-w' flag to 'zpool trim' which can be used to turn 'zpool trim' into a synchronous operation. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: John Gallagher <john.gallagher@delphix.com> Closes #10071	2020-03-04 15:07:11 -08:00
Matthew Ahrens	9cdf7b1f6b	Improve zfs destroy performance with zio_t-free zio_free() When "zfs destroy" is run, it completes quickly, and in the background we locate the blocks to free and free them. This background activity can be observed with `zpool get freeing` and `zpool wait -t free ...`. This background activity is processed by a single thread (the spa_sync thread) which calls zio_free() on each of the blocks to free. With even modest storage performance, the CPU consumption of zio_free() can be the performance bottleneck. Performance of zio_free() can be improved by not actually creating a zio_t in the common case (non-dedup, non-gang), instead calling metaslab_free() directly. This avoids the CPU cost of allocating the zio_t, and more importantly the cost of adding and later removing this zio_t from the parent zio's child list. The result is that performance of background freeing more than doubles, from 0.6 million blocks per second to 1.3 million blocks per second. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10034	2020-02-28 14:49:44 -08:00
Ryan Moeller	2ce90dca91	arc_summary: Make get_descriptions per platform Linux uses modinfo to get tunables descriptions, FreeBSD has to use sysctl. Move the existing function definition so it is defined that way on Linux, and add a definition in terms of sysctl for FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10062	2020-02-27 17:15:06 -08:00
Ryan Moeller	a33cb7e01a	Add missing newline after zfs redact help message Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10045	2020-02-25 16:20:50 -08:00
InsanePrawn	ecbbdac799	Systemd mount generator: Generate noauto units; add control properties This commit refactors the systemd mount generators and makes the following major changes: - The generator now generates units for datasets marked canmount=noauto, too. These units are NOT WantedBy local-fs.target. If there are multiple noauto datasets for a path, no noauto unit will be created. Datasets with canmount=on are prioritized. - Introduces handling of new user properties which are now included in the zfs-list.cache files: - org.openzfs.systemd:requires: List of units to require for this mount unit - org.openzfs.systemd:requires-mounts-for: List of mounts to require by this mount unit - org.openzfs.systemd:before: List of units to order after this mount unit - org.openzfs.systemd:after: List of units to order before this mount unit - org.openzfs.systemd:wanted-by: List of units to add a Wants dependency on this mount unit to - org.openzfs.systemd:required-by: List of units to add a Requires dependency on this mount unit to - org.openzfs.systemd:nofail: Toggles between a wants and a requires dependency. - org.openzfs.systemd:ignore: Do not generate a mount unit for this dataset. Consult the updated man page for detailed documentation. - Restructures and extends the zfs-mount-generator(8) man page with the above properties, information on unit ordering and a license header. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: InsanePrawn <insane.prawny@gmail.com> Closes #9649	2020-02-14 15:32:55 -08:00
Ryan Moeller	0f1832106d	Make zpool.d/iostat work on FreeBSD There are slight differences in the iostat commands between FreeBSD and Linux. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #9979	2020-02-14 08:37:40 -08:00
Matthew Ahrens	f49b7a0d8e	fix zstreamdump -C zstreamdump -C always fails. It is not calculating the checksums, but it's still trying to verify that the (non-calculated) checksum matches the one stored in the send stream. This change makes zstreamdump -C not verify checksums. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #9983	2020-02-13 11:24:57 -08:00
Justin Keogh	12f7b90c93	zdb: Always print symlink target When zdb is printing paths, also print the symlink target if it exists. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <matt@delphix.com> Signed-off-by: Justin Keogh <commits@v6y.net> Closes #9925	2020-02-12 11:36:05 -08:00
Christian Schwarz	a73f361fdb	Implement bookmark copying This feature allows copying existing bookmarks using zfs bookmark fs#target fs#newbookmark There are some niche use cases for such functionality, e.g. when using bookmarks as markers for replication progress. Copying redaction bookmarks produces a normal bookmark that cannot be used for redacted send (we are not duplicating the redaction object). ZCP support for bookmarking (both creation and copying) will be implemented in a separate patch based on this work. Overview: - Terminology: - source = existing snapshot or bookmark - new/bmark = new bookmark - Implement bookmark copying in `dsl_bookmark.c` - create new bookmark node - copy source's `zbn_phys` to new's `zbn_phys` - zero-out redaction object id in copy - Extend existing bookmark ioctl nvlist schema to accept bookmarks as sources - => `dsl_bookmark_create_nvl_validate` is authoritative - use `dsl_dataset_is_before` check for both snapshot and bookmark sources - Adjust CLI - refactor shortname expansion logic in `zfs_do_bookmark` - Update man pages - warn about redaction bookmark handling - Add test cases - CLI - pyyzfs libzfs_core bindings Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christian Schwarz <me@cschwarz.com> Closes #9571	2020-02-11 13:19:12 -08:00
Paul Zuchowski	bc67cba7c0	Fix zdb -R with 'b' flag zdb -R :b fails due to the indirect block being compressed, and the 'b' and 'd' flag not working in tandem when specified. Fix the flag parsing code and create a zfs test for zdb -R block display. Also fix the zio flags where the dotted notation for the vdev portion of DVA (i.e. 0.0:offset:length) fails. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #9640 Closes #9729	2020-02-10 14:00:05 -08:00
Graham Christensen	dda702fd16	bash scripts: use /usr/bin/env for bash shebangs Not all systems / distros have a `/bin/bash`, and these scripts are more difficult to run at development time. For example, my system is NixOS which doesn't have a /bin/bash. This is not a problem for NixOS building ZFS as a package: the build environment automatically replaces these shebangs with corrected paths. The problem is much more annoying at development time: either the scripts don't run, or I correct them for my local machine and deal with a perpetually dirty work tree. Before committing this patch I confirmed there are existing scripts which use `/usr/bin/env` to locate bash, so I am thinking this is a safe transformation. There are a handful of other shebangs in this repository which don't work on my system. This patch is useful on its own specifically for `commitcheck.sh`, otherwise I can't validate my commits before submission. Here are the remaining shebangs which NixOS systems won't have: 1274 #!/bin/ksh -p 91 #!/bin/ksh 89 #! /bin/ksh -p 2 #!/bin/sed -f 1 #!/usr/bin/perl -w 1 #!/usr/bin/ksh 1 #!/bin/nawk -f plus this which will create an invalid shebang in `tests/zfs-tests/tests/functional/mv_files/mv_files_common.kshlib`: echo "#!/bin/ksh" > $TEST_BASE_DIR/exitsZero.ksh I chose to leave those alone for now, and gauge the interest in this much smaller patch first. The fixes for these are easy enough by simply using `/usr/bin/env ksh`: 91 #!/bin/ksh 1 #!/usr/bin/ksh The fix for the other set is much trickier. Quoting the GNU coreutils manual: Most operating systems (e.g. GNU/Linux, BSDs) treat all text after the first space as a single argument. When using env in a script it is thus not possible to specify multiple arguments. and not all `env`'s support arguments. Mine (GNU Coreutils 8.31) does, though this feature is new since April 2018, GNU Coreutils 8.30: https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=668306ed86c8c79b0af0db8b9c882654ebb66db2 and worse, requires the -S argument: -S, --split-string=S process and split S into separate arguments; used to pass multiple arguments on shebang lines Example: $ seq 1 2 \| $(nix-build '<nixpkgs>' -A coreutils)/bin/env "sort -nr" /nix/[...]-coreutils-8.31/bin/env: ‘sort -nr’: No such file or directory /nix/[...]-coreutils-8.31/bin/env: use -[v]S to pass options in shebang lines $ seq 1 2 \| $(nix-build '<nixpkgs>' -A coreutils)/bin/env "-S sort -nr" 2 1 GNU Coreutils says FreeBSD's `env` does, though I wonder if FreeBSD's would be unhappy with the `-S`: https://www.gnu.org/software/coreutils/manual/html_node/env-invocation.html#env-invocation BusyBox v1.30.1 does not, and does not have a `-S`-like option: $ seq 1 2 \| $(nix-build '<nixpkgs>' -A busybox)/bin/env "sort -nr" env: can't execute 'sort -nr': No such file or directory Toybox 0.8.1 also does not, and also does not have a `-S` option: $ seq 1 2 \| $(nix-build '<nixpkgs>' -A toybox)/bin/env "sort -nr" env: exec sort -nr: No such file or directory --- At any rate, if this patch merges and the remaining ~1,500 are updated, the much larger patch should probably include a checkstyle-like test asserting all new shebangs use `/usr/bin/env`. I also don't mind dealing with NixOS weirdness if the project would prefer that. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Graham Christensen <graham@grahamc.com> Closes #9893	2020-02-10 13:13:46 -08:00
Romain Dolbeau	af09c050e9	Fix static data to link with -fno-common -fno-common is the new default in GCC 10, replacing -fcommon in GCC <= 9, so static data must only be allocated once. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu> Closes #9943	2020-02-06 09:25:29 -08:00
Ned Bass	a3403164d7	zdb: add support for object ranges for zdb -d Allow a range of object identifiers to dump with -d. This may be useful when dumping a large dataset and you want to break it up into multiple phases, or to resume where a previous scan left off. Object type selection flags are supported to reduce the performance overhead of verbosely dumping unwanted objects, and to reduce the amount of post-processing work needed to filter out unwanted objects from zdb output. This change extends existing syntax in a backward-compatible way. That is, the base case of a range is to specify a single object identifier to dump. Ranges and object identifiers can be intermixed as command line parameters. Usage synopsis: Object ranges take the form <start>:<end>[:<flags>] start Starting object number end Ending object number, or -1 for no upper bound flags Optional flags to select object types: A All objects (this is the default) d ZFS directories f ZFS files m SPA space maps z ZAPs - Negate effect of next flag Examples: # Dump all file objects zdb -dd tank/fish 0👎f # Dump all file and directory objects zdb -dd tank/fish 0👎fd # Dump all types except file and directory objects zdb -dd tank/fish 0👎A-f-d # Dump object IDs in a specific range zdb -dd tank/fish 1000:2000 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #9832	2020-01-24 11:00:46 -08:00
Romain Dolbeau	35b07497c6	Add AltiVec RAID-Z Implements the RAID-Z function using AltiVec SIMD. This is basically the NEON code translated to AltiVec. Note that the 'fletcher' algorithm requires 64-bits operations, and the initial implementations of AltiVec (PPC74xx a.k.a. G4, PPC970 a.k.a. G5) only has up to 32-bits operations, so no 'fletcher'. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu> Closes #9539	2020-01-23 11:01:24 -08:00

1 2 3 4 5 ...

1058 Commits