Compare commits

..

1927 Commits

Author SHA1 Message Date
Tony Hutter af88d47f1e Tag zfs-2.1.2
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2021-12-13 15:00:39 -08:00
Till Maas 24221589dd zfs-dkms rpm: Fix scriptlets dependencies
To ensure that the necessary packages are available during the %post and
%preun scriptlets, require them properly.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Till Maas <opensource@till.name>
Closes #12822
Closes #12832
2021-12-13 13:23:48 -08:00
Ryan Moeller def73c0735 FreeBSD: Add vop_standard_writecount_nomsync
https://cgit.freebsd.org/src/commit?id=3ffcfa599e29686cf2b3c1a6087408c37acaed78

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12828
2021-12-13 13:23:07 -08:00
Ryan Moeller effe984148 FreeBSD: Catch up with more VFS changes
Unused thread argument was removed from NDINIT*

https://cgit.freebsd.org/src/commit?id=7e1d3eefd410ca0fbae5a217422821244c3eeee4

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12828
2021-12-13 13:23:01 -08:00
Mark Johnston 19337332cc Fix several bugs in the FreeBSD rename VOP implementation
- To avoid a use-after-free, zfsvfs->z_log needs to be loaded after the
  teardown lock is acquired with ZFS_ENTER().
- Avoid leaking vnode locks in zfs_rename_relock() and zfs_rename_()
  when the ZFS_ENTER() macros forces an early return.

Refactor the rename implementation so that ZFS_ENTER() can be used
safely.  As a bonus, this lets us use the ZFS_VERIFY_ZP() macro instead
of open-coding its implementation.

Reported-by: Peter Holm <pho@FreeBSD.org>
Tested-by: Peter Holm <pho@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #12717
2021-12-13 13:22:54 -08:00
Pawel Jakub Dawidek b96737b83e Remove (now unused) td argument from zfs_lookup()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12748
2021-12-13 13:22:47 -08:00
Mark Johnston 4b7bfcf8a0 Exit the teardown section later in rename on FreeBSD
We have to hold the teardown lock while dereferencing zfsvfs->z_os and,
I believe, when committing to the ZIL.

Note that jumping to the "out" label, "error" is always non-zero.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12704
2021-12-13 13:22:41 -08:00
Mark Johnston 07165ce540 Fix potential use-after-frees in FreeBSD getpages and setattr VOPs
The objset object is reallocated during certain dataset operations, such
as rollbacks, so the objset pointer must be loaded after acquiring the
teardown lock.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12704
2021-12-13 13:22:34 -08:00
Brian Behlendorf 6ed7d77b44 ZTS: import_rewind_device_replaced reliably fails
The import_rewind_device_replaced.ksh test was never entirely reliable
because it depends on MOS data not being overwritten.  The MOS data is
not protected by the snapshot so occasional failures were always
expected.  However, this test is now failing reliably on all platforms
indicating something has changed in the code since the test was marked
"maybe".  Convert the test to a "known" failure until the root cause
is identified and resolved.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12821
2021-12-08 13:28:09 -08:00
Damian Szuberski 64e88992b6 Update checkstyle workflow env to ubuntu-20.04
- `checkstyle` workflow uses ubuntu-20.04 environment
- improved `mancheck.sh` readability

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12713
2021-12-08 13:27:56 -08:00
Brian Behlendorf ad15fb430a Linux 5.15 compat: META (#12824)
The final 5.15 kernel is available and has been tested.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2021-12-07 17:05:04 -08:00
Paul Dagnelie 57f6a050e6 ZFS send/recv with ashift 9->12 leads to data corruption
Improve the ability of zfs send to determine if a block is compressed
or not by using information contained in the blkptr.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12770
2021-12-07 17:04:34 -08:00
Coleman Kane b3b293c9fc Linux 5.16: Resolve ZSTD_isError symbol collision in Linux kernel
Newer zstd code introduced in the main kernel tree now creates a symbol
collision with ZSTD_isError in our ZSTD code. This change relabels our
implementation with a ZFS-specific symbol name, and undoes some
macro-based micro-optimizations that conflict with the attempt to rename
our internal-use version.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 13:14:24 -08:00
Coleman Kane bef7c02c81 Linux 5.16: The blk-cgroup.h header is where struct blkcg_gq is defined
The definition of struct blkcg_gq was moved into blk-cgroup.h, which is
a header that's been in Linux since 2015. This is used by
vdev_blkg_tryget() in module/os/linux/zfs/vdev_disk.c. Since the kernel
for CentOS 7 and similar-generation releases doesn't have this header,
its inclusion is guarded by a configure test.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 13:14:23 -08:00
Coleman Kane ea61e07413 Linux 5.16: bio_set_dev is no longer a helper macro
This change adds a confiugre check to determine if bio_set_dev is a
helper macro or not. If not, then the attempt to override its internal
call to bio_associate_blkg(), with a macro definition to our own
version, is no longer possible, as the compiler won't use it when
compiling the new inline function replacement implemented in the header.
This change also creates a new vdev_bio_set_dev() function that performs
the same work, and also performs the work implemented in
vdev_bio_associate_blkg(), as it is the only thing calling that function
in our code. Our custom vdev_bio_associate_blkg() is now only compiled
if the bio_set_dev() is a macro in the Linux headers.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 13:14:23 -08:00
Coleman Kane 9519fe1ff8 Linux 5.16: type member of iov_iter renamed iter_type
The iov_iter->type member was renamed iov_iter->iter_type. However,
while looking into this, realized that in 2018 a iov_iter_type(*iov)
accessor function was introduced. So if that is present, use it,
otherwise fall back to trying the existing behavior of directly
accessing type from iov_iter.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 13:14:23 -08:00
Coleman Kane 0c40ff56f2 Linux 5.16: block_device_operations->submit_bio now returns void
The return type for the submit_bio member of struct
block_device_operations was changed to no longer return a value.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 13:14:23 -08:00
Coleman Kane 806c3777e7 Linux 5.16 compat: asm/fpu/xcr.h is new location for xgetbv/xsetbv
Linux 5.16 moved these functions into this new header in commit
1b4fb8545f2b00f2844c4b7619d64d98440a477c. This change adds code to look
for the presence of this header, and include it so that the code using
xgetbv & xsetbv will compile again.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12800
2021-12-07 13:14:23 -08:00
наб ac9b1aa1bf tests/file_check: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-12-06 13:52:34 -08:00
John Wren Kennedy e20186f5d5 Strip colons from all test result filenames
The upload artifact functionality in github can't handle colons in
filenames. The current code handles this for files under the most
recent set of results. With the ability to rerun failed tests, now
there can be multiple sets of results, and they all need to be
processed in the same way.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12815
2021-12-06 12:23:02 -08:00
Brian Behlendorf 16da688f25 Linux 5.13 compat: retry zvol_open() when contended
Due to a possible lock inversion the zvol open call path on Linux
needs to be able to retry in the case where the spa_namespace_lock
cannot be acquired.

For Linux 5.12 an older kernel this was accomplished by returning
-ERESTARTSYS from zvol_open() to request that blkdev_get() drop
the bdev->bd_mutex lock, reaquire it, then call the open callback
again.  However, as of the 5.13 kernel this behavior was removed.

Therefore, for 5.12 and older kernels we preserved the existing
retry logic, but for 5.13 and newer kernels we retry internally in
zvol_open().  This should always succeed except in the case where
a pool's vdev are layed on zvols, in which case it may fail.  To
handle this case vdev_disk_open() has been updated to retry when
opening a device when -ERESTARTSYS is returned.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12301
Closes #12759
2021-12-06 12:22:57 -08:00
John Wren Kennedy e9ee57f682 Temporarily remove tests from sanity runfile
With the addition of functionality to rerun failing tests, some
tests that fail only sometimes still fail often enough to degrade
the reliability of the sanity runs. Remove them from the runfile
until they reliably pass.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12814
2021-12-06 12:22:51 -08:00
Paul Dagnelie d346361515 Add zfs-test facility to automatically rerun failing tests
This was a project proposed as part of the Quality theme for the
hackthon for the 2021 OpenZFS Developer Summit. The idea is to improve
the usability of the automated tests that get run when a PR is created
by having failing tests automatically rerun in order to make flaky
tests less impactful.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12740
2021-12-06 12:22:43 -08:00
Coleman Kane 12d27e7134 Linux 5.16: wait_on_page_bit() no longer available to modules
Instead, linux/pagemap.h offers a number of folio-specific functions to
be called instead. In this case, module/os/linux/zfs/zfs_vnops_os.c
wants to call wait_on_page_bit(pp, PG_writeback). This gets replaced
with folio_wait_bit(folio_page(pp), PG_writeback). This change modifies
the code to conditionally compile that if configure identifies th
presence of the folio_wait_bit() function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12800
2021-12-06 12:22:38 -08:00
Jorgen Lundman a1a29bf8fc Iterate encrypted clones at zvol_create_minor
Userland figures out which encryption-root keys are required to load,
and issues ZFS_IOC_LOAD_KEY.
The tail section of spa_keystore_load_wkey() will call
zvol_create_minors() on the encryption-root object.

Any clones of the encrypted zvol will not be plumbed. This commits
adds additional logic to detect if zvol has clones, and is encrypted,
then adds these to the list of zvols to call zvol_create_minors() on.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12471
2021-12-06 12:22:32 -08:00
Brian Behlendorf ea0dda5999 Exclude zfs_copies_003_pos on Linux
This test case may fail on 5.13 and newer Linux kernels if the
/dev/zvol/ device is not created by udev.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12301
Closes  #12738
2021-11-12 16:20:15 -08:00
Brian Behlendorf d7e640cf95 Restore dirty dnode detection logic
In addition to flushing memory mapped regions when checking holes,
commit de198f2d95 modified the dirty dnode detection logic to check
the dn->dn_dirty_records instead of the dn->dn_dirty_link.  Relying
on the dirty record has not be reliable, switch back to the previous
method.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900 
Closes #12745
2021-11-05 09:45:04 -07:00
Brian Behlendorf 664d487a5d Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
When using lseek(2) to report data/holes memory mapped regions of
the file were ignored.  This could result in incorrect results.
To handle this zfs_holey_common() was updated to asynchronously
writeback any dirty mmap(2) regions prior to reporting holes.

Additionally, while not strictly required, the dn_struct_rwlock is
now held over the dirty check to prevent the dnode structure from
changing.  This ensures that a clean dnode can't be dirtied before
the data/hole is located.  The range lock is now also taken to
ensure the call cannot race with zfs_write().

Furthermore, the code was refactored to provide a dnode_is_dirty()
helper function which checks the dnode for any dirty records to
determine its dirtiness.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900
Closes #12724
2021-11-05 08:08:55 -07:00
Dimitri John Ledkov 5bf81fea2f Upgrade to libabigail 2.0.0
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Closes #12722
Closes #12739
2021-11-05 07:59:40 -07:00
Tony Hutter 1fca958615 zed: Control NVMe fault LEDs
The ZED code currently can only turn on the fault LED for
a faulted disk in a JBOD enclosure.  This extends support
for faulted NVMe disks as well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12648
Closes #12695
2021-11-05 07:51:21 -07:00
Brian Behlendorf 22b0891dbb Linux 5.16 compat: submit_bio()
The submit_bio() prototype has changed again.  The version is 5.16
still only expects a single argument but the return type has changed
to void.  Since we never used the returned value before update the
configure check to detect both single arg versions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12725
2021-11-05 07:51:21 -07:00
Brian Behlendorf 0e537a0195 Linux 5.16 compat: linux/elevator.h
Commit https://github.com/torvalds/linux/commit/2e9bc346 moved
the elevator.h header under the block/ directory as part of some
refactoring.  This turns out not to be a problem since there's
no longer anything we need from the header.  This has been the
case for some time, this change removes the elevator.h include
and replaces it with a major.h include.

Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12725
2021-11-05 07:51:21 -07:00
Tony Hutter db9e1c907a vdev_id: Fix PHY sorting
One of our developers noticed a bug in vdev_id where we were incorrectly
sorting PHYs using alphabetical sorting (which usually works) instead
of natural sorting (-v).  For example:

	[port-0:0]# ls -d phy*
	phy-0:10  phy-0:11  phy-0:8  phy-0:9

	[port-0:0]# ls -vd phy*
	phy-0:8  phy-0:9  phy-0:10  phy-0:11

This fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12699
2021-11-02 16:31:17 -07:00
Tony Hutter d11b03ed81 vdev_id: Fix enclosure_symlinks feature
The vdev_id.conf "enclosure_symlinks" option persistently creates
and maps /dev/by-enclosure symlinks to dynamic /dev/sg* devices.

This patch fixes two issues:

1. The enclosure_symlinks feature was accidentally broken in:

   vdev_id: Support daisy-chained JBODs in multipath mode

2. Even when working, the feature numbered the enclosure
   sequentially rather than by HBA port number.  That meant that
   if a port was down or didn't appear in sysfs, then the
   enclosure_sumlinks numbers would be numbered wrong.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12660
2021-11-02 16:31:11 -07:00
Tony Hutter 586b5d366e Rescan enclosure sysfs path on import
When you create a pool, zfs writes vd->vdev_enc_sysfs_path with the
enclosure sysfs path to the fault LEDs, like:

    vdev_enc_sysfs_path = /sys/class/enclosure/0:0:1:0/SLOT8

However, this enclosure path doesn't get updated on successive imports
even if enclosure path to the disk changes.  This patch fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11950
Closes #12095
2021-11-02 16:31:05 -07:00
Ryan Moeller 27d9c6ae2b FreeBSD: Catch up with recent VFS changes
cn_thread is always curthread.

https://cgit.freebsd.org/src/commit?id=b4a58fbf640409a1e507d9f7b411c83a3f83a2f3
https://cgit.freebsd.org/src/commit?id=2b68eb8e1dbbdaf6a0df1c83b26f5403ca52d4c3

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12668
2021-11-02 13:48:54 -07:00
Martin Matuška b7ecb4ff0d FreeBSD: fix compilation of FreeBSD world after 29274c9f6
prng32_bounded() is available to kernel only on FreeBSD 13+.

Call inline random_get_pseudo_bytes() with correct pointer type.
To be consistent, apply to Linux as well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #12282
2021-11-02 13:35:47 -07:00
Brian Behlendorf af9aa4a216 ZTS: Standardize use of destroy_dataset in cleanup
When cleaning up a test case standardize on using the convention:

    datasetexists $ds && destroy_dataset $ds <flags>

By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure
that the destroy is retried in the event that a ZFS volume is busy.
This helps ensures ensure tests are fully cleaned up and prevents false
positive test failures on Linux.

Note that all of the tests which used 'zfs destroy' in cleanup have
been updated even if they don't use volumes.  This was done to
clearly establish the expected convention.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12663
2021-11-02 09:51:32 -07:00
Rich Ercolani 55ab3773d7 Workaround cloud-init hotplug issue
cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.

So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12644
Closes #12669
2021-11-02 09:51:32 -07:00
Brian Behlendorf 143476ce8d Use fallthrough macro
As of the Linux 5.9 kernel a fallthrough macro has been added which
should be used to anotate all intentional fallthrough paths.  Once
all of the kernel code paths have been updated to use fallthrough
the -Wimplicit-fallthrough option will because the default.  To
avoid warnings in the OpenZFS code base when this happens apply
the fallthrough macro.

Additional reading: https://lwn.net/Articles/794944/

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12441
2021-11-02 09:50:30 -07:00
Kevin Bowling d8a97a7be2 Detect HAVE_LARGE_STACKS at compile time (#12584)
Move HAVE_LARGE_STACKS definitions to header and set when appropriate.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Kevin Bowling <kbowling@FreeBSD.org>
Closes #12350
2021-11-01 14:56:18 -07:00
Rich Ercolani 8cd9f20a34 Correct a flaw in the Python 3 version checking (#12636)
It turns out the ax_python_devel.m4 version check assumes that
("3.X+1.0" >= "3.X.0") is True in Python, which is not when X+1
is 10 or above and X is not. (Also presumably X+1=100 and ...)

So let's remake the check to behave consistently, using the
"packaging" or (if absent) the "distlib" modules.

(Also, update the Github workflows to use the new packages.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12073
2021-11-01 14:54:47 -07:00
Brian Behlendorf 71c6098526 Tag 2.1.1
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-09-15 13:37:50 -07:00
Brian Behlendorf a3da79d582 Linux 5.14 compat: META
Increase the Linux-Maximum version in the META file to 5.14.
All of the required compatibility patches have been merged
and the 5.14 kernel has been officially released.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12565
2021-09-15 13:24:34 -07:00
Arun KV bb80b4649a Fixed data integrity issue when underlying disk returns error
Errors in zil_lwb_write_done() are not propagated to
zil_lwb_flush_vdevs_done() which can result in zil_commit_impl()
not returning an error to applications even when zfs was not able
to write data to the disk.

Remove the ZIO_FLAG_DONT_PROPAGATE flag from zio_rewrite() to
allow errors to propagate and consolidate the error handling for
flush and write errors to a single location (rather than having
error handling split between the "write done" and "flush done"
handlers).

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Arun KV <arun.kv@datacore.com>
Closes #12391
Closes #12443
2021-09-14 15:45:30 -07:00
Brian Behlendorf 7816a6b85b ZTS: Waiting for zvols to be available
This is a follow up patch for PR #12515 which addresses some
additional ZTS tests which are unreliable are should explicitly
wait for the required zvols to be available.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: @Theo13111
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12553
2021-09-14 15:45:11 -07:00
Brian Behlendorf 9183321501 Verify embedded blkptr's in arc_read()
The block pointer verification check in arc_read() should also
cover embedded block pointers.  While highly unlikely, accessing
a damaged block pointer can result in panic.  To further harden
the code extend the existing check to include embedded block
pointers and add a comment explaining the rational for this
sanity check.  Lastly, correct a flaw in zfs_blkptr_verify()
so the error count is checked even when checking a untrusted
config to verify the non-pool-specific portions of a block
pointer.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12535
2021-09-14 15:43:18 -07:00
Brian Behlendorf 32512acbc0 Linux 5.15 compat: get_acl()
Kernel commits

332f606b32b6 ovl: enable RCU'd ->get_acl()
0cad6246621b vfs: add rcu argument to ->get_acl() callback

Added compatibility code to detect the new ->get_acl() interface
and correctly handle the case where the new rcu argument is set.

Reviewed-by: Coleman Kane <ckane@colemankane.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12548
2021-09-14 15:42:59 -07:00
Allan Jude cea0752f8d Allow sending corrupt snapshots even if metadata is corrupted
When zfs_send_corrupt_data is set, use the TRAVERSE_HARD flag,
so traverse_visitbp() will not fail with ECKSUM if a blockpointer
cannot be read, but rather will continue and send the objects it can.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-By: Klara Inc.
Sponsored-By: WHC Online Solutions Inc.
Closes #12541
2021-09-14 15:42:49 -07:00
Rich Ercolani 7d70f1e099 arc: Drop an incorrect assert
Unfortunately, there was an overzealous assertion that was (in pretty
specific circumstances) false, causing failure.  This assertion was
added in error, so we're removing it.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #9897
Closes #12020
Closes #12246
2021-09-14 15:42:33 -07:00
Paul Dagnelie fd92825445 Compressed receive with different ashift can result in incorrect PSIZE on disk
We round up the psize to the nearest multiple of the asize or to the
lsize, whichever is smaller. Once that's done, we allocate a new
buffer of the appropriate size, zero the tail, and copy the data
into it. This adds a small performance cost to these kinds of writes,
but fixes the bookkeeping problems.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12522
Closes #8462
2021-09-14 15:42:17 -07:00
Alexander 7bf68e9806 Linux 5.15 compat: standalone <linux/stdarg.h>
Kernel commits

39f75da7bcc8 ("isystem: trim/fixup stdarg.h and other headers")
c0891ac15f04 ("isystem: ship and use stdarg.h")
564f963eabd1 ("isystem: delete global -isystem compile option")

(for now can be found in linux-next.git tree, will land into the
 Linus' tree during the ongoing 5.15 cycle with one of akpm merges)

removed the -isystem flag and disallowed the inclusion of any
compiler header files. They also introduced a minimal
<linux/stdarg.h> as a replacement for <stdarg.h>.
include/os/linux/spl/sys/cmn_err.h in the ZFS source tree includes
<stdarg.h> unconditionally. Introduce a test for <linux/stdarg.h>
and include it instead of the compiler's one to prevent module
build breakage.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #12531
2021-09-14 15:42:01 -07:00
Brian Behlendorf ad8dc99ed2 Linux 5.15 compat: block device readahead
The 5.15 kernel moved the backing_dev_info structure out of
the request queue structure which causes a build failure.

Rather than look in the new location for the BDI we instead
detect this upstream refactoring by the existance of either
the blk_queue_update_readahead() or disk_update_readahead()
functions.  In either case, there's no longer any reason to
manually set the ra_pages value since it will be overridden
with a reasonable default (2x the block size) when
blk_queue_io_opt() is called.

Therefore, we update the compatibility wrapper to do nothing
for 5.9 and newer kernels.  While it's tempting to do the
same for older kernels we want to keep the compatibility
code to preserve the existing behavior.  Removing it would
effectively increase the default readahead to 128k.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12532
2021-09-14 15:41:42 -07:00
Don Brady 6ca1f30708 Detect iSCSI in the zpool cmd vdev media script
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #12206
2021-09-14 15:40:52 -07:00
George Melikov e16e05c9cf CI: don't install abigail-tools
We use docker image instead.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-09-14 15:40:36 -07:00
George Melikov 5331e2d216 Update ABI files via new libabigail version
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-09-14 15:40:12 -07:00
George Melikov d6dae00982 Libabigail: make .abi files more consistent
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-09-14 15:38:55 -07:00
George Melikov 993d4b28af CI: use fresh libabigail via docker image
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-09-14 15:13:33 -07:00
George Melikov 004e7d3f9a Check for libabigail version
We need to use 1.8.0+ version, older versions
may segfault and give inconsistent results.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-09-14 15:13:11 -07:00
Ryan Moeller aef8a72afe ZTS: Remove exceptions for flaky zhack on FreeBSD
Issue #11854 has been resolved, so we can remove the exceptions for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12527
2021-09-14 15:12:14 -07:00
Ryan Moeller 81611683c8 FreeBSD: Don't remove SA xattr if not SA znode
We attempt to remove an existing SA xattr when setting a dir xattr, but
this only makes sense if the znode has been upgraded to the SA format.
Otherwise, we will hit an assert in zfs_sa_get_xattr.

Make sure this is an SA znode before attempting to remove the SA xattr.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12514
2021-09-14 15:11:56 -07:00
Rich Ercolani 72a989cf60 Fix cross-endian interoperability of zstd
It turns out that layouts of union bitfields are a pain, and the
current code results in an inconsistent layout between BE and LE
systems, leading to zstd-active datasets on one erroring out on
the other.

Switch everyone over to the LE layout, and add compatibility code
to read both.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12008
Closes #12022
2021-09-14 15:05:55 -07:00
Brian Behlendorf 6bb6410570 ZTS: Waiting for zvols to be available
The ZTS block_device_wait helper function should use -e when waiting
for a file to appear since it will be either a block special device
or a symlink.  This didn't cause any failures but when a device path
was specified the function would wait longer than needed.

Additionally update the most flakey test cases to pass the file path
to block_device_wait to try and improve the test reliability.  The
udev behavior on Fedora in particular can result in frequent false
positives.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12515
2021-09-14 14:37:50 -07:00
Ryan Moeller 6c3c7dc846 Correct checking bdev_check_media_change message
We're not looking for bdev_disk_changed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12492
2021-09-14 14:36:42 -07:00
Tony Hutter 2904ec57f0 Make 'zpool labelclear -f' work on offlined disks
This patch allows you to clear the label on offlined disks in an active
pool with `-f`.  Previously, labelclear wouldn't let you do that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12511
2021-09-14 14:36:37 -07:00
Anton Gubarkov bc371b2806 vdev_id: Return an error if config file is not found
Signed-off-by: Anton Gubarkov <anton.gubarkov@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-09-14 14:36:32 -07:00
Sam Hathaway e78d06f89b zpool-remove.8: describe top-level vdev sector size limitation
Document that top-level vdevs cannot be removed unless all top-level
vdevs have the same sector size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Sam Hathaway <sam@sam-hathaway.com>
Closes #11339
Closes #12472
2021-09-14 14:32:16 -07:00
Mark Johnston 2016d7fb9c Initialize parity blocks before RAID-Z reconstruction benchmarking
benchmark_raidz() allocates a row to benchmark parity calculation and
reconstruction.  In the latter case, the parity blocks are left
uninitialized, leading to reports from KMSAN.

Initialize parity blocks to 0xAA as we do for the data earlier in the
function.  This does not affect the selected RAID-Z implementation on
any of several systems tested.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12473
2021-09-14 14:32:16 -07:00
Ryan Moeller 584b7a214e ZTS: Add tests for creation time
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12432
2021-09-14 14:32:16 -07:00
Richard Yao 1655ce5619 Linux 4.11 compat: statx support
Linux 4.11 added a new statx system call that allows us to expose crtime
as btime. We do this by caching crtime in the znode to match how atime,
ctime and mtime are cached in the inode.

statx also introduced a new way of reporting whether the immutable,
append and nodump bits have been set. It adds support for reporting
compression and encryption, but the semantics on other filesystems is
not just to report compression/encryption, but to allow it to be turned
on/off at the file level. We do not support that.

We could implement semantics where we refuse to allow user modification
of the bit, but we would need to do a dnode_hold() in zfs_znode_alloc()
to find out encryption/compression information. That would introduce
locking that will have a minor (although unmeasured) performance cost.
It also would be inferior to zdb, which reports far more detailed
information. We therefore omit reporting of encryption/compression
through statx in favor of recommending that users interested in such
information use zdb.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #8507
2021-09-14 14:31:50 -07:00
Gordon Bergling 5de6e4ec94 zfs.4: Fix typo s/compatiblity/compatibility/
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Gordon Bergling <gbergling@googlemail.com>
Closes #12464
2021-09-14 14:31:50 -07:00
Alexander Motin a4862125b8 Remove b_pabd/b_rabd allocation from arc_hdr_alloc()
When a header is allocated for full overwrite it is a waste of time
to allocate b_pabd/b_rabd for it, since arc_write() will free them
without ever being touched.  If it is a read or a partial overwrite
then arc_read() and arc_hdr_decrypt() allocate them explicitly.

Reduced memory allocation in user threads also reduces ARC eviction
throttling there, proportionally increasing it in ZIO threads, that
is not good.  To minimize or even avoid it introduce ARC allocation
reserve, allowing certain arc_get_data_abd() callers to allocate a
bit longer in situations where user threads will already throttle.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12398
2021-09-14 14:31:50 -07:00
Alexander Motin 61773f41b8 Optimize arc_l2c_only lists assertions
It is very expensive and not informative to call multilist_is_empty()
for each arc_change_state() on debug builds to check for impossible.
Instead implement special index function for arc_l2c_only->arcs_list,
multilists, panicking on any attempt to use it.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12421
2021-09-14 14:31:22 -07:00
Alexander Motin 40e02f49e9 Fix/improve dbuf hits accounting
Instead of clearing stats inside arc_buf_alloc_impl() do it inside
arc_hdr_alloc() and arc_release().  It fixes statistics being wiped
every time a new dbuf is filled from the ARC.

Remove b_l1hdr.b_l2_hits. L2ARC hits are accounted at b_l2hdr.b_hits.
Since the hits are accounted under hash lock, replace atomics with
simple increments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12422
2021-09-14 14:31:22 -07:00
Alexander Motin c600f0687f Avoid vq_lock drop in vdev_queue_aggregate()
vq_lock is already too congested for two more operations per I/O.
Instead of dropping and reacquiring it inside vdev_queue_aggregate()
delegate the zio_vdev_io_bypass() and zio_execute() calls for parent
I/Os to callers, that drop the lock any way to execute the new I/O.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12297
2021-09-14 14:31:22 -07:00
Alexander Motin 5afc35b698 Use more atomics in refcounts
Use atomic_load_64() for zfs_refcount_count() to prevent torn reads
on 32-bit platforms.  On 64-bit ones it should not change anything.

When built with ZFS_DEBUG but running without tracking enabled use
atomics instead of mutexes same as for builds without ZFS_DEBUG.
Since rc_tracked can't change live we can check it without lock.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12420
2021-09-14 14:31:01 -07:00
Ryan Moeller c6c0d30016 ZTS: Avoid unset $tmpdir in redacted_panic
The redacted_send tests make use of a $tmpdir variable, except in
redacted_send/redacted_panic the variable is never defined.

Use $TEST_BASE_DIR instead.

Clean up the stream file after the test.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12455
2021-09-14 14:31:01 -07:00
Allan Jude 24e51e3749 Restore FreeBSD sysctl processing for arc.min and arc.max
Before OpenZFS 2.0, trying to set the FreeBSD sysctl vfs.zfs.arc_max
to a disallowed value would return an error.
Since the switch, it instead only generates WARN_IF_TUNING_IGNORED

Keep the ability to set the sysctl's specifically to 0, even though
that is less than the minimum, because some tests depend on this.

Also lost, was the ability to set vfs.zfs.arc_max to a value less
than the default vfs.zfs.arc_min at boot time. Restore this as well.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #12161
2021-09-14 14:31:01 -07:00
Ryan Moeller 744f3009fc zfs: add missed dependency of zfs module on zlib
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Martin Matuska <mm@FreeBSD.org>
Co-authored-by: Konstantin Belousov <kib@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
External-issue: https://reviews.freebsd.org/D31207
Closes #12442
2021-09-14 14:30:39 -07:00
Ryan Moeller cacc48702b Add zfs.sh -r flag to reload modules
zfs.sh already can load and unload, so why not both?

This is convenient when developing changes to the module and you want
to rapidly make some changes, rebuild the module, reload the module,
and test the changes.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12450
2021-09-14 14:30:39 -07:00
Ryan Moeller cc55271681 Fix usage of find in tests/Makefile.am
The path is not optional on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12453
2021-09-14 14:30:39 -07:00
Tony Nguyen 477edd642c Run arc_evict thread at higher priority
Run arc_evict thread at higher priority, nice=0, to give it more CPU
time which can improve performance for workload with high ARC evict
activities.

On mixed read/write and sequential read workloads, I've seen between
10-40% better performance.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #12397
2021-09-14 14:30:13 -07:00
Rich Ercolani 23184b172a Make get_key_material_file fail more verbosely
It turns out, there are a lot of possible reasons for fopen to fail.
Let's share which reason we failed for today.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12410
2021-09-14 14:30:13 -07:00
Brian Behlendorf 32a971e749 Enable /proc/diskstats for zvols
The /proc/diskstats accounting needs to be explicitly enabled
for block devices which do not use multi-queue.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12440
Closes #12066
2021-09-14 14:30:13 -07:00
George Melikov c07ed69577 Man zpool-scrub.8: describe sequential scrub
Describe sequential scrub and add examples of scrub status.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12429
2021-09-14 14:29:46 -07:00
hedongzhang ddb732e2c8 Modify checksum obtain method of QAT
CpaDcGeneratefooter function that obtain the checksum code
does not support the CPA_DC_STATELESS mode. So we get the
adler32 chencksum of the end of the zlib from dc_results.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chengfei Zhu <chengfeix.zhu@intel.com>
Signed-off-by: hedong.zhang <h_d_zhang@163.com>
Closes #12343
2021-09-14 14:29:46 -07:00
Mark Johnston 451d6da988 Allow disabling of unmapped I/O on FreeBSD
We have a tunable which permits one to disable the use of unmapped I/O
for the buffer cache.  Respect it in ZFS as well.  This is useful for
KMSAN, which cannot easily maintain shadow state for unmapped pages.

No functional change intended, as unmapped I/O is permitted by default
and there's no real reason to disable it in practice except for
debugging.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12446
2021-09-14 14:29:46 -07:00
Alexander Motin e298ac5d04 Add comment on metaslab_class_throttle_reserve() locking
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Issue #12314
Closes #12419
2021-09-14 13:09:40 -07:00
John Wren Kennedy 9429910781 Assorted fixes for the performance tests
- Bail out early if we're running the perf tests and forget to
  specify disks.
- Allow perf tests to run with any number of disks.
- Remove weekly vs. nightly settings
- Move variables with common values to perf.shlib
- Use zinject to clear the ARC over export/import
- Fix dbuf cache size calculation

When the meaning of `dbuf_cache_max_bytes` changed, the performance
test that covers the dbuf cache started to fail. The test would try to
write files for the test using the max possible size of the cache,
inevitably filling the pool and failing. This change uses
`dbuf_cache_shift` to correctly calculate the dbuf cache size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12408
2021-09-14 13:09:24 -07:00
Matthew Ahrens 8a969f3e2d Read past end of argv array in zpool_do_import()
`zpool_do_import()` passes `argv[0]`, (optionally) `argv[1]`, and
`pool_specified` to `import_pools()`.  If `pool_specified==FALSE`, the
`argv[]` arguments are not used.  However, these values may be off the
end of the `argv[]` array, so loading them could dereference unmapped
memory.  This error is reported by the asan build:

```
=================================================================
==6003==ERROR: AddressSanitizer: heap-buffer-overflow
READ of size 8 at 0x6030000004a8 thread T0
    #0 0x562a078b50eb in zpool_do_import zpool_main.c:3796
    #1 0x562a078858c5 in main zpool_main.c:10709
    #2 0x7f5115231bf6 in __libc_start_main
    #3 0x562a07885eb9 in _start

0x6030000004a8 is located 0 bytes to the right of 24-byte region
allocated by thread T0 here:
    #0 0x7f5116ac6b40 in __interceptor_malloc
    #1 0x562a07885770 in main zpool_main.c:10699
    #2 0x7f5115231bf6 in __libc_start_main
```

This commit passes NULL for these arguments if they are off the end
of the `argv[]` array.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #12339
2021-09-14 13:08:53 -07:00
Václav Skála 898b1e173c Add missing properties to zfs allow manpage
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Václav Skála <skala@vshosting.cz>
Closes #12402
2021-09-14 13:08:19 -07:00
George Amanakis 406534f807 Fixes in persistent L2ARC
In l2arc_add_vdev() first decide whether the device is eligible for
L2ARC rebuild or whole device trim and then add it to the list of cache
devices. Otherwise l2arc_feed_thread() might already start writing on
the device invalidating previous content as l2ad_hand = l2ad_start.
However l2arc_rebuild_vdev() needs the device present in the cache
device list to figure out its l2arc_dev_t. Fix this by moving most of
l2arc_rebuild_vdev() in a new function l2arc_rebuild_dev() which does
not need to search in the cache device list.

In contrast to l2arc_add_vdev() we do not have to worry about
l2arc_feed_thread() invalidating previous content when onlining a
cache device. The device parameters (l2ad*) are not cleared when
offlining the device and writing new buffers will not invalidate
all previous content. In worst case only buffers that have not had
their log block written to the device will be lost.

Retire persist_l2arc_00{4,5,8} tests since they cover code already
covered by the remaining ones. Test persist_l2arc_006 is renamed to
persist_l2arc_004 and persist_l2arc_007 is renamed to persist_l2arc_005.

Fix a typo in persist_l2arc_004, and remove an assertion that is not
always true from l2arc_arcstats_pos. Also update an assertion in
persist_l2arc_005 and explain why in a comment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12365
2021-09-14 13:07:44 -07:00
Mark Johnston ac573e3105 Initialize dn_next_type[] in the dnode constructor
It seems nothing ensures that this array is zeroed when a dnode is
freshly allocated, so in principle it retains the values from the
previous allocation.  In practice it seems to be the case that the
fields should end up zeroed, but we can zero the field anyway for
consistency.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-09-14 13:07:44 -07:00
Mark Johnston 99df200ffc Zero pad bytes following TX_WRITE log data
When logging a TX_WRITE record in the case where file data has to be
copied from the DMU, we pad the log record size to a multiple of 8
bytes.  In this case, any padding bytes should be zeroed, otherwise the
contents of uninitialized memory are written to the ZIL.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-09-14 12:42:21 -07:00
Mark Johnston bd910fdeb0 Zero pad bytes when allocating a ZIL record
When allocating a record, we round up the allocation size to a multiple
of 8.  In this case, any padding bytes should be zeroed, otherwise the
contents of uninitialized memory are written to the ZIL.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-09-14 12:42:21 -07:00
Mark Johnston 9cc9821014 Initialize all fields in zfs_log_xvattr()
When logging TX_SETATTR, we could otherwise fail to initialize part of
the corresponding ZIL record depending on which fields are present in
the xvattr.  Initialize the creation time and the AV scan timestamp to
zero so that uninitialized bytes are not written to the ZIL.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-09-14 12:42:21 -07:00
Mark Johnston fceda40c1e Initialize "autoreplace" in spa_ld_get_props()
spa_prop_find() may fail to find the specified property, in which case
it suppresses ENOENT from zap_lookup().  In this case, the return value
is left uninitialized, so spa_autoreplace was being initialized using an
uninitialized stack variable.

This was found using KMSAN.  It appears to be a regression from commit
9eb7b46ed0, which removed the initialization of "autoreplace" from the
definition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-09-14 12:41:10 -07:00
Coleman Kane 4434baab11 Linux 5.14 compat: explicity assign set_page_dirty
Kernel 5.14 introduced a change where set_page_dirty of
struct address_space_operations is no longer implicitly set to
__set_page_dirty_buffers(), which ended up resulting in a NULL
pointer deref in the kernel when it is attempted to be called.
This change sets .set_page_dirty in the structure to
__set_page_dirty_nobuffers(), which was introduced with the
related patch set. The breaking change was introduce in commit
0af573780b0b13fceb7fabd49dc1b073cee9a507 to torvalds/linux.git.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12427
2021-09-14 12:41:10 -07:00
Rich Ercolani 6385f4e70e Fix unfortunate NULL in spa_update_dspace
After 1325434b, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12380
Closes #12428
2021-09-14 12:41:10 -07:00
Brian Behlendorf 2f073cc9c6 Linux 5.14 compat: blk_alloc_disk()
In Linux 5.14, blk_alloc_queue is no longer exported, and its usage
has been superseded by blk_alloc_disk, which returns a gendisk struct
from which we can still retrieve the struct request_queue* that is
needed in the one place where it is used. This also replaces the call
to alloc_disk(minors), and minors is now set via struct member
assignment.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12362
Closes #12409
2021-09-14 12:40:45 -07:00
Ryan Moeller 729eb48666 zloop: Add a max iterations option, use default run/pass times
It is useful to have control over the number of iterations of zloop so
we can easily produce "x core dumps found *in y iterations*" metrics.

Using random values for run/pass times doesn't improve coverage in a
meaningful way.

Randomizing run time could be seen as a compromise between running a
greater variety of shorter tests versus a smaller variety of longer
tests within a fixed time span.  However, it is not desirable when
running a fixed number of iterations.

Pass time already incorporates randomness within ztest.

Either parameter can be passed to ztest explicitly if the defaults are
not satisfactory.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12411
2021-09-14 12:40:45 -07:00
Alexander Motin 93e11e257b FreeBSD: Ignore make_dev_s() errors
Since errors returned by zvol_create_minor_impl() are ignored by the
common code, it is more convenient to ignore make_dev_s() errors there.
It allows, for example, to get device created for the zvol after later
rename instead of having it further stuck in half-created state.
zvol_rename_minor() already ignores those errors.

While there, switch from MAXPHYS to maxphys in FreeBSD 13+.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12375
2021-09-14 12:40:45 -07:00
Jorgen Lundman eaa10257ca Remove old orig_fd variable from zfs send
Possibly required in the past, but is currently fills no purpose.
Ordinarily such tiny cleanup is not generally worth it, however
on the macOS port, in a future commit, we do unspeakable things to the
"fd" for send/recv, and it would be easier to only have to deal with
one "fd" instead of two.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12404
2021-09-14 12:40:16 -07:00
Alexander Motin 32c0b6468c Optimize allocation throttling
Remove mc_lock use from metaslab_class_throttle_*().  The math there
is based on refcounts and so atomic, so the only race possible there
is between zfs_refcount_count() and zfs_refcount_add().  But in most
cases metaslab_class_throttle_reserve() is called with the allocator
lock held, which covers the race.  In cases where the lock is not
held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we
do not use zfs_refcount_count().  And even if we assume some other
non-existing scenario, the worst that may happen from this race is
few more I/Os get to allocation earlier, that is not a problem.

Move locks and data of different allocators into different cache
lines to avoid false sharing.  Group spa_alloc_* arrays together
into single array of aligned struct spa_alloc spa_allocs.  Align
struct metaslab_class_allocator.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12314
2021-09-14 12:40:15 -07:00
George Melikov 7c61e1ef9d CI: generate ABI files if changed
So commit author can just download them as
artifacts and commit.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12379
2021-09-14 12:40:15 -07:00
Alexander Motin 6a49948c73 Minor ARC optimizations
Remove unneeded global, practically constant, state pointer variables
(arc_anon, arc_mru, etc.), replacing them with macros of real state
variables addresses (&ARC_anon, &ARC_mru, etc.).

Change ARC_EVICT_ALL from -1ULL to UINT64_MAX, not requiring special
handling in inner loop of ARC reclamation.  Respectively change bytes
argument of arc_evict_state() from int64_t to uint64_t.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12348
2021-09-14 12:39:48 -07:00
Jorgen Lundman 4dfb698aac dmu_redact.c does not call bqueue_destroy
Ensure all calls to bqueue_init() has a corresponding call to bqueue_destroy()

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12118
2021-09-14 12:39:48 -07:00
Alexander 4affa09f3e A few fixes of callback typecasting (for the upcoming ClangCFI)
* zio: avoid callback typecasting
* zil: avoid zil_itxg_clean() callback typecasting
* zpl: decouple zpl_readpage() into two separate callbacks
* nvpair: explicitly declare callbacks for xdr_array()
* linux/zfs_nvops: don't use external iput() as a callback
* zcp_synctask: don't use fnvlist_free() as a callback
* zvol: don't use ops->zv_free() as a callback for taskq_dispatch()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #12260
2021-09-14 12:39:48 -07:00
Ryan Moeller 0ca9558561 Remove unused fields from zvol_task_t
We don't use or need the pool name or value source in the zvol tasks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12361
2021-09-14 12:39:17 -07:00
Alexander Motin c2c4d05700 FreeBSD: Switch from MAXPHYS to maxphys on FreeBSD 13+
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12378
2021-09-14 12:39:17 -07:00
George Melikov f8c2e91db5 zpool_influxdb: fix -Werror=stringop-truncation
Use strlcpy instead of problematic strncpy

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12344
2021-09-14 12:39:17 -07:00
Rich Ercolani 056c273939 Correct zfs-send(8) on readonly sends
zfs-send(8) claimed in the flags list you could use -pR when sending
a readonly filesystem or volume. You cannot.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12336
2021-09-14 12:38:51 -07:00
Alexander Motin ba76bb30a6 Introduce dsl_dir_diduse_transfer_space()
Most of dsl_dir_diduse_space() and dsl_dir_transfer_space() CPU time
is a dd_lock overhead and time spent in dmu_buf_will_dirty(). Calling
them one after another is a waste of time and even more contention.
Doing that twice for each rewritten block within dbuf_write_done()
via dsl_dataset_block_kill() and dsl_dataset_block_born() created one
of the biggest CPU overheads in case of small blocks rewrite.

dsl_dir_diduse_transfer_space() combines functionality of these two
functions for cases where it is needed, but without double overhead,
practically for the cost of dsl_dir_diduse_space() or even cheaper.

While there, optimize dsl_dir_phys() calls in dsl_dir_diduse_space()
and dsl_dir_transfer_space().  It seems Clang detects some aliasing
there, repeating dd->dd_dbuf->db_data dereference multiple times,
increasing dd_lock scope and contention.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Author: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12300
2021-09-14 12:38:51 -07:00
наб 968dc13572 config/libatomic: require -latomic iff atomic.c doesn't link w/o it
In absence of LTO, and dynamic libatomic, la.so ends up in the needs
section of every toolchain executable; some consider this an issue.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12345
Closes #12359
2021-09-14 12:38:51 -07:00
Rich Ercolani 960a5a557b Tinker with slop space accounting with dedup
* Tinker with slop space accounting with dedup

Do not include the deduplicated space usage in the slop space
reservation, it leads to surprising outcomes.

* Update spa_dedup_dspace sometimes

Sometimes, we get into spa_get_slop_space() with
spa_dedup_dspace=~0ULL, AKA "unset", while spa_dspace is correctly set.

So call the code to update it before we use it if we hit that case.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12271
2021-09-14 12:38:05 -07:00
Alexander Motin 45305a067f Fix ARC ghost states eviction accounting
arc_evict_hdr() returns number of evicted bytes in scope of specific
state.  For ghost states it does not mean the amount of really freed
memory, but the logical buffer size.  It is correct for the eviction
process, but not for waking up threads waiting for ARC size reduction,
as added in "Revise ARC shrinker algorithm" commit, causing premature
wakeups while ARC is still overflowed, allowing even bigger overflow,
plus processing overhead when next allocation will also get blocked,
probably also for too short time.

To fix that make arc_evict_hdr() also return the amount of really
freed memory, which for the ghost states is only the header, and use
it to update arc_evict_count instead.  Originally I was thinking to
not return it at all, since arc_get_data_impl() does not account for
the headers, but decided that some slow allocation progress is better
than long waits, reaching on my tests up to 100ms.

To reduce negative latency effects of long time periods when reclaim
thread can free little real memory, start reclamation process earlier,
before we actually reached the overflow threshold, when we have to
throttle new allocations.  We can also do it without taking global
arc_evict_lock, reducing the contention.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12279
2021-09-14 12:38:05 -07:00
Brian Behlendorf a5e68f0478 Update bug report template
- Remove the "SPL Version" line, the repositories have been merged
  since the 0.8 release and we no longer need to ask about this.

- Simply ask for the kernel version / patch level and add a hint
  about how to get this information on Linux and FreeBSD.

- Remove "Status: Triage Needed" from the template, in practice
  we really haven't been using this label so let's step setting it.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #12340
2021-09-14 12:38:05 -07:00
George Wilson 8415c3c170 file reference counts can get corrupted
Callers of zfs_file_get and zfs_file_put can corrupt the reference
counts for the file structure resulting in a panic or a soft lockup.
When zfs send/recv runs, it will add a reference count to the
open file, and begin to send or recv the stream. If the file descriptor
is closed, then when dmu_recv_stream() or dmu_send() return we will
call zfs_file_put to remove the reference we placed on the file
structure. Unfortunately, because zfs_file_put() uses the file
descriptor to lookup the file structure, it may end up finding that
the file descriptor table no longer contains the file struct, thus
leaking the file structure. Or it might end up finding a file
descriptor for a different file and blindly updating its reference
counts. Other failure modes probably exists.

This change reworks the zfs_file_[get|put] interface to not rely
on the file descriptor but instead pass the zfs_file_t pointer around.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-issue: DLPX-76119
Closes #12299
2021-09-14 12:37:38 -07:00
Jorgen Lundman 04ebe29188 dprintf_dnode: strcpy -> strlcpy
Missed a couple of strcpy() in earlier commit, this is only used with
--enable-debug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12311
2021-09-14 12:37:38 -07:00
Jorgen Lundman a0b4da2297 Replace strchrnul() with strrchr()
Could have gone either way with this one, either adding it to
macOS/Windows SPL, or returning it to "classic" usage with strrchr().
Since the new special way isn't really used, and only used once,
we have this commit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes  #12312
2021-09-14 12:37:38 -07:00
Alexander Motin c84670950a FreeBSD: Use unmapped I/O for scattered/gang ABD buffers
Many FreeBSD disk drivers support "unmapped" I/O mode, when data
buffer represented not with a virtually contiguous KVA-mapped address
range, but with a list of physical memory pages.  Originally it was
designed to do I/O from buffers without KVA mapping (unmapped).  But
moving virtual addresses out of equation allows us to operate even
non-contiguous data buffers with one condition: all buffer discon-
tinuities must be aligned to memory page borders.

Doing I/O to capable GEOM device this patch traverses through non-
linear ABD buffers, validating the chunks borders.  If the condition
is met, it supplies GEOM with the list of original physical memory
pages instead of copying the data into temporary contiguous buffer.
On capable hardware on pools with ashift=12 and default ABD chunk of
4KB it should handle all the I/O without additional memory copying.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12320
2021-09-14 12:37:02 -07:00
Alexander Motin 49bb454120 FreeBSD: Hardcode abd_chunk_size to PAGE_SIZE
It makes no sense to set it below PAGE_SIZE, since it increases all
overheads and makes returning memory to OS problematic.  It makes no
sense to set it above PAGE_SIZE, since such allocations and especially
frees are too expensive and cause KVA fragmentation to benefit from
fewer chunks.  After that it makes no sense to keep more complicated
math here.

What may have sense though is just a tunable border between linear and
scatter ABDs, previously also controlled by this tunable.  Retain that
functionality by taking abd_scatter_min_size tunable from Linux, just
with different default value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12328
2021-09-14 12:36:44 -07:00
Alexander Motin 41b33dce44 Move gethrtime() calls out of vdev queue lock
This dramatically reduces the lock contention on systems with slower
(non-TSC) timecounters.  With TSC the difference is minimal, but since
this lock is pretty congested, any improvement counts.  Plus I don't
see any reason to do it under the lock other than the latency of the
lock itself, which this change actually reduces.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12281
2021-09-14 12:35:53 -07:00
Justin Gottula dab147d65a Use substantially more robust program exit status logic in zvol_id
Currently, there are several places in zvol_id where the program logic
returns particular errno values, or even particular ioctl return values,
as the program exit status, rather than a straightforward system of
explicit zero on success and explicit nonzero value(s) on failure.

This is problematic for multiple reasons. One particularly interesting
problem that can arise, is that if any of these values happens to have
all 8 least significant bits unset (i.e., it is a positive or negative
multiple of 256), then although the C program sees a nonzero int value
(presumed to be a failure exit status), the actual exit status as seen
by the system is only the bottom 8 bits of that integer: zero.

This can happen in practice, and I have encountered it myself. In a
particularly weird situation, the zvol_open code in the zfs kernel
module was behaving in such a manner that it caused the open() syscall
to fail and for errno to be set to a kernel-private value (ERESTARTSYS,
which happens to be defined as 512). It turns out that 512 is evenly
divisible by 256; or, in other words, its least significant 8 bits are
all-zero. So even though zvol_id believed it was returning a nonzero
(failure) exit status of 512, the system modulo'd that value by 256,
resulting in the actual exit status visible by other programs being 0!
This actually-zero (non-failure) exit status caused problems: udev
believed that the program was operating successfully, when in fact it
was attempting to indicate failure via a nonzero exit status integer.
Combined with another problem, this led to the creation of nonsense
symlinks for zvol dev nodes by udev.

Let's get rid of all this problematic logic, and simply return
EXIT_SUCCESS (0) is everything went fine, and EXIT_FAILURE (1) if
anything went wrong.

Additionally, let's clarify some of the variable names (error is similar
to errno, etc) and clean up the overall program flow a bit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-09-14 12:23:38 -07:00
Justin Gottula 7138fe7205 Print zvol_id error messages to stderr rather than stdout
The zvol_id program is invoked by udev, via a PROGRAM key in the
60-zvol.rules.in rule file, to determine the "pretty" /dev/zvol/*
symlink paths paths that should be generated for each opaquely named
/dev/zd* dev node.

The udev rule uses the PROGRAM key, followed by a SYMLINK+= assignment
containing the %c substitution, to collect the program's stdout and then
"paste" it directly into the name of the symlink(s) to be created.

Unfortunately, as currently written, zvol_id outputs both its intended
output (a single string representing the symlink path that should be
created to refer to the name of the dataset whose /dev/zd* path is
given) AND its error messages (if any) to stdout.

When processing PROGRAM keys (and others, such as IMPORT{program}), udev
uses only the data written to stdout for functional purposes. Any data
written to stderr is used solely for the purposes of logging (if udev's
log_level is set to debug).

The unintended consequence of this is as follows: if zvol_id encounters
an error condition; and then udev fails to halt processing of the
current rule (either because zvol_id didn't return a nonzero exit
status, or because the PROGRAM key in the rule wasn't written properly
to result in a "non-match" condition that would stop the current rule on
a nonzero exit); then udev will create a space-delimited list of symlink
names derived directly from the words of the error message string!

I've observed this exact behavior on my own system, in a situation where
the open() syscall on /dev/zd* dev nodes was failing sporadically (for
reasons that aren't especially relevant here). Because the open() call
failed, zvol_id printed "Unable to open device file: /dev/zd736\n" to
stdout and then exited.

The udev rule finished with SYMLINK+="zvol/%c %c". Assuming a volume
name like pool/foo/bar, this would ordinarily expand to
   SYMLINK+="zvol/pool/foo/bar pool/foo/bar"
and would cause symlinks to be created like this:
   /dev/zvol/pool/foo/bar -> /dev/zd736
   /dev/pool/foo/bar      -> /dev/zd736

But because of the combination of error messages being printed to
stdout, and the udev syntax freely accepting a space-delimited sequence
of names in this context, the error message string
   "Unable to open device file: /dev/zd736\n"
in reality expanded to
   SYMLINK+="zvol/Unable to open device file: /dev/zd736"
which caused the following symlinks to actually be created:
   /dev/zvol/Unable -> /dev/zd736
   /dev/to          -> /dev/zd736
   /dev/open        -> /dev/zd736
   /dev/device      -> /dev/zd736
   /dev/file:       -> /dev/zd736
   /dev//dev/zd736  -> /dev/zd736

(And, because multiple zvols had open() syscall errors, multiple zvols
attempted to claim several of those symlink names, resulting in numerous
udev errors and timeouts and general chaos.)

This commit rectifies all this silliness by simply printing error
messages to stderr, as Dennis Ritchie originally intended.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-09-14 12:23:38 -07:00
Justin Gottula fd2e4d143d Udev rules: use match (==) rather than assign (=) for PROGRAM
Assignment syntax (=) can be used for the PROGRAM key. But the PROGRAM
key is really a match key, not an assign key. The internal logic used by
udev to decide whether a PROGRAM key "matched" or not (which determines
whether the remainder of the rule is evaluated) depends on whether the
operator was OP_MATCH (==) or OP_NOMATCH (!=). [1]

The man page claims that '"=", ":=", and "+=" have the same effect as
"=="' for PROGRAM keys. And, after a brief perusal, the udev source code
does seem to confirm that operators other than OP_MATCH (==) or
OP_NOMATCH (!=) are implicitly converted to OP_MATCH (==). [2] But it's
not entirely clear that this is definitely the case: anecdotal testing
seems to indicate that when OP_ASSIGN (=) is used, the program's exit
status is disregarded and the remainder of the rule is processed
regardless of whether it was, in fact, a successful exit.

The bottom line here is that, if zvol_id hits some snag and returns a
nonzero exit status, then we almost certainly do NOT want to continue on
with the rule and use whatever the stdout contents may have been to
mindlessly create /dev/zvol/* symlinks. Therefore, let's be extra-sure
and use the match (==) operator explicitly, to eliminate any possibility
that udev might do the wrong thing, and ensure that a nonzero exit
status will definitely short-circuit the rest of the rule, bypassing the
SYMLINK+= assignments.

[1]
udev,
 file src/udev/udev-rules.c,
  func udev_rule_apply_token_to_event,
   switch case TK_M_PROGRAM if r != 0 (nonzero exit status):
      return token->op == OP_NOMATCH;
   switch case TK_M_PROGRAM if r == 0 (zero exit status):
      return token->op == OP_MATCH;
   func retval 0 => key is considered to have matched
   func retval 1 => key is considered to have NOT matched

[2]
udev,
 file src/udev/udev-rules.c,
  func parse_token,
   at func start:
      bool is_match = IN_SET(op, OP_MATCH, OP_NOMATCH);
   in else-if case streq(key, "PROGRAM"):
      if (!is_match) op = OP_MATCH;

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-09-14 12:23:10 -07:00
Justin Gottula 0cb122941e Udev rules: replace deprecated $tempnode with $devnode
The $tempnode substitution is so old that it's not even mentioned in the
man page anymore. It is still technically supported by udev, but with
plenty of "deprecated" comments surrounding it.

The preferred modern equivalent of $tempnode is $devnode (or
alternatively, %N).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-09-14 12:23:10 -07:00
Justin Gottula c20ba9bd7a Udev rules: use non-ancient comma syntax
This file is old as dirt. It's entirely possible that commas were
optional in udev back at that time. But they're definitely supposed to
be there nowadays.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-09-14 12:23:10 -07:00
Alexander Motin 15177c1aac Compact dbuf/buf hashes and lock arrays
With default dbuf cache size of 1/32 of ARC, it makes no sense to have
hash table of the same size (or even bigger on Linux).  Reduce it to
1/8 of ARC's one, still leaving some slack, assuming higher I/O rate
via dbuf cache than via ARC.

Remove padding from ARC hash locks array.  The idea behind padding
is to avoid false sharing between locks.  It would have sense if
there would be a limited number of very busy locks.  But since we
have no limit on the number, using the same memory for more locks we
can achieve even lower lock contention with the same false sharing,
or we can use less memory for the same contention level.

Reduce number of hash locks from 8192 to 2048.  The number is still
big enough to not cause contention, but reduced memory size improves
cache hit rate for mutex_tryenter() in ARC eviction thread, saving
about 1% of the thread time.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12289
2021-09-14 12:22:46 -07:00
Jorgen Lundman 035219ee10 Fix abd leak, kmem_free correct size of abd_t
Fix a leak of abd_t that manifested mostly when using
raidzN with at least as many columns as N (e.g. a
four-disk raidz2 but not a three-disk raidz2).
Sufficiently heavy raidz use would eventually run a system
out of memory.

Additionally:

* Switch abd_cache arena to FIRSTFIT, which empirically
improves perofrmance.

* Make abd_chunk_cache more performant and debuggable.

* Allocate the abd_zero_buf from abd_chunk_cache rather
than the heap.

* Don't try to reap non-existent qcaches in abd_cache arena.

* KM_PUSHPAGE->KM_SLEEP when allocating chunks from their
own arena

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Co-authored-by: Sean Doran <smd@use.net>
Closes #12295
2021-09-14 12:22:28 -07:00
Jorgen Lundman 2334bc4efa Upstream: dmu_zfetch_stream_fini leaks refcount
dmu_zfetch_stream_fini() is missing calls to destroy the refcounts,
leaking them and the mutex inside.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12294
2021-09-14 12:21:55 -07:00
Ryan Moeller d6c2b89032 ZED: Match added disk by pool/vdev GUID if found (#12217)
This enables ZED to auto-online vdevs that are not wholedisk managed by
ZFS.

Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2021-09-14 12:10:44 -07:00
Alexander Motin f3969ea78b Optimize small random numbers generation
In all places except two spa_get_random() is used for small values,
and the consumers do not require well seeded high quality values.
Switch those two exceptions directly to random_get_pseudo_bytes()
and optimize spa_get_random(), renaming it to random_in_range(),
since it is not related to SPA or ZFS in general.

On FreeBSD directly map random_in_range() to new prng32_bounded() KPI
added in FreeBSD 13.  On Linux and in user-space just reduce the type
used to uint32_t to avoid more expensive 64bit division.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12183
2021-09-14 12:10:17 -07:00
Ryan Moeller 6fe6192796 FreeBSD: Implement xattr=sa
FreeBSD historically has not cared about the xattr property; it was
always treated as xattr=on.  With xattr=on, xattrs are stored as files
in a hidden xattr directory.  With xattr=sa, xattrs are stored as
system attributes and get cached in nvlists during xattr operations.
This makes SA xattrs simpler and more efficient to manipulate.  FreeBSD
needs to implement the SA xattr operations for feature parity with
Linux and to ensure that SA xattrs are accessible when migrated or
replicated from Linux.

Following the example set by Linux, refactor our existing extattr vnops
to split off the parts handling dir style xattrs, and add the
corresponding SA handling parts.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11997
2021-09-14 12:09:35 -07:00
Ryan Moeller 1826068523 FreeBSD: Clean up ASSERT/VERIFY use in module
Convert use of ASSERT() to ASSERT0(), ASSERT3U(), ASSERT3S(),
ASSERT3P(), and likewise for VERIFY().  In some cases it ended up
making more sense to change the code, such as VERIFY on nvlist
operations that I have converted to use fnvlist instead.  In one
place I changed an internal struct member from int to boolean_t to
match its use.  Some asserts that combined multiple checks with &&
in a single assert have been split to separate asserts, to make it
apparent which check fails.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11971
2021-09-14 12:02:23 -07:00
Brian Behlendorf 4f92fe0f5c Tag 2.1.0
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-07-02 11:04:34 -07:00
Brian Behlendorf 508fff0e4b Tag 2.1.0-rc8
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-06-29 13:19:19 -07:00
Brian Behlendorf d3942963ee Linux 5.13 compat: META
Increase the Linux-Maximum version in the META file to 5.13.
All of the required compatibility patches have been merged
and the 5.13 kernel has been officially released.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-06-29 13:18:58 -07:00
Laurențiu Nicola 0584ad8f94 zed: fix sending emails (#12292)
Commit 6fc3099 broke the quoting when invoking the mail program, revert
that change.

Signed-off-by: Laurențiu Nicola <lnicola@dend.ro>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2021-06-29 13:15:21 -07:00
Alexander Motin ea47857090 Avoid 64bit division in multilist index functions
The number of sublists in a multilist is relatively small. We dont need
64 bits to calculate an index. 32 bits is sufficient and makes the
code more efficient.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> 
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12288
2021-06-29 13:15:04 -07:00
Michal Vasilek 4ebda5d4d3 Fix plymouth passphrase prompt with dracut
plymouth --command splits the command on spaces which means
that zfs-load-key was getting the filesystem name enclosed
in single quotes (since 13c59bb76) and failing. This commit
fixes it by piping the password directly to the command
similar to how it's done in other scripts (initramfs,
dracut without plymouth).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Michal Vasilek <michal@vasilek.cz>
Related-to: #9193
Related-to: #9202
Closes #12147
2021-06-29 13:14:49 -07:00
Rich Ercolani 57ce66d293 Fix build with KASAN
The stock zstd code expects some helpers from ASAN if present.
This works fine in userland, but in kernel, KASAN also gets detected,
and lacks those helpers. So let's make some empty substitutes for
that case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12232
2021-06-29 13:14:42 -07:00
Alexander Motin da6c288dfc Help compiller optimize out abd_verify()
While abd_verify() does nothing when built without debug, compiler
can't optimize it out by itself due to calls to external list_*()
and abd_verify_scatter().  This commit makes it explicit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12280
2021-06-29 13:14:20 -07:00
Brian Behlendorf aee26af277 Update cache file when setting compatibility property
Unlike most other properties the 'compatibility' property is stored
in the pool config object and not the DMU_OT_POOL_PROPS object.

This had the advantage that the compatibility information is available
without needing to fully import the pool (it can be read with zdb).
However, this means we need to make sure to update both the copy of
the config in the MOS and the cache file.  This wasn't being done.

This commit adds a call to spa_async_request() to ensure the copy of
the config in the cache file gets updated as well as the one stored
in the pool.  This same change is made for the 'comment' property
which suffers from the same inconsistency.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Colm Buckley <colm@tuatha.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12261 
Closes #12276
2021-06-24 14:33:51 -07:00
Paul Dagnelie 620389d32f Fix flag copying in resume case
A couple flags weren't being copied in the case where we're doing size
estimation on a resume.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes: #12266
2021-06-24 13:14:04 -07:00
jumbi77 5d7f47d828 zfs_metaslab_mem_limit should be 25 instead of 75
According to current zfs man page zfs_metaslab_mem_limit should be
25 instead of 75.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: jumbi77@users.noreply.github.com
Closes #12273
2021-06-24 13:13:56 -07:00
Rich Ercolani 977540168c Stop using "zstreamdump" in tests/
zstreamdump was replaced with "zstream dump"; let's stop using the
old name, compat symlink or no.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12277
2021-06-24 13:13:49 -07:00
Jonathon 1143d3d29d Update libera webchat client URL
Libera have made a webchat client available. This change builds on #12127.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jonathon Fernyhough <jonathon@m2x.dev>
Closes #12251
2021-06-24 13:13:44 -07:00
Attila Fülöp 088712793e gcc 11 cleanup
Compiling with gcc 11.1.0 produces three new warnings.
Change the code slightly to avoid them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12130
Closes #12188
Closes #12237
2021-06-24 13:13:40 -07:00
Brian Behlendorf e0886c96a8 ZTS: Add known exceptions
The receive-o-x_props_override test case reliably fails on the
FreeBSD main builders (but not on Linux), until the root cause is
understood add this test to the FreeBSD exception list.

On Linux the alloc_class_012_pos test case may occasionally fail.
This is a known false positive which has also been added to the
Linux exception list until the test can be made entirely reliable.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12272
2021-06-24 13:12:52 -07:00
Rich Ercolani 5e89181544 Annotated dprintf as printf-like
ZFS loves using %llu for uint64_t, but that requires a cast to not 
be noisy - which is even done in many, though not all, places.
Also a couple places used %u for uint64_t, which were promoted
to %llu. 

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12233
2021-06-24 13:12:36 -07:00
Antonio Russo dfbda2465f Revert Consolidate arc_buf allocation checks
This reverts commit 13fac09868.

Per the discussion in #11531, the reverted commit---which intended only
to be a cleanup commit---introduced a subtle, unintended change in
behavior.

Care was taken to partially revert and then reapply 10b3c7f5e4
which would otherwise have caused a conflict.  These changes were
squashed in to this commit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Suggested-by: @chrisrd
Suggested-by: robn@despairlabs.com
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11531 
Closes #12227
2021-06-24 13:12:25 -07:00
Alexander Motin 6b239d1757 Use wmsum for arc, abd, dbuf and zfetch statistics. (#12172)
wmsum was designed exactly for cases like these with many updates
and rare reads.  It allows to completely avoid atomic operations on
congested global variables.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12172
2021-06-24 13:10:59 -07:00
наб 9a865b7fb7 libspl: implement atomics in terms of atomics
This replaces the generic libspl atomic.c atomics implementation
with one based on builtin gcc atomics.  This functionality was added
as an experimental feature in gcc 4.4.  Today even CentOS 7 ships
with gcc 4.8 as the default compiler we can make this the default.

Furthermore, the builtin atomics are as good or better than our
hand-rolled implementation so it's reasonable to drop that custom code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11904
Closes #12252
Closes #12244
2021-06-21 21:48:31 -07:00
George Amanakis 3de7aeb68a Avoid deadlock when removing L2ARC devices under I/O
In case we have I/O and try to remove an L2ARC device a deadlock might
occur. arc_read()->zio_read()->zfs_blkptr_verify() waits for SCL_VDEV
to be dropped while holding the hash_lock. However, spa_l2cache_load()
holds SCL_ALL and waits for the hash_lock in l2arc_evict().

Fix this by moving zfs_blkptr_verify() to the top top arc_read() before
the hash_lock is taken. Verify the block pointer and return a checksum
error if damaged rather than halting the system, by using
BLK_VERIFY_LOG instead of BLK_VERIFY_HALT.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12054
2021-06-17 13:35:29 -07:00
наб ff64096e75 systemd: import: expand $ZPOOL_IMPORT_OPTS correctly
Turns out $ZPOOL_IMPORT_OPTS expands in a shell-like fashion,
yielding 'import' '-aN' '-o' 'cachefile=none' for an unset variable,
and 'import' '-aN' '-o' 'cachefile=none' 'word1' 'word2' for a
white-spaced one, but ${ZPOOL_IMPORT_OPTS} expands like "${Z_I_O}"
would in a shell, yielding 'import' '-aN' '-o' 'cachefile=none' ''
(empty) and 'import' '-aN' '-o' 'cachefile=none' 'word1 word2' (spaced)

Fixes eec5ba113e "dracut: 90zfs: respect
zfs_force=1 on systemd systems"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes: #12231
2021-06-15 16:56:53 -07:00
Matthew Ahrens 7e2212990b vdev_draid_min_asize() ignores reserved space
vdev_draid_min_asize() returns the minimum size of a child vdev.  This
is used when determining if a disk is big enough to replace a child.
It's also used by zdb to determine how big of a child to make to test
replacement.

vdev_draid_min_asize() says that the child’s asize has to be at least
1/Nth of the entire draid’s asize, which is the same logic as raidz.
However, this contradicts the code in vdev_draid_open(), which
calculates the draid’s asize based on a reduced child size:

  An additional 32MB of scratch space is reserved at the end of each
  child for use by the dRAID expansion feature

So the problem is that you can replace a draid disk with one that’s
vdev_draid_min_asize(), but it actually needs to be larger to accommodate
the additional 32MB.  The replacement is allowed and everything works at
first (since the reserved space is at the end, and we don’t try to use
it yet), but when you try to close and reopen the pool,
vdev_draid_open() calculates a smaller asize for the draid, because of
the smaller leaf, which is not allowed.

I think the confusion is that vdev_draid_min_asize() is correctly
returning the amount of required *allocatable* space in a leaf, but the
actual *size* of the leaf needs to be at least 32MB more than that.
ztest_vdev_attach_detach() assumes that it can attach that size of
device, and it actually can (the kernel/libzpool accepts it), but it
then later causes zdb to not be able to open the pool.

This commit changes vdev_draid_min_asize() to return the required size
of the leaf, not the size that draid will make available to the metaslab
allocator.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11459
Closes #12221
2021-06-15 16:56:42 -07:00
Paul Zuchowski bd83c1e0c6 Do not hash unlinked inodes
In zfs_znode_alloc we always hash inodes.  If the
znode is unlinked, we do not need to hash it.  This
fixes the problem where zfs_suspend_fs is doing zrele
(iput) in an async fashion, and zfs_resume_fs unlinked
drain processing will try to hash an inode that could
still be hashed, resulting in a panic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9741
Closes #11223
Closes #11648
Closes #12210
2021-06-15 16:56:19 -07:00
Rich Ercolani a416b29e16 Added uncompress requirement
Having an old enough version of "file" and no "uncompress" program
installed can cause rpmbuild as root to crash and mangle rpmdb.

So let's add a build dependency for RPM-based systems.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12071
Closes: #12168
2021-06-15 16:55:49 -07:00
Brian Behlendorf af4b6f7dab ZTS: Add zfs_clone_livelist_dedup.ksh to Makefile.am
Commit 86b5f4c12 added a new zfs_clone_livelist_dedup.ksh test case
but didn't include it in the Makefile.am.  This results in the test
not being included in the dist tarball so it's never run by the CI.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #12224
2021-06-15 16:55:31 -07:00
Brian Behlendorf c3b60ededa Tag 2.1.0-rc7
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-06-10 10:50:16 -07:00
Alexander Motin 57196f8ae9 Re-embed multilist_t storage
This commit partially reverts changes to multilists in PR 7968
(multi-threaded spa-sync()) and adds some cache line alignments to
separate read-only multilists and heavily modified refcount's to
different cache lines.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-by: iXsystems, Inc.
Closes #12158
2021-06-10 10:50:16 -07:00
наб 8dc540ae16 dracut: 90zfs: respect zfs_force=1 on systemd systems
On systemd systems provide an environment generator in order
to respect the zfs_force=1 kernel command line option.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11403
Closes #12195
2021-06-10 10:50:16 -07:00
Alexander Motin efdfb14fc8 Remove pool io kstats
This mostly reverts "3537 want pool io kstats" commit of 8 years ago.

From one side this code using pool-wide locks became pretty bad for
performance, creating significant lock contention in I/O pipeline.
From another, there are more efficient ways now to obtain detailed
statistics, while this statistics is illumos-specific and much less
usable on Linux and FreeBSD, reported only via procfs/sysctls.

This commit does not remove KSTAT_TYPE_IO implementation, that may
be removed later together with already unused KSTAT_TYPE_INTR and
KSTAT_TYPE_TIMER.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12212
2021-06-10 10:50:16 -07:00
Rich Ercolani 2e158b0e0b Added error for writing to /dev/ on Linux
Starting in Linux 5.10, trying to write to /dev/{null,zero} errors out.
Prefer to inform people when this happens rather than hoping they guess
what's wrong.

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes:  #11991
2021-06-10 10:50:16 -07:00
наб 4cf3e48a3b libzfs: format safety
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12116
2021-06-10 10:50:16 -07:00
наб 4e0fff2e02 zgenhostid.8: revisit
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12212
2021-06-10 10:50:16 -07:00
наб 14973b917c Consistentify miscellaneous style on remaining manpages
Most notably this fixes the vdev_id(8) non-.Xrs in vdev_id.conf.5

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12212
2021-06-10 10:50:16 -07:00
наб a444efb6d7 Move properties, parameters, events, and concepts around manual sections
The pages moved as follows:
  zpool-features.{5 => 7}
  spl{-module-parameters.5 => .4}
  zfs{-module-parameters.5 => .4}
  zfs-events.5 => into zpool-events.8
  zfsconcepts.{8 => 7}
  zfsprops.{8 => 7}
  zpoolconcepts.{8 => 7}
  zpoolprops.{8 => 7}

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Co-authored-by: Daniel Ebdrup Jensen <debdrup@FreeBSD.org>
Closes #12149
Closes #12212
2021-06-10 10:50:16 -07:00
наб 0bef46e6d5 man: use one Makefile, use OpenZFS for .Os
The prevailing style is to use either nothing, or the originating
organisational umbrella (here: OpenZFS), and these aren't Linux manpages

This also deduplicates the substitution code, and makes adding/removing
sexions simpler in future

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12212
2021-06-10 10:50:16 -07:00
Brian Behlendorf 1180d61152 Fix minor shellcheck 0.7.2 warnings
The first warning of a misspelling is a false positive, so we annotate
the script accordingly.  As for the x-prefix warnings update the check
to use the conventional '[ -z <string> ]' syntax.

all-syslog.sh:46:47: warning: Possible misspelling: ZEVENT_ZIO_OBJECT
    may not be assigned, but ZEVENT_ZIO_OBJSET is. [SC2153]
make_gitrev.sh:53:6: note: Avoid x-prefix in comparisons as it no
    longer serves a purpose [SC2268]
man-dates.sh:10:7: note: Avoid x-prefix in comparisons as it no
    longer serves a purpose [SC2268]

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12208
2021-06-10 10:50:16 -07:00
наб 6ce97bb4a2 zed.d/history_event-zfs-list-cacher.sh.in: parallelise, simplify
This:
  (a) improves the error log message,
  (b) locks per pool instead of globally,
  (c) locks the actual output file instead of /var/lock/zfs-list,
      which would otherwise linger there forever (well, still will,
      but you can remove it and it won't come back), and
  (d) preserves attributes of the output file
      instead of reverting them to 0:0 644

It is imperative that the previous commit
("zed-functions.sh: zed_lock(): don't truncate lock")
be included in any series that contains this one

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-06-09 13:06:39 -07:00
наб 27d3cc6cd3 zed.d/all-debug.sh: simplify
By locking the log file itself, we can omit arduous rebinding and
explicit umask setting, but, perhaps more importantly, avoid permanently
littering /var/lock/ with zed.debug.log.lock we will never delete

It is imperative that the previous commit
("zed-functions.sh: zed_lock(): don't truncate lock")
be included in any series that contains this one

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-06-09 13:06:09 -07:00
наб d6a0cecab1 zed-functions.sh: zed_lock(): don't truncate lock
By appending instead of truncating, we can lock on any file (with write
permissions) instead of only dedicated lock files, since the locking
process itself no longer alters the file in any way

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-06-09 13:06:05 -07:00
Alan Somers 2c3d7283b4 libzfs: On FreeBSD, use MNT_NOWAIT with getfsstat
`getfsstat(2)` is used to retrieve the list of mounted file systems,
which libzfs uses when fetching properties like mountpoint, atime,
setuid, etc.  The `mode` parameter may be `MNT_NOWAIT`, which uses
information in the VFS's cache, or `MNT_WAIT`, which effectively does a
`statfs` on every single mounted file system in order to fetch the most
up-to-date information.  As far as I can tell, the only fields that
libzfs cares about are the filesystem's name, mountpoint, fstypename,
and mount flags.  Those things are always updated on mount and unmount,
so they will always be accurate in the VFS's mount cache except in two
circumstances:

1) When a file system is busy unmounting
2) When a ZFS file system changes the value of a mount-overridable
   property like atime or setuid, but doesn't remount the file system.
   Right now that only happens when the property is changed by an
   unprivileged user who has delegated authority to change the property
   but not to mount the dataset.  But perhaps libzfs could choose to do
   it for other reasons in the future.

Switching to `MNT_NOWAIT` will greatly improve speed with no downside,
as long as we explicitly update the mount cache whenever we change a
mount-overridable property.

For comparison, Illumos gets this information using the native
`getmntany` and `getmntent` functions, which also use cached
information.  The illumos function that would refresh the cache,
`resetmnttab`, is never called by libzfs.

And on GNU/Linux, `getmntany` and `getmntent` don't even communicate
with the kernel directly.  They simply parse the file they are given,
which is usually /etc/mtab or /proc/mounts.  Perhaps the implementation
of /proc/mounts is synchronous, ala MNT_WAIT; I don't know.

Sponsored-by:	Axcient
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes: #12091
2021-06-09 13:05:34 -07:00
наб d7e6f293da Modernise/fix/rewrite unlinted manpages
zpool-destroy.8: flatten, fix description
zfs-wait.8: flatten, fix description, use list for events
zpool-reguid.8: flatten, fix description
zpool-history.8: flatten, fix description
zpool-export.8: flatten, fix description, remove -f "unmount" reference
  AFAICT no such command exists even in Illumos (as of today, anyway),
  and we definitely don't call it
zpool-labelclear.8: flatten, fix description
zpool-features.5: modernise
spl-module-parameters.5: modernise
zfs-mount-generator.8: rewrite
zfs-module-parameters.5: modernise

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12169
2021-06-09 13:05:34 -07:00
Rich Ercolani 2f23f0f940 Force --enable-debug on FreeBSD if INVARIANTS is set
There's already logic to force INVARIANTS on for building if it's
present in the running kernel; however, not having DEBUG enabled
when DEBUG and INVARIANTS are can cause strange panics.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12185
Closes #12163
2021-06-09 13:05:34 -07:00
Serapheim Dimitropoulos a377bde727 Livelist logic should handle dedup blkptrs
Update the logic to handle the dedup-case of consecutive
FREEs in the livelist code. The logic still ensures that
all the FREE entries are matched up with a respective
ALLOC by keeping a refcount for each FREE blkptr that we
encounter and ensuring that this refcount gets to zero
by the time we are done processing the livelist.

zdb -y no longer panics when encountering double frees

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #11480
Closes #12177
2021-06-09 13:05:34 -07:00
Alexander Motin e76373de7b More aggsum optimizations
- Avoid atomic_add() when updating as_lower_bound/as_upper_bound.
Previous code was excessively strong on 64bit systems while not
strong enough on 32bit ones.  Instead introduce and use real
atomic_load() and atomic_store() operations, just an assignments
on 64bit machines, but using proper atomics on 32bit ones to avoid
torn reads/writes.

 - Reduce number of buckets on large systems.  Extra buckets not as
much improve add speed, as hurt reads.  Unlike wmsum for aggsum
reads are still important.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12145
2021-06-09 13:05:34 -07:00
наб b05ae1a82a libzfs: write_inuse_diffs_one: format strerror() with "%s"
Fixes 50353dbd ("Let zfs diff be more  permissive") which accidentally
introduced a build warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12197
2021-06-09 13:05:34 -07:00
наб b12a6d961c i-t: don't try to import from empty cache
Chases 7c64ee9e77
 ("zfs-import-{cache,scan}: change condition to FileNotEmpty")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12108
2021-06-09 13:05:34 -07:00
наб 11b165cc9b Use %%/* instead of awk -F/ {print $1} to strip datasets
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12108
2021-06-09 13:05:34 -07:00
наб 2dde9202d9 dracut: 90zfs: zfs-load-key: don't load unencrypted bootfs' keylocation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11800
Closes #12108
2021-06-09 13:05:34 -07:00
наб 998035d534 dracut: 90zfs: module-setup: try /lib*/libgcc_s.so*, relax /u/l/gcc path
SUSE stores the library at /lib64/libgcc_s.so.1 (/lib/libgcc_s.so.1 for
i686 glibc), which is in the search path

Also relax the /usr/lib path to catch systems similar to SUSE
(/usr/lib64/gcc/x86_64-suse-linux/10/libgcc_s.so) but without
the top-level lib64

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11750
Closes #12108
2021-06-09 13:05:34 -07:00
Rich Ercolani 64e38df237 Let zfs diff be more permissive
In the current world, `zfs diff` will die on certain kinds of errors
that come up on ordinary, not-mangled filesystems - like EINVAL,
which can come from a file with multiple hardlinks having the one
whose name is referenced deleted.

Since it should always be safe to continue, let's relax about all
error codes - still print something for most, but don't immediately
abort when we encounter them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12072
2021-06-09 13:05:34 -07:00
jharmening d3dddbaa20 FreeBSD: incorporate changes to the VFS_QUOTACTL(9) KPI
VFS_QUOTACTL(9) has been updated to allow each filesystem to indicate
whether it has changed the busy state of the mount.  The filesystem
may still assume that its .vfs_quotactl entrypoint is always called
with the mount busied, but only needs to unbusy the mount (and clear
*mp_busy) if it does something that actually requires the mount to be
unbusied.  It no longer needs to blindly copy-paste the UFS protocol
for calling vfs_unbusy(9) for the Q_QUOTAOFF and Q_QUOTAON commands.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jason Harmening <jason.harmening@gmail.com>
Closes #12052
2021-06-09 13:05:34 -07:00
Ryan Moeller e298695809 Fix error check in nvlist_print_json_string
Move check for errors from mbrtowc() into the loop.  The error values
are not actually negative, so we don't break out of the loop when they
are encountered.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12175
Closes #12176
2021-06-09 13:05:34 -07:00
наб c650ceb64d Lint most manpages
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12129
2021-06-09 13:05:34 -07:00
наб 13c9a41140 mancheck: accept lints, accept lint overrides
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12129
2021-06-09 13:05:34 -07:00
Brian Behlendorf af80160f48 Linux: Set spl_kmem_cache_slab_limit when page size !4K
For small objects the kernel's slab implementation is very fast and
space efficient. However, as the allocation size increases to
require multiple pages performance suffers. The SPL kmem cache
allocator was designed to better handle these large allocation
sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers
prefer to use the kernel's slab allocator for small objects and
the custom SPL kmem cache allocator for larger objects.

This logic was effectively disabled for all architectures using
a non-4K page size which caused all kmem caches to only use the
SPL implementation. Functionally this is fine, but the SPL code
which calculates the target number of objects per-slab does not
take in to account that __vmalloc() always returns page-aligned
memory. This can result in a massive amount of wasted space when
allocating tiny objects on a platform using large pages (64k).

To resolve this issue we set the spl_kmem_cache_slab_limit cutoff
to 16K for all architectures. 

This particular change does not attempt to update the logic used
to calculate the optimal number of pages per slab. This remains
an issue which should be addressed in a future change.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12152
Closes #11429
Closes #11574
Closes #12150
2021-06-09 13:05:34 -07:00
Colm db9048741b A couple of small style cleanups
In `zpool_load_compat()`:

  * initialize `l_features[]` with a loop rather than a static
    initializer.

  * don't redefine system constants; use private names instead

Rationale here:

When an array is initialized using a static {foo}, only the specified
members are initialized to the provided values, the rest are
initialized to zero. While B_FALSE is of course zero, it feels
unsafe to rely on this being true forever, so I'm inclined to sacrifice
a few microseconds of runtime here and initialize using a loop.

When looking for the correct combination of system constants to use
(in open() and mmap()), I prefer to use private constants rather than
redefining system ones; due to the small chance that the system
ones might be referenced later in the file. So rather than defining
O_PATH and MAP_POPULATE, I use distinct constant names.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #12156
2021-06-09 13:05:34 -07:00
наб b1b1faf0f1 zfs-module-parameters.5: remove nonexistent parameters
zfs_arc_overflow_shift was never a parameter:
ca0bf58d65 ("Illumos 5497 - lock
contention on arcs_mtx") is the only result in
git log -Soverflow_shift, and it wasn't exposed then, nor is it now

zfs_read_chunk_size was renamed to zfs_vnops_read_chunk_size in
e53d678d4a ("Share zfs_fsync, zfs_read,
zfs_write, et al between Linux and FreeBSD")

zio_decompress_fail_fraction was never a parameter: it was added in
c3bd3fb4ac ("OpenZFS 9403 - assertion
failed in arc_buf_destroy()") as a developer aid for setting in zdb, but
it's a dangerous test tunable and has no place in public documentation,
(not to mention that it obviously doesn't work):
> Although this did uncover a few low priority issues, this
  unfortuantely also causes ztest to ASSERT in many locations where the
  code is working correctly since it is designed to fail on IO errors.
  Developers can manually set this variable with the '-o' option to find
  and debug issues.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12157
2021-06-09 13:05:34 -07:00
наб 2fe8060cee spl-module-parameters.5: remove spl_kmem_cache_{expire,obj_per_slab_min}
Both were removed in 4fbdb10c7b ("remove
kmem_cache module parameter KMC_EXPIRE_AGE")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12157
2021-06-09 13:05:34 -07:00
Rich Ercolani 6e01b47a28 Quick fixes for two ZTS failures
On FreeBSD 14, these two tests started erroring out like the
objects they're attempting to examine don't exist.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12165
2021-06-09 13:05:34 -07:00
Rich Ercolani cc9dae2404 Added another missed case to arc_summary3
It turns out that sometimes, evidently only when run inside the
ZTS handler, arc_summary3 | head > /dev/null will die with ENOTCONN,
and ruin the test run.

Added handling for that.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12160
2021-06-09 13:05:34 -07:00
grembo 1fb18c4bde FreeBSD boot code reminder after zpool upgrade
There used to be a warning after upgrading a zpool in FreeBSD, so users
won't forget to update the boot loader that pool is booted from.

This change brings this warning back, but only if the bootfs property
is set on the pool, which should be sufficient for the vast majority of
FreeBSD installations. People running something custom are most likely
aware of what to do after an upgrade in their specific environment.

Functionality is implemented in an OS specific helper function.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Michael Gmelin <grembo@FreeBSD.org>
Signed-off-by: Michael Gmelin <grembo@FreeBSD.org>
Closes #12099
Closes #12104
2021-06-09 13:05:34 -07:00
Rich Ercolani a0c055cfd3 Remove iov_iter_advance() for iter_write
The additional iter advance is incorrect, as copy_from_iter() has
already done the right thing.  This will result in the following
warning being printed to the console as of the 5.12 kernel.

    Attempted to advance past end of bvec iter

This change should have been included with #11378 when a
similar change was made on the read side.

Suggested-by: @siebenmann
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Issue #11378
Closes #12041
Closes #12155
2021-06-09 13:05:34 -07:00
наб 91bb2e91bd Turn checkbashisms into a make target
make_gitrev.sh actually breaks checkbashisms' parser,
which /insists/ that the end-of-line " is actually a string start

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12101
2021-06-09 13:05:34 -07:00
наб 132240507d Turn shellcheck into a normal make target. Fix new files it caught
This checks every file it checked (and a few more),
but explicitly instead of "if it works it works" best-effort
(which wasn't that good anyway)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #10512
Closes #12101
2021-06-09 13:05:34 -07:00
наб d6f8f41c21 udev/rules.d: .gitignore: glob all rules
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12101
2021-06-09 13:05:34 -07:00
наб 2f5c39137a i-t: don't suggest zpool-import with altroot to /root
This *will fail* when remounted by the real root

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12148
2021-06-09 13:05:34 -07:00
наб 90d0023005 i-t: let rootdelay= set $ZFS_INITRD_PRE_MOUNTROOT_SLEEP
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Ref: https://github.com/openzfs/zfs/issues/11420#issuecomment-850338673
Closes #11663
Closes #12148
2021-06-09 13:05:34 -07:00
наб dbfbc1e524 zstream: force-install zstreamdump link
Accidentally introduced by commit dd00925e8d.

Force-install the zstreamdump link, this is a supported configuration
and the install should not fail if it needs to overwrite an existing
file.

Also cd to work around some funny platforms as noted in AC_PROG_LN_S doc

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12143
2021-06-09 13:05:34 -07:00
наб a61c502907 Widen mancheck to all of man and test-runner
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
наб f2e890ddfa test-runner.1: modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
наб 71f6e6480e zfs-events.5: modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
наб a1b1007e95 vdev_id.conf.5: modernise
Also yeet pci_slot since it doesn't seem to exist?

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
наб 01918aa3f4 man: use Nm/Cm/Fl consistently
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
наб e1ae0c4b99 zed.8: modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
наб 475ee3bc2b cstyle.1: modernise
Also remove note about the OS/Net consolidation, now the illumos gate

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
наб f6ce806298 vdev_id.8: modernise, note scsi topology
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
наб ea034765b4 zhack.1: modernise
The spacing on zhack    feature stat    pool is a bit iffy(?)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
наб 9d7c10388a zpool_influxdb.8: modernise
Also rip out the section about potentially including in the OpenZFS
distribution and simplify -e description

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
наб f67a920a6d zinject.8: modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
наб 5c7c2b1301 raidz_test.1: modernise
Also re-add articles left out by the slav who wrote this

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
наб 5d08b105a2 zpoolprops.8: fix spacing in ashift
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
наб 55b6d4a3d8 fsck.zfs.8: modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
наб a0372059b1 arcstat.1: modernise
Also slim down the description a tad

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
наб 68a310dc45 ztest.1: modernise
I fixed a few typos, but avoided changing anything beyond that;
the sould of the document should be preserved

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
наб 8ed04625db zgenhostid.8: use single-line indent macro for single-line examples
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-06-09 13:05:34 -07:00
Manoj Joseph c3c65e3cfb long options for ztest
This change introduces long options for ztest. It builds the usage
message as well as the long_options array from a single table. It also
adds #defines for the default values.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Manoj Joseph <manoj.joseph@delphix.com>
Closes #12117
2021-06-09 13:05:34 -07:00
Paul Dagnelie eb9d335009 Don't direct to freenode in issue template
While Libera doesn't yet have a webchat client, we should at least
direct them to the right network. Once a webchat client is available,
we can direct them to it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12127
2021-06-09 13:05:34 -07:00
Alexander Motin 85c43508f3 Introduce write-mostly sums
wmsum counters are a reduced version of aggsum counters, optimized for
write-mostly scenarios.  They do not provide optimized read functions,
but instead allow much cheaper add function.  The primary usage is
infrequently read statistic counters, not requiring exact precision.

The Linux implementation is directly mapped into percpu_counter KPI.
The FreeBSD implementation is directly mapped into counter(9) KPI.
In user-space due to lack of better implementation mapped to aggsum.

Unfortunately neither Linux percpu_counter nor FreeBSD counter(9)
provide sufficient functionality to completelly replace aggsum, so
it still remains to be used for several hot counters.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12114
2021-06-09 13:05:34 -07:00
Alexander Motin 09c0a8fd1a Improve scrub maxinflight_bytes math.
Previously, ZFS scaled maxinflight_bytes based on total number of
disks in the pool.  A 3-wide mirror was receiving a queue depth of 3
disks, which it should not, since it reads from all the disks inside.
For wide raidz the situation was slightly better, but still a 3-wide
raidz1 received a depth of 3 disks instead of 2.

The new code counts only unique data disks, i.e. 1 disk for mirrors
and non-parity disks for raidz/draid.  For draid the math is still
imperfect, since vdev_get_nparity() returns number of parity disks
per group, not per vdev, but still some better than it was.

This should slightly reduce scrub influence on payload for some pool
topologies by avoiding excessive queuing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored-By:	iXsystems, Inc.
Closing #12046
2021-06-08 14:50:57 -07:00
наб ec3b25825e etc/systemd/zfs-mount-generator: output tweaks
git-diff--w-dirty, but:
  * zfs-load-key-$DSET.service -> zfs-load-key@$DSET.service
  * flattened set -eu into other /bin/sh flags
  * simpler (for 1 2 3 vs while [ counter ]; counter+=1) prompt loop
  * exec $ZFS where applicable

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11915
Closes #11917
2021-06-08 14:50:42 -07:00
наб 0382362ce0 etc/systemd/zfs-mount-generator: rewrite in C
A plain rewrite of the shell version, and generates identical
units, save for replacing some empty lines with nothing, having fewer
meaningless spaces in After=s and different spacing in the lock scripts,
for a clean git diff -w

This is a gain of anywhere from 0m0.336s vs 0m0.022s (15.27x)
to 0m0.202s vs 0m0.006s (33.67x), depending on the hardware,
a.k.a. from "absolutely unusable" to "perfectly fine"

This also properly deals with canmount=noauto units across multiple
pools

See PR for detailed timings (of an early version) and diffs

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11915
Closes #11917
2021-06-08 14:50:38 -07:00
наб 0458070928 zstreamdump: replace with link to zstream
zstreamdump(8) was in quite a bad state,
and the wrapper didn't work if invoked without /sbin in $PATH

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12015
2021-06-08 14:48:58 -07:00
наб 1fc5f8cbfd d/zfsutils.zfs.init derivatives: shellcheck, fix header
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-06-08 14:47:24 -07:00
наб 263b3d64ab contrib/bash_completion.d: fix obvious shellcheck problems
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-06-08 14:47:13 -07:00
наб c1a64be6d4 zgenhostid: use argument path directly
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-06-08 14:47:05 -07:00
наб e40ffed021 Trim excess shellcheck annotations. Widen to all non-Korn scripts
Before, make shellcheck checked
  scripts/{commitcheck,make_gitrev,man-dates,paxcheck,zfs-helpers,zfs,
           zfs-tests,zimport,zloop}.sh
  cmd/zed/zed.d/{{all-debug,all-syslog,data-notify,generic-notify,
                 resilver_finish-start-scrub,scrub_finish-notify,
                 statechange-led,statechange-notify,trim_finish-notify,
                 zed-functions}.sh,history_event-zfs-list-cacher.sh.in}
  cmd/zpool/zpool.d/{dm-deps,iostat,lsblk,media,ses,smart,upath}
now it also checks
  contrib/dracut/{02zfsexpandknowledge/module-setup,
                  90zfs/{export-zfs,parse-zfs,zfs-needshutdown,
                         zfs-load-key,zfs-lib,module-setup,
                         mount-zfs,zfs-generator}}.sh.in
  cmd/zed/zed.d/{pool_import-led,vdev_attach-led,
                 resilver_finish-notify,vdev_clear-led}.sh
  contrib/initramfs/{zfsunlock,hooks/zfs.in,scripts/local-top/zfs}
  tests/zfs-tests/tests/perf/scripts/prefetch_io.sh
  scripts/common.sh.in
  contrib/bpftrace/zfs-trace.sh
  autogen.sh

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-06-08 14:46:31 -07:00
наб 59d91b4d10 Fix SC2181 ("[ $?") outside tests/
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-06-08 14:45:03 -07:00
наб d53a6969c1 i-t: rewrite hooks
This produces a leaner image, doesn't fail if zdb doesn't exist,
properly handles hostnameless systems, doesn't mention crypto modules
for no reason, doesn't add useless empty executable in hopes an
eight-year-old PR is merged, uses i-t builtins for all copies

Also optimize the checkbashisms filter to spawn one (or a few) awks
instead of one per regular file and remove initramfs/hooks therefrom due
to a command -v false positive

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12017
2021-06-08 14:44:35 -07:00
наб 019739b6e9 dracut/90/module-setup: mainly shellcheck cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11956
2021-06-08 14:43:06 -07:00
Brian Behlendorf 7d9f3ef0ef Tag 2.1.0-rc6
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-05-28 09:01:47 -07:00
Armin Wehrfritz 0a01f36b69 RPM: Explicitly set the required min/max kernel version for the DKMS package
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Armin Wehrfritz <dkxls23@gmail.com>
Closes #12124
2021-05-28 09:01:31 -07:00
Rich Ercolani 9e06df8ab0 Minor fix to configure on s390x
configure on s390x has a key check fail with an error about
a variable being used uninitialized. So let's initialize it.

Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12126
2021-05-28 09:01:24 -07:00
Rich Ercolani 57e3b9c3cc Bend zpl_set_acl to permit the new userns* parameter
Just like #12087, the set_acl signature changed with all the bolted-on
*userns parameters, which disabled set_acl usage, and caused #12076.

Turn zpl_set_acl into zpl_set_acl and zpl_set_acl_impl, and add a
new configure test for the new version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12076
Closes #12093
2021-05-27 22:31:57 -07:00
Rich Ercolani 605b901ca6 Reinstate the old zpool read label logic as a fallback
In case of AIO failure, we should probably fallback to the old
behavior and still work.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12032
Closes #12040
2021-05-27 22:31:57 -07:00
наб c5f268fa14 mount.zfs.8: match to reality; zfsprops.8: add missing temporary options
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12111
2021-05-27 22:31:57 -07:00
наб bc7792cad2 mount.zfs.8: modernise
No changes to the text itself

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12111
2021-05-27 22:31:57 -07:00
наб f3222062c0 zfsprops.8: remove nbmand-not-used-on-Linux and pointer to mount(8)
Linux man-pages' mount(8) points at fcntl(2), as does mount(2),
and support for it is little-used, deprecated, and configurable
since 4.5.

As far as I can tell, FreeBSD doesn't support nbmand at all ‒
mandatory locks are mostly dead

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12111
2021-05-27 22:31:57 -07:00
наб 6316086b72 Various Linux kABI cosmetics
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12103
2021-05-27 22:31:57 -07:00
наб 7cdd4dd33b linux: don't fall through to 3-arg vfs_getattr
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12103
2021-05-27 22:31:57 -07:00
Alexander Motin a7bb2dab4e FreeBSD: Update dataset_kstats for zvols in dev mode
Previous commit added accounting for geom mode, but not for dev.
In geom mode we actually have GEOM statistics, while in dev mode
additional accounting actually makes more sense.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12097
2021-05-27 22:31:57 -07:00
Rich Ercolani ecebf770c1 Correct flaws in arc_summary[23] and their test.
The change correctly handles BrokenPipeError and improves the
associated tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12037
Closes #12036
2021-05-27 22:31:57 -07:00
Alexander Motin aa4a84e616 FreeBSD: avoid memory allocation in arc_prune_async
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Closes #12049
2021-05-27 22:31:57 -07:00
Alexander Motin cc1c7b0171 FreeBSD: Retry OCF ENOMEM errors.
ZFS does not expect transient errors from crypto.  For read they are
counted as checksum errors, while for write end up in panic.  To not
panic on random low memory conditions retry ENOMEM errors in the OCF
wrapper function.

While there remove unneeded timeout and priority from msleep().

External-issue: https://reviews.freebsd.org/D30339
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12077
2021-05-27 22:31:57 -07:00
Rich Ercolani fa7ee48e10 Add note for printing all dbgmsg entries on FreeBSD
I looked for a bit, and couldn't find any documentation on
how to print all logged dbgmsg entries, just messages since
the DTrace probe started, until @allanjude kindly pointed me
toward the sysctl.

So let's add that note where the DTrace probe is mentioned for
FreeBSD, so other people can find it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12113
2021-05-27 22:31:57 -07:00
vermavipinkumar d6bedbbc44 Propagate vdev state due to invalid label corruption
Propagate vdev child state to parents on invalid label
Add VDEV_AUX_BAD_LABEL to print_import_config()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Co-authored-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Signed-off-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Closes #12088
2021-05-27 22:31:57 -07:00
Rich Ercolani ab6717cba6 Update tmpfile() existence detection
Linux changed the tmpfile() signature again in torvalds/linux@6521f89,
which in turn broke our HAVE_TMPFILE detection in configure.

Update that macro to include the new case, and change the signature of
zpl_tmpfile as appropriate.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12060
Closes: #12087
2021-05-27 22:31:56 -07:00
Brian Behlendorf 33a06f27e6 Fix dRAID sequential resilver silent damage handling
This change addresses two distinct scenarios which are possible
when performing a sequential resilver to a dRAID pool with vdevs
that contain silent unknown damage. Which in this circumstance
took the form of the devices being intentionally overwritten with
zeros. However, it could also result from a device returning incorrect
data while a sequential resilver was in progress.

Scenario 1) A sequential resilver is performed while all of the
dRAID vdevs are ONLINE and there is silent damage present on the
vdev being resilvered. In this case, nothing will be repaired
by vdev_raidz_io_done_reconstruct_known_missing() because
rc->rc_error isn't set on any of the raid columns. To address
this vdev_draid_io_start_read() has been updated to always mark
the resilvering column as ESTALE for sequential resilver IO.

Scenario 2) Multiple columns contain silent damage for the same
block and a sequential resilver is performed. In this case it's
impossible to generate the correct data from parity unless all of
the damaged columns are being sequentially resilvered (and thus
only good data is used to generate parity). This is as expected
and there's nothing which can be done about it. However, we need
to be careful not to make to situation worse. Since we can't
verify the data is actually good without a checksum, we must
only repair the devices which are being sequentially resilvered.
Otherwise, an incorrect repair to a device which previously
contained good data could effectively lock in the damage and
make reconstruction impossible. A check for this was added to
vdev_raidz_io_done_verified() along with a new test case.

Lastly, this change updates the redundancy_draid_spare1 and
redundancy_draid_spare3 test cases to be more representative
of normal dRAID replacement operation.  Specifically, what we
care about is that the scrub run after a sequential resilver
does not find additional blocks which need repair.  This would
indicate the sequential resilver failed to rebuild a section of
one of the devices. Note also the tests were switched to using
the verify_pool() function which still checks for checksum errors.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12061
2021-05-27 22:31:56 -07:00
Lauri Tirkkonen 7ad8fc5407 zfs-allow.8: mention 'bookmark' permission
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Lauri Tirkkonen <lauri@hacktheplanet.fi>
Closes #12064
2021-05-27 22:31:56 -07:00
Rich Ercolani 272b178d52 Simple change to fix building in recent environments
Renamed _fini too for symmetry.

Suggested-by: @ensch
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12059
Closes: #11987
Closes: #12056
2021-05-27 22:31:56 -07:00
Alexander Motin 8f3584292f Scale worker threads and taskqs with number of CPUs
While use of dynamic taskqs allows to reduce number of idle threads,
hardcoded 8 taskqs of each kind is a big overkill for small systems,
complicating CPU scheduling, increasing I/O reorder, etc, while
providing no real locking benefits, just not needed there.

On another side, 12*8 worker threads per kind are able to overload
almost any system nowadays.  For example, pool of several fast SSDs
with SHA256 checksum makes system barely responsive during scrub, or
with dedup enabled barely responsive during large file deletion.

To address both problems this patch introduces ZTI_SCALE macro, alike
to ZTI_BATCH, but with multiple taskqs, depending on number of CPUs,
to be used in places where lock scalability is needed, while request
ordering is not so much.  The code is made to create new taskq for
~6 worker threads (less for small systems, but more for very large)
up to 80% of CPU cores (previous 75% was not good for rounding down).
Both number of threads and threads per taskq are now tunable in case
somebody really wants to use all of system power for ZFS.

While obviously some benchmarks show small peak performance reduction
(not so big really, especially on systems with SMT, where use of the
second threads does not give as much performance as the first ones),
they also show dramatic latency reduction and much more smooth user-
space operation in case of high CPU usage by ZFS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11966
2021-05-27 22:31:56 -07:00
Brian Behlendorf 289b1393a4 ZTS: Increase redundancy test timeout
The redundancy_draid.ksh and redundancy_raidz.ksh tests were updated
by commit 93c8e91fe to additionally verify self-healing.  This
additional check increased the run time which can now occasionally
exceed the default maximum timeout in the CI environment.  To prevent
this from causing failures increase the default timeout for the
redundancy test cases.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12043
2021-05-27 22:12:48 -07:00
Paul Zuchowski ec75f80a50 Fix dmu_recv_stream test for resumable
Use dsl_dataset_has_resume_receive_state()
not dsl_dataset_is_zapified() to check if
stream is resumable.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #12034
2021-05-27 22:12:40 -07:00
Ryan Moeller c2c02e490f FreeBSD: Use SET_ERROR to trace xattr name errors
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11997
2021-05-27 22:12:26 -07:00
Ryan Moeller 793dffb04e FreeBSD: Don't force xattr mount option
The kernel will use the xattr property by default when not overridden
by a mount option.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11997
2021-05-27 22:12:19 -07:00
Brian Behlendorf faa5673982 Revert "Fix raw sends on encrypted datasets when copying back snapshots"
Commit d1d4769 takes into account the encryption key version to
decide if the local_mac could be zeroed out. However, this could lead
to failure mounting encrypted datasets created with intermediate
versions of ZFS encryption available in master between major releases.
In order to prevent this situation revert d1d4769 pending a more
comprehensive fix which addresses the mount failure case.

Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11294
Issue #12025
Issue #12300
Closes #12033
2021-05-27 22:10:13 -07:00
наб ade8e4b7d6 Widen mancheck target to all pages, fix them
mandoc: ./man/man8/zfs-mount-generator.8.in:188:2:
        ERROR: skipping end of block that is not open: RE
mandoc: ./man/man8/zfs_ids_to_path.8:38:2:
        ERROR: skipping unknown macro: .LP
mandoc: ./man/man8/zfs_ids_to_path.8:48:2:
        ERROR: inserting missing end of block: Sh breaks Bl
mandoc: ./man/man8/zfs-wait.8:69:2:
        ERROR: skipping end of block that is not open: El
mandoc: ./man/man8/zfs-program.8:460:2:
        ERROR: inserting missing end of block: It breaks Bd
mandoc: ./man/man8/zfs-mount-generator.8:188:2:
        ERROR: skipping end of block that is not open: RE
mandoc: ./man/man8/zstream.8:43:2:
        ERROR: skipping unknown macro: .LP
mandoc: ./man/man8/zstream.8:107:2:
        ERROR: inserting missing end of block: Sh breaks Bl
mandoc: ./man/man8/zstream.8:107:2:
        ERROR: inserting missing end of block: Sh breaks Bl
make: *** [Makefile:1529: mancheck] Error 1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12017
2021-05-27 22:09:51 -07:00
Brian Behlendorf 1a7bad542b ZTS: Add known exceptions
The following seven tests been observed to occasionally fail during
CI testing.  This commit adds them to the list of known somewhat
flaky test cases.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12023
2021-05-27 22:09:32 -07:00
Coleman Kane 17351a79e2 linux 5.13 compat: bdevops->revalidate_disk() removed
Linux kernel commit 0f00b82e5413571ed225ddbccad6882d7ea60bc7 removes the
revalidate_disk() handler from struct block_device_operations. This
caused a regression, and this commit eliminates the call to it and the
assignment in the block_device_operations static handler assignment
code, when configure identifies that the kernel doesn't support that
API handler.

Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11967 
Closes #11977
2021-05-27 22:09:26 -07:00
наб 1cb517aebd module/zfs: remove zfs_zevent_console and zfs_zevent_cols
zfs_zevent_console committed multiple printk()s per line without
properly continuing them ‒ a single event could easily be fragmented
across over thirty lines, making it useless for direct application

zfs_zevent_cols exists purely to wrap the output from zfs_zevent_console

The niche this was supposed to fill can be better served by something
akin to the all-syslog ZEDLET

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #7082 
Closes #11996
2021-05-27 22:09:19 -07:00
Brian Behlendorf cb2e336038 Tag 2.1.0-rc5
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-05-10 15:13:37 -07:00
наб 381a0ca1e8 libzfs: zfs_asprintf(): don't return undefined pointer
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-10 12:21:38 -07:00
наб f7a69b93c6 libzfsbootenv: lzbe_set_boot_device(): don't free undefined pointer
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-10 12:21:31 -07:00
наб 8f6e2b5485 zfs_get_enclosure_sysfs_path(): don't free undefined pointer
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-10 12:21:23 -07:00
наб 14b56624c8 zfs_get_enclosure_sysfs_path(): don't leak dev path
Also always free tmp2 at the end

Before:
nabijaczleweli@tarta:~/uwu$ valgrind --leak-check=full ./blergh
==8947== Memcheck, a memory error detector
==8947== Using Valgrind-3.14.0 and LibVEX
==8947== Command: ./blergh
==8947==
(null)
==8947==
==8947== HEAP SUMMARY:
==8947==     in use at exit: 23 bytes in 1 blocks
==8947==   total heap usage: 3 allocs, 2 frees, 1,147 bytes allocated
==8947==
==8947== 23 bytes in 1 blocks are definitely lost in loss record 1 of 1
==8947==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==8947==    by 0x48D74B7: vasprintf (vasprintf.c:73)
==8947==    by 0x48B7833: asprintf (asprintf.c:35)
==8947==    by 0x401258: zfs_get_enclosure_sysfs_path
                         (zutil_device_path_os.c:191)
==8947==    by 0x401482: main (blergh.c:107)
==8947==
==8947== LEAK SUMMARY:
==8947==    definitely lost: 23 bytes in 1 blocks
==8947==    indirectly lost: 0 bytes in 0 blocks
==8947==      possibly lost: 0 bytes in 0 blocks
==8947==    still reachable: 0 bytes in 0 blocks
==8947==         suppressed: 0 bytes in 0 blocks
==8947==
==8947== For counts of detected and suppressed errors, rerun with: -v
==8947== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

nabijaczleweli@tarta:~/uwu$ sed -n 191p zutil_device_path_os.c
        tmpsize = asprintf(&tmp1, "/sys/block/%s/device", dev_name);

After:
nabijaczleweli@tarta:~/uwu$ valgrind --leak-check=full ./blergh
==9512== Memcheck, a memory error detector
==9512== Using Valgrind-3.14.0 and LibVEX
==9512== Command: ./blergh
==9512==
(null)
==9512==
==9512== HEAP SUMMARY:
==9512==     in use at exit: 0 bytes in 0 blocks
==9512==   total heap usage: 3 allocs, 3 frees, 1,147 bytes allocated
==9512==
==9512== All heap blocks were freed -- no leaks are possible
==9512==
==9512== For counts of detected and suppressed errors, rerun with: -v
==9512== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-10 12:21:18 -07:00
наб 214ae461f1 zpool: vdev_run_cmd(): don't free undefined pointers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-10 12:21:12 -07:00
наб 12ed5275d1 libzfs: zpool_load_compat(): don't free undefined pointers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-10 12:21:06 -07:00
наб 133fd00930 libzfs: zpool_load_compat(): open feature file cloexec
As a bonus, this also passes the open flags into the open flags instead
of the mode (it worked by accident because O_RDONLY is 0),
correctly detects a failed map,
and prefaults the entire file since we're always writing to every page

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-10 12:21:00 -07:00
illiliti 74256266ff copy-builtin: posix conformance
This commits contains changes to allow running `copy-builtin` without
bash + some minor improvements.

changed shebang to /bin/sh
added -f option to `set` to globally disable unneeded globbing
replaced all `echo` commands within add_after() with `printf`
alternative to avoid possible issues with options (-neE)
dropped non-portable superfluous `readlink` command
replaced superfluous `true` command with `:` builtin alternative
replaced non-portable `--recursive` option of `cp` command with `-R`
alternative
dropped non-portable `local` keyword

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: illiliti <illiliti@protonmail.com>
Closes #12004
2021-05-10 12:18:54 -07:00
Brian Behlendorf 2085a5f992 Fix dRAID self-healing short columns
When dRAID performs a normal read operation only the data columns
in the raid map are read from disk.  This is enough information to
calculate the checksum, verify it, and return the needed data to the
application.  It's only in the event of a checksum failure that the
additional parity and any empty columns must be read since they are
required for parity reconstruction.

Reading these additional columns is handled by vdev_raidz_read_all()
which calls vdev_draid_map_alloc_empty() to expand the raid_map_t
and submit IOs for the missing columns.  This all works correctly,
but it fails to account for any "short" columns.  These are data
columns which are padded with a empty skip sector at the end.
Since that empty sector is not needed for a normal read it's not
read when columns is first read from disk.  However, like the parity
and empty columns the skip sector is needed to perform reconstruction.

The fix is to mark any "short" columns as never being read by clearing
the rc_tried flag when expanding the raid_map_t.  This will cause
the entire column to re-read from disk in the event of a checksum
failure allowing the self-healing functionality to repair the block.

Note that this only effects the self-healing feature because when
scrubbing a pool the parity, data, and empty columns are all read
initially to verify their contents.  Furthermore, only blocks which
contain "short" columns would be effected, and only when the memory
backing the skip sector wasn't already zeroed out.

This change extends the existing redundancy_raidz.ksh test case to
verify self-healing (as well as resilver and scrub).  Then applies
the same test case to dRAID with a slightly modified version of
the test script called redundancy_draid.ksh.  The unused variable
combrec was also removed from both test cases.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12010
2021-05-10 12:18:36 -07:00
наб b1dd6351bb Replace ZoL with OpenZFS where applicable
Afterward, git grep ZoL matches:
  * README.md:  * [ZoL Site](https://zfsonlinux.org)
  - Correct
  * etc/default/zfs.in:# ZoL userland configuration.
  - Changing this would induce a needless upgrade-check,
    if the user has modified the configuration;
    this can be updated the next time the defaults change
  * module/zfs/dmu_send.c:   * ZoL < 0.7 does not handle [...]
  - Before 0.7 is ZoL, so fair enough

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11956
2021-05-10 12:16:46 -07:00
Ryan Moeller cb18cf6b0a FreeBSD: Remove !FreeBSD ifdef'd code
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11994
2021-05-10 12:16:39 -07:00
Ryan Moeller 6c25218c7e Clean up use of zfs_log_create in zfs_dir
zfs_log_create returns void, so there is no reason to cast its return
value to void at the call site.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11994
2021-05-10 12:16:32 -07:00
наб 0bb2a48ee6 zed: protect against wait4()/fork() races to the global PID table
This can be very easily triggered by adding a sleep(1) before
the wait4() on a PID-starved system: the reaper thread would wait
for a child before its entry appeared, letting old entries accumulate:

  Invoking "all-debug.sh" eid=3021 pid=391
  Finished "(null)" eid=0 pid=391 time=0.002432s exit=0
  Invoking "all-syslog.sh" eid=3021 pid=336
  Finished "(null)" eid=0 pid=336 time=0.002432s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3021 pid=347
  Invoking "all-debug.sh" eid=3022 pid=349
  Finished "history_event-zfs-list-cacher.sh" eid=3021 pid=347
                                              time=0.001669s exit=0
  Finished "(null)" eid=0 pid=349 time=0.002404s exit=0
  Invoking "all-syslog.sh" eid=3022 pid=370
  Finished "(null)" eid=0 pid=370 time=0.002427s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3022 pid=391
  avl_find(tree, new_node, &where) == NULL
  ASSERT at ../../module/avl/avl.c:641:avl_add()
  Thread 1 "zed" received signal SIGABRT, Aborted.

By employing this wider lock, we atomise [wait, remove] and [fork, add]:
slowing down the reaper thread now just causes some zombies
to accumulate until it can get to them

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11963
Closes #11965
2021-05-10 12:13:53 -07:00
Alyssa Ross f15ec889a9 Return required size when encode_fh size too small
Quoting <linux/exportfs.h>:

> encode_fh() should return the fileid_type on success and on error
> returns 255 (if the space needed to encode fh is greater than
> @max_len*4 bytes). On error @max_len contains the minimum size (in 4
> byte unit) needed to encode the file handle.

ZFS was not setting max_len in the case where the handle was too
small.  As a result of this, the `t_name_to_handle_at.c' example in
name_to_handle_at(2) did not work on ZFS.

zfsctl_fid() will itself set max_len if called with a fid that is too
small, so if we give zfs_fid() that behavior as well, the fix is quite
easy: if the handle is too small, just use a zero-size fid instead of
the handle.

Tested by running t_name_to_handle_at on a normal file, a directory, a
.zfs directory, and a snapshot.

Thanks-to: Puck Meerburg <puck@puckipedia.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Closes #11995
2021-05-10 12:13:45 -07:00
Alexander Motin 0db64e9036 Simplify/fix dnode_move() for dn_zfetch
Previous code tried to keep prefetch streams while moving dnode.  But
it was at least not updating per-stream zs_fetchback pointers, causing
use-after-free on next access.  Instead of that I see much easier and
cleaner to just drop old prefetch state and start new from scratch.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11936
Closes #11998
2021-05-10 12:13:29 -07:00
Ryan Moeller 85071b2ff5 FreeBSD: Initialize/destroy zp->z_lock
zp->z_lock is used in shared code for protecting projid and scantime.
We don't exercise these paths much if at all on FreeBSD, so have been
lucky enough not to have issues with the uninitialized locks so far.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #12003
2021-05-10 12:13:12 -07:00
Rich Ercolani 1a0ce3d5ef Updated zfs_dbgmsg_enable documentation to be more accurate
Changed the default specified for zfs_dbgmsg_enable, added
clarification of interaction with zfs_flags.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #11984
Closes #11986
2021-05-10 12:13:03 -07:00
наб 08d2e39719 zed.d/zed-functions.sh: fix zed_guid_to_pool() on dash
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11935
Closes #11954
2021-05-10 12:12:20 -07:00
наб f87f41009f zed.d/history_event-zfs-list-cacher.sh: no grep for snapshot detection
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11935
2021-05-10 12:12:04 -07:00
наб 918e5c6fca zed.d/*-notify.sh: use mktemp instead of generating temp path manually
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11935
2021-05-10 12:11:44 -07:00
наб 9c5256a2fe zed.d/pool_import-led.sh: fix for current zpool scripts
Also minor clean-up with folding state_to_val() into a case,
unrolling the lesser-available seq into numbers,
ignoring vdev states we don't care about,
and documentation comments

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11934
Closes #11935
2021-05-10 12:11:33 -07:00
наб 39bcdd8d74 libzutil: fix dm_get_underlying_path() return if not a DM device
For example, this would happily return "/dev/(null)" for /dev/sda1

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11935
2021-05-10 12:10:20 -07:00
Ryan Moeller 075dcad351 ZTS: Fix xattr_002_neg passing too soon
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11970
2021-05-10 12:09:42 -07:00
Ryan Moeller 5701e393b7 FreeBSD: Prune some unneeded definitions
IS_XATTRDIR is never used.
v_count is only used in two places, one immediately followed by the
use of the real name, v_usecount.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #11973
2021-05-10 12:09:34 -07:00
Arshad Hussain 910b43310f vdev_id: variable not getting expanded under map_slot()
Under function map_slot() variable passed as args
were not getting properly substituted or expanded.
This patch fixes the substitution issue.

Reviewed-by: Niklas Edmundsson <nikke@acc.umu.se>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #11951 
Closes #11959
2021-05-10 12:08:18 -07:00
Nathaniel Wesley Filardo 4ac37f8b2e vdev_mirror: don't scrub/resilver devices that can't be read
This ensures that we don't accumulate checksum errors against offline or
unavailable devices but, more importantly, means that we don't
needlessly create DTL entries for offline devices that are already
up-to-date.

Consider a 3-way mirror, with disk A always online (and so always with
an empty DTL) and B and C only occasionally online.  When A & B resilver
with C offline, B's DTL will effectively be appended to C's due to these
spurious ZIOs even as the resilver empties B's DTL:

  * These ZIOs land in vdev_mirror_scrub_done() and flag an error

  * That flagged error causes vdev_mirror_io_done() to see
    unexpected_errors, so it issues a ZIO_TYPE_WRITE repair ZIO, which
    inherits ZIO_FLAG_SCAN_THREAD because zio_vdev_child_io() includes
    that flag in ZIO_VDEV_CHILD_FLAGS.

  * That ZIO fails, too, and eventually zio_done() gets its hands on it
    and calls vdev_stat_update().

  * vdev_stat_update() sees the error and this zio...

    * is not speculative,
    * is not due to EIO (but rather ENXIO, since the device is closed)
    * has an ->io_vd != NULL (specifically, the offline leaf device)
    * is a write
    * is for a txg != 0 (but rather the read block's physical birth txg)
    * has ZIO_FLAG_SCAN_THREAD asserted

  * So: vdev_stat_update() calls vdev_dtl_dirty() on the offline vdev.

Then, when A & C resilver with B offline, that story gets replayed and
C's DTL will be appended to B's.

In fact, one does not need this permanently-broken-mirror scenario to
induce badness: breaking a mirror with no DTLs and then scrubbing will
create DTLs for all offline devices.  These DTLs will persist until the
entire mirror is reassembled for the duration of the *resilver*, which,
incidentally, will not consider the devices with good data to be sources
of good data in the case of a read failure.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com>
Closes #11930
2021-05-10 12:08:07 -07:00
Brian Behlendorf 6f2012c1dc zfs.spec.in: remove post ldconfig scriptlets
In Fedora 28 the packaging guidelines were changed such that ldconfig
should no longer be called in either the %post or %postun scriptlets.
Instead the new compatibility macros %ldconfig_post, %ldconfig_postun,
and %ldocnfig_scriptlets should be used.

Since we only currently support Fedora 31 and newer, we could drop
%post or %postun scriptlets entirely according to the guidelines.
However, since we also use the same spec file for CentOS / RHEL
it's convenient to call the macros which are available starting
with CentOS / RHEL 8.  For CentOS / RHEL 7 we must still call
ldconfig in the traditional way.

https://fedoraproject.org/wiki/Changes/Removing_ldconfig_scriptlets

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11931
2021-05-10 12:07:16 -07:00
Toomas Soome 96f04062e4 zdb: ASSERT issues when DEBUG is not defined
If zdb is not built with DEBUG mode, the ASSERT macros will be
eliminated.

This will leave vim defined, but not used (gcc warning) and
checkpoint spacemap validation loop will do nothing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11932
2021-05-10 12:07:09 -07:00
Brian Behlendorf a7a579e6e1 ZTS: Add known exceptions
Both the zpool_initialize_import_export and checkpoint_discard_busy
test cases a known to occasionally fail.  Add them to the list of
known possible failures and reference the appropriate issue on the
tracker.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11949
2021-05-10 12:06:31 -07:00
Martin Matuška 463d7e1a61 Drop "All rights reserved" from files by trasz@FreeBSD.org
This obeys the change in freebsd/freebsd-src@bce7ee9d4

External-issue: https://reviews.freebsd.org/D26980
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11947
2021-05-10 12:06:21 -07:00
Prawn 6c1a7be11e receive: don't fail inheriting (-x) properties on wrong dataset type
Receiving datasets while blanket inheriting properties like zfs 
receive -x mountpoint can generally be desirable, e.g. to avoid 
unexpected mounts on backup hosts.

Currently this will fail to receive zvols due to the mountpoint 
property being applicable to filesystems only.  This limitation 
currently requires operators to special-case their minds and tools 
for zvols.

This change gets rid of this limitation for inherit (-x) by
Spiting up the dataset type handling: Warnings for inheriting (-x), 
errors for overriding (-o).

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #11416
Closes #11840
Closes #11864
2021-05-10 12:06:11 -07:00
Mateusz Guzik 1b7d883eaa FreeBSD: damage control racing .. lookups in face of mkdir/rmdir
External-issue: https://reviews.freebsd.org/D29769
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11926
2021-05-10 12:05:49 -07:00
Romain Dolbeau e5c01296ff Fix AVX512BW Fletcher code on AVX512-but-not-BW machines
Introduce a specific valid function for avx512f+avx512bw (instead 
of checking only for avx512f).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Romain Dolbeau <romain@dolbeau.org>
Closes #11937
Closes #11938
2021-05-10 12:05:36 -07:00
Brian Behlendorf 64b83a0bf1 Tag 2.1.0-rc4
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-04-22 14:54:35 -07:00
наб 8fd65351a7 zed: protect against wait4()/fork() races to the launched process tree
As soon as wait4() returns, fork() can immediately return with the same
PID, and race to lock _launched_processes_lock, then try to add the new
(duplicate) PID to _launched_processes, which asserts

By locking before wait4(), we ensure, that, given that same
unfortunate scheduling, _launched_processes_lock cannot be locked by the
spawner before we pop the process in the reaper, and only afterward will
it be added

This moves where the reaper idles when there are children from the
wait4() to the pause(), locking for the duration of that single syscall
in both the no-children and running-children cases; the impact of this
is one to two syscalls (depending on _launched_processes_lock state)
per loop

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11924
Closes #11928
2021-04-22 14:53:51 -07:00
Daniel Stevenson 79d9f663b0 Fixed incorrect man page reference in zfsprops(8)
The special_small_blocks section directed readers to zpool(8) for
documentation on special allocation classes, while they are actually
documented in zpoolconcepts(8).

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Daniel Stevenson <daniel@dstev.net>
Closes #11918
2021-04-21 10:26:57 -07:00
наб 478b8ec8f2 etc/systemd/zfs-mount-generator: don't fail if no cached pools
If $FSLIST exists but is empty, the generator fails with
  sort: cannot read: '/etc/zfs/zfs-list.cache/*':
  No such file or directory

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11915
2021-04-19 15:22:58 -07:00
наб 219acd907b freebsd/libshare: nfs: make nfs_is_shared() thread-safe
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
2021-04-19 15:22:58 -07:00
наб cc33149e5a libshare: nfs: don't leak nfs_lock_fd when lock fails
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
2021-04-19 15:22:58 -07:00
Brian Behlendorf 40abc63c40 ZTS: Improve redundancy test scripts
- Add additional logging to provide more information about why the
  test failed.  This including logging more of the individual commands
  and the contents and differences of the record files on failure.

- Updated get_vdevs() to properly exclude all top-level vdevs
  including raidz3 and draid[1-3].

- Replaced gnudd with dd.  This is the only remaining place in the
  test suite gnudd is used and it shouldn't be needed.

- The refill_test_env function expects the pool as the first argument
  but never sets the pool variable.

- Only fill the test pools to 50% of capacity instead of 75% to help
  speed up the tests.

- Fix replace_missing_devs() calculation, MINDEVSIZE should be
  MINVDEVSIZE.

- Fix damage_devs() so it overwrites almost all of the device so
  we're guaranteed to damage filesystem blocks.

- redundancy_stripe.ksh should not use log_mustnot to check if the
  pool is healthy since the return value may be misinterpreted.
  Just perform a normal conditional check and log the failure.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11906
2021-04-19 15:22:58 -07:00
Attila Fülöp cdc27fd061 ICP: Silence objtool "stack pointer realignment" warnings
Objtool requires the use of a DRAP register while aligning the
stack. Since a DRAP register is a gcc concept and we are
notoriously low on registers in the crypto code, it's not worth
the effort to mimic gcc generated stack realignment.

We simply silence the warning by adding the offending object files
to OBJECT_FILES_NON_STANDARD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #6950
Closes #11914
2021-04-19 15:22:58 -07:00
наб 121a34e36b libzfs: refresh property cache after inheriting userprop
This matches what happens when inheriting a system property

Consider the following program:
	int main() {
		void *zhp = libzfs_init();
		void *dataset = zfs_open(zhp, "zest/__test", 1);

		printf("before:");
		dump_nvlist(zfs_get_user_props(dataset), 2);
		printf("\n");

		zfs_prop_inherit(dataset, "xyz.nabijaczleweli:test", 0);
		printf("after:");
		dump_nvlist(zfs_get_user_props(dataset), 2);
		printf("\n");

		zfs_refresh_properties(dataset);
		printf("refreshed:");
		dump_nvlist(zfs_get_user_props(dataset), 2);
		printf("\n");
	}

And the output before:
	# zfs set xyz.nabijaczleweli:test=hehe zest/__test
	# ./a.out
	before:  xyz.nabijaczleweli:test:
	      value: 'hehe'
	      source: 'zest/__test'

	after:  xyz.nabijaczleweli:test:
	      value: 'hehe'
	      source: 'zest/__test'

	refreshed:

As compared to the output after:
	# zfs set xyz.nabijaczleweli:test=hehe zest/__test
	# ./a.out
	before:  xyz.nabijaczleweli:test:
	      value: 'hehe'
	      source: 'zest/__test'

	after:
	refreshed:

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11064
Closes #11911
2021-04-19 15:22:58 -07:00
наб dfa2beddea libzfs: don't mark prompt+raw as retriable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11911
Closes #11031
2021-04-19 15:22:58 -07:00
Mateusz Guzik 14a1980b35 Combine zio caches if possible
This deduplicates 2 sets of caches which use the same allocation size.

Memory savings fluctuate a lot, one sample result is FreeBSD running
"make buildworld" saving ~180MB RAM in reduced page count associated
with zio caches.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11877
2021-04-19 15:22:58 -07:00
наб 9747310cc1 contrib/dracut: 90: zfs-{rollback,snapshot}-bootfs: use @sbindir@
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11898
2021-04-19 15:22:57 -07:00
наб 0b10994cea contrib/i-t: properly mount root's children with spaces
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11898
2021-04-19 15:22:57 -07:00
наб 692387343f contrib/dracut: 90: mount essential datasets under root
This partly mirrors what the i-t script does (though that mounts all
children, recursively) ‒ /etc, /usr, /lib*, and /bin are all essential,
if present, to successfully invoke the real init, which will then mount
everything else it might need in the right order

The following extreme-case set-up boots w/o issues now:
  /               zoot            zfs  rw,relatime,xattr,noacl
  ├─/etc          zoot/etc        zfs  rw,relatime,xattr,noacl
  ├─/usr          zoot/usr        zfs  rw,relatime,xattr,noacl
  │ └─/usr/local  zoot/usr/local  zfs  rw,relatime,xattr,noacl
  ├─/var          zoot/var        zfs  rw,relatime,xattr,noacl
  │ ├─/var/lib    zoot/var/lib    zfs  rw,relatime,xattr,noacl
  │ ├─/var/log    zoot/var/log    zfs  rw,relatime,xattr,posixacl
  │ ├─/var/cache  zoot/var/cache  zfs  rw,relatime,xattr,noacl
  │ └─/var/tmp    zoot/var/tmp    zfs  rw,relatime,xattr,noacl
  ├─/home         zoot/home       zfs  rw,relatime,xattr,noacl
  │ └─/home/nab   zoot/home/nab   zfs  rw,relatime,xattr,noacl
  ├─/boot         zoot/boot       zfs  rw,relatime,xattr,noacl
  ├─/root         zoot/home/root  zfs  rw,relatime,xattr,noacl
  ├─/opt          zoot/opt        zfs  rw,relatime,xattr,noacl
  └─/srv          zoot/srv        zfs  rw,relatime,xattr,noacl

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11898
2021-04-19 15:22:57 -07:00
наб 8d869cd840 contrib/dracut: 90: generator: only log to kmsg if debug set on cmdline
"debug" is also used by systemd itself, and there's really no reason for
the generator to write this much garbage by default

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11898
2021-04-19 15:22:57 -07:00
наб ae8dd6676b contrib/dracut: 02: don't spill device names across multiple lines
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11898
2021-04-19 15:22:57 -07:00
Paul Zuchowski d938c6ee27 Fix crash in zio_done error reporting
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #11872
Closes #11896
2021-04-19 15:22:57 -07:00
Ryan Moeller dc4d55268c zfs-send(8): Restore sorting of flags
Before #11710 the flags in zfs-send(8) were sorted.
Restore order and bump the date.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11905
2021-04-19 15:22:57 -07:00
наб bdcf0cca10 linux/spl: proc: use global table_{min,max} values instead of local ones
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11879
2021-04-19 15:22:57 -07:00
наб 7be4669a65 linux/libspl: gethostid: read from /proc/sys/kernel/spl/hostid, simplify
Fixes get_system_hostid() if it was set via the aforementioned sysctl
and simplifies the code a bit.  The kernel and user-space must agree,
after all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11879
2021-04-19 15:22:57 -07:00
наб e5c4f86e7a linux/spl: base proc_dohostid() on proc_dostring()
This fixes /proc/sys/kernel/spl/hostid on kernels with mainline commit
32927393dc1ccd60fb2bdc05b9e8e88753761469 ("sysctl: pass kernel pointers
to ->proc_handler") ‒ 5.7-rc1 and up

The access_ok() check in copy_to_user() in proc_copyout_string() would
always fail, so all userspace reads and writes would fail with EINVAL

proc_dostring() strips only the final new-line,
but simple_strtoul() doesn't actually need a back-trimmed string ‒
writing "012345678   \n" is still allowed, as is "012345678zupsko", &c.

This alters what happens when an invalid value is written ‒
previously it'd get set to what-ever simple_strtoul() returned
(probably 0, thereby resetting it to default), now it does nothing

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11878
Closes #11879
2021-04-19 15:22:57 -07:00
Jitendra Patidar 4c925936e3 ZFS traverse_visitbp optimization to limit prefetch
Traversal code, traverse_visitbp() does visit blocks recursively.
Indirect (Non L0) Block of size 128k could contain, 1024 block pointers
of 128 bytes. In case of full traverse OR incremental traverse, where
all blocks were modified, it could traverse large number of blocks
pointed by indirect. Traversal code does issue prefetch of blocks
traversed below indirect. This could result into large number of
async reads queued on vdev queue. So, account for prefetch issued for
blocks pointed by indirect and limit max prefetch in one go.

Module Param:
zfs_traverse_indirect_prefetch_limit: Limit of prefetch while traversing
an indirect block.

Local counters:
prefetched: Local counter to account for number prefetch done.
pidx: Index for which next prefetch to be issued.
ptidx: Index at which next prefetch to be triggered.

Keep "ptidx" somewhere in the middle of blocks prefetched, so that
blocks prefetch read gets the enough time window before their demand
read is issued.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #11802 
Closes #11803
2021-04-19 15:22:57 -07:00
наб 15d3470c2e ZTS: add zed_fd_spill to verify the fds ZEDLETs inherit
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11891
2021-04-19 15:12:45 -07:00
наб 55cf5a255a zed: set O_CLOEXEC on persistent fds, remove closefrom() from pre-exec
Also don't dup /dev/null over stdio if daemonised

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11891
2021-04-19 15:12:41 -07:00
Paul Dagnelie d682e20ba4 Add SIGSTOP and SIGTSTP handling to issig
This change adds SIGSTOP and SIGTSTP handling to the issig function; 
this mirrors its behavior on Solaris. This way, long running kernel 
tasks can be stopped with the appropriate signals. Note that doing 
so with ctrl-z on the command line doesn't return control of the tty 
to the shell, because tty handling is done separately from stopping 
the process. That can be future work, if people feel that it is a 
necessary addition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Issue #810 
Issue #10843 
Closes #11801
2021-04-19 15:12:33 -07:00
Brian Behlendorf 9c470dc6c0 Fix 'make checkbashisms` warnings
The awk command used by the checkbashisms target incorrectly
adds the escape character before the ! and # characters.  This
results in the following warnings because these characters do not
need to be escaped.

    awk: cmd. line:1: warning: regexp escape sequence
        `\!' is not a known regexp operator
    awk: cmd. line:1: warning: regexp escape sequence
        `\#' is not a known regexp operator

Remove the unneeded escape character before ! and #.

Valid escape sequences are:

    https://www.gnu.org/software/gawk/manual/html_node/Escape-Sequences.html

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11902
2021-04-19 15:12:28 -07:00
Brian Behlendorf 7616fc7971 Tag 2.1.0-rc3
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-04-14 14:50:46 -07:00
Yuri Pankov 7bf3959601 Fix vdev health padding in zpool list -v
Do not (incorrectly, right instead left) pad health string itself,
it will be taken care of when printing property value below.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Yuri Pankov <yuripv@FreeBSD.org>
Closes #11899
2021-04-14 13:23:08 -07:00
Brian Behlendorf 25cd7b520d Obsolete earlier packages due to version bump
Follow up to d5ef91af which adds a missing 'obsoletes' for the
libzfs-devel package.

Add a comment to the zfs.spec file as a reminder that previous
versions of the package should be marked as obsolete.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11844 
Closes #11895
2021-04-14 13:23:08 -07:00
наб 4a30a0b947 linux/libspl: getextmntent(): don't leak mnttab FILE*
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11868
2021-04-14 13:23:08 -07:00
наб ecd6ad66b0 libzfs: zfs_mount_at(): load key for encryption root if MS_CRYPT
zfs_crypto_load_key() only works on encryption roots,
and zfs mount -la would fail if it encounters a datasets that
is sorted before their encroots.

To trigger:
  truncate -s 40G /tmp/test
  dd if=/dev/urandom of=/tmp/k bs=128 count=1 status=none
  zpool create -O encryption=on -O keylocation=file:///tmp/k \
               -O keyformat=passphrase test /tmp/test
  zfs create -o mountpoint=/a test/a
  zfs create -o mountpoint=/b test/b
  zfs umount test
  zfs unload-key test
  zfs mount -la

The final mount errored out with:
  Key load error: Keys must be loaded for
    encryption root of 'test/a' (test).
  Key load error: Keys must be loaded for
    encryption root of 'test/b' (test).

And only /test was mounted

This technically breaks the libzfs API, but the previous behavior was
decidedly a bug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11870 
Closes #11875
2021-04-14 13:23:08 -07:00
Mateusz Guzik 2aa0d643fd FreeBSD: use vnlru_free_vfsops if available
Fixes issues when zfs is used along with other filesystems.

External-issue: https://cgit.freebsd.org/src/commit/?id=e9272225e6bed840b00eef1c817b188c172338ee
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11881
2021-04-14 13:23:08 -07:00
Mateusz Guzik a6b82cc0bb FreeBSD: add missing seqc write begin/end around zfs_acl_chown_setattr
It happens to trip over an assert but does not matter for correctness at
this time. Done for future proofing.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11884
2021-04-14 13:23:08 -07:00
Mateusz Guzik 4568b5cfba FreeBSD: add support for lockless symlink lookup
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11883
2021-04-14 13:23:08 -07:00
наб ffc2e74f79 .gitmodules: link to openzfs github repository
Update the test images link to reference the openzfs github repository.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11868
2021-04-14 13:23:08 -07:00
Prawn 8b03fce289 cmd/zfs receive: allow dry-run (-n) to check property args
zfs recv -n does not report some errors it could.  The code to bail 
out of the receive if in dry-run mode came a little early, skipping 
validation of cmdprops (recv -x and -o) among others.  Move the
check down to enable these additional checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #11862
2021-04-14 13:23:08 -07:00
Colm 1f3de97374 Improvements to the 'compatibility' property
Several improvements to the operation of the 'compatibility' property:

1) Improved handling of unrecognized features:
Change the way unrecognized features in compatibility files are handled.

 * invalid features in files under /usr/share/zfs/compatibility.d
   only get a warning (as these may refer to future features not yet in
   the library),
 * invalid features in files under /etc/zfs/compatibility.d
   get an error (as these are presumed to refer to the current system).

2) Improved error reporting from zpool_load_compat.
Note: slight ABI change to zpool_load_compat for better error reporting.

3) compatibility=legacy inhibits all 'zpool upgrade' operations.

4) Detect when features are enabled outside current compatibility set
   * zpool set compatibility=foo <-- print a warning
   * zpool set feature@xxx=enabled <-- error
   * zpool status <-- indicate this state

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #11861
2021-04-14 13:23:08 -07:00
Brian Behlendorf 7de1797cee ZTS: fix removal_condense_export test case
It's been observed in the CI that the required 25% of obsolete bytes
in the mapping can be to high a threshold for this test resulting in
condensing never being triggered and a test failure.  To prevent these
failures make the existing zfs_condense_indirect_obsolete_pct tuning
available so the obsolete percentage can be reduced from 25% to 5%
during this test.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11869
2021-04-14 13:23:08 -07:00
Brian Behlendorf 44ccb1d20f Update libzfs.abi for zfs_send() change
Commit 099fa7e4 intentionally modified the libzfs ABI.  However, it
failed to include an update for the libzfs.abi file.  This commit
resolves the `make checkabi` warning due to that omission.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11710
2021-04-14 13:23:07 -07:00
pstef dfd3015048 Balance parentheses in parameter descriptions
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Piotr Paweł Stefaniak <pstef@freebsd.org>
Closes #11882
2021-04-14 13:23:07 -07:00
Brian Behlendorf 4a9b29375e ZTS: Add known exceptions
The fault/auto_spare_shared, l2arc/persist_l2arc_007_pos, and
alloc_class/alloc_class_013_pos test cases are not entirely reliable
and may occasionally fail resulting in a false positive in the CI.
Add these tests to known list of possible failures until they can
be made 100% reliable.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11890
2021-04-14 13:23:07 -07:00
наб 2453d0263d lib/: set O_CLOEXEC on all fds
As found by
  git grep -E '(open|setmntent|pipe2?)\(' |
    grep -vE '((zfs|zpool)_|fd|dl|lzc_re|pidfile_|g_)open\('

FreeBSD's pidfile_open() says nothing about the flags of the files it
opens, but we can't do anything about it anyway; the implementation does
open all files with O_CLOEXEC

Consider this output with zpool.d/media appended with
"pid=$$; (ls -l /proc/$pid/fd > /dev/tty)":
  $ /sbin/zpool iostat -vc media
  lrwx------ 0 -> /dev/pts/0
  l-wx------ 1 -> 'pipe:[3278500]'
  l-wx------ 2 -> /dev/null
  lrwx------ 3 -> /dev/zfs
  lr-x------ 4 -> /proc/31895/mounts
  lrwx------ 5 -> /dev/zfs
  lr-x------ 10 -> /usr/lib/zfs-linux/zpool.d/media
vs
  $ ./zpool iostat -vc vendor,upath,iostat,media
  lrwx------ 0 -> /dev/pts/0
  l-wx------ 1 -> 'pipe:[3279887]'
  l-wx------ 2 -> /dev/null
  lr-x------ 10 -> /usr/lib/zfs-linux/zpool.d/media

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-14 13:23:07 -07:00
наб ab88e9e264 libzfs{,_core}: set O_CLOEXEC on persistent (ZFS_DEV and MNTTAB) fds
These were fd 3, 4, and 5 by the time zfs change-key hit
execute_key_fob()

glibc appends "e" to setmntent() mode, but musl's just returns fopen()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-14 13:23:07 -07:00
наб 54aaed5c45 libzfs: zfs_crypto_create() requires a new key by definition: set newkey
This changes the password prompt for new encryption roots from
  Enter passphrase:
  Re-enter passphrase:
to
  Enter new passphrase:
  Re-enter new passphrase:
which makes more sense and is more consistent with "new passphrase"
now always meaning "come up with something" and plain "passphrase"
"remember that thing"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-14 13:23:07 -07:00
наб aa52015d45 libzfs_crypto.c: remove unused key_locator enum
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-14 13:23:07 -07:00
наб 5faeaa1365 zfprops(8): fix spacing in jailed= arguments
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-14 13:19:50 -07:00
наб 17792783d0 zfs-[un]jail(8): fix "zfs-jail [un]jail" leftovers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-14 13:19:50 -07:00
pablofsf 07d64c07e0 Allow zfs to send replication streams with missing snapshots
A tentative implementation and discussion was done in #5285.
According to it a send --skip-missing|-s flag has been added.
In a replication stream, when there are snapshots missing in
the hierarchy, if -s is provided print a warning and ignore
dataset (and its children) instead of throwing an error

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pablo Correa Gómez <ablocorrea@hotmail.com>
Closes #11710
2021-04-14 13:19:50 -07:00
Olaf Faaland f8631d0fe0 kmod-zfs should obsolete kmod-spl as well as spl-kmod
Without this Obsoletes, using packages built --with-spec=redhat, an
upgrade from zfs-0.7 to zfs-2.x does not cause the kmod-spl-0.7 package
to be removed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #11865
2021-04-14 13:19:50 -07:00
наб afe2a5ca12 zvol_wait: properly handle zvol_volmode sysctl being 3/none
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-14 13:19:50 -07:00
наб 2a667acfcb zfs_ids_to_path: print correct wrong values
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-14 13:19:50 -07:00
наб 5f3f4def83 zfs_ids_to_path: the -v comes after the executable name
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-14 13:19:50 -07:00
наб 8bc955f546 contrib/bpftrace: exec bpftrace, remove useless cat
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-14 13:19:50 -07:00
наб b3bb604a4c arc_summary3: just read /s/m/{mod}/version instead of spawning cat
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-14 13:19:50 -07:00
наб 2ae2892235 zvol_wait: fix for zvols with spaces in name, optimise
list_zvols() would happily, for zvols with spaces in their names,
assign the second half to volmode, &c., so use a normal read
and set IFS to a tab instead of using 4 separate AWK processes(?)

Similarly, in filter_out_deleted_zvols(), run zfs(8) once and use the
output directly instead of spawning a zfs(8) process per zvol

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-14 13:19:50 -07:00
наб 9d47853c74 zstreamdump: exec zstream dump
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-14 13:19:50 -07:00
TerraTech 30f5b2fbe2 zpl_inode.c: Fix SMACK interoperability
SMACK needs to have the ZFS dentry security field setup before
SMACK's d_instantiate() hook is called as it requires functioning
'__vfs_getxattr()' calls to properly set the labels.

Fxes:
1) file instantiation properly setting the object label to the
   subject's label
2) proper file labeling in a transmutable directory

Functions Updated:
1) zpl_create()
2) zpl_mknod()
3) zpl_mkdir()
4) zpl_symlink()

External-issue: https://github.com/cschaufler/smack-next/issues/1
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: TerraTech <TerraTech@users.noreply.github.com>
Closes #11646 
Closes #11839
2021-04-14 13:19:50 -07:00
Ryan Moeller ad34215364 ZTS: Improve cleanup in removal_with_export
Kill the removal operation on every platform, not just Linux.
The test has been fixed and is now stable on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11856
2021-04-14 13:19:49 -07:00
Rich Ercolani 1eee2c7b27 Added check for broken alien version
Added a check for alien 8.95.{1,2,3}, which is known to fail to 
generate debs 100% of the time, and instead print out a message 
informing the developer that it's known to be broken and linking 
them to more information.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #11848 
Closes #11850
2021-04-14 13:19:49 -07:00
Brian Behlendorf f17d146ca6 Use dsl_scan_setup_check() to setup a scrub
When a rebuild completes it will automatically schedule a follow up
scrub to verify all of the block checksums.  Before setting up the
scrub execute the counterpart dsl_scan_setup_check() function to
confirm the scrub can be started.  Prior to this change we'd only
check vdev_rebuild_active() which isn't as comprehensive, and using
the check function keeps all of this logic in one place.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11849
2021-04-14 13:19:49 -07:00
Tino Reichardt a35cadcf14 Fix double sha1/sha1.o line in module/icp/Makefile.in
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #11852
2021-04-14 13:19:49 -07:00
Ryan Moeller 72a9750ac8 ZTS: Tests using zhack may fail on FreeBSD
As described in #11854, zhack is occasionally segfaulting on FreeBSD. 
Debugging this is proving to be tricky. To avoid false positives in
the CI add entries for the tests that use zhack in zts-report to 
accept that they may occasionally fail on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Issue #11854
Closes #11855
2021-04-14 13:19:49 -07:00
Ryan Moeller 7822c01eb6 Ratelimit deadman zevents as with delay zevents
Just as delay zevents can flood the zevent pipe when a vdev becomes
unresponsive, so do the deadman zevents.

Ratelimit deadman zevents according to the same tunable as for delay
zevents.

Enable deadman tests on FreeBSD and add a test for deadman event
ratelimiting. 

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11786
2021-04-14 13:19:49 -07:00
Marcin Skarbek 96e15d29fa Add kmodtool fix to detect different System.map location
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Marcin Skarbek <git@skarbek.name>
Closes #7807 
Closes #11836
2021-04-14 13:19:49 -07:00
наб bb8db9d927 zed: untangle _zed_conf_parse_path()
Dunno, maybe it's just me, but the previous style was /really/ confusing

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-14 13:19:49 -07:00
наб 32cc3f0837 zed: don't malloc() global zed_conf instance, optimise zed_conf layout
It's all of 40 bytes with 4-byte pointers and 64 with 8-byte ones
(previously 44 and 88, respectively) ‒
there's no reason it can't live on the stack

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-14 13:19:49 -07:00
наб f96dbd7a29 zed: remove zed_conf::{min,max}_events and ZED_{MIN,MAX}_EVENTS
No users, fields marked "reserved for future use", macros defined to 0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-14 13:19:49 -07:00
наб c30fef2523 zed: remove zed_conf::syslog_facility
No users, nobody sets it, main() hard-codes LOG_DAEMON, which is the
only correct value for this

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-14 13:19:49 -07:00
наб 487cd2df11 zed: _zed_conf_display_help(): be consistent about what got_err means
Users passed in EXIT_SUCCESS and EXIT_FAILURE, despite it being a bool

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-14 13:19:49 -07:00
наб 5a674a1b42 zed: untangle -h option listing
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-14 13:19:49 -07:00
наб 52f65648e0 zed: print out licence string as one big chunk
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-14 13:19:49 -07:00
наб 55780d8ec0 zed: only go up to current limit in close_from() fallback
Consider the following strace log:
  prlimit64(0, RLIMIT_NOFILE,
            NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = -1 EBADF (Bad file descriptor)
  dup2(0, 30000)                      = -1 EBADF (Bad file descriptor)
  dup2(0, 300000)                     = -1 EBADF (Bad file descriptor)
  prlimit64(0, RLIMIT_NOFILE,
            {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = 3000
  dup2(0, 30000)                      = 30000
  dup2(0, 300000)                     = 300000

Even a privileged process needs to bump its rlimit before being able
to use fds higher than rlim_cur.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-14 13:19:49 -07:00
наб b3a7e6e7f3 zed.8: the Diagnosis Engine is implemented
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-14 13:19:49 -07:00
наб ea30225fdb zed: replace zed_file_write_n() with write(2), purge it
We set SA_RESTART early on, which will prevent EINTRs (indeed, to the
point of needing to clear it in the reaper, since it interferes with
pause(2)), which is the only error zed_file_write_n() actually handled
(plus, the pid write is no bigger than 12 bytes anyway)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-14 13:19:49 -07:00
наб 018560b153 zed: merge all _NOT_IMPLEMENTED_ events
These events should currently never be generated.

Also untag _zed_event_add_nvpair() from merge with
zpool_do_events_nvprint() ‒ they serve different purposes (machine,
usually script vs human consumption) and format the output differently
as it stands

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-14 13:19:49 -07:00
наб 48b60cffda zed: remove unused zed_file_read_n()
Same deal as zed_file_close_on_exec()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-14 13:19:49 -07:00
наб cb97db792e zed: bump zfs_zevent_len_max if we miss any events
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-14 13:19:49 -07:00
наб 718ee43362 zed.8: don't pretend an unprivileged user could change the script owner
And add a note on /why/ ZEDLETs need to be owned by root

Quoth chown(2), Linux man-pages project:
  Only a privileged process (Linux: one with the CAP_CHOWN capability)
  may change the owner of a file.

Quoth chown(2), FreeBSD:
     [EPERM]  The operation would change the ownership,
              but the effective user ID is not the super-user.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-14 13:19:49 -07:00
наб 01219379cf zed: purge all mentions of a configuration file
There simply isn't a need for one, since the flags the daemon takes
are all short (mostly just toggles) and administrative in nature,
and are therefore better served by the age-old tradition of sourcing an
environment file and preparing the cmdline in the init-specific handler
itself, if needed at all

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-14 13:19:49 -07:00
наб 0a51083e39 zed: implement close_from() in terms of /proc/self/fd, if available
/dev/fd on Darwin

Consider the following strace output:
  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0

Yes, that is well over a million file descriptors!

This reduces the ZED start-up time from "at least a second" to
"instantaneous", and, under strace, from "don't even try" to "usable"
by simple virtue of doing five syscalls instead of over a million;
in most cases the main loop does nothing

Recent Linuxes (5.8+) have close_range(2) for this, but that's an
overoptimisation (and libcs don't have wrappers for it yet)

This is also run by the ZEDLET pre-exec. Compare:
  Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0
to
  Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0
lol

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-14 13:19:49 -07:00
наб b73e40a5ad zed: print combined system/user time after ZEDLET death
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-14 13:19:49 -07:00
наб fa991f2a47 zed: allow limiting concurrent jobs
200ms time-out is relatively long, but if we already hit the cap,
then we'll likely be able to spawn multiple new jobs when we wake up

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807
2021-04-14 13:19:49 -07:00
наб dbf8baf8dd zed: remove unused zed_file_close_on_exec()
The FIXME comment was there since the initial implementation in 2014,
there are no users

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807
2021-04-14 13:19:49 -07:00
наб 9160c441a4 zed: use separate reaper thread and collect ZEDLETs asynchronously
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807
2021-04-14 13:19:49 -07:00
наб 84c1652a0a zed: set names for all threads
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807
2021-04-14 13:19:48 -07:00
Brian Behlendorf 7474d775c7 Tag 2.1.0-rc2
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-04-07 13:30:30 -07:00
Olaf Faaland 506ffdf648 fix misplaced quotes in kmod-preamble
rpm/redhat/zfs-kmod.spec.in has a typo in the shell code that
creates the kmod-preamble file.  This typo results in the
preamble file having the wrong name,

./SOURCES/kmod-preamblenObsoletes

and missing the Obsoletes clause that has become part of the name.

Because the filename is incorrect, the built package does not have
"obsoletes" or "conflicts" set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #11851
2021-04-07 13:29:10 -07:00
Brian Behlendorf 0bf91beab9 Obsolete earlier packages due to version bump
In order for package managers such as dnf to upgrade cleanly after
the package SONAME bump the obsolete package names must be known.
Update the new packages to correctly obsolete the old ones.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11844
Closes #11847
2021-04-07 13:29:06 -07:00
наб 04344d5bc3 i-t: don't brokenly set the scheduler for root pool vdev's disks
This effectively reverts
  4fc411f7a3 (part of #6807) and
  f6fbe25664 (#9042) ‒
the code itself and latter PR cite symmetry with whole-disk-vdev
behaviour (presumably because rootfs vdevs are rarely whole disks),
but the code is broken for NVME devices (indeed, it'd strip the
controller number instead of the (potential) partition number, turning
"nvme0n1p1" into "nvmen1p1", which would then subsequently fail the
sysfs existence check); it could be fixed to handle those (and any
others) rather easily by dereferencing /sys/class/block/$devname,
but this isn't the place for setting this ‒ as noted in the commit that
removed setting the scheduler by default
(9e17e6f254) ‒ use an udev rule

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11838
2021-04-07 13:28:56 -07:00
наб 46aec6d439 i-t: fix root=zfs:AUTO
IFS= would break loops in import_pool(), which would fault
any automatic import

Additionally $ZFS_BOOTFS from cmdline would interfere with find_rootfs()

If many pools were present, same thing could happen across multiple
find_rootfs() runs, so bail out early and clean up in error path

Suggested-by: @nachtgeist
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11278
Closes #11838
2021-04-07 13:28:52 -07:00
matt-fidd 1bb4b5a5ae zfs get -p only outputs 3 columns if "clones" property is empty
get_clones_string currently returns an empty string for filesystem
snapshots which have no clones. This breaks parsable `zfs get` output as
only three columns are output, instead of 4.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Fiddaman <github@m.fiddaman.uk>
Co-authored-by: matt <matt@fiddaman.net>
Closes #11837
2021-04-07 13:28:14 -07:00
наб 71a3487a89 zpool-features.5: remove "booting not possible with this feature"s
The exact limitations on what features are supported when booting
vary considerably depending on the environment.  In order to minimize
confusion avoid categorical statements which assume GRUB2 is being 
used.  The supported GRUB2 features are covered earlier in this man 
page for easy reference.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11842
2021-04-07 13:28:06 -07:00
George Melikov 2aed1ab13d man: fix wrong .Xr macros usages
In addition, html doc will have working hyperlinks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11845
2021-04-07 13:27:57 -07:00
наб d09e1c1343 libzutil: zfs_isnumber(): return false if input empty
zpool list, which is the only user, would mistakenly try to parse the
empty string as the interval in this case:

  $ zpool list "a"
  cannot open 'a': no such pool
  $ zpool list ""
  interval cannot be zero
  usage: <usage string follows>
which is now symmetric with zpool get:
  $ zpool list ""
  cannot open '': name must begin with a letter

Avoid breaking the  "interval cannot be zero" string.
There simply isn't a need for this, and it's user-facing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11841 
Closes #11843
2021-04-07 13:27:42 -07:00
Brian Behlendorf e2f5074c0f ZTS: pool_checkpoint improvements
The pool_checkpoint tests may incorrectly fail because several of
them invoke zdb for an imported pool.  In this scenario it's not
unexpected for zdb to fail if the pool is modified.  To resolve
this these zdb checks are now done after the pool has been exported.

Additionally, the default cleanup functions assumed the pool would
be imported when they were run.  If this was not the case they're
exit early and fail to cleanup all of the test state causing
subsequent tests to fail.  Add a check to only destroy the pool
when it is imported.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11832
2021-04-07 13:27:28 -07:00
Andrea Gelmini ca7af7f675 Fix various typos
Correct an assortment of typos throughout the code base.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #11774
2021-04-07 13:27:11 -07:00
наб 35cce6ea63 bash_completion.d: always call zfs/zpool binaries directly
/dev/zfs is 0:0 666 on most systems, so the [ -w /dev/zfs ] check always
succeeds, but if zfs isn't in $PATH (e.g. when completing from
"/sbin/zfs list" on a regular account) this can lead to error spew like

  nabijaczleweli@szarotka:~$ /sbin/zfs list bash: zfs: command not found
  @ bash: zfs: command not found

We only do read-only commands, and quite general ones at that,
so there's no need to elevate one way or another.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11828
2021-04-07 13:27:06 -07:00
Brian Behlendorf d539b77934 Add RELEASES.md file
Document the project's policy regarding publishing and maintaining
official OpenZFS releases.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11821
2021-04-07 13:26:58 -07:00
Ryan Moeller 202e7545dc ZTS: inheritance/inherit_001_pos is flaky
Add inheritance/inherit_001_pos to the maybe fails on FreeBSD list.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11830
2021-04-07 13:26:35 -07:00
Ryan Moeller 003f2d04b6 FreeBSD: Fix stable/12 after AT_BENEATH removal
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11827
2021-04-07 13:25:20 -07:00
Brian Behlendorf ec311430e2 Bump libzfs.so and libzpool.so versions
Bump the library versions as advised by the libtool guidelines.

https://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html

Two new functions were added but no existing functions were changed,
so we increase the version and the age (version:revision:age).

Added functions (2):
- boolean_t zpool_is_draid_spare(const char *);
- zpool_compat_status_t zpool_load_compat(const char *,
      boolean_t *, char *, char *);

Additionally bump the libzpool.so version information.  This library
is for internal use but we still want to update the version to track
major changes to the interfaces.

The libzfsbootenv, libuutil, libnvpair and libzfs_core libraries
have not been updated.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11817
2021-04-07 13:25:13 -07:00
Ryan Moeller 895d39aa83 Allow pool names that look like Solaris disk names
Nothing bad happens if a prefix of your pool name matches a disk name.
This is a bit of a silly restriction at this point.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11781 
Closes #11813
2021-04-07 13:24:46 -07:00
Ryan Moeller 7f789e150f Don't scale zfs_zevent_len_max by CPU count
The lower bound for this scaling to too low and the upper bound is too
high.  Use a fixed default length of 512 instead, which is a reasonable
value on any system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11822
2021-04-07 13:24:38 -07:00
Ryan Moeller 917f4b334c Atomically check and set dropped zevent count
ratelimit_dropped isn't protected by a lock and is expected to
be updated atomically.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11822
2021-04-07 13:24:33 -07:00
Brian Behlendorf 6613ea3b33 CI: Increase free space in workflow
Recently we've been running out of free space in the ubuntu 20.04
environment resulting in test failures.  This appears to be caused
by a change in the default available free space and not because of
any change in OpenZFS. Try and avoid this failure by applying a
suggested workaround which removes some unnecessary files.

https://github.com/actions/virtual-environments/issues/2840

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11826
2021-04-07 13:24:28 -07:00
Brian Atkinson e783fa0925 Fixing m4 iops rename check
The configure check for iops->rename wanting flags was missing the
AC_MSG_CHECKING() so it would just print yes without saying what was
being checked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11825
2021-04-07 13:24:22 -07:00
наб dc52c0d725 fsck.zfs: implement 4/8 exit codes as suggested in manpage
Update the fsck.zfs helper to bubble up some already-known-about 
errors if they are detected in the pool.

health=degraded => 4/"Filesystem errors left uncorrected"
health=faulted && dataset in /etc/fstab => 8/"Operational error"
pool not found => 8/"Operational error"
everything else => 0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11806
2021-04-07 13:24:18 -07:00
Mike Swanson bfdd001679 Add compatibility file sets (ZoL 0.6.1, 0.6.4, OpenZFS 2.1)
ZoL 0.6.1 introduced feature flags with the three features that all
implementations at the time were guaranteed to have.  0.6.4 introduced
a few more until 0.6.5 added two after that.  OpenZFS 2.1 added the
dRAID feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com>
Closes #11818
2021-04-07 13:24:08 -07:00
Brian Behlendorf 3522f57b6a Tag 2.1.0-rc1
New features:
- Distributed Spare (dRAID) Feature
- Added "compatibility" property for zpool feature sets
- Added zpool_influxdb command to collect zpool statistics

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-03-29 16:39:52 -07:00
наб 38280c3526 zed: reap child after killing on time-out
When a child process is killed waitpid() must be called on the
pid the reap the zombie process.

Update BUGS section to reflect reality by replacing "zedlets
aren't time limited with "zedlets can be interrupted".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11769 
Closes #11798
2021-03-26 14:21:00 -07:00
Matthew Ahrens 2b56a63457 Use a helper function to clarify gang block size
For gang blocks, `DVA_GET_ASIZE()` is the total space allocated for the
gang DVA including its children BP's.  The space allocated at each DVA's
vdev/offset is `vdev_psize_to_asize(vd, SPA_GANGBLOCKSIZE)`.

This commit makes this relationship more clear by using a helper
function, `vdev_gang_header_asize()`, for the space allocated at the
gang block's vdev/offset.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11744
2021-03-26 11:19:35 -07:00
Matthew Ahrens b85f47efd0 When specifying raidz vdev name, parity count should match
When specifying the name of a RAIDZ vdev on the command line, it can be
specified as raidz-<vdevID> or raidzP-<vdevID>.
e.g. `zpool clear poolname raidz-0` or `zpool clear poolname raidz2-0`

If the parity is specified in the vdev name, it should match the actual
parity of that RAIDZ vdev, otherwise the command should fail.  This
commit makes it so.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Stuart Maybee <stuart.maybee@comcast.net>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11742
2021-03-26 11:12:22 -07:00
Luis Henriques 2037edbdaa Fix error code on __zpl_ioctl_setflags()
Other (all?) Linux filesystems seem to return -EPERM instead of -EACCESS
when trying to set FS_APPEND_FL or FS_IMMUTABLE_FL without the
CAP_LINUX_IMMUTABLE capability.  This was detected by generic/545 test
in the fstest suite.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Luis Henriques <henrix@camandro.org>
Closes #11791
2021-03-26 10:46:45 -07:00
Jessica Clarke ef977fce66 Support running FreeBSD buildworld on Arm-based macOS hosts
Arm-based Macs are like FreeBSD and provide a full 64-bit stat from the
start, so have no stat64 variants. Thus, define stat64 and fstat64 as
aliases for the normal versions.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
Closes #11771
2021-03-26 10:45:12 -07:00
Andrea Gelmini 8a915ba1f6 Removed duplicated includes
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #11775
2021-03-22 12:34:58 -07:00
Andrea Gelmini be1e69f31c Fix typo in Python method name
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #11776
2021-03-22 12:32:38 -07:00
Alexander Motin 891568c990 Split dmu_zfetch() speculation and execution parts
To make better predictions on parallel workloads dmu_zfetch() should
be called as early as possible to reduce possible request reordering.
In particular, it should be called before dmu_buf_hold_array_by_dnode()
calls dbuf_hold(), which may sleep waiting for indirect blocks, waking
up multiple threads same time on completion, that can significantly
reorder the requests, making the stream look like random.  But we
should not issue prefetch requests before the on-demand ones, since
they may get to the disks first despite the I/O scheduler, increasing
on-demand request latency.

This patch splits dmu_zfetch() into two functions: dmu_zfetch_prepare()
and dmu_zfetch_run().  The first can be executed as early as needed.
It only updates statistics and makes predictions without issuing any
I/Os.  The I/O issuance is handled by dmu_zfetch_run(), which can be
called later when all on-demand I/Os are already issued.  It even
tracks the activity of other concurrent threads, issuing the prefetch
only when _all_ on-demand requests are issued.

For many years it was a big problem for storage servers, handling
deeper request queues from their clients, having to either serialize
consequential reads to make ZFS prefetcher usable, or execute the
incoming requests as-is and get almost no prefetch from ZFS, relying
only on deep enough prefetch by the clients.  Benefits of those ways
varied, but neither was perfect.  With this patch deeper queue
sequential read benchmarks with CrystalDiskMark from Windows via
iSCSI to FreeBSD target show me much better throughput with almost
100% prefetcher hit rate, comparing to almost zero before.

While there, I also removed per-stream zs_lock as useless, completely
covered by parent zf_lock.  Also I reused zs_blocks refcount to track
zf_stream linkage of the stream, since I believe previous zs_fetch ==
NULL check in dmu_zfetch_stream_done() was racy.

Delete prefetch streams when they reach ends of files.  It saves up
to 1KB of RAM per file, plus reduces searches through the stream list.

Block data prefetch (speculation and indirect block prefetch is still
done since they are cheaper) if all dbufs of the stream are already
in DMU cache.  First cache miss immediately fires all the prefetch
that would be done for the stream by that time.  It saves some CPU
time if same files within DMU cache capacity are read over and over.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11652
2021-03-19 22:56:11 -07:00
Chunwei Chen 296a4a369b Fix zfs_get_data access to files with wrong generation
If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.

This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #10593
Closes #11682
2021-03-19 22:53:31 -07:00
Andrew 66e6d3f128 Fix regression in POSIX mode behavior
Commit 235a85657 introduced a regression in evaluation of POSIX modes
that require group DENY entries in the internal ZFS ACL. An example
of such a POSX mode is 007. When write_implies_delete_child is set,
then ACE_WRITE_DATA is added to `wanted_dirperms` in prior to calling
zfs_zaccess_common(). This occurs is zfs_zaccess_delete().

Unfortunately, when zfs_zaccess_aces_check hits this particular DENY
ACE, zfs_groupmember() is checked to determine whether access should be
denied, and since zfs_groupmember() always returns B_TRUE on Linux and
so this check is failed, resulting ultimately in EPERM being returned.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #11760
2021-03-19 22:50:46 -07:00
Palash Gandhi c23850759f ZTS: New test for kernel panic induced by redacted send
This change adds a new test that covers a bug fix in the binary search
in the redacted send resume logic that causes a kernel panic.
The bug was fixed in https://github.com/openzfs/zfs/pull/11297.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Palash Gandhi <palash.gandhi@delphix.com>
Closes #11764
2021-03-19 22:47:50 -07:00
Martin Matuška cd5b812818 Allow setting bootfs property on pools with indirect vdevs
The FreeBSD boot loader relies on the bootfs property and is capable
of booting from removed (indirect) vdevs.

Reviewed-by Eric van Gyzen
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11763
2021-03-19 22:46:43 -07:00
Ryan Moeller 0ab84bff55 Fix typo in zgenhostid.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11770
2021-03-19 22:39:42 -07:00
Brian Atkinson f52124dce8 Removing old code for k(un)map_atomic
It used to be required to pass a enum km_type to kmap_atomic() and
kunmap_atomic(), however this is no longer necessary and the wrappers
zfs_k(un)map_atomic removed these. This is confusing in the ABD code as
the struct abd_iter member iter_km no longer exists and the wrapper
macros simply compile them out.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11768
2021-03-19 22:38:44 -07:00
Serapheim Dimitropoulos 793c958f6f Initialize metaslab range trees in metaslab_init
= Motivation

We've noticed several zloop crashes within Delphix generated
due to the following sequence of events:

- A device gets expanded and new metaslabas are allocated for
  it. These metaslabs go through `metaslab_init()` but haven't
  gone through `metaslab_sync_done()` yet. This meas that the
  only range tree that's actually set is the `ms_allocatable`.
  All the others are NULL.

- A vdev_initialization is issues and `vdev_initialize_thread`
  starts processing one of these new metaslabs of the expanded
  vdev.

- As part of `vdev_initialize_calculate_progress()` we call
  into `metaslab_load()` and `metaslab_load_impl()` which
  in turn tries to dereference the metaslabs trees that
  are still NULL and therefore we crash.

The same failure can come up from the `vdev_trim` code paths.

= This Patch

We considered the following solutions to deal with this issue:

[A] Add logic to `vdev_initialize/trim` to skip those new
    metaslabs. We decided against this as it would be good
    to avoid exposing this lower-level detail to higer-level
    operations.

[B] Have `metaslab_load_impl()` return early for new metaslabs
    and thus never touch those range_trees that are NULL at
    that time. This seemed more of a work-around for the bug
    and not a clear-cut solution.

[C] Refactor our logic so all metaslabs have their range_trees
    created at the time of their creatin in `metaslab_init()`.

In this patch we decided to go with [C] because:

(1) It doesn't expose more metaslab details to higher level
    operations such as vdev initialize and trim.

(2) The current behavior of creating the range trees lazily
    in `metaslab_sync_done()` is unnecessarily complicated.

(3) Always initializing the metaslab range_trees makes other
    parts of the codebase cleaner. For example, we used to
    use `ms_freed` as the reference value for knowing whether
    all the range_trees have been initialized. Now we no
    longer need to do that check in most places (and in the
    few that we do we use the `ms_new` boolean field now
    which is more readable).

= Side Changes

Probably due to a mismerge we set `ms_loaded` to `B_TRUE` twice
in `metasloab_load_impl()`. In this patch we remove the extraneous
assignment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #11737
2021-03-19 22:36:02 -07:00
Coleman Kane ffd6978ef5 Linux 5.12 update: bio_max_segs() replaces BIO_MAX_PAGES
The BIO_MAX_PAGES macro is being retired in favor of a bio_max_segs()
function that implements the typical MIN(x,y) logic used throughout the
kernel for bounding the allocation, and also the new implementation is
intended to be signed-safe (which the former was not).

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11765
2021-03-19 22:33:42 -07:00
Coleman Kane e2a8296131 Linux 5.12 compat: idmapped mounts
In Linux 5.12, the filesystem API was modified to support ipmapped
mounts by adding a "struct user_namespace *" parameter to a number
functions and VFS handlers. This change adds the needed autoconf
macros to detect the new interfaces and updates the code appropriately.
This change does not add support for idmapped mounts, instead it
preserves the existing behavior by passing the initial user namespace
where needed.  A subsequent commit will be required to add support
for idmapped mounted.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11712
2021-03-19 21:00:59 -07:00
Matthew Ahrens 330c6c0523 Clean up RAIDZ/DRAID ereport code
The RAIDZ and DRAID code is responsible for reporting checksum errors on
their child vdevs.  Checksum errors represent events where a disk
returned data or parity that should have been correct, but was not.  In
other words, these are instances of silent data corruption.  The
checksum errors show up in the vdev stats (and thus `zpool status`'s
CKSUM column), and in the event log (`zpool events`).

Note, this is in contrast with the more common "noisy" errors where a
disk goes offline, in which case ZFS knows that the disk is bad and
doesn't try to read it, or the device returns an error on the requested
read or write operation.

RAIDZ/DRAID generate checksum errors via three code paths:

1. When RAIDZ/DRAID reconstructs a damaged block, checksum errors are
reported on any children whose data was not used during the
reconstruction.  This is handled in `raidz_reconstruct()`.  This is the
most common type of RAIDZ/DRAID checksum error.

2. When RAIDZ/DRAID is not able to reconstruct a damaged block, that
means that the data has been lost.  The zio fails and an error is
returned to the consumer (e.g. the read(2) system call).  This would
happen if, for example, three different disks in a RAIDZ2 group are
silently damaged.  Since the damage is silent, it isn't possible to know
which three disks are damaged, so a checksum error is reported against
every child that returned data or parity for this read.  (For DRAID,
typically only one "group" of children is involved in each io.)  This
case is handled in `vdev_raidz_cksum_finish()`. This is the next most
common type of RAIDZ/DRAID checksum error.

3. If RAIDZ/DRAID is not able to reconstruct a damaged block (like in
case 2), but there happens to be additional copies of this block due to
"ditto blocks" (i.e. multiple DVA's in this blkptr_t), and one of those
copies is good, then RAIDZ/DRAID compares each sector of the data or
parity that it retrieved with the good data from the other DVA, and if
they differ then it reports a checksum error on this child.  This
differs from case 2 in that the checksum error is reported on only the
subset of children that actually have bad data or parity.  This case
happens very rarely, since normally only metadata has ditto blocks.  If
the silent damage is extensive, there will be many instances of case 2,
and the pool will likely be unrecoverable.

The code for handling case 3 is considerably more complicated than the
other cases, for two reasons:

1. It needs to run after the main raidz read logic has completed.  The
data RAIDZ read needs to be preserved until after the alternate DVA has
been read, which necessitates refcounts and callbacks managed by the
non-raidz-specific zio layer.

2. It's nontrivial to map the sections of data read by RAIDZ to the
correct data.  For example, the correct data does not include the parity
information, so the parity must be recalculated based on the correct
data, and then compared to the parity that was read from the RAIDZ
children.

Due to the complexity of case 3, the rareness of hitting it, and the
minimal benefit it provides above case 2, this commit removes the code
for case 3.  These types of errors will now be handled the same as case
2, i.e. the checksum error will be reported against all children that
returned data or parity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11735
2021-03-19 16:22:10 -07:00
Mateusz Guzik 2f385c913f FreeBSD: make seqc asserts conditional on replay
Avoids tripping on asserts when doing pool recovery.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11739
2021-03-17 22:09:45 -07:00
Matthew Ahrens 46df6e98aa Remove unused rr_code
The `rr_code` field in `raidz_row_t` is unused.

This commit removes the field, as well as the code that's used to set
it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11736
2021-03-17 21:57:09 -07:00
Ryan Moeller ec3e4c6784 FreeBSD: Fix memory leaks in kstats
Don't handle (incorrectly) kmem_zalloc() failure.  With KM_SLEEP,
will never return NULL.

Free the data allocated for non-virtual kstats when deleting the object.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11767
2021-03-17 21:55:18 -07:00
Adam D. Moss 1daad98176 Linux: always check or verify return of igrab()
zhold() wraps igrab() on Linux, and igrab() may fail when the inode 
is in the process of being deleted.  This means zhold() must only be
called when a reference exists and therefore it cannot be deleted. 
This is the case for all existing consumers so add a VERIFY and a
comment explaining this requirement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11704
2021-03-16 16:33:34 -07:00
Dries Michiels 5f9d61d06b Update FreeBSD versions
Update supported FreeBSD versions in documentation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Dries Michiels <driesm.michiels@gmail.com>
Closes #11718
2021-03-16 15:03:28 -07:00
gldisater 07dff5cffe Hold and release permissions exist
The man page was missing these two permissions.
Add the missing permissions to the man page. 

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jeremy Faulkner <gldisater@gldis.ca>
Closes #11727
2021-03-16 15:01:21 -07:00
Ryan Moeller 5638803b6a ZTS: Add tests for DOS mode attributes
Create a new section of tests to run with acltype=off.

For now the only test we have is for the DOS mode READONLY attribute on
FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11734
2021-03-16 15:00:14 -07:00
Don Brady dd0b5c8559 Reference_tracking_enable should be a module param
To make use of zfs_refcount_held tunable it should be a module 
parameter in open-zfs.  Also, since the macros will auto-generate OS 
specific tunables, removed the existing zfs_refcount_held reference 
in module/os/freebsd/zfs/sysctl_os.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #11753
2021-03-16 14:56:17 -07:00
Ryan Moeller 9305ff2edf ZTS: Fix incorrect use of libtest in user_run by xattr_003_neg
You can't use user_run to eval ksh functions defined in libtest unless
you include libtest in the user shell.

Fix xattr_003_neg by:
* include libtest in the user shell
* *then* run get_xattr
* assert this fails
* use variables for filenames so they don't change in the user's shell
* don't log the contents of /etc/passwd
* cleanup all byproducts

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11185
2021-03-12 16:17:30 -08:00
Ryan Moeller e0b53a5dbb ZTS: Use ksh and current environment for user_run
The current user_run often does not work as expected.  Commands are run
in a different shell, with a different environment, and all output is
discarded.

Simplify user_run to retain the current environment, eliminate eval,
and feed the command string into ksh.  Enhance the logging for
user_run so we can see out and err.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11185
2021-03-12 16:17:01 -08:00
Mariusz Zaborski e464f7c7cc FreeBSD: bring back possibility to rewind the checkpoint from bootloader
Add parsing of the rewind options.

When I was upstreaming the change [1], I omitted the part where we
detect that the pool should be rewind. When the FreeBSD repo has
synced with the OpenZFS, this part of the code was removed.

[1] FreeBSD repo: 277f38abffc6a8160b5044128b5b2c620fbb970c
[2] OpenZFS repo: f2c027bd6a

External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254152
Originally reviewed by: tsoome, allanjude
Originally reviewed by: kevans (ok from high-level overview)
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #11730
2021-03-12 16:12:14 -08:00
Ryan Moeller f845b2dd1c FreeBSD: Clean up zfsdev_close to match Linux
Resolve some oddities in zfsdev_close() which could result in a
panic and were not present in the equivalent function for Linux.

- Remove unused definition ZFS_MIN_MINOR
- FreeBSD: Simplify zfsdev state destruction
- Assert zs_minor is valid in zfsdev_close
- Make locking around zfsdev state match Linux

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11720
2021-03-12 16:09:15 -08:00
Mateusz Guzik e3e82dcc51 FreeBSD: switch teardown lock to rms
This deserializes otherwise non-contending operations.

The previous scheme of using 17 locks hashed by curthread runs into
conflicts very quickly. Check the pull request for sample results.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:48 -08:00
Mateusz Guzik 5ebe425a5b Macroify teardown lock handling
This will allow platforms to implement it as they see fit, in particular
in a different manner than rrm locks.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:39 -08:00
Mateusz Guzik 9847f77f01 FreeBSD: rename teardown inactive macros to mimick rrm convention
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:31 -08:00
Mateusz Guzik f9acd578f0 FreeBSD: remove 2 assertions that teardown lock is not held
They are not very useful and hard to implement in the rms routine
the code is about to start using.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:20 -08:00
Mateusz Guzik 300f68e017 FreeBSD: rework asserts in zfs_dd_lookup
1. even up ifdefs
2. drop the arguably useless teardown lock asserts -- nothing else
   checks for it

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:07 -08:00
Mateusz Guzik 446400346d Add branch prediction to ZFS_ENTER and ZFS_VERIFY_ZP macros
They are expected to fail only in corner cases.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:03 -08:00
George Wilson 0936981d86 zpool import cachefile improvements
Importing a pool using the cachefile is ideal to reduce the time
required to import a pool. However, if the devices associated with
a pool in the cachefile have changed, then the import would fail.
This can easily be corrected by doing a normal import which would
then read the pool configuration from the labels.

The goal of this change is make importing using a cachefile more
resilient and auto-correcting. This is accomplished by having
the cachefile import logic automatically fallback to reading the
labels of the devices similar to a normal import. The main difference
between the fallback logic and a normal import is that the cachefile
import logic will only look at the device directories that were
originally used when the cachefile was populated. Additionally,
the fallback logic will always import by guid to ensure that only
the pools in the cachefile would be imported.

External-issue: DLPX-71980
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #11716
2021-03-12 15:42:27 -08:00
Martin Matuška b8fa03efbc Fix whitespace introduced in ecc277cff
The manual page change in ecc277c has introduced whitespace on
line ends.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11722
2021-03-11 19:42:04 -08:00
Ryan Moeller 35aa9dc6df FreeBSD: Fix scope of deadman tunables
A few deadman tunables ended up in the wrong sysctl node.

Move them to vfs.zfs.deadman.*

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11715
2021-03-11 19:23:24 -08:00
Adam D. Moss c94d648b1c Microoptimizations for VERIFY() and friends
Add branch hints and constify the intermediate evaluations of 
left/right params in VERIFY3*().

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11708
2021-03-11 17:16:09 -08:00
Allan Jude 92e8fb6395 Add missing files to Makefile
Some .h files that were added were missed in this Makefile. Since 
they are .h files, their being missing only resulted in them 
disappeared from the dist archive.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11705
2021-03-11 17:13:34 -08:00
George Melikov 8d534c37ac CI checkstyle: pin ubuntu version
Our checkstyle doesn't work well on Ubuntu 20.04,
temporary pin it to 18.04.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11713
2021-03-11 17:11:31 -08:00
Don Brady f5ada6538d Return finer grain errors in libzfs unmount_one
Added errno mappings to unmount_one() in libzfs.  Changed do_unmount() 
implementation to return errno errors directly like is done for 
do_mount() and others.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #11681
2021-03-08 08:46:45 -08:00
Tony Hutter 4fdbd43450 vdev_id: Create symlinks even if no /dev/mapper/
vdev_id uses the /dev/mapper/ symlinks to resolve a UUID to a dm name
(like dm-1).  However on some multipath setups, there is no /dev/mapper/
entry for the UUID at the time vdev_id is called by udev.  However,
this isn't necessarily needed, as we may be able to resolve the dm
name from the $DEVNAME that udev passes us (like DEVNAME="/dev/dm-1").

This patch tries to resolve the dm name from $DEVNAME first, before
falling back to looking in /dev/mapper/.  This fixed an issue where the
by-vdev names weren't reliably showing up on one of our nodes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11698
2021-03-08 08:43:30 -08:00
Antonio Russo b2eebe3ae7 ZTS events_002: Improve speed and reliability
events_002 exercises the ZED, ensuring that it neither misses events,
nor reporting events twice.

On slow test hardware, some of the timeouts are insufficient to allow
the ZED to properly settle.  Conversely, on fast hardware these same
timeouts are too long, unnecessarily slowing the test run.

Instead of using a fixed timeout, wait for the expected final event
before returning.  Additionally, wait with a timeout for unexpected
events to avoid missing them if they show up late.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11703
2021-03-08 08:42:45 -08:00
Christian Schwarz 93e3658035 zvol: call zil_replaying() during replay
zil_replaying(zil, tx) has the side-effect of informing the ZIL that an
entry has been replayed in the (still open) tx.  The ZIL uses that
information to record the replay progress in the ZIL header when that
tx's txg syncs.

ZPL log entries are not idempotent and logically dependent and thus
calling zil_replaying() is necessary for correctness.

For ZVOLs the question of correctness is more nuanced: ZVOL logs only
TX_WRITE and TX_TRUNCATE, both of which are idempotent. Logical
dependencies between two records exist only if the write or discard
request had sync semantics or if the ranges affected by the records
overlap.

Thus, at a first glance, it would be correct to restart replay from
the beginning if we crash before replay completes. But this does not
address the following scenario:
Assume one log record per LWB.
The chain on disk is

    HDR -> 1:W(1, "A") -> 2:W(1, "B") -> 3:W(2, "X") -> 4:W(3, "Z")

where N:W(O, C) represents log entry number N which is a TX_WRITE of C
to offset A.
We replay 1, 2 and 3 in one txg, sync that txg, then crash.
Bit flips corrupt 2, 3, and 4.
We come up again and restart replay from the beginning because
we did not call zil_replaying() during replay.
We replay 1 again, then interpret 2's invalid checksum as the end
of the ZIL chain and call replay done.
The replayed zvol content is "AX".

If we had called zil_replaying() the HDR would have pointed to 3
and our resumed replay would not have replayed anything because
3 was corrupted, resulting in zvol content "BX".

If 3 logically depends on 2 then the replay corrupted the ZVOL_OBJ's
contents.

This patch adds the zil_replaying() calls to the replay functions.
Since the callbacks in the replay function need the zilog_t* pointer
so that they can call zil_replaying() we open the ZIL while
replaying in zvol_create_minor(). We also verify that replay has
been done when on-demand-opening the ZIL on the first modifying
bio.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11667
2021-03-07 09:49:58 -08:00
Ryan Moeller b30cd70599 ZTS: Improve cleanup in zpool tests
* Restore original kern.corefile value after the test.
* Don't leave behind a frozen pool.
* Clean up leftover vdev files.
* Make zpool_002_pos and zpool_003_pos consistent in their handling of
core files while here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11694
2021-03-07 09:41:01 -08:00
manfromafar ecc277cff7 Clarify compressed zfs send/recv behavior
Docs for send and receive do not explain behavior when sending a 
compressed stream then receiving on a host that overrides compression 
with -o compress=value.

The data from the send stream is written as it was from the send is 
the compressed form but the compression algorithm set on the receiver 
is the overridden version which causes some confusion as to what 
algorithm was actually used.

Updated man docs to clarify behavior

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed By: Allan Jude <allanjude@freebsd.org>
Signed-off-by: manfromafar <manfromafar@outlook.com>
Closes #11690
2021-03-07 09:39:16 -08:00
Ryan Moeller 4b2e20824b Intentionally allow ZFS_READONLY in zfs_write
ZFS_READONLY represents the "DOS R/O" attribute.
When that flag is set, we should behave as if write access
were not granted by anything in the ACL.  In particular:
We _must_ allow writes after opening the file r/w, then
setting the DOS R/O attribute, and writing some more.
(Similar to how you can write after fchmod(fd, 0444).)

Restore these semantics which were lost on FreeBSD when refactoring
zfs_write.  To my knowledge Linux does not actually expose this flag,
but we'll need it to eventually so I've added the supporting checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11693
2021-03-07 09:31:52 -08:00
Brian Behlendorf e7a06356c1 Suppress cppcheck invalidSyntax warninigs
For some reason cppcheck 1.90 is generating an invalidSyntax warning
when the BF64_SET macro is used in the zstream source.  The same
warning is not reported by cppcheck 2.3, nor is their any evident
problem with the expanded macro.  This appears to be an issue with
this version of cppcheck.  This commit annotates the source to suppress
the warning.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11700
2021-03-05 17:56:35 -08:00
Brian Behlendorf 6bbb44e157 Initialize ZIL buffers
When populating a ZIL destination buffer ensure it is always
zeroed before its contents are constructed.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11687
2021-03-05 14:45:13 -08:00
Jorgen Lundman 8a6d444825 Fix abd_get_offset_struct() may allocate new abd
Even when supplied with an abd to abd_get_offset_struct(), the call
to abd_get_offset_impl() can allocate a different abd. Ensure to
call abd_fini_struct() on the abd that is not used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #11683
2021-03-05 12:22:57 -08:00
Ryan Moeller ba74de88c0 FreeBSD module --enable-debug --enable-invariants
Wire up the --enable-debug flag for configure to the FreeBSD module
build.  Add --enable-invariants.

The running FreeBSD kernel config is used to detect whether to enable
INVARIANTS if not explicitly specified with --enable-invariants or
--disable-invariants.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11678
2021-03-05 12:16:41 -08:00
Thomas Lamprecht fd1c366f82 zpool: use tab to intend continuation from removal status
Bring the output of the removal status in line with the other
"fields" that zpool status outputs, and thus allows an parser to
easier detect this as continuation of the 'remove:' output.

Before:
remove: Removal of vdev 0 copied 282G in 0h9m, completed on [...]
    776K memory used for removed device mappings

Now:
remove: Removal of vdev 0 copied 282G in 0h9m, completed on [...]
	776K memory used for removed device mappings

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Closes #11674
2021-03-05 12:15:35 -08:00
James Wah 92fb29b9f9 Don't bomb out when using keylocation=file://
Avoid following the error path when the operation in fact succeeded.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: James Wah <james@laird-wah.net>
Closes #11651
2021-03-03 08:28:49 -08:00
Christian Schwarz e439ee83c1 linux: zvol: avoid heap allocation for zvol_request_sync=1
The spl_kmem_alloc showed up in some flamegraphs in a single-threaded
4k sync write workload at 85k IOPS on an
Intel(R) Xeon(R) Silver 4215 CPU @ 2.50GHz.
Certainly not a huge win but I believe the change is clean and
easy to maintain down the road.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11666
2021-03-03 08:15:28 -08:00
Jake Howard 3242b5358e Add "zstd-fast" to help options for "compression" property
This value does work as expected, and is documented in the manpage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jake Howard <git@theorangeone.net>
Closes #11670
2021-03-03 08:14:19 -08:00
nssrikanth bedbc13daa Cancel TRIM / initialize on FAULTED non-writeable vdevs
When a device which is actively trimming or initializing becomes
FAULTED, and therefore no longer writable, cancel the active
TRIM or initialization.  When the device is merely taken offline
with `zpool offline` then stop the operation but do not cancel it.
When the device is brought back online the operation will be
resumed if possible.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Closes #11588
2021-03-02 10:27:27 -08:00
Andriy Gapon 2e160dee97 Fix assert in FreeBSD-specific dmu_read_pages
The function has three similar pieces of code: for read-behind pages,
requested pages and read-ahead pages.  All three pieces had an
assert to ensure that the page is not mapped.  Later the assert was
relaxed to require that the page is not mapped for writing.  But that
was done in two places out of three.  This change fixes the third piece,
read-ahead.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #11654
2021-02-27 17:23:09 -08:00
Brian Behlendorf 3e73ea0c10 ZTS: zpool_trim_start_and_cancel_pos.ksh
Several of the TRIM tests were based of the initialize tests and
then adapted for TRIM.  The zpool_trim_start_and_cancel_pos.ksh
test was intended to be one such test but it was overlooked and
actually never adapted.  Update it accordingly.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11649
2021-02-27 17:19:50 -08:00
Martin Matuška 03ef8f09e1 Add missing checks for unsupported features
After 35ec517 it has become possible to import ZFS pools witn an
active org.illumos:edonr feature on FreeBSD, leading to a panic.

In addition, "zpool status" reported all pools without edonr
as upgradable and "zpool upgrade -v" reported edonr in the list
of upgradable features.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11653
2021-02-27 17:16:02 -08:00
Coleman Kane 778fa36ee7 Linux 5.12 compat: replace bio_*_io_acct with disk_*_io_acct
The bio_*_acct functions became GPL exports, which causes the
kernel modules to refuse to compile. This replaces code with
alternate function calls to the disk_*_io_acct interfaces, which
are not GPL exports. This change was added in kernel commit
99dfc43ecbf67f12a06512918aaba61d55863efc.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11639
2021-02-24 10:06:05 -08:00
Coleman Kane d939930fcc Linux 5.12 compat: bio->bi_disk member moved
The struct bio member bi_disk was moved underneath a new member named
bi_bdev. So all attempts to reference bio->bi_disk need to now become
bio->bi_bdev->bd_disk.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11639
2021-02-24 10:04:34 -08:00
Brian Behlendorf 8e43fa12c5 Fix vdev_rebuild_thread deadlock
The metaslab_disable() call may block waiting for a txg sync.
Therefore it's important that vdev_rebuild_thread release the
SCL_CONFIG read lock it is holding before this call.  Failure
to do so can result in the txg_sync thread getting blocked
waiting for this lock which results in a deadlock.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewd-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11647
2021-02-24 10:01:00 -08:00
Brian Behlendorf 75a089ed34 Fix overly broad locking in spa_vdev_config_exit()
Calling vdev_free() only requires the we acquire the spa config
SCL_STATE_ALL locks, not the SCL_ALL locks.  In particular, we need
need to avoid taking the SCL_CONFIG lock (included in SCL_ALL) as a
writer since this can lead to a deadlock.  The txg_sync_thread() may
block in spa_txg_history_init_io() when taking the SCL_CONFIG lock
as a reading when it detects there's a pending writer.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11585
2021-02-24 10:00:21 -08:00
Tony Hutter 3ee4e6d8b7 vdev_id: Fix partition regular expression
Given a DM device name, the old vdev_id script would extract any text
after a 'p' as the partition number.  It then appends "-part" + the
partition number to the name, giving a by-vdev name like "L0-part5".

This works fine if the DM name is like 'dm-2p5', but doesn't work if
the DM name is a multipath name like "mpatha".  In those cases it
incorrectly matches the 'p' in "mpatha", giving by-vdev names like
"L0-partatha".

This patch fixes the issue by making the partition regex match stricter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11637
2021-02-24 09:58:46 -08:00
Brian Behlendorf 1dfc82a14e Linux: increase max nvlist_src size
On Linux increase the maximum allowed size of the src nvlist which
can be passed to the /dev/zfs ioctl.  Originally, this was set
to a maximum of KMALLOC_MAX_SIZE (4M) because it was kmalloc'd.
Since that time it's been converted to a vmalloc so that's no
longer a hard limit, and it's desirable for `zfs send/recv` to
allow larger nvlists so more snapshots can be sent at once.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6572
Closes #11638
2021-02-24 09:57:18 -08:00
Prakash Surya f01eaed455 Add upper bound for slop space calculation
This change modifies the behavior of how we determine how much slop
space to use in the pool, such that now it has an upper limit. The
default upper limit is 128G, but is configurable via a tunable.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #11023
2021-02-24 09:52:43 -08:00
Ryan Moeller 5156862960 Wrap bare EINVAL returns with SET_ERROR
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11636
2021-02-24 09:51:10 -08:00
Ryan Moeller 94fa1c3d96 Force symlink creation for zpool.d compat links
gmake install fails when zpool.d compat links already exist.

Force the symlinks to be recreated if already present.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11633
2021-02-24 09:49:59 -08:00
Cedric Maunoury b9c07ec71b send_iterate_snap : doall send without fromsnap
The behavior of a NULL fromsnap was inadvertently changed for a doall
send when the send/recv logic in libzfs was updated.  Restore the
previous behavior by correcting send_iterate_snap() to include all
the snapshots in the nvlist for this case. 

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cedric Maunoury <cedric.maunoury@gmail.com>
Closes #11608
2021-02-24 09:48:58 -08:00
Adam D. Moss 9312e0fd1e Fix error message when zfs module are already unloaded
Using zfs-sh -u on linux will fail with inaccurate message when the
zfs modules are already unloaded.  Deal with the case where a module 
is already unloaded; its USE_COUNT will be the empty string

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11627
2021-02-20 20:23:10 -08:00
fbynite 11f2e9a491 vdev_ops: don't try to call vdev_op_hold or vdev_op_rele when NULL
This prevents a panic after a SLOG add/removal on the root pool followed
by a zpool scrub.

When a SLOG is removed, a hole takes its place - the vdev_ops for a hole
is vdev_hole_ops, which defines the handler functions of vdev_op_hold
and vdev_op_rele as NULL.

This bug has been reported in illumos and FreeBSD, a different trigger
in the FreeBSD report though.

Credit for this patch goes to Patrick Mooney <pmooney@pfmooney.com>

Obtained from: illumos-gate commit: c65bd18728f34725
External-issue: https://www.illumos.org/issues/12981
External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252396
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.fx907@gmail.com>
Closes #11623
2021-02-20 20:19:20 -08:00
Tony Hutter 7e56b05058 Better zfs_get_enclosure_sysfs_path() enclosure support
A multpathed disk will have several 'underlying' paths to the disk.  For
example, multipath disk 'dm-0' may be made up of paths:
/dev/{sda,sdb,sdc,sdd}.  On many enclosures those underlying sysfs
paths will have a symlink back to their enclosure device entry
(like 'enclosure_device0/slot1').  This is used by the
statechange-led.sh script to set/clear the fault LED for a disk, and
by 'zpool status -c'.

However, on some enclosures, those underlying paths may not all have
symlinks back to the enclosure device.  Maybe only two out of four
of them might.

This patch updates zfs_get_enclosure_sysfs_path() to favor returning
paths that have symlinks back to their enclosure devices, rather
than just returning the first path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11617
2021-02-20 20:17:45 -08:00
Brian Atkinson c0801bf35a Cleaning up uio headers
Making uio_impl.h the common header interface between Linux and FreeBSD
so both OS's can share a common header file. This also helps reduce code
duplication for zfs_uio_t for each OS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11622
2021-02-20 20:16:50 -08:00
Christian Schwarz 52cb284f7b ztest: propagate -o to the zdb child process
I think this is the behavior that most users expect.

Future work: have a separate flag, e.g., -O, to specify separate
set_global_vars for the zdb child than for the ztest children.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602
2021-02-19 22:45:16 -08:00
Christian Schwarz 2801e68d1b ztest: fix -o by calling set_global_var in child processes
Without set_global_var() in the child processes the -o option provides
little use.

Before this change set_global_var() was called as a side-effect of
getopt processing which only happens for the parent ztest process.

This change limits the set of options that can be set and makes them
available to the child through ztest_shared_opts_t.

Future work: support arbitrary option count and length.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602
2021-02-19 22:45:10 -08:00
Christian Schwarz edc508ac0b libzpool: set_global_var: refactor to not modify 'arg'
Also fixes leak of the dlopen handle in the error case.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602
2021-02-19 22:45:04 -08:00
Christian Schwarz b5fffa1d29 libzpool: set_global_var: fix endianness handling (fixes zdb -o )
Without this patch I get the error

  Setting global variables is only supported on little-endian systems

when using `zdb -o` on my amd64 machine.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602
2021-02-19 22:44:05 -08:00
Ryan Moeller 64e0fe14ff Restore FreeBSD resource usage accounting
Add zfs_racct_* interfaces for platform-dependent read/write accounting.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11613
2021-02-19 22:34:33 -08:00
Don Brady 03e02e5b56 Checksum errors may not be counted
Fix regression seen in issue #11545 where checksum errors 
where not being counted or showing up in a zpool event.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #11609
2021-02-19 22:33:15 -08:00
Mark Johnston e7adccf7f5 FreeBSD: disable the use of hardware crypto offload drivers for now
First, the crypto request completion handler contains a bug in that it
fails to reset fs_done correctly after the request is completed.  This
is only a problem for asynchronous drivers.  Second, some hardware
drivers have input constraints which ZFS does not satisfy.  For
instance, ccp(4) apparently requires the AAD length for AES-GCM to be a
multiple of the cipher block size, and with qat(4) the AES-GCM AAD
length may not be longer than 240 bytes.  FreeBSD's generic crypto
framework doesn't have a mechanism to automatically fall back to a
software implementation if a hardware driver cannot process a request,
and ZFS does not tolerate such errors.

The plan is to implement such a fallback mechanism, but with FreeBSD
13.0 approaching we should simply disable the use hardware drivers for
now.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #11612
2021-02-18 15:51:20 -08:00
Andriy Gapon 778869fa13 Fix report_mount_progress never calling set_progress_header
That happens because of an off-by-one mistake.
share_mount_one_cb() calls report_mount_progress(current=sm_done) after
having incremented sm_done by one.  Then report_mount_progress()
increments the parameter again.  It appears that that logic became
obsolete after commit a10d50f999, parallel zfs mount.

On FreeBSD I observe that zfs mount -a -v prints, for example,
    (null): (201/248)
That happens because set_progress_header() is never called.

With this change the output becomes correct:
    Mounting ZFS filesystems: (209/248)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #11607
2021-02-18 13:53:05 -08:00
Ryan Libby bf156c966b Remove unused abd_alloc_scatter_offset_chunkcnt
Remove function that become unused after refactoring in
e2af2acce3.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11614
2021-02-17 21:39:13 -08:00
Colm 658fb8020f Add "compatibility" property for zpool feature sets
Property to allow sets of features to be specified; for compatibility
with specific versions / releases / external systems. Influences
the behavior of 'zpool upgrade' and 'zpool create'. Initial man
page changes and test cases included.

Brief synopsis:

zpool create -o compatibility=off|legacy|file[,file...] pool vdev...

compatibility = off : disable compatibility mode (enable all features)
compatibility = legacy : request that no features be enabled
compatibility = file[,file...] : read features from specified files.
Only features present in *all* files will be enabled on the
resulting pool. Filenames may be absolute, or relative to
/etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d (/etc
checked first).

Only affects zpool create, zpool upgrade and zpool status.

ABI changes in libzfs:

* New function "zpool_load_compat" to load and parse compat sets.
* Add "zpool_compat_status_t" typedef for compatibility parse status.
* Add ZPOOL_PROP_COMPATIBILITY to the pool properties enum
* Add ZPOOL_STATUS_COMPATIBILITY_ERR to the pool status enum

An initial set of base compatibility sets are included in
cmd/zpool/compatibility.d, and the Makefile for cmd/zpool is
modified to install these in $pkgdatadir/compatibility.d and to
create symbolic links to a reasonable set of aliases.

Reviewed-by: ericloewe
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #11468
2021-02-17 21:30:45 -08:00
Brian Behlendorf 35ec51796f FreeBSD: disable edonr in zfs_mod_supported_feature()
Rather than conditionally compiling out the edonr code for FreeBSD
update zfs_mod_supported_feature() to indicate this feature is
unsupported.  This ensures that all spa features are defined on
every platform, even if they are not supported.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11605 
Issue #11468
2021-02-17 08:14:51 -08:00
José Luis Salvador Rufo aef1830f93 Support uClibc for the tests compilations
There are two issues that don't allow ZFS to be compiled using uClibc.
`backtrace()`, and `program_invocation_short_name` as a `const`.
This patch adds uClibc to the conditionals in the same way there are
already for Glibc for `backtrace()`; and removes the external param
`program_invocation_short_name` because its only used here for the
whole project.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes #11600
2021-02-16 21:51:46 -08:00
Ryan Moeller 436ab35a53 Make inline ABD predicates compatible with C++
FreeBSD's zfsd fails to build after e2af2acce3 due to strict type
checking errors from the implicit conversion between bool and boolean_t
in the inline predicate definitions in abd.h.

Use conditionals to return the correct value type from these functions.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11592
2021-02-15 10:15:50 -08:00
Brian Behlendorf c1c31a835a Linux 5.11 compat: META
Increase the Linux-Maximum version in the META file to 5.11.
All of the required compatibility patches have been merged.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11586
2021-02-10 10:11:21 -08:00
Arshad Hussain 677cdf7865 vdev_id: Support daisy-chained JBODs in multipath mode
Within function sas_handler() userspace commands like
'/usr/sbin/multipath' have been replaced with sourcing
device details from within sysfs which reduced a
significant amount of overhead and processing time.
Multiple JBOD enclosures and their order are sourced
from the bsg driver (/sys/class/enclosure) to isolate
chassis top-level expanders, which are then dynamically
indexed based on host channel of the multipath subordinate
disk member device being processed. Additionally added a
"mixed" mode for slot identification for environments where
a ZFS server system may contain SAS disk slots where there
is no expander (direct connect to HBA) while an attached
external JBOD with an expander have different slot identifier
methods.

How Has This Been Tested?
~~~~~~~~~~~~~~~~~~~~~~~~~

Testing was performed on a AMD EPYC based dual-server
high-availability multipath environment with multiple
HBAs per ZFS server and four SAS JBODs. The two primary
JBODs were multipath/cross-connected between the two
ZFS-HA servers. The secondary JBODs were daisy-chained
off of the primary JBODs using aligned SAS expander
channels (JBOD-0 expanderA--->JBOD-1 expanderA,
          JBOD-0 expanderB--->JBOD-1 expanderB, etc).
Pools were created, exported and re-imported, imported
globally with 'zpool import -a -d /dev/disk/by-vdev'.
Low level udev debug outputs were traced to isolate
and resolve errors.

Result:
~~~~~~~

Initial testing of a previous version of this change
showed how reliance on userspace utilities like
'/usr/sbin/multipath' and '/usr/bin/lsscsi' were
exacerbated by increasing numbers of disks and JBODs.
With four 60-disk SAS JBODs and 240 disks the time to
process a udevadm trigger was 3 minutes 30 seconds
during which nearly all CPU cores were above 80%
utilization. By switching reliance on userspace
utilities to sysfs in this version, the udevadm
trigger processing time was reduced to 12.2 seconds
and negligible CPU load.

This patch also fixes few shellcheck complains.

Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Jeff Johnson <jeff.johnson@aeoncomputing.com>
Signed-off-by: Jeff Johnson <jeff.johnson@aeoncomputing.com>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #11526
2021-02-09 13:04:09 -08:00
khng300 fc273894d2 Rename zfs_inode_update to zfs_znode_update_vfs
zfs_znode_update_vfs is a more platform-agnostic name than
zfs_inode_update. Besides that, the function's prototype is moved to
include/sys/zfs_znode.h as the function is also used in common code.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ka Ho Ng <khng300@gmail.com>
Sponsored by: The FreeBSD Foundation
Closes #11580
2021-02-09 11:17:29 -08:00
Kleber Tarcísio 4f22619ae3 Add an assert to clarify code
The first time through the loop prevdb and prevhdl are NULL.  They 
are then both set, but only prevdb is checked.  Add an ASSERT to 
make it clear that prevhdl must be set when prevdb is.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kleber <klebertarcisio@yahoo.com.br>
Closes #10754
Closes #11575
2021-02-09 11:14:59 -08:00
Antonio Russo f8ce8aed0c Set file mode during zfs_write
3d40b65 refactored zfs_vnops.c, which shared much code verbatim between
Linux and BSD.  After a successful write, the suid/sgid bits are reset,
and the mode to be written is stored in newmode.  On Linux, this was
propagated to both the in-memory inode and znode, which is then updated
with sa_update.

3d40b65 accidentally removed the initialization of newmode, which
happened to occur on the same line as the inode update (which has been
moved out of the function).

The uninitialized newmode can be saved to disk, leading to a crash on
stat() of that file, in addition to a merely incorrect file mode.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11474 
Closes #11576
2021-02-08 09:15:05 -08:00
наб 7c64ee9e77 zfs-import-{cache,scan}: change condition to FileNotEmpty
When all pools are exported ZFS will generate an empty cache file.
This will cause the import service to fail, which is sub-optimal, 
since this means that dracut fails, and it necessary to run 
`zpool import -a` to boot, delete the file, and regenerate+reinstall 
the initrd.

This resolves the issue by treating an zero-length cache files the
same as a missing cache file.  This aligns the behavior with that
of the `zpool` command itself.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11568
2021-02-05 11:25:22 -08:00
nssrikanth 32366649d3 Fixed issue with processing of EC_dev_remove event
The pool guid and vdev guid received by zfs_agent_post_event(),
which calls zfs_retire_recv(), are normally non-zero.  However,
later in this same method they may be unconditionally reset to
zero by the code which is intended to handle  multipath, spare
and l2arc vdevs.  This will result in the EC_dev_remove not
being handled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>\
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Closes #11564
2021-02-05 08:30:50 -08:00
Brian Behlendorf d66f017c17 zfs-list.8: clarify listing snapshots
Clarify how to include snapshots in the `zpool list` output by
referencing the full name of the `listsnapshots` pool property,
and the `zpool list -t snapshot` option.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11562
Closes #11565
2021-02-04 09:56:28 -08:00
Christian Schwarz 84268b099b Document monotonicity of dmu_tx_assign() and txg_hold_open()
Expand the comments to make it clear exactly what is guaranteed
by dmu_tx_assign() and txg_hold_open().  Additionally, update
the comment which refers to txg_exit() when it should reference
txg_rele_to_sync().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11521
2021-02-02 10:11:37 -08:00
George Melikov 9eee7fce3b zts-report.py: ignore some skipped tests in Github CI
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554
2021-02-02 09:37:23 -08:00
George Melikov aa743acc79 CI: add ubuntu-* functional tests runner
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554
2021-02-02 09:37:23 -08:00
George Melikov 1051499aef CI: rename zfs-tests workflow
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554
2021-02-02 09:37:11 -08:00
Brian Behlendorf 96a6629872 Remove unused iov_iter_init_compat() wrapper
This compatibility code is no longer needed.  For it a while
iov_iter_init_compat() was used by zfs_uio_prefaultpages() but
this code should have been dropped as part of commit 83b91ae1.
Take care of that oversight and remove it.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11543
2021-01-30 10:06:14 -08:00
Matthew Ahrens 2d4bbd14fc The abd child/parent relationship does not need to be tracked
ABD's currently track their parent/child relationship.  This applies to
`abd_get_offset()` and `abd_borrow_buf()`.  However, nothing depends on
knowing this relationship, it's only used for consistency checks to
verify that we are not destroying an ABD that's still in use.  When we
are creating/destroying ABD's frequently, the performance impact of
maintaining these data structures (in particular the atomic
increment/decrement operations) can be measurable.

This commit removes this verification code on production builds, but
keeps it when ZFS_DEBUG is set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11535
2021-01-30 10:04:42 -08:00
nssrikanth cf0a6dd3ed Added extra check to replace Faulted VDEV with Distributed Spare
In ZED zfs_retire agent added a check to handle Distributed Spare
replacement for Faulted VDEV also.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Mark Maybee <mark.maybee@hpe.com>
Closes #11354 
Closes #11355
2021-01-28 17:00:26 -08:00
Brian Atkinson 2993698eb3 Fixing gang ABD when adding another gang
I originally applied a fix in #11539 to fix a parent's child references
when a gang ABD is free'd. However, I did not take into account
abd_gang_add_gang(). We still need to make sure to update the child
references in this function as well. In order to resolve this I removed
decreasing the gang ABD's size in abd_free_gang() as well as moved back
the original placeent of zfs_refcount_remove_many() in abd_free().

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11542
2021-01-28 16:54:12 -08:00
George Melikov 9f8c7e6a76 ZTS: add userspace_send_encrypted.ksh to Makefile
All tests need to be included in the Makefiles.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11541
2021-01-28 13:39:38 -08:00
Matthew Ahrens f8c0d7e1f6 fix abd_nr_pages_off for gang abd
`__vdev_disk_physio()` uses `abd_nr_pages_off()` to allocate a bio with
a sufficient number of iovec's to process this zio (i.e.
`nr_iovecs`/`bi_max_vecs`).  If there are not enough iovec's in the bio,
then additional bio's will be allocated.  However, this is a sub-optimal
code path.  In particular, it requires several abd calls (to
`abd_nr_pages_off()` and `abd_bio_map_off()`) which will have to walk
the constituents of the ABD (the pages or the gang children) because
they are looking for offsets > 0.

For gang ABD's, `abd_nr_pages_off()` returns the number of iovec's
needed for the first constituent, rather than the sum of all
constituents (within the requested range).  This always under-estimates
the required number of iovec's, which causes us to always need several
bio's.  The end result is that `__vdev_disk_physio()` is usually O(n^2)
for gang ABD's (and occasionally O(n^3), when more than 16 bio's are
needed).

This commit fixes `abd_nr_pages_off()`'s handling of gang ABD's, to
correctly determine how many iovec's are needed, by adding up the number
of iovec's for each of the gang children in the requested range.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11536
2021-01-28 09:28:20 -08:00
George Amanakis 0ae184a6ba Avoid updating the L2ARC device header unnecessarily
If we do not write any buffers to the cache device and the evict hand
has not advanced do not update the cache device header.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11522 
Closes #11537
2021-01-28 09:20:03 -08:00
Brian Atkinson 416015ef54 Removing ABD Parent Child Reference Before Freeing ABD
Moving the call to zfs_refcount_remove_many() in abd_free() to be called
before any of the ABD free variants are called. This is necessary
because abd_free_gang() adjusts the abd_size for the gang ABD. If the
parent's child references are removed after free'ing the gang ABD the
refcount is not adjusted correctly for the parent's children.

I also removed some stray abd_put() in comments and changed
abd_free_gang_abd() -> abd_free_gang().

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11539
2021-01-28 09:15:17 -08:00
Allan Jude 393e69241e Add zdb -r <dataset> <object-id | file> <output>
While you can use zdb -R poolname vdev:offset:[<lsize>/]<psize>[:flags] 
to extract individual DVAs from a vdev, it would be handy for be able 
copy an entire file out of the pool.

Given a file or object number, add support to copy the contents to a 
file. Useful for debugging and recovery.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11027
2021-01-27 21:36:01 -08:00
Mark Maybee b2c5904a78 Revert special case code from pre-hashtable nvlist era
Before a hash table was added on top of the nvlist code, there were
cases where the nvlist allocation was changed from fnvlist_alloc()
to nvlist_alloc() to avoid expensive NV_UNIQUE_NAME checks. Now
this is no longer necessary. These changes should be reverted to be
consistent with other code. There are some cases where this change
will also reduce the number of iterations.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Maybee <mark.maybee@delphix.com>
Closes #11464
2021-01-27 21:31:51 -08:00
Paul Dagnelie 2921ad6cba Fix zrele race in zrele_async that can cause hang
There is a race condition in zfs_zrele_async when we are checking if 
we would be the one to evict an inode. This can lead to a txg sync 
deadlock.

Instead of calling into iput directly, we attempt to perform the atomic 
decrement ourselves, unless that would set the i_count value to zero. 
In that case, we dispatch a call to iput to run later, to prevent a 
deadlock from occurring.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11527 
Closes #11530
2021-01-27 21:29:58 -08:00
George Melikov b8e6401b79 ZTS: pool_state test check for pool existence in cleanup
If there is no scsi_debug module, then this test
must be skipped, in this case cleanup routine should
be prepared for absent pool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11534
2021-01-27 17:33:30 -08:00
Alan Somers 6b2e7203ae Fix a resource leak in uu_avl_pool_destroy
Need to destroy the pthread mutex created in uu_avl_pool_create.

https://svnweb.freebsd.org/base?view=revision&revision=262912

Obtained from: FreeBSD
Sponsored by:	Spectra Logic Corporation
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11528
2021-01-26 19:39:28 -08:00
Alan Somers cf0977ad72 Parallelize vdev_validate
The runtime of vdev_validate is dominated by the disk accesses in
vdev_label_read_config.  Speed it up by validating all vdevs in
parallel using a taskq.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470
2021-01-26 19:36:51 -08:00
Alan Somers 67874d5487 Read all disk labels concurrently in vdev_label_read_config
This is similar to what we already do in vdev_geom_read_config.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470
2021-01-26 19:36:02 -08:00
Alan Somers a0e01997ec Parallelize vdev_load
metaslab_init is the slowest part of importing a mature pool, and it
must be repeated hundreds of times for each top-level vdev.  But its
speed is dominated by a few serialized disk accesses.  That can lead to
import times of > 1 hour for pools with many top-level vdevs on spinny
disks.

Speed up the import by using a taskqueue to parallelize vdev_load across
all top-level vdevs.

This also requires adding mutex protection to
metaslab_class_t.mc_historgram.  The mc_histogram fields were
unprotected when that code was first written in "Illumos 4976-4984 -
metaslab improvements" (OpenZFS
f3a7f6610f).  The lock wasn't added until
3dfb57a35e, though it's unclear exactly
which fields it's supposed to protect.  In any case, it wasn't until
vdev_load was parallelized that any code attempted concurrent access to
those fields.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470
2021-01-26 19:35:59 -08:00
Alan Somers dfb44c500e Fix a man page link in zfs-program.8
zfs-program.8 has an orphan link, fix it.

https://svnweb.freebsd.org/base?view=revision&revision=360080

Obtained from: FreeBSD
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11529
2021-01-26 16:17:11 -08:00
Brian Behlendorf 0e6c493fec cppcheck: integrete cppcheck
In order for cppcheck to perform a proper analysis it needs to be
aware of how the sources are compiled (source files, include
paths/files, extra defines, etc).  All the needed information is
available from the Makefiles and can be leveraged with a generic
cppcheck Makefile target.  So let's add one.

Additional minor changes:

* Removing the cppcheck-suppressions.txt file.  With cppcheck 2.3
  and these changes it appears to no longer be needed.  Some inline
  suppressions were also removed since they appear not to be
  needed.  We can add them back if it turns out they're needed
  for older versions of cppcheck.

* Added the ax_count_cpus m4 macro to detect at configure time how
  many processors are available in order to run multiple cppcheck
  jobs.  This value is also now used as a replacement for nproc
  when executing the kernel interface checks.

* "PHONY =" line moved in to the Rules.am file which is included
  at the top of all Makefile.am's.  This is just convenient becase
  it allows us to use the += syntax to add phony targets.

* One upside of this integration worth mentioning is it now allows
  `make cppcheck` to be run in any directory to check that subtree.

* For the moment, cppcheck is not run against the FreeBSD specific
  kernel sources.  The cppcheck-FreeBSD target will need to be
  implemented and testing on FreeBSD to support this.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-01-26 16:12:26 -08:00
Brian Behlendorf a06ba74a1e cppcheck: return value always 0
Identical condition and return expression 'rc', return value is
always 0.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-01-26 16:12:18 -08:00
Brian Behlendorf 2cdd75bed7 cppcheck: remove redundant ASSERTs
The ASSERT that the passed pointer isn't NULL appears after the
pointer has already been dereferenced.  Remove the redundant check.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-01-26 16:12:10 -08:00
Brian Behlendorf 6fc1ce0723 cppcheck: resolve double free
The double free reported for the realloc() failure branch is a
false positive.  It should be resolved in cppcheck 2.4 but for
the benefit of older versions we supress the warning.

    https://trac.cppcheck.net/ticket/9292

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-01-26 16:12:02 -08:00
Brian Behlendorf 7454d2bb8d cppcheck: zpool_main.c possible null pointer dereference
Explicitly check for NULL to satisfy cppcheck that "val" can never
be NULL when passed to printf().  This looks like a false positive
since is_blank_str() can never take the false conditional branch
when passed a NULL.  But there's no harm in adding the extra check.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-01-26 16:11:46 -08:00
Matthew Ahrens 62d4287f27 RAIDZ2/3 fails to heal silently corrupted parity w/2+ bad disks
When scrubbing, (non-sequential) resilvering, or correcting a checksum
error using RAIDZ parity, ZFS should heal any incorrect RAIDZ parity by
overwriting it.  For example, if P disks are silently corrupted (P being
the number of failures tolerated; e.g. RAIDZ2 has P=2), `zpool scrub`
should detect and heal all the bad state on these disks, including
parity.  This way if there is a subsequent failure we are fully
protected.

With RAIDZ2 or RAIDZ3, a block can have silent damage to a parity
sector, and also damage (silent or known) to a data sector.  In this
case the parity should be healed but it is not.

The problem can be noticed by scrubbing the pool twice.  Assuming there
was no damage concurrent with the scrubs, the first scrub should fix all
silent damage, and the second scrub should be "clean" (`zpool status`
should not report checksum errors on any disks).  If the bug is
encountered, then the second scrub will repair the silently-damaged
parity that the first scrub failed to repair, and these checksum errors
will be reported after the second scrub.  Since the first scrub repaired
all the damaged data, the bug can not be encountered during the second
scrub, so subsequent scrubs (more than two) are not necessary.

The root cause of the problem is some code that was inadvertently added
to `raidz_parity_verify()` by the DRAID changes.  The incorrect code
causes the parity healing to be aborted if there is damaged data
(`rc_error != 0`) or the data disk is not present (`!rc_tried`).  These
checks are not necessary, because we only call `raidz_parity_verify()`
if we have the correct data (which may have been reconstructed using
parity, and which was verified by the checksum).

This commit fixes the problem by removing the incorrect checks in
`raidz_parity_verify()`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11489 
Closes #11510
2021-01-26 16:05:05 -08:00
Will Andrews d7265b3309 ZTS: zpool_export test improvements
- refactor cleanup routines into common kshlib zpool_export_cleanup func
- don't require physical disks to test, just use files

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Will Andrews <will@firepipe.net>
Closes #11518
2021-01-26 13:14:04 -08:00
Lorenz Hüdepohl caedada66e dracut: Fix race condition between load-key and import
zfs-load-key.sh is called by the dracut-pre-mount.service unit which has
no explicit 'After' dependency on zfs-import.target. That way it can be
that the pool has not yet been imported and the zfs-load-key.sh finishes
without ever seeing the relevant pool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Lorenz Hüdepohl <dev@stellardeath.org>
Closes #11500
2021-01-26 12:14:22 -08:00
Will Andrews f4f50a7048 spa_export_common: refactor common exit points
Create a common exit point for spa_export_common (a very long 
function), which avoids missing steps on failure.  This work
is helpful for the planned forced pool export changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Will Andrews <will@firepipe.net>
Closes #11514
2021-01-25 15:04:11 -08:00
Will Andrews 35ac0ed1fd ZTS: improve output clarity of check_prop_source
Instead of just failing, indicate the expected and actual value and
source as a NOTE.  Tests using this failed in an earlier version of
the changeset and this information helped find the cause.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Will Andrews <will@firepipe.net>
Closes #11517
2021-01-25 14:39:58 -08:00
Will Andrews a57acbb627 ZTS: remove duplicate check_prop_source from zfs_receive
There is an identical definition in zfs_set_common.kshlib already.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Will Andrews <will@firepipe.net>
Closes #11516
2021-01-25 14:38:19 -08:00
Will Andrews de1a6cc387 logapi: cat output file instead of printing
This avoids globbing together multiple lines in the log, if you happen
to specify LOGAPI_DEBUG because you want to see it.

Signed-off-by:	Will Andrews <will@firepipe.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11515
2021-01-25 14:33:57 -08:00
Matthew Macy a4134da2b2 spl-taskq: Make sure thread tsd hash entry is cleared
Like any other thread created by thread_create() we need to call
thread_exit() to properly clean it up.  In particular, this ensures the
tsd hash for the thread is cleared.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11512
2021-01-25 11:18:28 -08:00
Alan Somers fd95af8dd4 Speed up "zpool import" in the presence of many zvols
By default, FreeBSD does not allow zpools to be backed by zvols (that
can be changed with the "vfs.zfs.vol.recursive" sysctl). When that
sysctl is set to 0, the kernel does not attempt to read zvols when
looking for vdevs. But the zpool command still does. This change brings
the zpool command into line with the kernel's behavior. It speeds "zpool
import" when an already imported pool has many zvols, or a zvol with
many snapshots.

https://svnweb.freebsd.org/base?view=revision&revision=357235
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241083
https://reviews.freebsd.org/D22077

Obtained from: FreeBSD
Reported by: Martin Birgmeier <d8zNeCFG@aon.at>
Sponsored by: Axcient
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11502
2021-01-24 16:02:45 -08:00
наб 8887760aef zfsprops.8: fix mispluralisation in "Default values is"
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11509
2021-01-24 15:57:51 -08:00
Ryan Moeller a4acc47e4b ZTS: Use swapctl to list swap devices on FreeBSD
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11503
2021-01-24 15:56:59 -08:00
Arshad Hussain e2870fb24a vdev_id: Add error message when $CONFIG is missing
It was observed that vdev_id exists silently when
the $CONFIG file is missing.

This patch adds error message in case vdev_id is
called without default $CONFIG or '-c'. This makes
end user observe the exit message more easily.

Before Patch:
~~~~~~~~~~~~~
$ ./cmd/vdev_id/vdev_id
$

After Patch:
~~~~~~~~~~~~
$ ./cmd/vdev_id/vdev_id
Error: Config file "/etc/zfs/vdev_id.conf" not found
$

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #11498
2021-01-23 15:52:29 -08:00
Colm 4a90d4d6fc Fix two minor lint errors (cppcheck)
Fix two minor errors reported by cppcheck:

In module/zfs/abd.c (abd_get_offset_impl), add non-NULL
assertion to prevent NULL dereference warning.

In module/zfs/arc.c (l2arc_write_buffers), change 'try'
variable to 'pass' to avoid C++ reserved word.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #11507
2021-01-23 15:49:32 -08:00
Alexander Motin 5aa69a57da Relax special_small_blocks assertion.
Follow up for commit 624222a, value asserted <= SPA_OLD_MAXBLOCKSIZE
instead of SPA_MAXBLOCKSIZE as it should be after the previous change.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11501
2021-01-23 15:45:27 -08:00
Matthew Macy 716408f560 Add basic io_uring test
Provide a basic test coverage for io_uring I/O.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11497
2021-01-23 15:42:42 -08:00
Ryan Moeller 1c94345103 FreeBSD: upstream changes to VFS interface
Set VIRF_MOUNTPOINT flag on snapshot mountpoint.

Authored-by: Mateusz Guzik <mjg@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11458
2021-01-23 15:40:43 -08:00
Matt Macy 0e9bcd5d4f FreeBSD: fix HEAD build, conditionally remove FDSYNC defines
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11458
2021-01-23 15:39:55 -08:00
Brian Behlendorf 76e1f78d4b Only add supported features during pool creation
When creating a pool only features supported by both user and
kernel space should be enabled.  Furthermore, improve the error
messages when attempting to create, or add, a dRAID vdev when
the dRAID feature is not supported by the kernel modules.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11492
2021-01-22 09:47:06 -08:00
Matthew Ahrens aa755b3549 Set aside a metaslab for ZIL blocks
Mixing ZIL and normal allocations has several problems:

1. The ZIL allocations are allocated, written to disk, and then a few
seconds later freed.  This leaves behind holes (free segments) where the
ZIL blocks used to be, which increases fragmentation, which negatively
impacts performance.

2. When under moderate load, ZIL allocations are of 128KB.  If the pool
is fairly fragmented, there may not be many free chunks of that size.
This causes ZFS to load more metaslabs to locate free segments of 128KB
or more.  The loading happens synchronously (from zil_commit()), and can
take around a second even if the metaslab's spacemap is cached in the
ARC.  All concurrent synchronous operations on this filesystem must wait
while the metaslab is loading.  This can cause a significant performance
impact.

3. If the pool is very fragmented, there may be zero free chunks of
128KB or more.  In this case, the ZIL falls back to txg_wait_synced(),
which has an enormous performance impact.

These problems can be eliminated by using a dedicated log device
("slog"), even one with the same performance characteristics as the
normal devices.

This change sets aside one metaslab from each top-level vdev that is
preferentially used for ZIL allocations (vdev_log_mg,
spa_embedded_log_class).  From an allocation perspective, this is
similar to having a dedicated log device, and it eliminates the
above-mentioned performance problems.

Log (ZIL) blocks can be allocated from the following locations.  Each
one is tried in order until the allocation succeeds:
1. dedicated log vdevs, aka "slog" (spa_log_class)
2. embedded slog metaslabs (spa_embedded_log_class)
3. other metaslabs in normal vdevs (spa_normal_class)

The space required for the embedded slog metaslabs is usually between
0.5% and 1.0% of the pool, and comes out of the existing 3.2% of "slop"
space that is not available for user data.

On an all-ssd system with 4TB storage, 87% fragmentation, 60% capacity,
and recordsize=8k, testing shows a ~50% performance increase on random
8k sync writes.  On even more fragmented systems (which hit problem #3
above and call txg_wait_synced()), the performance improvement can be
arbitrarily large (>100x).

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11389
2021-01-21 15:12:54 -08:00
Lorenz Hüdepohl 984362a71e dracut: Support /usr/bin as 'systemctl' path
On openSUSE the initrd has systemctl in /usr/bin, check this path as
well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Lorenz Hüdepohl <dev@stellardeath.org>
Closes #11487
2021-01-21 12:59:24 -08:00
Antonio Russo 0ae733c7a4 Install zgenhostid to sbindir
zgenhostid(8) is used to modify or create /etc/hostid.  This
administrative tool is currently installed to bindir.  System utilities
are typically placed in sbin.

Modify the installation directory for zgenhostid.  Additionally, track
this change in its use in dracut and the rpm installation.

Authored-by: наб <nabijaczleweli@nabijaczleweli.xyz>
Authored-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11485
2021-01-21 12:58:24 -08:00
Alan Somers 2d8f72d76c zpool: speed up importing large pools (#11469)
The ZFS_IOC_POOL_TRYIMPORT ioctl returns an nvlist from the kernel to a
preallocated buffer in  userland.  Userland must guess how large the
buffer should be.  If it undersizes it, it must reallocate and try
again.  That can cost a lot of time for large pools.

OpenZFS commit 28b40c8a6e set the guess at "zc.zc_nvlist_conf_size * 4"
without explanation.  On my system, that is too small.  From experiment,
x 32 is a better multiplier.  But I don't know how to calculate it
theoretically.

Sponsored by:	Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #11469
2021-01-21 12:55:54 -08:00
Alan Somers e50b5217e7 libzutil: optimize zpool_read_label with AIO
Read all labels in parallel instead of sequentially.

Originally committed as
https://cgit.freebsd.org/src/commit/?id=b49e9abcf44cafaf5cfad7029c9a6adbb28346e8

Obtained from: FreeBSD
Sponsored by: Spectra Logic, Axcient
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11467
2021-01-21 11:24:35 -08:00
Alan Somers ec40ce8405 libzutil: don't read extraneous data in zpool_read_label
zpool_read_label doesn't need the full labels including uberblocks.  It
only needs the vdev_phys_t.  This reduces by half the amount of data
read to check for a label, speeding up "zpool import", "zpool
labelclear", etc.

Originally committed as
https://cgit.freebsd.org/src/commit/?id=63f8025d6acab1b334373ddd33f940a69b3b54cc

Obtained from: FreeBSD
Sponsored by: Spectra Logic Corp, Axcient
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11467
2021-01-21 11:18:32 -08:00
Brian Behlendorf 83b91ae1a4 Linux 5.10 compat: restore custom uio_prefaultpages()
As part of commit 1c2358c1 the custom uio_prefaultpages() code
was removed in favor of using the generic kernel provided
iov_iter_fault_in_readable() interface.  Unfortunately, it
turns out that up until the Linux 4.7 kernel the function would
only ever fault in the first iovec of the iov_iter.  The result
being uiomove_iov() may hang waiting for the page.

This commit effectively restores the custom uio_prefaultpages()
pages code for Linux 4.9 and earlier kernels which contain the
troublesome version of iov_iter_fault_in_readable().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11463 
Closes #11484
2021-01-21 10:43:39 -08:00
Brian Atkinson d0cd9a5cc6 Extending FreeBSD UIO Struct
In FreeBSD the struct uio was just a typedef to uio_t. In order to
extend this struct, outside of the definition for the struct uio, the
struct uio has been embedded inside of a uio_t struct.

Also renamed all the uio_* interfaces to be zfs_uio_* to make it clear
this is a ZFS interface.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11438
2021-01-20 21:27:30 -08:00
Matthew Ahrens e2af2acce3 allow callers to allocate and provide the abd_t struct
The `abd_get_offset_*()` routines create an abd_t that references
another abd_t, and doesn't allocate any pages/buffers of its own.  In
some workloads, these routines may be called frequently, to create many
abd_t's representing small pieces of a single large abd_t.  In
particular, the upcoming RAIDZ Expansion project makes heavy use of
these routines.

This commit adds the ability for the caller to allocate and provide the
abd_t struct to a variant of `abd_get_offset_*()`.  This eliminates the
cost of allocating the abd_t and performing the accounting associated
with it (`abdstat_struct_size`).  The RAIDZ/DRAID code uses this for
the `rc_abd`, which references the zio's abd.  The upcoming RAIDZ
Expansion project will leverage this infrastructure to increase
performance of reads post-expansion by around 50%.

Additionally, some of the interfaces around creating and destroying
abd_t's are cleaned up.  Most significantly, the distinction between
`abd_put()` and `abd_free()` is eliminated; all types of abd_t's are
now disposed of with `abd_free()`.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Issue #8853 
Closes #11439
2021-01-20 11:24:37 -08:00
sterlingjensen 03f036cbcc Re-apply path sanitizer, as mount(8) still mangles it
Prior to util-linux 2.36.2, if a file or directory in the
current working directory was named 'dataset' then mount(8)
would prepend the current working directory to the dataset.

Eventually, we should be able to drop this workaround.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11295 
Closes #11462
2021-01-19 11:57:31 -08:00
Antonio Russo f8c4d63a26 ZTS: avoid piping to special devices
As described in #11445, the kernel interface kernel_{read,write} no
longer act on special devices.  In the ZTS, zfs send and receive are
tested by piping to these devices, leading to spurious failures (for
positive tests) and may mask errors (for negative tests).

Until a more permanent mechanism to address this deficiency is
developed, clean up the output from the ZTS by avoiding directly piping
to or from /dev/null and /dev/zero.

For /dev/zero input, simply use a pipe: `cat </dev/zero |` .

However, for /dev/null output, the shell semantics for pipe failures
means that zfs send error codes will be masked by the successful
`| cat >/dev/null` command execution.  In that case, use a temporary
file under $TEST_BASE_DIR for output in favor.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11478
2021-01-19 11:53:35 -08:00
Ryan Moeller 60a2434b29 libzfs_sendrecv: Use fnv* to verify nvlist/nvpair*
Use verified variants of nvlist/nvpair functions where applicable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11460
2021-01-14 09:53:09 -08:00
Ryan Moeller a9eaae065b ztest: Clean up use of ASSERT and VERIFY
Try to use more appropriate ASSERT and VERIFY variants in ztest.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11454
2021-01-12 17:21:01 -08:00
Antonio Russo 8752f7e320 ZTS: avoid race to unmount in zfs_rollback_001
The zfs_rollback_001 test modifies files in a temporary, test dataset
repeatedly.  Before each iteration, any preexisting dataset is removed,
after unmounted with umount -f, if necessary.

Add a short delay after the forced unmount, avoiding a race that can
prevent zfs destroy from succeeding, leading to a test failure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11451
2021-01-12 17:20:02 -08:00
Ryan Moeller bea3fc71ae ztest: Use fnvlist_* instead of VERIFYing nvlist_*
Simplify ztest by using fnvlist functions to verify success.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11441
2021-01-11 09:30:33 -08:00
Matthew Ahrens 2ac90457f5 record ioctl elapsed time in zpool history
Each zfs ioctl that changes on-disk state (e.g. set property, create
snapshot, destroy filesystem) is recorded in the zpool history, and is
printed by `zpool history -i`.

For performance diagnostic purposes, it would be useful to know how long
each of these ioctls took to run.  This commit adds that functionality,
with a new `ZPOOL_HIST_ELAPSED_NS` member of the history nvlist.

Additionally, the time recorded in this history log is currently the
time that the history record is written to disk.  But in many cases (CLI
args logging and ioctl logging), this happens asynchronously,
potentially many seconds after the operation completed.  This commit
changes the timestamp to reflect when the history event was created,
rather than when it was written to disk.

Reviewed-by: Mark Maybee <mmaybee@cray.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11440
2021-01-11 09:29:25 -08:00
Matthew Ahrens dc303dcf5b assertion failed in arc_wait_for_eviction()
If the system is very low on memory (specifically,
`arc_free_memory() < arc_sys_free/2`, i.e. less than 1/16th of RAM
free), `arc_evict_state_impl()` will defer wakups.  In this case, the
arc_evict_waiter_t's remain on the list, even though `arc_evict_count`
has been incremented past their `aew_count`.

The problem is that `arc_wait_for_eviction()` assumes that if there are
waiters on the list, the count they are waiting for has not yet been
reached.  However, the deferred wakeups may violate this, causing
`ASSERT(last->aew_count > arc_evict_count)` to fail.

This commit resolves the issue by having new waiters use the greater of
`arc_evict_count` and the last `aew_count`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11285
Closes #11397
2021-01-07 20:06:32 -08:00
Matthew Macy f11b09dec3 FreeBSD: minor_t needs to be signed so that -1 is recognized as such
zfsdev_close sets zs_minor to -1 to avoid duplicate calls to
destroy. This doesn't mix well with the current u_int used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11437
2021-01-07 10:41:27 -08:00
Brian Behlendorf 06346cc5b5 Autoconf 2.70 compatibility
Several m4 macros have been retired in autoconf 2.70.  Update the
the build system to use the new macros provided to replace them.

* Replaced AC_HELP_STRING with AS_HELP_STRING.

* Replaced AC_TRY_COMPILE with AC_COMPILE_IFELSE/AC_LANG_PROGRAM.

* Replaced AC_CANONICAL_SYSTEM with AC_CANONICAL_TARGET

* Replaced AC_PROG_LIBTOOL with LT_INIT

* $CPP is not defined in ZFS_AC_KERNEL and really shouldn't be
  directly used like this.  Replace it with an $AWK command
  to extract the kernel source version.

Reviewed-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11413 
Closes #11419
2021-01-02 16:55:55 -08:00
Toomas Soome 4ba8c6b584 zfs_mount_all_mountpoints: cleanup_all should leave pool root mounted
if pool root is not mounted, then zpool umount in next test will leave
dataset mountpoint directory around and next zfs mount -a will fail
with error: cannot mount '/testpool': directory is not empty

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11417
2021-01-02 16:54:53 -08:00
Konstantin Khorenko 064c2cf40e VZ 7 kernel compat: introduce ITER-enabled .direct_IO() via IOVECs
Virtuozzo 7 kernels starting 3.10.0-1127.18.2.vz7.163.46
have the following configuration:

  * no HAVE_VFS_RW_ITERATE
  * HAVE_VFS_DIRECT_IO_ITER_RW_OFFSET

=> let's add implementation of zpl_direct_IO() via
zpl_aio_{read,write}() in this case.

https://bugs.openvz.org/browse/OVZ-7243

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Closes #11410 
Closes #11411
2020-12-30 14:18:29 -08:00
Matthew Ahrens 808e681238 Memory leak in zdb:import_checkpointed_state()
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396
2020-12-28 10:05:55 -08:00
Matthew Ahrens f014700a37 Memory leak in ztest_dmu_objset_own()
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396
2020-12-28 10:05:44 -08:00
Matthew Ahrens a0316ad268 Memory leak in ztest_vdev_attach_detach()
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396
2020-12-28 10:05:32 -08:00
Matthew Ahrens b6722b871b nvlist leaked in zpool_find_config()
In `zpool_find_config()`, the `pools` nvlist is leaked.  Part of it (a
sub-nvlist) is returned in `*configp`, but the callers also leak that.

Additionally, in `zdb.c:main()`, the `searchdirs` is leaked.

The leaks were detected by ASAN (`configure --enable-asan`).

This commit resolves the leaks.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396
2020-12-28 10:05:31 -08:00
Toomas Soome 40ab927ae8 implicit conversion from 'boolean_t' to 'ds_hold_flags_t'
Build error on illumos with gcc 10 did reveal:

In function 'dmu_objset_refresh_ownership':
../../common/fs/zfs/dmu_objset.c:857:25: error: implicit conversion
from 'boolean_t' to 'ds_hold_flags_t' {aka 'enum ds_hold_flags'}
[-Werror=enum-conversion]
      857 |  dsl_dataset_disown(ds, decrypt, tag);
          |                         ^~~~~~~
cc1: all warnings being treated as errors

libzfs_input_check.c: In function 'zfs_ioc_input_tests':
libzfs_input_check.c:754:28: error: implicit conversion from
'enum dmu_objset_type' to 'enum lzc_dataset_type'
[-Werror=enum-conversion]
  754 |  err = lzc_create(dataset, DMU_OST_ZFS, NULL, NULL, 0);
      |                            ^~~~~~~~~~~
cc1: all warnings being treated as errors

The same issue is present in openzfs, and also the same issue about
ds_hold_flags_t, which currently defines exactly one valid value.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11406
2020-12-27 16:31:02 -08:00
Brian Behlendorf c449d4b06d Linux 5.11 compat: blk_{un}register_region()
As of 5.11 the blk_register_region() and blk_unregister_region()
functions have been retired. This isn't a problem since add_disk()
has implicitly allocated minor numbers for a very long time.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:46 -08:00
Brian Behlendorf 19697e4545 Linux 5.11 compat: revalidate_disk_size()
Both revalidate_disk_size() and revalidate_disk() have been removed.
Functionally this isn't a problem because we only relied on these
functions to call zvol_revalidate_disk() for us and to perform any
additional handling which might be needed for that kernel version.
When neither are available we know there's no additional handling
needed and we can directly call zvol_revalidate_disk().

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:40 -08:00
Brian Behlendorf 72ba4b2a4c Linux 5.11 compat: bdev_whole()
The bd_contains member was removed from the block_device structure.
Callers needing to determine if a vdev is a whole block device should
use the new bdev_whole() wrapper.  For older kernels we provide our
own bdev_whole() wrapper which relies on bd_contains for compatibility.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:33 -08:00
Brian Behlendorf a970f0594e Linux 5.11 compat: bio_start_io_acct() / bio_end_io_acct()
The generic IO accounting functions have been removed in favor of the
bio_start_io_acct() and bio_end_io_acct() functions which provide a
better interface.  These new functions were introduced in the 5.8
kernels but it wasn't until the 5.11 kernel that the previous generic
IO accounting interfaces were removed.

This commit updates the blk_generic_*_io_acct() wrappers to provide
and interface similar to the updated kernel interface.  It's slightly
different because for older kernels we need to pass the request queue
as well as the bio.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:24 -08:00
Brian Behlendorf b7281c88bc Linux 5.11 compat: lookup_bdev()
The lookup_bdev() function has been updated to require a dev_t
be passed as the second argument. This is actually pretty nice
since the major number stored in the dev_t was the only part we
were interested in. This allows to us avoid handling the bdev
entirely.  The vdev_lookup_bdev() wrapper was updated to emulate
the behavior of the new lookup_bdev() for all supported kernels.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:08 -08:00
Brian Behlendorf c347fac586 Linux 5.11 compat: conftest
Update the ZFS_LINUX_TEST_PROGRAM macro to always set the module
license.  As of the 5.11 kernel not setting a license has been
converted from a warning to an error.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:15:19 -08:00
Ryan Moeller 8fbd31d761 dbufstat: Fix warnings with Python 3.8
Replace "is" with "==" and "is not" with "!=".

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11394
2020-12-23 15:10:35 -08:00
Brian Behlendorf 4879c4d198 Linux 5.10 compat: META
Increase the Linux-Maximum version in the META file to 5.10.
All of the required compatibility patches have been merged.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11391
2020-12-23 08:55:02 -08:00
Brian Behlendorf 0c763f76b1 Remove unused check from dmu_tx_count_write()
Individual transactions may not be larger than DMU_MAX_ACCESS.
This is enforced by the assertions in dmu_tx_hold_write() and
dmu_tx_hold_write_by_dnode().  There's an additional check in
dmu_tx_count_write() however it has no effect and only sets a
local err variable.  We could enable this check, however since
it's already enforced by ASSERTs elsewhere I opted to remove it
instead.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3731 
Closes #11384
2020-12-21 20:17:13 -08:00
Christian Schwarz af9593903e zfs-kmods: install to /lib/modules instead of /usr/lib/modules
Before this patch, dracut wouldn't find zfs.ko for inclusion in
initramfs. This was caused by the packages installing in to
/lib/modules instead of /usr/lib/modules.  Correcting this allows
dracut to do the right thing, even without

    # /etc/dracut.conf
    add_drivers+=" zfs "

Notably, rpm/redhat/zfs-kmod.spec.in does not contain the definition of
the `prefix` macro that this commit removes in the generic kmod spec.
And https://rpmfusion.org/Packaging/KernelModules/Kmods2 does not
mention `prefix` at all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11381
2020-12-21 20:14:32 -08:00
Kjeld Schouten-Lebbing 7bbe563a0e Forward questions to github discussions
Instead of creating issues with type "question"
Forward to the GitHub Discussion system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #11383
2020-12-21 20:09:02 -08:00
Andy Fiddaman 39372fa25b Dangling reference from dmu_objset_upgrade
After porting the fix for https://github.com/openzfs/zfs/issues/5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andy Fiddaman <andy@omnios.org>
Closes #11368
2020-12-21 10:13:23 -08:00
Brian Behlendorf 6152014d1a Linux 4.18.0-257.el8 compat: blk_alloc_queue()
The CentOS stream 4.18.0-257 kernel appears to have backported
the Linux 5.9 change to make_request_fn and the associated API.
To maintain weak modules compatibility the original symbol was
retained and the new interface blk_alloc_queue_rh() was added.

Unfortunately, blk_alloc_queue() was replaced in the blkdev.h
header by blk_alloc_queue_bh() so there doesn't seem to be a way
to build new kmods against the old interfces.  Even though they
appear to still be available for weak module binding.

To accommodate this a configure check is added for the new _rh()
variant of the function and used if available.  If compatibility
code gets added to the kernel for the original blk_alloc_queue()
interface this should be fine.  OpenZFS will simply continue to
prefer the new interface and only fallback to blk_alloc_queue()
when blk_alloc_queue_rh() isn't available.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11374
2020-12-21 10:11:56 -08:00
Brian Behlendorf 8947fa4495 Fix maybe uninitialized variable warning
Commit 1c2358c12 restructured this code and introduced a warning
about the variable maybe not being initialized.  This cannot happen
with the updated code but we should initialize the variable anyway
to silence the warning.

    zpl_file.c: In function ‘zpl_iter_write’:
    zpl_file.c:324:9: warning: ‘count’ may be used uninitialized
        in this function [-Wmaybe-uninitialized]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11373
2020-12-20 09:50:13 -08:00
Brian Behlendorf 9ac535e662 Remove iov_iter_advance() from iter_read
There's no need to call iov_iter_advance() in zpl_iter_read().
This was preserved from the previous code where it wasn't needed
but also didn't cause any problems.  Now that the iter functions
also handle pipes that's no longer the case.  When fully reading a
pipe buffer iov_iter_advance() may results in the pipe buf release
function being called which will not be registered resulting in
a NULL dereference.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11375 
Closes #11378
2020-12-20 09:49:29 -08:00
Christian Schwarz 49c482fde3 dsl_pool: extend comment on DSL Pool Configuration Lock
Based on a conversation with Matt on the OpenZFS Slack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11370
2020-12-19 18:04:05 -08:00
Michael D Labriola 1c0bbd52c3 Linux 5.10 compat: also zvol_revalidate_disk()
Commit 59b68723 added a configure check for 5.10, which removed
revalidate_disk(), and conditionally replaced it's usage with a call to
the new revalidate_disk_size() function.  However, the old function also
invoked the device's registered callback, in our case
zvol_revalidate_disk().  This commit adds a call to zvol_revalidate_disk()
in zvol_update_volsize() to make sure the code path stays the same.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Michael D Labriola <michael.d.labriola@gmail.com>
Closes #11358
2020-12-18 09:36:19 -08:00
Prawn 77f09e2932 ZED/zfs-list-cacher.sh: don't exit on ignored event type
Check for the history_event type instead.

The zfs-list-cacher.sh script currently respects the event types
excluded from syslog(!) in ZED_SYSLOG_SUBCLASS_EXCLUDE.
This makes little sense in this single-purpose script and
silently breaks when history_events are excluded from syslog,
which is the default since 13d65987a9.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #11164
Closes #11347
2020-12-18 09:34:10 -08:00
Brian Behlendorf 1c2358c12a Linux 5.10 compat: use iov_iter in uio structure
As of the 5.10 kernel the generic splice compatibility code has been
removed.  All filesystems are now responsible for registering a
->splice_read and ->splice_write callback to support this operation.

The good news is the VFS provided generic_file_splice_read() and
iter_file_splice_write() callbacks can be used provided the ->iter_read
and ->iter_write callback support pipes.  However, this is currently
not the case and only iovecs and bvecs (not pipes) are ever attached
to the uio structure.

This commit changes that by allowing full iov_iter structures to be
attached to uios.  Ever since the 4.9 kernel the iov_iter structure
has supported iovecs, kvecs, bvevs, and pipes so it's desirable to
pass the entire thing when possible.  In conjunction with this the
uio helper functions (i.e uiomove(), uiocopy(), etc) have been
updated to understand the new UIO_ITER type.

Note that using the kernel provided uio_iter interfaces allowed the
existing Linux specific uio handling code to be simplified.  When
there's no longer a need to support kernel's older than 4.9, then
it will be possible to remove the iovec and bvec members from the
uio structure and always use a uio_iter.  Until then we need to
maintain all of the existing types for older kernels.

Some additional refactoring and cleanup was included in this change:

- Added checks to configure to detect available iov_iter interfaces.
  Some are available all the way back to the 3.10 kernel and are used
  when available.  In particular, uio_prefaultpages() now always uses
  iov_iter_fault_in_readable() which is available for all supported
  kernels.

- The unused UIO_USERISPACE type has been removed.  It is no longer
  needed now that the uio_seg enum is platform specific.

- Moved zfs_uio.c from the zcommon.ko module to the Linux specific
  platform code for the zfs.ko module.  This gets it out of libzfs
  where it was never needed and keeps this Linux specific code out
  of the common sources.

- Removed unnecessary O_APPEND handling from zfs_iter_write(), this
  is redundant and O_APPEND is already handled in zfs_write();

Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11351
2020-12-18 08:48:26 -08:00
Brian Behlendorf 2844ad60d4 ZTS: Simplify zpool_initialize_verify_initialized
Consider the test to be a success as long as the initializing pattern
is found at least once per metaslab.  This indicates that at least
part of the free space was initialized.  Ideally we'd check that the
pattern was written to all free space but that's much trickier so this
check is a reasonable compromise.

Using a here-string to feed the loop in this test causes an empty
string to still trigger the loop so we miss the `spacemaps=0` case.
Pipe into the loop instead.

While here, we can use `zpool wait -t initialize $TESTPOOL` to wait for
the pool to initialize.

Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11365
2020-12-18 08:42:59 -08:00
Matthew Ahrens 71e4ce0e52 special device removal space accounting fixes
The space in special devices is not included in spa_dspace (or
dsl_pool_adjustedsize(), or the zfs `available` property).  Therefore
there is always at least as much free space in the normal class, as
there is allocated in the special class(es).  And therefore, there is
always enough free space to remove a special device.

However, the checks for free space when removing special devices did not
take this into account.  This commit corrects that.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11329
2020-12-17 12:11:56 -08:00
Ryan Moeller 1531506d23 Avoid extra work updating ARC kstats and tunables
After e357046 it should not be necessary to periodically update ARC
kstats and tunables.  Tunable updates are applied when modified, and
kstats are updated on demand.

Update kstats in `arc_evict_cb_check()` for `ZFS_DEBUG` builds only.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11237
2020-12-17 11:16:42 -08:00
sterlingjensen fb188409f1 Use the correct return type for getopt
Use the correct return type for getopt otherwise clang complains
about tautological-constant-out-of-range-compare.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11359
2020-12-17 10:19:30 -08:00
Matthew Ahrens be5c6d9653 Only examine best metaslabs on each vdev
On a system with very high fragmentation, we may need to do lots of gang
allocations (e.g. most indirect block allocations (~50KB) may need to
gang). Before failing a "normal" allocation and resorting to ganging, we
try every metaslab.  This has the impact of loading every metaslab (not
a huge deal since we now typically keep all metaslabs loaded), and also
iterating over every metaslab for every failing allocation. If there are
many metaslabs (more than the typical ~200, e.g. due to vdev expansion
or very large vdevs), the CPU cost of this iteration can be very
impactful.  This iteration is done with the mg_lock held, creating long
hold times and high lock contention for concurrent allocations,
ultimately causing long txg sync times and poor application performance.

To address this, this commit changes the behavior of "normal" (not
try_hard, not ZIL) allocations.  These will now only examine the 100
best metaslabs (as determined by their ms_weight).  If none of these
have a large enough free segment, then the allocation will fail and
we'll fall back on ganging.

To accomplish this, we will now (normally) gang before doing a
`try_hard` allocation.  Non-try_hard allocations will only examine the
100 best metaslabs of each vdev.  In summary, we will first try normal
allocation.  If that fails then we will do a gang allocation.  If that
fails then we will do a "try hard" gang allocation.  If that fails then
we will have a multi-layer gang block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11327
2020-12-16 14:40:05 -08:00
Alexander Motin f8020c9363 Make metaslab class rotor and aliquot per-allocator.
Metaslab rotor and aliquot are used to distribute workload between
vdevs while keeping some locality for logically adjacent blocks.  Once
multiple allocators were introduced to separate allocation of different
objects it does not make much sense for different allocators to write
into different metaslabs of the same metaslab group (vdev) same time,
competing for its resources.  This change makes each allocator choose
metaslab group independently, colliding with others only sporadically.

Test including simultaneous write into 4 files with recordsize of 4KB
on a striped pool of 30 disks on a system with 40 logical cores show
reduction of vdev queue lock contention from 54 to 27% due to better
load distribution.  Unfortunately it won't help much ZVOLs yet since
only one dataset/ZVOL is synced at a time, and so for the most part
only one allocator is used, but it may improve later.

While there, to reduce the number of pointer dereferences change
per-allocator storage for metaslab classes and groups from several
separate malloc()'s to variable length arrays at the ends of the
original class and group structures.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11288
2020-12-15 10:55:44 -08:00
gregory-lee-bartholomew e2d952cda0 DKMS: Disable weak modules
Fedora does not guarantee a stable kABI, so weak modules should be dis-
abled. See the dkms man page for a more detailed explanation of the weak
module feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #9891
Closes #11128
Closes #11242
Closes #11335
2020-12-15 09:22:30 -08:00
Ryan Libby d8a09b3a04 lua: avoid gcc -Wreturn-local-addr bug
Avoid a bug with gcc's -Wreturn-local-addr warning with some
obfuscation.  In buggy versions of gcc, if a return value is an
expression that involves the address of a local variable, and even if
that address is legally converted to a non-pointer type, a warning may
be emitted and the value of the address may be replaced with zero.
Howerver, buggy versions don't emit the warning or replace the value
when simply returning a local variable of non-pointer type.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90737

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11337
2020-12-15 09:20:48 -08:00
Ryan Libby 956f94010f spa: avoid type narrowing warning
Building the spa module for i386 caused gcc to emit
-Wint-to-pointer-cast "cast to pointer from integer of different size"
because spa.spa_did was uint64_t but pthread_join (via thread_join in
spa_deactivate) takes a pointer (32-bit on i386).  Define spa_did to be
pointer-size instead.  For now spa_did is in fact never non-zero and the
thread_join could instead be ifdef'd out, but changing the size of
spa_did may be more useful for the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11336
2020-12-15 09:20:06 -08:00
Ryan Libby c7500ded3e FreeBSD libzfs: gcc requires __thread after static
Building libzfs with gcc on FreeBSD failed because gcc is picky about
the order of keywords in declarations with __thread, whereas clang is
more relaxed.

https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11331
2020-12-14 09:28:24 -08:00
Matthew Macy 923d730329 dmu_zfetch: fix memory leak
The last change caused the read completion callback to not be called
if the IO was still in progress. This change restores allocation
of the arc buf callback, but in the callback path checks the new
acb_nobuf field to know to skip buffer allocation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11324
2020-12-12 16:00:00 -08:00
George Amanakis c76a40bfda Fix reporting of CKSUM errors in indirect vdevs
When removing and subsequently reattaching a vdev, CKSUM errors may
occur as vdev_indirect_read_all() reads from all children of a mirror
in case of a resilver.

Fix this by checking whether a child is missing the data and setting a
flag (ic_error) which is then checked in vdev_indirect_repair() and
suppresses incrementing the checksum counter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11277
2020-12-11 12:15:37 -08:00
Brian Behlendorf 1ad07b01bc Remove draid.d symlink from zfs_helpers.sh
In an earlier revision of dRAID there existed an /etc/zfs/draid.d
directory.  This was removed before the final version was integrated
but a little bit was accidentally overlooked in the zfs_helpers.sh
script.  Remove this remnant. 

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11326
2020-12-11 11:00:58 -08:00
Ryan Moeller 695ac5850b arc_summary3: Handle overflowing value width
Some tunables shown by arc_summary3 have string values that may exceed
the normal line length, leaving a negative offset between the name and
value fields.  The negative space is of course not valid and Python
rightly barfs up an exception traceback.

Handle an overflowing value field width by ignoring the line length
and separating the name from the value by a single space instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11270
2020-12-11 10:29:53 -08:00
Ryan Moeller 439dc034e9 FreeBSD: Implement sysctl for fletcher4 impl
There is a tunable to select the fletcher 4 checksum implementation on
Linux but it was not present in FreeBSD.

Implement the sysctl handler for FreeBSD and use ZFS_MODULE_PARAM_CALL
to provide the tunable on both platforms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11270
2020-12-11 10:29:01 -08:00
Matthew Ahrens ba67d82142 Improve zfs receive performance with lightweight write
The performance of `zfs receive` can be bottlenecked on the CPU consumed
by the `receive_writer` thread, especially when receiving streams with
small compressed block sizes.  Much of the CPU is spent creating and
destroying dbuf's and arc buf's, one for each `WRITE` record in the send
stream.

This commit introduces the concept of "lightweight writes", which allows
`zfs receive` to write to the DMU by providing an ABD, and instantiating
only a new type of `dbuf_dirty_record_t`.  The dbuf and arc buf for this
"dirty leaf block" are not instantiated.

Because there is no dbuf with the dirty data, this mechanism doesn't
support reading from "lightweight-dirty" blocks (they would see the
on-disk state rather than the dirty data).  Since the dedup-receive code
has been removed, `zfs receive` is write-only, so this works fine.

Because there are no arc bufs for the received data, the received data
is no longer cached in the ARC.

Testing a receive of a stream with average compressed block size of 4KB,
this commit improves performance by 50%, while also reducing CPU usage
by 50% of a CPU.  On a per-block basis, CPU consumed by receive_writer()
and dbuf_evict() is now 1/7th (14%) of what it was.

Baseline: 450MB/s, CPU in receive_writer() 40% + dbuf_evict() 35%
New: 670MB/s, CPU in receive_writer() 17% + dbuf_evict() 0%

The code is also restructured in a few ways:

Added a `dr_dnode` field to the dbuf_dirty_record_t.  This simplifies
some existing code that no longer needs `DB_DNODE_ENTER()` and related
routines.  The new field is needed by the lightweight-type dirty record.

To ensure that the `dr_dnode` field remains valid until the dirty record
is freed, we have to ensure that the `dnode_move()` doesn't relocate the
dnode_t.  To do this we keep a hold on the dnode until it's zio's have
completed.  This is already done by the user-accounting code
(`userquota_updates_task()`), this commit extends that so that it always
keeps the dnode hold until zio completion (see `dnode_rele_task()`).

`dn_dirty_txg` was previously zeroed when the dnode was synced.  This
was not necessary, since its meaning can be "when was this dnode last
dirtied".  This change simplifies the new `dnode_rele_task()` code.

Removed some dead code related to `DRR_WRITE_BYREF` (dedup receive).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11105
2020-12-11 10:26:02 -08:00
Paul Dagnelie 7d4b365ce3 Fix kernel panic induced by redacted send
In the redaction list traversal code, there is a bug in the binary search
logic when looking for the resume point. Maxbufid can be decremented to -1,
causing us to read the last possible block of the object instead of the one we
wanted. This can cause incorrect resume behavior, or possibly even a hang in
some cases. In addition, when examining non-last blocks, we can treat the
block as being the same size as the last block, causing us to miss entries in
the redaction list when determining where to resume. Finally, we were ignoring
the case where the resume point was found in the buffer being searched, and
resuming from minbufid. All these issues have been corrected, and the code has
been significantly simplified to make future issues less likely.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11297
2020-12-11 10:22:29 -08:00
Ryan Moeller 8c5606ca0b FreeBSD: Fix format of vfs.zfs.arc_no_grow_shift
vfs.zfs.arc_no_grow_shift has an invalid type (15) and this causes
py-sysctl to format it as a bytearray when it should be an integer.

"U" is not a valid format, it should be "I" and the type should match
the variable type, int.  We can return EINVAL if the value is set below
zero.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11318
2020-12-10 15:28:56 -08:00
Ryan Moeller 513c196200 FreeBSD: Update usage of py-sysctl
py-sysctl now includes the CTLTYPE_NODE type nodes in the list returned
by sysctl.filter() on FreeBSD head.  It also provides descriptions now.

Eliminate the subprocess call to get descriptions, and filter out the
nodes so we only deal with values.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11318
2020-12-10 15:28:31 -08:00
Brian Behlendorf e5f732edbb Fix possibly uninitialized 'root_inode' variable warning
Resolve an uninitialized variable warning when compiling.

    In function ‘zfs_domount’:
    warning: ‘root_inode’ may be used uninitialized in this
        function [-Wmaybe-uninitialized]
    sb->s_root = d_make_root(root_inode);

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11306
2020-12-10 15:23:26 -08:00
Paul Dagnelie 60a4c7d2a2 Implement memory and CPU hotplug
ZFS currently doesn't react to hotplugging cpu or memory into the 
system in any way. This patch changes that by adding logic to the ARC 
that allows the system to take advantage of new memory that is added 
for caching purposes. It also adds logic to the taskq infrastructure 
to support dynamically expanding the number of threads allocated to a 
taskq.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11212
2020-12-10 14:09:23 -08:00
Brian Behlendorf f483daa870 CI: add zloop workflow
Run ztest via zloop for 20 minutes, total run time is ~30 minutes.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11319
2020-12-10 10:55:53 -08:00
George Melikov 5053dfa08c CI: add zloop workflow
Run ztest via zloop for 20 minutes, total run time is ~30 minutes.

Signed-off-by: George Melikov <mail@gmelikov.ru>
2020-12-10 20:46:15 +03:00
Ryan Moeller e0716250bf FreeBSD: Do zcommon_init sooner to avoid FPU panic
There has been a panic affecting some system configurations where the
thread FPU context is disturbed during the fletcher 4 benchmarks,
leading to a panic at boot.

module_init() registers zcommon_init to run in the last subsystem
(SI_SUB_LAST).  Running it as soon as interrupts have been configured
(SI_SUB_INT_CONFIG_HOOKS) makes sure we have finished the benchmarks
before we start doing other things.

While it's not clear *how* the FPU context was being disturbed, this
does seem to avoid it.

Add a module_init_early() macro to run zcommon_init() at this earlier
point on FreeBSD.  On Linux this is defined as module_init().

Authored by: Konstantin Belousov <kib@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11302
2020-12-09 21:29:00 -08:00
Attila Fülöp b9916b4064 ZTS: three small follow up fixes for #11167
Follow up fix for 0cb40fa3. Remove unused variables, don't source
unused libs and add missed cleanup.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11311
2020-12-09 21:27:12 -08:00
Érico Nogueira Rolim 957f9681eb mount_zfs: print strerror instead of errno for error reporting
Tracking down an error message with the errno value can be difficult,
using strerror makes the error message clearer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11303
2020-12-09 21:24:59 -08:00
sterlingjensen 1e4667af32 Drop path prefix workaround
Canonicalization, the source of the trouble, was disabled in 9000a9f.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11295
2020-12-09 21:24:26 -08:00
Orivej Desh ab4fb9b74e Delete rw_semaphore.wait_lock configure check
Last use of wait_lock was removed in "Linux 5.3 compat: retire
rw_tryupgrade()" (e7a99dab2b).

Fixes the issue reported in
https://github.com/openzfs/zfs/issues/11097#issuecomment-714532367

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Orivej Desh <orivej@gmx.fr>
Closes #11309
2020-12-09 21:22:54 -08:00
Matthew Macy 1e4732cbda Decouple arc_read_done callback from arc buf instantiation
Add ARC_FLAG_NO_BUF to indicate that a buffer need not be
instantiated.  This fixes a ~20% performance regression on
cached reads due to zfetch changes.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11220 
Closes #11232
2020-12-09 15:05:06 -08:00
Brian Behlendorf edb20ff3ba Fix optional "force" arg handing in zfs_ioc_pool_sync()
The fnvlist_lookup_boolean_value() function should not be used
to check the force argument since it's optional.  It may not be
provided or may have been created with the wrong flags.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11281
Closes #11284
2020-12-09 14:52:45 -08:00
George Melikov 1a735e763a CI: add new zfs-tests-sanity workflow
Run zfs-tests with sanity.run for brief results.  Timeouts
are rare, so minimize false positives by increasing the
default from 60 to 180 seconds.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11304
2020-12-08 09:53:45 -08:00
George Melikov 8e8fdce682 ZTS: zpool_trim tests throttle trim process
Otherwise trim may finish before progress checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11296
2020-12-07 10:06:10 -08:00
Brian Behlendorf 83b698dc42 Reduce fletcher4 and raidz benchmark times
During module load time all of the available fetcher4 and raidz
implementations are benchmarked for a fixed amount of time to
determine the fastest available.  Manual testing has shown that this
time can be significantly reduced with negligible effect on the final
results.

This commit changes the benchmark time to 1ms which can reduce the
module load time by over a second on x86_64.  On an x86_64 system
with sse3, ssse3, and avx2 instructions the benchmark times are:

    Fletcher4    603ms   -> 15ms
    RAIDZ        1,322ms -> 64ms

Reviewed-by: Matthew Macy <mmacy@freebsd.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11282
2020-12-06 09:57:20 -08:00
Alexander Motin 8136b9d73b Avoid some spa_has_pending_synctask() calls.
Since 8c4fb36a24 (PR #7795) spa_has_pending_synctask() started to
take two more locks per write inside txg_all_lists_empty().  I am
surprised those pool-wide locks are not contended, but still their
operations are visible in CPU profiles under contended vdev lock.

This commit slightly changes vdev_queue_max_async_writes() flow to
not call the function if we are going to return max_active any way
due to high amount of dirty data.  It allows to save some CPU time
exactly when the pool is busy.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11280
2020-12-06 09:55:02 -08:00
Alexander Motin 6366ef2240 Bring consistency to ABD chunk count types.
With both abd_size and abd_nents being uint_t it makes no sense for
abd_chunkcnt_for_bytes() to return size_t.  Random mix of different
types used to count chunks looks bad and makes compiler more difficult
to optimize the code.

In particular on FreeBSD this change allows compiler to completely
optimize out abd_verify_scatter() when built without debug, removing
pointless 64-bit division and even more pointless empty loop.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11279
2020-12-06 09:53:40 -08:00
Brian Behlendorf eed2bfe06a Enable ABI checks for the checkstyle workflow
Extend the CI checkstyle workflow to perform the library ABI
checks in the master branch.  The intent is not to prevent any
ABI changes but to detect them immediately so when they're
made it's done intentionally.

When the changing the ABI the `make storeabi` target can be
used to generate a new .abi file which can be included with
the commit.  This depends on the libabigail utility which is
available from the majority of distribution package managers.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11287
2020-12-06 09:50:47 -08:00
Brian Behlendorf 0484e8722f ZTS: adjust zpool_import_012_pos timeout
When running in the CI the zpool_import_012_pos test case occasionally
takes longer than the maximum 600 seconds.  When this happens the test
case is considered to have failed but always completes a few minutes
latter.  Since the logs suggest nothing has actually failed this commit
increases timeout and removes the exception.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11286
2020-12-06 09:48:36 -08:00
Brian Behlendorf 81638c999d ZTS: Update zfs_share_concurrent_shares.ksh
Occasionally an out of memory error is hit by this test case
when mounting the filesystems.  Try and reduce the likelihood
of this occurring by reducing the thread count from 100 to 50.
It also has the advantage of slightly speeding up the test.

    cannot mount 'testpool/testfs3/79': Cannot allocate memory
        filesystem successfully created, but not mounted

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11283
2020-12-06 09:47:33 -08:00
George Amanakis d1d47691c2 Fix raw sends on encrypted datasets when copying back snapshots
When sending raw encrypted datasets the user space accounting is present
when it's not expected to be. This leads to the subsequent mount failure
due a checksum error when verifying the local mac.
Fix this by clearing the OBJSET_FLAG_USERACCOUNTING_COMPLETE and reset
the local mac. This allows the user accounting to be correctly updated
on first mount using the normal upgrade process.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10523 
Closes #11221
2020-12-04 14:34:29 -08:00
Attila Fülöp 0cb40fa389 zpool: Dryrun fails to list some devices
`zpool create -n` fails to list cache and spare vdevs.
`zpool add -n` fails to list spare devices.
`zpool split -n` fails to list `special` and `dedup` labels.
`zpool add -n` and `zpool split -n` shouldn't list hole devices.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11122
Closes #11167
2020-12-04 14:04:39 -08:00
Ryan Moeller 4b6e2a5a33 Add -u option to 'zfs create'
Add -u option to 'zfs create' that prevents file system from being
automatically mounted. This is similar to the 'zfs receive -u'.

Authored by: pjd <pjd@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@35c58230e2

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Ported-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11254
2020-12-04 14:01:42 -08:00
Brian Behlendorf 8f158ae6ad Add sanity.run file
This run file contains a subset of functional tests which exercise
as much functionality as possible while still executing relatively
quickly.  The included tests should take no more than a few seconds
each to run at most.  This provides a convenient way to sanity test a
change before committing to a full test run which takes several hours.

    $ ./scripts/zfs-tests.sh -r sanity
    ...
    Results Summary
    PASS	 813

    Running Time:	00:14:42
    Percent passed:	100.0%

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11271
2020-12-03 10:49:39 -08:00
melak 766e06695f Fix trivial typo in zfs-diff.8
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tamas TEVESZ <ice@extreme.hu>
Closes #11268 
Closes #11272
2020-12-03 10:18:26 -08:00
Alexander Motin dcf7044522 Fix for "Reduce latency effects of non-interactive I/O"
It was found that setting min_active tunables for non-interactive I/Os
makes them stuck.  It is caused by zfs_vdev_nia_delay, that can never
be reached if we never issue any I/Os due to min_active set to zero.

Fix this by issuing at least one non-interactive I/O at a time when
there are no interactive I/Os.  When there are interactive I/Os, zero
min_active allows to completely block any non-interactive I/O.  It may
min_active starvation in some scenarios, but who we are to deny foot
shooting?

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11261
2020-12-03 10:02:39 -08:00
qzdanis 9109b89cd7 Add compatibility for busybox mktemp
Busybox's mktemp requires at least six X's in the template, causing
the current sed --in-place check to fail because the file does not
exist. This change adds additional X's to mktemp templates that do
not already have at least six X's in them.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Quentin Zdanis <zdanisq@gmail.com>
Closes #11269
2020-12-03 10:01:16 -08:00
Ryan Moeller 0aacde2e9a FreeBSD: notify userspace when a vdev is removed
This is needed for zfsd to autoreplace vdevs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11260
2020-12-02 10:20:02 -08:00
Finix1979 ec50cd24ba Avoid unneccessary zio allocation and wait
In function dmu_buf_hold_array_by_dnode, the usage of zio is only for 
the reading operation. Only create the zio and wait it in the reading 
scenario as a performance optimization.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Finix Yan <yancw@info2soft.com>
Closes #11251 
Closes #11256
2020-12-02 09:28:55 -08:00
Andrew Sun 95a78a035a Make zpool status "remove:" label print in bold
When ZFS_COLOR is set, zpool status shows row headings in bold,
except for the "remove:" heading. This is a quick fix that makes
it print in bold too.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Sun <me@andrewsun.com>
Closes #11255
2020-12-01 15:22:51 -08:00
George Melikov aa2778d100 CI: simplify checkstyle runner
Remove excess steps.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11262
2020-12-01 12:15:55 -08:00
Pavel Snajdr 52c8537513 zpool_influxdb: move to libexec dir
Move the zpool_influxdb command to /usr/libexec/zfs,
and include the /usr/libexec/zfs path in the system search
directory when running the test suite.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #11156
Closes #11160
Closes #11224
2020-11-28 11:15:57 -08:00
Brian Behlendorf b2a54a28b5 Verify zfs module loaded before starting services
Extend the change made in ae12b02 to verify the zfs kernel
modules are loaded to the rest of the OpenZFS services.  If
the modules aren't loaded the neither the share, volume, or
and zed services can be started.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11243
2020-11-28 11:11:18 -08:00
Đoàn Trần Công Danh 16692e6ba0 dracut: use /bin/sh instead of bash as the intepreter
Despite that dracut has a hard dependency on bash,
its modules doesn't, dracut only has a hard dependency on bash for
module-setup (on a fully usable machine). Inside initramfs, dracut
allows users choose from a list of handful other shells, e.g. bash,
busybox, dash, mkfsh.

In fact, my local machine's initramfs is being built with dash,
and it's functional for a very long time.

Before 64025fa3a (Silence 'make checkbashisms', 2020-08-20), we also
allows our users to have that right, too.

Let's fix the problem 'make checkbashisms' reported and allows our users
to have that right, again.

For 'plymouth' case, let's simply run the command inside the if instead
of checking for the existence of command before running it, because the
status is also failture if plymouth is unavailable.

While we're at it, let's remove an unnecessary fork for grep in
zfs-generator.sh.in and its following complicated 'if elif fi' with
a simple 'case ... esac'.

To support this change, also exclude 90zfs from "make checkbashisms"
because the current CI infrastructure ships an old version of
"checkbashisms", which complains about "command -v", while the current
latest "checkbashisms" thinks it's fine. In the near future, we can
revert that change to "Makefile.am" when CI infrastructure is updated.

Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Closes #11244
2020-11-28 11:02:08 -08:00
Brian Behlendorf 04a82e043d Remove incorrect assertion
Commit 85703f6 added a new ASSERT to zfs_write() as part of the
cleanup which isn't correct in the case where multiple processes
are concurrently extending a file.  The `zp->z_size` is updated
atomically while holding a range lock on only a portion of the
file.  Therefore, it's possible for the file size to increase
after a same check is performed earlier in the loop causing this
ASSERT to fail.  The code itself handles this case correctly so
only the invalid ASSERT needs to be removed.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11235
2020-11-24 09:28:42 -08:00
Alexander Motin 6f5aac3ca0 Reduce latency effects of non-interactive I/O
Investigating influence of scrub (especially sequential) on random read
latency I've noticed that on some HDDs single 4KB read may take up to 4
seconds!  Deeper investigation shown that many HDDs heavily prioritize
sequential reads even when those are submitted with queue depth of 1.

This patch addresses the latency from two sides:
 - by using _min_active queue depths for non-interactive requests while
   the interactive request(s) are active and few requests after;
 - by throttling it further if no interactive requests has completed
   while configured amount of non-interactive did.

While there, I've also modified vdev_queue_class_to_issue() to give
more chances to schedule at least _min_active requests to the lowest
priorities.  It should reduce starvation if several non-interactive
processes are running same time with some interactive and I think should
make possible setting of zfs_vdev_max_active to as low as 1.

I've benchmarked this change with 4KB random reads from ZVOL with 16KB
block size on newly written non-fragmented pool.  On fragmented pool I
also saw improvements, but not so dramatic.  Below are log2 histograms
of the random read latency in milliseconds for different devices:

4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before:
0, 0, 2,  1,  12,  21,  19,  18, 10, 15, 17, 21
after:
0, 0, 0, 24, 101, 195, 419, 250, 47,  4,  0,  0
, that means maximum latency reduction from 2s to 500ms.

4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before:
0, 0,  2,  31,  38,  28,  18,  12, 17, 20, 24, 10, 3
after:
0, 0, 55, 247, 455, 470, 412, 181, 36,  0,  0,  0, 0
, i.e. from 4s to 250ms.

1 SAS HDD SEAGATE ST14000NM0048 before:
0,  0,  29,   70, 107,   45,  27, 1, 0, 0, 1, 4, 19
after:
1, 29, 681, 1261, 676, 1633,  67, 1, 0, 0, 0, 0,  0
, i.e. from 4s to 125ms.

1 SAS SSD SEAGATE XS3840TE70014 before (microseconds):
0, 0, 0, 0, 0, 0, 0, 0,  70, 18343, 82548, 618
after:
0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844,  90

I've also measured scrub time during the test and on idle pools.  On
idle fragmented pool I've measured scrub getting few percent faster
due to use of QD3 instead of QD2 before.  On idle non-fragmented pool
I've measured no difference.  On busy non-fragmented pool I've measured
scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11166
2020-11-24 09:26:42 -08:00
Brian Behlendorf f67bebbc34 Obsolete earlier packages due to version bump
In order for package managers such as dnf to upgrade cleanly after
the package SONAME bump the obsolete package names must be known.
Update the new packages to correctly obsolete the old ones.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11230 
Closes #11233
2020-11-24 09:24:24 -08:00
Matthew Macy cd44f5be37 FreeBSD: decouple ZFS_DEBUG from kernel debug settings
Reviewed-by: Martelli Nikola @martellini
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11213
2020-11-24 09:16:46 -08:00
Brian Behlendorf 0657326f9c Update dRAID short feature description
The documentation describes dRAID as a distributed spare, not
parity, RAID implementation.  Update the short feature description
to match the rest of the documentation.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11229
2020-11-23 14:49:17 -08:00
Antonio Russo d45267183f libzfsbootenv: do not depend on libnvpair
We do not build libnvpair.pc.  Moreover, it is automatically pulled in
by libzfs.pc, so no additional specific dependency is required.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11227
2020-11-22 15:16:42 -08:00
cragw dc6d39a85e pam_zfs_key: accommodate different dataset naming scheme
Name of dataset for user home directory may vary from the expected
$homes_prefix/$username, if different naming scheme is being used.

We can use property mountpoint to specify the dataset for $username
as long as its value is identical to passwd's pw_dir.

For example:
    NAME                       PROPERTY     VALUE
    rpool/home/myuser_123456   mountpoint   /home/myuser

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Crag Wang <crag0715@gmail.com>
Closes #11165
2020-11-22 09:32:34 -08:00
Brian Behlendorf f1ece319fd Include the ABI with dist tarball
The ABI should be included when generating the `make dist` tarball
since it's required by the `make checkabi` target.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11225
2020-11-21 10:44:52 -08:00
Brian Behlendorf 4d0ba94113 Correct missing zil_claim() DTL updates
Commit a1d477c2 accidentally disabled DTL updates for the zil_claim()
case described at the end of vdev_stat_update() by unconditionally
disabling all DTL updates when loading.  This was done to avoid
a deadlock on the vd_dtl_lock when loading the DTLs from disk.

    vdev_dtl_contains <--- Takes vd->vd_dtl_lock
    vdev_mirror_child_missing
    vdev_mirror_io_start
    zio_vdev_io_start
    __zio_execute
    arc_read
    dbuf_issue_final_prefetch
    dbuf_prefetch_impl
    dbuf_prefetch
    dmu_prefetch
    space_map_iterate
    space_map_load_length
    space_map_load
    vdev_dtl_load <--- Takes vd->vd_dtl_lock
    vdev_load
    spa_ld_load_vdev_metadata
    spa_tryimport

The missing DTL updates can be restored by moving the space_map_load()
call outside the vd_dtl_lock.  A private range tree is populated by
reading the space map and then merged in to the DTL_MISSING tree
under the lock.

Furthermore, the SPA_LOAD_NONE check in vdev_dtl_contains() leads to an
additional problem.  Any resilvering which occurs before SPA_LOAD_NONE
is set will incorrectly determine that there's nothing to repair.  This
can result in full redundancy not being restored for some blocks.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11218
2020-11-20 13:14:45 -08:00
Antonio Russo 8434017a27 Track SONAME version bump in packaging
RPM and DEB packages are named after the SONAME version of the library
they contain.  After bumping this version, the packaging should be
renamed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11219
2020-11-19 16:25:24 -08:00
наб 567e4d0dfa dracut/mount-zfs.sh: quote expansion on zpool test
Bring over some of the improvements from dracut/zfs-load-key.sh,
shellcheck is slightly quieter as well

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11198
2020-11-19 16:20:57 -08:00
наб 2d9f82d891 dracut/zfs-load-key.sh: simplify import loop, quote variable assignments
The loop now has a less confusing condition and properly uses
systemctl(1) is-failed's return code instead of that entire mess

The assignments could turn into "var=val program" if encryptionroot
or keylocation had whitespace in them

As a bonus, this (mostly) silences shellcheck

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11198
2020-11-19 16:20:42 -08:00
Ryan Moeller 85703f616d Reduce confusion in zfs_write
Is this block when abuf != NULL ever reached? Yes, it is.

Add asserts and comments to prove that when we get here, we have a full
block write at an aligned offset extending past EOF.

Simplify by removing the check that tx_bytes == max_blksz, since we can
assert that it is always true.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11191
2020-11-18 15:06:59 -08:00
Matthew Macy 0ca45cb310 Fix problems in zvol_set_volmode_impl
- Don't leave fstrans set when passed a snapshot
- Don't remove minor if volmode already matches new value
- (FreeBSD) Wait for GEOM ops to complete before trying
  remove (at create time GEOM will be "tasting" in parallel)
- (FreeBSD) Don't leak zvol_state_lock on open if zv == NULL
- (FreeBSD) Don't try to unlock zv->zv_state lock if zv == NULL

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11199
2020-11-17 09:50:52 -08:00
Brian Behlendorf 82611cdfe5 Add ABI snapshot
Add a snapshot of the current ABI using libabigail-1.7-2.  The
included ABI passes `make checkabi` for CentOS 7, Fedora 33,
Debian 10, and Ubuntu 20.04.  This covers a fairly wide range
of glibc, gcc, and libabigail versions plus other changes which
are platform specific.

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11144
2020-11-17 09:21:39 -08:00
Antonio Russo 14c34c3d49 Library ABI tracking with abigail
Provide two make targets: checkabi and storeabi.

storeabi uses libabigail to generate a reference copy of the ABI for the
public libraries.

checkabi compares such a reference to the compiled version, failing if
they are not compatible.  No ABI is generated for libzpool.so, it is
only used by ztest and zdb and not external consumers.

Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11144
2020-11-17 09:18:52 -08:00
наб e6c59cd171 zpool: correctly align columns with -p
zpool_expand_proplist() now ignores pl_fixed if its new literal
argument is true.  The rest is a consequence of needing to pass
that down.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiao?=~Dska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11202
2020-11-16 09:26:20 -08:00
наб fd654e412e zpool(8): fix pool-wi[sd]e typo
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11202
2020-11-16 09:26:16 -08:00
loli10K 4072f465bc Fix 'zfs userspace' for received datasets in encrypted root
For encrypted receives, where user accounting is initially disabled on
creation, both 'zfs userspace' and 'zfs groupspace' fails with
EOPNOTSUPP: this is because dmu_objset_id_quota_upgrade_cb() forgets to
set OBJSET_FLAG_USERACCOUNTING_COMPLETE on the objset flags after a
successful dmu_objset_space_upgrade().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9501 
Closes #9596
2020-11-16 09:10:29 -08:00
George Amanakis 2c210f6818 Fix ASSERT logic in l2arc_evict()
In case of cache device removal it is possible that at the end of
l2arc_evict() we have l2ad_hand = l2ad_evict. This can lead to the
following panic in case of a debug build:

VERIFY3(dev->l2ad_hand < dev->l2ad_evict) failed (321920512 < 321920512)
Call Trace:
 dump_stack+0x66/0x90
 spl_panic+0xef/0x117 [spl]
 l2arc_remove_vdev+0x11d/0x290 [zfs]
 spa_load_l2cache+0x275/0x5b0 [zfs]
 spa_vdev_remove+0x4a5/0x6e0 [zfs]
 zfs_ioc_vdev_remove+0x59/0xa0 [zfs]
 zfsdev_ioctl_common+0x5b3/0x630 [zfs]
 zfsdev_ioctl+0x53/0xe0 [zfs]
 do_vfs_ioctl+0x42e/0x6b0
 ksys_ioctl+0x5e/0x90
 do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

In case of cache device removal it also possible that l2ad_hand +
distance > l2ad_end since we do not iterate l2arc_evict() and l2ad_hand
is not reset. This has no functional consequence however as the cache
device is about to be removed.

Fix this by omitting the ASSERT in case of device removal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11205
2020-11-16 09:08:11 -08:00
Érico Rolim 24c12b48a1 config/dracut/90zfs: handle cases where hostid(1) returns all zeros
On systems with musl libc, hostid(1) always prints "00000000", which
will cause improper behavior when the 90zfs module is configured in a
dracut initramfs. Work around this by copying the host /etc/hostid if
the file exists, and otherwise only write /etc/hostid if hostid(1)
returns something meaningful. This avoids zgenhostid creating a random
/etc/hostid for the initramfs, which could lead to errors when trying to
import the pool if spl_hostid isn't defined in the kernel command line.

Furthermore, tag the /etc/hostid file as hostonly, since it is system
specific and shouldn't be taken into account when trying to use an
initramfs generated in one system to boot into a different system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Co-authored-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11174
Closes #11189
2020-11-14 17:21:54 -08:00
Érico Rolim 9c4b6dbb31 zgenhostid: accept hostid arguments equal to zero.
A common usage pattern for zgenhostid, including in the ZFS dracut
module, is running it as:

  zgenhostid $(hostid)

However, zgenhostid only accepted hostid arguments greater than 0, which
meant that, when the output of hostid(1) was "00000000", zgenhostid
would error out, even though 0 is a possible return value for the
gethostid(3) function used by hostid(1):

- On current musl libc, gethostid(3) is a stub that always returns 0.
- On glibc, gethostid(3) will return 0 if /etc/hostid exists but is
  smaller than 4 bytes.

In these cases, it makes more sense for zgenhostid to treat a value of 0
as other parts of the zfs codebase do, meaning that a hostid value
couldn't be determined; therefore, it should attempt to generate a
random value to write into /etc/hostid.

The manpage and usage output have been updated to reflect this.

Whitespace has also been fixed in the usage output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11174
Closes #11189
2020-11-14 17:20:54 -08:00
Brian Behlendorf 4352edaafb Linux: Fix ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP usage
The ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP macros should not be used
in the Linux zpl_*.c source files.  They return a positive error
value which is correct for the common code, but not for the Linux
specific kernel code which expects a negative return value.  The
ZPL_ENTER/ZPL_EXIT/ZPL_VERFY_ZP macros should be used instead.

Furthermore, the ZPL_EXIT macro has been updated to not call the
zfs_exit_fs() function.  This prevents a possible deadlock which
can occur when a snapshot is automatically unmounted because the
zpl_show_devname() must never wait on in progress automatic
snapshot unmounts.

Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11169 
Closes #11201
2020-11-14 10:19:00 -08:00
Matthew Ahrens d66aab7c08 Assertion failure when logging large output of channel program
The output of ZFS channel programs is logged on-disk in the zpool
history, and printed by `zpool history -i`.  Channel programs can use
10MB of memory by default, and up to 100MB by using the `zfs program -m`
flag.  Therefore their output can be up to some fraction of 100MB.

In addition to being somewhat wasteful of the limited space reserved for
the pool history (which for large pools is 1GB), in extreme cases this
can result in a failure of `ASSERT(length <= DMU_MAX_ACCESS);` in
`dmu_buf_hold_array_by_dnode()`.

This commit limits the output size that will be logged to 1MB.  Larger
outputs will not be logged, instead a entry will be logged indicating
the size of the omitted output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11194
2020-11-14 10:17:16 -08:00
Ryan Moeller 7e3617de35 Return EFAULT at the end of zfs_write() when set
FreeBSD's VFS expects EFAULT from zfs_write() if we didn't complete
the full write so it can retry the operation.  Add some missing
SET_ERRORs in zfs_write().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11193
2020-11-14 10:16:26 -08:00
Brian Behlendorf b2255edcc0 Distributed Spare (dRAID) Feature
This patch adds a new top-level vdev type called dRAID, which stands
for Distributed parity RAID.  This pool configuration allows all dRAID
vdevs to participate when rebuilding to a distributed hot spare device.
This can substantially reduce the total time required to restore full
parity to pool with a failed device.

A dRAID pool can be created using the new top-level `draid` type.
Like `raidz`, the desired redundancy is specified after the type:
`draid[1,2,3]`.  No additional information is required to create the
pool and reasonable default values will be chosen based on the number
of child vdevs in the dRAID vdev.

    zpool create <pool> draid[1,2,3] <vdevs...>

Unlike raidz, additional optional dRAID configuration values can be
provided as part of the draid type as colon separated values. This
allows administrators to fully specify a layout for either performance
or capacity reasons.  The supported options include:

    zpool create <pool> \
        draid[<parity>][:<data>d][:<children>c][:<spares>s] \
        <vdevs...>

    - draid[parity]       - Parity level (default 1)
    - draid[:<data>d]     - Data devices per group (default 8)
    - draid[:<children>c] - Expected number of child vdevs
    - draid[:<spares>s]   - Distributed hot spares (default 0)

Abbreviated example `zpool status` output for a 68 disk dRAID pool
with two distributed spares using special allocation classes.

```
  pool: tank
 state: ONLINE
config:

    NAME                  STATE     READ WRITE CKSUM
    slag7                 ONLINE       0     0     0
      draid2:8d:68c:2s-0  ONLINE       0     0     0
        L0                ONLINE       0     0     0
        L1                ONLINE       0     0     0
        ...
        U25               ONLINE       0     0     0
        U26               ONLINE       0     0     0
        spare-53          ONLINE       0     0     0
          U27             ONLINE       0     0     0
          draid2-0-0      ONLINE       0     0     0
        U28               ONLINE       0     0     0
        U29               ONLINE       0     0     0
        ...
        U42               ONLINE       0     0     0
        U43               ONLINE       0     0     0
    special
      mirror-1            ONLINE       0     0     0
        L5                ONLINE       0     0     0
        U5                ONLINE       0     0     0
      mirror-2            ONLINE       0     0     0
        L6                ONLINE       0     0     0
        U6                ONLINE       0     0     0
    spares
      draid2-0-0          INUSE     currently in use
      draid2-0-1          AVAIL
```

When adding test coverage for the new dRAID vdev type the following
options were added to the ztest command.  These options are leverages
by zloop.sh to test a wide range of dRAID configurations.

    -K draid|raidz|random - kind of RAID to test
    -D <value>            - dRAID data drives per group
    -S <value>            - dRAID distributed hot spares
    -R <value>            - RAID parity (raidz or dRAID)

The zpool_create, zpool_import, redundancy, replacement and fault
test groups have all been updated provide test coverage for the
dRAID feature.

Co-authored-by: Isaac Huang <he.huang@intel.com>
Co-authored-by: Mark Maybee <mmaybee@cray.com>
Co-authored-by: Don Brady <don.brady@delphix.com>
Co-authored-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mmaybee@cray.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10102
2020-11-13 13:51:51 -08:00
Matthew Ahrens a724db0374 Channel program may spuriously fail with "memory limit exhausted"
ZFS channel programs (invoked by `zfs program`) are executed in a LUA
sandbox with a limit on the amount of memory they can consume.  The
limit is 10MB by default, and can be raised to 100MB with the `-m` flag.
If the memory limit is exceeded, the LUA program exits and the command
fails with a message like `Channel program execution failed: Memory
limit exhausted.`

The LUA sandbox allocates memory with `vmem_alloc(KM_NOSLEEP)`, which
will fail if the requested memory is not immediately available.  In this
case, the program fails with the same message, `Memory limit exhausted`.
However, in this case the specified memory limit has not been reached,
and the memory may only be temporarily unavailable.

This commit changes the LUA memory allocator `zcp_lua_alloc()` to use
`vmem_alloc(KM_SLEEP)`, so that we won't spuriously fail when memory is
temporarily low.  Instead, we rely on the system to be able to free up
memory (e.g. by evicting from the ARC), and we assume that even at the
highest memory limit of 100MB, the channel program will not truly
exhaust the system's memory.

External-issue: DLPX-71924
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11190
2020-11-11 17:16:15 -08:00
Brian Behlendorf c08d442e45 Linux: Fix mount/unmount when dataset name has a space
The custom zpl_show_devname() helper should translate spaces in
to the octal escape sequence \040.  The getmntent(2) function
is aware of this convention and properly translates the escape
character back to a space when reading the fsname.

Without this change the `zfs mount` and `zfs unmount` commands
incorrectly detect when a dataset with a name containing spaces
is mounted.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11182 
Closes #11187
2020-11-11 17:14:24 -08:00
Mateusz Guzik 18ca574f0a G/C data_alloc_arena
It is a leftover from illumos always set to NULL and introducing a
spurious difference between zio_buf and zio_data_buf.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11188
2020-11-11 17:11:32 -08:00
Tony Perkins 9bd14b8724 Start snapdir_iterate traversals to begin wtih the value of zero.
The microzap hash can sometimes be zero for single digit snapnames.
The zap cursor can then have a serialized value of two (for . and ..),
and skip the first entry in the avl tree for the .zfs/snapshot directory
listing, and therefore does not return all snapshots.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: Tony Perkins <tperkins@datto.com>
Closes #11039
2020-11-11 17:06:16 -08:00
Adrian Chadd 3fcd737478 Fix compiling on FreeBSD + gcc - don't assume illmnos bits
This looks like it was once from the illumnos compat code.
FreeBSD doesn't have cmn_err as a compiler format attribute, so
it definitely errors out.

It doesn't show up on LLVM because it doesn't trigger at all.

Add in the format flags but keep them behind #if 0 for now;
there are too many format issues that trigger when one does
format checking in the shared code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes #11068
Closes #11069
2020-11-10 15:54:12 -08:00
Adrian Chadd 79a357c2a1 Fix pointer-is-uint64_t-sized assumption in the ioctl path
This shows up when compiling freebsd-head on amd64 using gcc-6.4.
The lib32 compat build ends up tripping over this assumption.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes #11068
Closes #11069
2020-11-10 15:53:13 -08:00
sterlingjensen a4ae4998cb Fix memleak in cmd/mount_zfs.c
Convert dynamic allocation to static buffer, simplify parse_dataset
function return path. Add tests specific to the mount helper.

Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11098
2020-11-10 15:50:44 -08:00
наб b60ae3a5dc zpoolprops.8: clarify vdev expansion rules
Remove reference to EFI(?), explain that the new space
is beyond the GPT for whole-disk vdevs, and add section noting how it
behaves with partition vdevs in terms of how the user is most likely to
encounter it ‒ the previous phrasing was confusing
and seemed to indicate that "zpool online -e" will be able to claim

  GPT[whatever, ZFS, free space, whatever]

into

  GPT[whatever, ZFS, whatever]
but that's not the case, as it'll only be able to do so after manually
resizing the ZFS partition to include the free space beforehand, i.e.:
  GPT[whatever, ZFS, free space, whatever]
  GPT[whatever, [ZFS + free space], potentially left-overs, whatever]
  # zpool online -e
  GPT[whatever, ZFS, whatever]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11158
2020-11-10 12:48:26 -08:00
Mateusz Guzik 1a0b4f566c G/C struct znode -> z_moved
The field is yet another leftover from unsupported zfs_znode_move.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11186
2020-11-10 12:42:47 -08:00
Pavel Zakharov 0f66201dbc initramfs: zfsunlock hook breaks /usr/bin
The copy_exec() function expects that the full path of the target 
file is passed rather than just the directory, and will take care 
of creating the underlying directories if they don't exist.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Closes #11162
2020-11-10 11:12:07 -08:00
Ryan Moeller 7b42f09049 FreeBSD: Simplify zvol_geom_open and zvol_cdev_open
We can consolidate the unlocking procedure into one place by starting
with drop_suspend set to B_FALSE and moving the open count check up.

While here, a little code cleanup. Match the out labels between
zvol_geom_open and zvol_cdev_open, and add a missing period in some
comments.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11175
2020-11-10 11:08:10 -08:00
Ryan Moeller 2186ed33f1 FreeBSD: Avoid spurious EINTR in zvol_cdev_open
zvol_first_open can fail with EINTR if spa_namespace_lock is not held
and cannot be taken without waiting.

Apply the same logic that was done for zvol_geom_open to take
spa_namespace_lock if not already held on first open in zvol_cdev_open.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11175
2020-11-10 11:07:25 -08:00
Ryan Moeller d1dd72a2c5 Simplify offset and length limit in zfs_write
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-11-10 10:58:59 -08:00
Ryan Moeller 9a764716fc Const some unchanging variables in zfs_write
Show that these values will not be changing later.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-11-10 10:58:59 -08:00
Ryan Moeller 2074dfd0e9 FreeBSD: Move uio_prefaultpages def to uio.h
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-11-10 10:58:59 -08:00
Ryan Moeller 8a9634e2f3 Remove redundant oid parameter to update_pages
The oid comes from the znode we are already passing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-11-10 10:54:30 -08:00
Ryan Moeller eec6646ea9 Factor uid, gid, and projid out of loop in zfs_write
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-11-10 10:53:19 -08:00
Alexander Motin daabddaac1 Fix dmu_tx_dirty_throttle after arc_c reduction
After initial arc_c was reduced to arc_c_min it became possible that
on datasets with primarycache=metadata or none dirty data make up most
of ARC capacity and easily more than configured 50% of initial arc_c,
that causes forced txg commits by arc_tempreserve_space() and periodic
very long write delays.

This patch makes arc_tempreserve_space() to use arc_c only after ARC
warmed up once and arc_c really means something, but use arc_c_max
before that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11178
2020-11-10 10:39:26 -08:00
Matthew Macy 570d7038d0 Fix dnode refcount tracking
Fix a couple of places where the wrong tag is passed
to dnode_{hold, rele}

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11184
2020-11-10 10:37:10 -08:00
Ryan Moeller 52e585a822 ZTS: Add L1 corruption test
Add a new test case which corrupts all level 1 block in a file.
Then verifies that corruption is detected and repaired.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141
2020-11-05 17:16:25 -08:00
Ryan Moeller cce66dfa8e ZTS: Output all block copies in list_file_blocks
The second part of list_file_blocks transforms the object description
output by zdb -ddddd $ds $objnum into a stream of lines of the form
"level path offset length" for the indirect blocks in the given file.
The current code only works for the first copy of L0 blocks.  L1 and
L2 indirect blocks have more than one copy on disk.

Add one more -d to the zdb command so we get all block copies and
rewrite the transformation to match more than L0 and output all DVAs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141
2020-11-05 17:16:21 -08:00
Ryan Moeller 94deb47872 ZTS: Fix list_file_blocks for mirror vdevs, level > 0
The first part of list_file_blocks transforms the pool configuration
output by zdb -C $pool into shell code to set up a shell variable,
VDEV_MAP, that maps from vdev id to the underlying vdev path. This
variable is a simple indexed array. However, the vdev id in a DVA is
only the id of the top level vdev.

When the pool is mirrored, the top level vdev is a mirror and its
children are the mirrored devices. So, what we need is to map from
the top level vdev id to a list of the underlying vdev paths.
ist_file_blocks does not need to work for raidz vdevs, so we can
disregard that case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141
2020-11-05 17:16:16 -08:00
Mariusz Zaborski ae37ceadaa FreeBSD: Prevent a NULL reference in zvol_cdev_open
Check if the ZVOL has been written before calling zil_async_to_sync.
The ZIL will be opened on the first write, not earlier.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #11152
2020-11-05 17:02:19 -08:00
khng300 a4246bce50 FreeBSD: Prevent NULL pointer dereference of resid
spa_config_load() passes NULL into resid when doing zfs_file_read().
This would trip over when vfs.zfs.autoimport_disable=0.

Sponsored by: The FreeBSD Foundation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ka Ho Ng <khng@freebsdfoundation.org>
Closes #11149
2020-11-04 16:50:08 -08:00
Antonio Russo 71ae6a9d23 Synchronize library ABI levels
Bump library SOVERSION under Linux to match FreeBSD's.

Additionally, this bump properly accounts for the ABI changes relative
to ZoL 0.8.5 for the Linux build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Issue #11144
2020-11-03 09:24:43 -08:00
Ryan Moeller 181b2adc2a FreeBSD: zvol_os: Use SET_ERROR more judiciously
SET_ERROR is useful to trace errors, so use it where the errors occur
rather than factored out to the end of a function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11146
2020-11-03 09:21:09 -08:00
Brian Behlendorf 2d7843401a ZTS: zdb_block_size_histogram increase variance
The expected variance for this test case was originally set at 10%
based on local testing.  Additional testing via the CI has show it
can be as large as 11%.  Increase the expected maximum to 12% to
prevent this test from incorrectly failing.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11148
2020-11-03 09:20:34 -08:00
Brian Behlendorf 123d2803bf ZTS: Wait on all events in events_001_pos.ksh
The events_001_pos.ksh test case can fail because it's possible,
and correct, for the config_sync event to be posted after the last
"expected" event.  To accommodate this the run_and_verify() function
has been updated to wait for all non-history events, not just the
last event.  This does not increase the run time of the test as
long as all the events do get generated.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11147
2020-11-03 09:19:45 -08:00
Coleman Kane 59b6872327 Linux 5.10 compat: revalidate_disk_size() added
A new function was added named revalidate_disk_size() and the old
revalidate_disk() appears to have been deprecated. As the only ZFS
code that calls this function is zvol_update_volsize, swapping the
old function call out for the new one should be all that is required.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11085
2020-11-02 22:01:19 +00:00
Coleman Kane ae15f1c1d8 Linux 5.10 compat: check_disk_change() removed
Kernel 5.10 removed check_disk_change() in favor of callers using
the faster bdev_check_media_change() instead, and explicitly forcing
bdev revalidation when they desire that behavior. To preserve prior
behavior, I have wrapped this into a zfs_check_media_change() macro
that calls an inline function for the new API that mimics the old
behavior when check_disk_change() doesn't exist, and just calls
check_disk_change() if it exists.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11085
2020-11-02 22:01:19 +00:00
Coleman Kane 838a249012 Linux 5.10 compat: percpu_ref added data member
Kernel commit 2b0d3d3e4fcfb brought in some changes to the struct
percpu_ref structure that moves most of its fields into a member
struct named "data" of type struct percpu_ref_data. This includes
the "count" member which is updated by vdev_blkg_tryget(), so update
this function to chase the API change, and detect it via configure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11085
2020-11-02 22:01:19 +00:00
Brian Behlendorf 8c7d604c62 Linux 5.10 compat: frame.h renamed objtool.h
In Linux 5.10 the linux/frame.h header was renamed linux/objtool.h.
Add a configure check to detect and use the correctly named header.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11085
2020-11-02 22:01:10 +00:00
Sebastian Gottschall 7eefaf0ca0 Optimize locking checks in mempool allocator
Avoid checking the whole array of objects each time by removing the self
organized memory reaping. this can be managed by the global memory reap
callback which is called every 60 seconds. this will reduce the use if
locking operations significant.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Closes #11126
2020-11-02 12:10:07 -08:00
Christian Schwarz ab8c935ea6 zfs_vnops: make zfs_get_data OS-independent
Move zfs_get_data() in to platform-independent code. The only
platform-specific aspect of it is the way we release an inode 
(Linux) / vnode_t (FreeBSD). I am not aware of a platform that
could be supported by ZFS that couldn't implement zfs_rele_async 
itself. It's sibling zvol_get_data already is platform-independent.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #10979
2020-11-02 12:07:07 -08:00
Mateusz Guzik 09eb36ce3d Introduce CPU_SEQID_UNSTABLE
Current CPU_SEQID users don't care about possibly changing CPU ID, but
enclose it within kpreempt disable/enable in order to fend off warnings
from Linux's CONFIG_DEBUG_PREEMPT.

There is no need to do it. The expected way to get CPU ID while allowing
for migration is to use raw_smp_processor_id.

In order to make this future-proof this patch keeps CPU_SEQID as is and
introduces CPU_SEQID_UNSTABLE instead, to make it clear that consumers
explicitly want this behavior.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11142
2020-11-02 11:51:12 -08:00
Matthew Macy 8583540c6e Consolidate zfs_holey and zfs_access
The zfs_holey() and zfs_access() functions can be made common
to both FreeBSD and Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11125
2020-10-31 09:40:08 -07:00
Ryan Moeller 2f94e8f09e Remove duplicate cond_resched() definition
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11131
2020-10-31 09:37:56 -07:00
Ryan Moeller 65a343bbd3 zvol_os: Fix handling of zvol private data
zvol private data is supposed to be nulled by zvol_clear_private before
zvol_free is called as an indicator that the zvol is going away.

Implement zvol_clear_private for volmode=dev.

Assert that zvol_clear_private has been called before zvol_free.

Check that zvol_clear_private has not been called when updating
volsize.  If it has, fail with ENXIO.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:49 -07:00
Ryan Moeller 277884ab42 zvol_os: Don't leak doi in cdev error path
Make sure to free doi in zvol_create_minor impl when make_dev_s fails.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:43 -07:00
Ryan Moeller 9a0ef216e5 zvol_os: Properly ignore error in volmode lookup
We fall back to a default volmode and continue when looking up a zvol's
volmode property fails.  After this we should set the error to 0 to
ensure we take the success paths in the out section.

While here, make sure we only log that the zvol was created on success.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:36 -07:00
Ryan Moeller 1a6a75ac07 zvol_os: Code cleanup in zvol_create_minor_impl
Nonfunctional changes for readability and consistency.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:30 -07:00
Ryan Moeller 260f6a28af zvol_os: Keep better track of open count in close
zvol_geom_close gets a count of the number of close operations to do.

Make sure we're always using this count to check if this will be the
last close operation performed on the zvol.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:23 -07:00
Ryan Moeller 0b32d81783 zvol_os: Tidy up asserts
Using more specific assert variants gives better messages on failure.

No functional change.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:15 -07:00
Mateusz Guzik c4ede65bdf zstd: track allocator statistics
Note that this only tracks sizes as requested by the caller.
Actual allocated space will almost always be bigger (e.g., rounded up to
the next power of 2 or page size). Additionally the allocated buffer may
be holding other areas hostage. Nonetheless, this is a starting point
for tracking memory usage in zstd.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11129
2020-10-30 15:26:10 -07:00
Attila Fülöp e8beeaa111 ICP: gcm: Allocate hash subkey table separately
While evaluating other assembler implementations it turns out that
the precomputed hash subkey tables vary in size, from 8*16 bytes
(avx2/avx512) up to 48*16 bytes (avx512-vaes), depending on the
implementation.

To be able to handle the size differences later, allocate
`gcm_Htable` dynamically rather then having a fixed size array, and
adapt consumers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11102
2020-10-30 15:24:21 -07:00
Attila Fülöp d9655c5b37 Add some missing cfi frame info in aesni-gcm-x86_64.S
While preparing #9749 some .cfi_{start,end}proc directives
were missed. Add the missing ones.

See upstream https://github.com/openssl/openssl/commit/275a048f

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11101
2020-10-30 15:23:18 -07:00
Mateusz Guzik 115216cc92 FreeBSD: catch up with 1300124 version bump
- use cache_vop_mkdir
- cache_rename -> cache_vop_rename

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11136
2020-10-30 15:22:04 -07:00
Ryan Moeller d1e4ded7bc FreeBSD: Fix 12.2-STABLE after AT_BENEATH MFC
AT_BENEATH was merged to stable/12, where kern_unlinkat takes a 
non-const path.  DECONST the path passed to kern_unlinkat in the 
case where AT_BENEATH is defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11139
2020-10-30 15:19:02 -07:00
Matthew Macy 5fa356ea44 Remove UIO_ZEROCOPY functions structures
The original xuio zero copy functionality has always been unused 
on Linux and FreeBSD.  Remove this disabled code to avoid any
confusion and improve readability.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11124
2020-10-30 10:00:33 -07:00
Alexander Motin 1199c3e8fb Yield periodically when rebuilding L2ARC
L2ARC devices of several terabytes filled with 4KB blocks may take 15
minutes to rebuild.  Due to the way L2ARC log reading is implemented
it is quite likely that for all that time rebuild thread will never
sleep.  At least on FreeBSD kernel threads have absolute priority and
can not be preempted by threads with lower priorities.  If some thread
is also bound to that specific CPU it may not get any CPU time for all
the 15 minutes.

Reviewed-by: Cedric Berger <cedric@precidata.com>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11116
2020-10-30 08:57:54 -07:00
Ryan Moeller 76d04993a6 Update references to nonexistent man pages in code
Refer to the correct section or alternative for FreeBSD and Linux.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11132
2020-10-30 08:55:59 -07:00
Alexander Motin e3a6ac8d06 FreeBSD: Remove BIO_ORDERED flag from BIO_FLUSH
ZFS always waits for the write completion before flushing the cache.
That is why it does not require explicit ordering fences around it,
which are pretty difficult to implement for NVMe, since one has no
internal concept of strict request ordering.

This was already removed from FreeBSD once, but got resurrected
by mistake during OpenZFS merge.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11130
2020-10-30 08:50:57 -07:00
Tony Hutter f829227f49 ZTS: Fix xattr_004_pos failure, don't use tmpfs
Previously, xattr_004_pos would create files with xattrs on both
tmpfs and ext2, and then copy them to zfs to verify that their
xattrs were preserved.  However tmpfs doesn't support xattrs.

This was never noticed until Fedora 33.  In Fedora 32 and older,
/tmp was on the root partition (like ext4), whereas on Fedora 33
/tmp is actually tmpfs.  That caused this test to fail on Fedora 33.

This fix updates the test to only create the file on ext2, not tmpfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11133
2020-10-30 08:47:42 -07:00
Mateusz Guzik 973ba682f5 Linux: g/c leftover fence in zfs_znode_alloc
The port removed provisions for zfs_znode_move but the cleanup missed
this bit. To quote the original:

[snip]
    list_insert_tail(&zfsvfs->z_all_znodes, zp);
    membar_producer();
    /*
     * Everything else must be valid before assigning z_zfsvfs makes the
     * znode eligible for zfs_znode_move().
     */
    zp->z_zfsvfs = zfsvfs;
[/snip]

In the current code it is immediately followed by unlock which issues
the same fence, thus plays no role in correctness.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11115
2020-10-29 09:54:20 -07:00
Mateusz Guzik 082ff328f2 FreeBSD: g/c unused zfs_znode_move support
The allocator does not provide the functionality to begin with.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11114
2020-10-29 09:52:50 -07:00
Brian Behlendorf 4ce728d028 Use known license string for zlua
The Linux kernel MODULE_LICENSE macro only recognizes a handful of
license strings and "MIT" is not one of the them.  Update the macro
to use "Dual MIT/GPL" which is recognized and what the kernel expects
MIT licensed modules to use.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11112
Closes #11113
2020-10-27 09:43:36 -07:00
Ryan Moeller 5c810ac499 FreeBSD: Skip RAW kstat sysctls by default
These kstats are often expensive to compute so we want to avoid them
unless specifically requested.

The following kstats are affected by this change:

kstat.zfs.${pool}.multihost
kstat.zfs.${pool}.misc.state
kstat.zfs.${pool}.txgs
kstat.zfs.misc.fletcher_4_bench
kstat.zfs.misc.vdev_raidz_bench
kstat.zfs.misc.dbufs
kstat.zfs.misc.dbgmsg

In FreeBSD 13, sysctl(8) has been updated to still list the
names/description/type of skipped sysctls so they are still
discoverable.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11099
2020-10-26 14:34:28 -07:00
Mateusz Guzik 01a65c5861 FreeBSD: catch up with 1300123 version bump
- removed thread argument from VOP_INACTIVE
- removed cred argument from VOP_VPTOCNP

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11104
2020-10-26 14:32:17 -07:00
Cy Schubert 3928ec5339 Restore identification of VDEVs using non-native block size
NAME         STATE     READ WRITE CKSUM
dsk02        ONLINE       0     0     0
  mirror-0   ONLINE       0     0     0
    ada1s4a  ONLINE       0     0     0
    ada2s4a  ONLINE       0     0     0  block size: 512B configured, 4096B native

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Toomas Soome <tsoome@me.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed off by: Cy Schubert <cy@FreeBSD.org>
Closes #11088
2020-10-22 12:15:17 -07:00
xtouqh 1e36af8c7b Properly format NAME subsection of zfs/zpool subcommands
Use proper names (i.e. zfs-allow and zpool-add) in NAME subsections
of zfs/zpool subcommands instead of current "pretty-printed" ones as
makewhatis utilities (or some implementations of it, namely the one
from mandoc suite used in FreeBSD) look not only at the document title
but also in NAME subsection, adding zfs(8)/zpool(8) to search results
which is not correct. (Common sense and other utilities splitting
subcommands in multiple man pages, e.g. git, do the same.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: xtouqh <xtouqh@hotmail.com>
Closes #11086
2020-10-22 11:28:10 -07:00
Ryan Moeller eb02a4c6fb Add missing zfs_arc_evict_batch_limit tunable
It's even documented already.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11094
2020-10-22 10:18:26 -07:00
Ryan Moeller 2aaab887bb arcstat: Add -a and -p options from FreeNAS
Added -a option to automatically print all valid statistics.
Added -p option to suppress scaling of printed data.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored by: Nick Principe <32284693+powernap@users.noreply.github.com>
Ported-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11090
2020-10-21 14:09:14 -07:00
Matthew Macy e53d678d4a Share zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSD
The zfs_fsync, zfs_read, and zfs_write function are almost identical
between Linux and FreeBSD.  With a little refactoring they can be
moved to the common code which is what is done by this commit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11078
2020-10-21 14:08:06 -07:00
Adam D. Moss 666aa69f32 Non-l2arc pool reads shouldn't be l2arc misses
The current l2_misses accounting behavior treats all reads to pools 
without a configured l2arc as an l2arc miss, IFF there is at least 
one other pool on the system which does have an l2arc configured.

This makes it extremely hard to tune for an improved l2arc hit/miss 
ratio because this ratio will be modulated by reads from pools which 
do not (and should not) have l2arc devices; its upper limit will 
depend on the ratio of reads from l2arc'd pools and non-l2arc'd pools.

This PR prevents ARC reads affecting l2arc stats (n.b. l2_misses is 
the only relevant one) where the target spa doesn't have an l2arc.

Includes new test - l2arc_l2miss_pos.ksh

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #10921
2020-10-20 11:39:52 -07:00
Kyle Evans 241c62bdd7 Makefile.bsd: remove directory that no longer exists
This was removed in a reorganization of directories preparing for the
merge of FreeBSD support, 006e9a4088 by mmacy. While llvm is perfectly
happy with the nonexistent -I directory, the gcc6 and gcc9 we can elect
to use as cross-toolchains both trip over it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #11077
2020-10-20 11:34:59 -07:00
Matthew Macy ff2f54246d FreeBSD: delete unreferenced file
zfs_onexit_os.c was not deleted when it was removed from the build

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11079
2020-10-20 08:53:16 -07:00
Ryan Moeller 777b8ccc35 Fix commitcheck on FreeBSD
Convert from bash to sh, avoid Perl regexes and \s, prune unused
functions.

Reviewed-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11070
2020-10-20 08:35:53 -07:00
Don Brady 13d65987a9 zed syslog entries drop important info
ZED will log zevents summaries to the syslog, however the log entries 
tend to drop event details that can be useful for diagnosis. This is 
especially true for ereport events, like io, checksum, and delay.

Update the all-syslog.sh script to log additional event information.

Add an optional config option, ZED_SYSLOG_DISPLAY_GUIDS, to zed.rc
for choosing GUIDs over names for pool and vdev.

Change the default ZED_SYSLOG_SUBCLASS_EXCLUDE to exclude history_event 
events. These events tend to be frequent, convey no meaningful info, 
and are already logged in the zpool history.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #10967
2020-10-19 11:01:00 -07:00
Ryan Moeller ab6a0e236e Ignore zpool_influxdb binary
This was requested but forgotten in #10786.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11071
2020-10-16 13:21:28 -07:00
Mateusz Guzik 41e2b3de13 FreeBSD: add missing fplookup_vexec handler to special vop vectors
Otherwise lookup can fail with EOPNOTSUPP or panic.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11066
2020-10-15 14:49:06 -07:00
Mateusz Guzik 34cda44af6 FreeBSD: g/c unused vop vector zfsctl_ops_shares_dir
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11066
2020-10-15 14:48:20 -07:00
Don Brady dff71c7936 Ignore special vdev ashift for spa ashift min/max
The removal of a vdev in the normal class would fail if there was a 
special or deup vdev that had a different ashift than the vdevs in 
the normal class.

Moved the initialization of spa_min_ashift / spa_max_ashift from 
vdev_open so that it occurs after the vdev allocation bias was 
initialized (i.e. after vdev_load).

Caveat -- In order to remove a special/dedup vdev it must have the 
same ashift as the normal pool vdevs.  This could perhaps be lifted 
in the future (i.e. for the case where there is ample space in any 
surviving special class vdevs)

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #9363
Closes #9364
Closes #11053
2020-10-15 14:45:16 -07:00
Christian Schwarz 15a4ca4620 Fix crash caused by invalid snapshot names in redactnvl
This is a follow up fix for commit 0fdd6106bb.  The VERIFY is
only true when we haven't hit an error code path.  See added
test case for a reproducer.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11048
2020-10-14 14:04:19 -07:00
Paul Dagnelie 6a60ef80e2 Fix incorrect deletion order in range_tree_add_impl gap case
After a side-effectful call like add or remove, references to range 
segs stored in btrees can no longer be used safely.  We move the 
remove call to just before the reinsertion call so that the seg 
remains valid for as long as we need it.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11044 
Closes #11056
2020-10-14 08:59:54 -07:00
Mateusz Guzik 47a7e99939 FreeBSD: fix panic due to tqid overflow
The 32-bit counter eventually wraps to 0 which is a sentinel for invalid
id.

Make it 64-bit on LP64 platforms and 0-check otherwise.

Note: Linux counterpart uses id stored per queue instead of a global.
I did not check going that way is feasible with the goal being the
minimal fix doing the job.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11059
2020-10-14 08:57:03 -07:00
Ryan Moeller 485b50bb9e Cross-platform acltype
The acltype property is currently hidden on FreeBSD and does not
reflect the NFSv4 style ZFS ACLs used on the platform.  This makes it
difficult to observe that a pool imported from FreeBSD on Linux has a
different type of ACL that is being ignored, and vice versa.

Add an nfsv4 acltype and expose the property on FreeBSD.

Make the default acltype nfsv4 on FreeBSD.

Setting acltype to an unhanded style is treated the same as setting
it to off.  The ACLs will not be removed, but they will be ignored.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10520
2020-10-13 21:25:48 -07:00
Warner Losh b302185a92 FreeBSD: make adjustments for the standalone environment
In FreeBSD, there are three compile environments that are supported:
user land, the kernel and the bootloader / standalone. Adjust the
headers to compile in the standalone environment. Limit kernel-only
items from view when _STANDALONE is defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Warner Losh <imp@FreeBSD.org>
Closes #10998
2020-10-13 21:05:49 -07:00
Matthew Macy 57dc5d42b1 dmu_zfetch: don't leak unreferenced stream when zfetch is freed
Currently streams are only freed when:
  - They have no referencing zfetch and and their I/O references
    go to zero.
  - They are more than 2s old and a new I/O request comes in on
    the same zfetch.

This means that we will leak unreferenced streams when their zfetch
structure is freed.

This change checks the reference count on a stream at zfetch free
time. If it is zero we free it immediately. If it has remaining
references we allow the prefetch callback to free it at I/O
completion time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11052
2020-10-13 21:03:36 -07:00
Warner Losh 6ba2e72b78 aarch64: Use proper guards for NEON instructions
The zstd code assumes that if you are on aarch64, you have NEON
instructions. This is not necessarily true. In a boot loader, where
you might not have the VFP properly initialized, these instructions
may not be available. It's also an error to include arm_neon.h when
the NEON insturctions aren't enabled. Change the guards for using the
NEON instructions from __aarch64__ to __ARM_NEON which is the standard
symbol for knowing if they are available.

__ARM_NEON is the proper symbol, defined in ARM C Language Extensions
Release 2.1 (https://developer.arm.com/documentation/ihi0053/d/). Some
sources suggest __ARM_NEON__, but that's the obsolete spelling from
prior versions of the standard.

Updated based on zstd pull request https://github.com/facebook/zstd/pull/2356

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@bsdimp.com>
Closes #11055
2020-10-13 21:01:40 -07:00
Adam D. Moss 92286311f8 Add zfs.sh module unload error message
If modules fail to unload because of outstanding users, don't 
consider this a success.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11042
2020-10-13 16:51:54 -07:00
Christian Schwarz 701f656b97 dmu.h: remove stale declaration dmu_objset_snapshot_tmp
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11047
2020-10-13 16:46:00 -07:00
Mateusz Guzik 2ce14cdf68 FreeBSD: use cache_rename if available
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11045
2020-10-13 16:41:26 -07:00
Mathieu Velten b3a216fba0 blkg_tryget config test: initialize struct
Missing struct initialization in a config test results in the
interface being incorrectly detected.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Mathieu Velten <matmaul@gmail.com>
Closes #10713 
Closes #11049
2020-10-13 16:36:36 -07:00
Kjeld Schouten-Lebbing 4b59c195ff Increase Supported Linux Kernel to 5.9
This increases the Linux kernel version to 5.9 from 5.8
as most compatibility fixes should already be included.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #11050
2020-10-13 09:51:13 -07:00
Ryan Moeller e191b60ddc FreeBSD: Improve libzfs_error_init messages
It is a common mistake to have failed to autoload the module due to
permission issues when running a ZFS command as a user.  "Operation
not permitted" is an unhelpfully vague error message.

Use a thread-local message buffer to format a nicer error message.
We can infer that loading the kernel module failed if the module is
not loaded.  This can be extended with heuristics for other errors
in the future.

While looking at this stuff, remove an unused thread-local message
buffer found in libspl and remove some inaccurate verbiage from the
comment on libzfs_load_module.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11033
2020-10-13 09:38:40 -07:00
Ryan Moeller 7dfc56d866 Expose zfetch_max_idistance tunable
FreeBSD had this value tunable before the switch to the new OpenZFS.
The tunable name has changed, breaking legacy compat.

Restore legacy compat for this tunable, properly expose the tunable
with the new name on all platforms, and document it in
zfs-module-parameters(5).

While here, clean up the documentation for zfetch_max_distance a bit.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11038
2020-10-13 09:32:34 -07:00
Christian Schwarz 61868bb14d zil_parse: make callback parameters const
Code cleanup, a follow up commit to 4d55ea81.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Ryan Moeller <ryan@freqlabs.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11020
2020-10-09 09:34:54 -07:00
Richard Elling e9527d44e6 Add zpool_influxdb command
A zpool_influxdb command is introduced to ease the collection
of zpool statistics into the InfluxDB time-series database.
Examples are given on how to integrate with the telegraf
statistics aggregator, a companion to influxdb.

Finally, a grafana dashboard template is included to show
how pool latency distributions can be visualized in a
ZFS + telegraf + influxdb  + grafana environment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com>
Closes #10786
2020-10-09 09:29:21 -07:00
Ryan Moeller b7ab7ae241 Linux: Initialize zp in zfs_setattr_dir
The value of zp is used without having been initialized under some
conditions.  Initialize the pointer to NULL.

Add a regression test case using chown in acl/posix.  However, this is
not enough because the setup sets xattr=sa, which means zfs_setattr_dir
will not be called.  Create a second group of acl tests in acl/posix-sa
duplicating the acl/posix tests with symlinks, and remove xattr=sa from
the original acl/posix tests.  This provides more coverage for the
default xattr=on code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10043
Closes #11025
2020-10-09 09:27:14 -07:00
Brian Behlendorf d0249a4bd0 Replace ZFS on Linux references with OpenZFS
This change updates the documentation to refer to the project
as OpenZFS instead ZFS on Linux.  Web links have been updated
to refer to https://github.com/openzfs/zfs.  The extraneous
zfsonlinux.org web links in the ZED and SPL sources have been
dropped.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11007
2020-10-08 20:10:13 -07:00
Jacob Adams 07f5d4d663 Fix Linux modules uninstall
A missing semicolon between kmoddir variable declaration and the
uninstall for loop caused modules_uninstall-Linux to fail with:

    Syntax error: "do" unexpected

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jacob Adams <jacob@tookmund.com>
Closes #11032
2020-10-08 20:07:10 -07:00
Ryan Moeller 84cfe5d4f8 ZTS: Fix path to /dev/null in nopwrite_recsize
Don't direct stdout and stderr of dd to $TEST_BASE_DIR/null,
direct it to /dev/null.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11026
2020-10-08 16:39:23 -07:00
Chuck Tuffli a8fc1b8743 Fix ubsan: shift exponent is too large
When running libzpool with the Undefined Behavior Sanitizer (ubsan)
enabled, a zpool create causes a run-time error:

    module/zfs/vdev_label.c:600:14: runtime error: shift exponent 64 is
    too large for 64-bit type 'long long unsigned int'`

in vdev_config_generate()

Fix is to convert vdev_removal_max_span to its base-2 logarithm, using
highbit64(), and then compare the "shifts".

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Chuck Tuffli <ctuffli@gmail.com>
Closes #9744
Closes #11024
2020-10-08 16:37:27 -07:00
Christian Schwarz 36482bf607 libzfs_sendrecv: zfs_send: remove unused pipefd and tid variables
fixup of 196bee4

On gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), the code removed
caused `-Wmaybe-uninitialized` errors.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11021
2020-10-08 09:43:51 -07:00
Ryan Moeller 73989f4b9e Make dbufstat work on FreeBSD
With procfs_list kstats implemented for FreeBSD, dbufs are now exposed
as kstat.zfs.misc.dbufs.

On FreeBSD, dbufstats can use the sysctl instead of procfs when no
input file has been given.

Enable the dbufstats tests on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11008
2020-10-08 09:40:23 -07:00
Ryan Moeller 82b81a2acd FreeBSD: Sort and dedup includes in kmod_core
Code cleanup. Sort includes, remove duplicates, and drop
some extra blank lines in kmod_core.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11000
2020-10-08 09:37:56 -07:00
George Melikov b0c9239f17 docs: update README's installation link
OpenZFS is a cross-OS project now.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11022
2020-10-08 09:33:53 -07:00
George Amanakis a76e4e6761 Make L2ARC tests more robust
Instead of relying on arbitrary timers after pool export/import or cache
device off/online rely on arcstats. This makes the L2ARC tests more
robust. Also cleanup some functions related to persistent L2ARC.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10983
2020-10-05 15:29:05 -07:00
Toomas Soome 4e84f67a96 zdb should not output binary data on terminal
The zdb is interpreting byte array as textual string in dump_zap,
but there are also binary arrays and we should not output binary
data on terminal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
External-issue: https://www.illumos.org/issues/12012
External-issue: https://www.illumos.org/issues/11713
Closes #11006
2020-10-05 14:05:28 -07:00
Ryan Moeller 79f0935fab FreeBSD: Sort out kernel FPU headers for 12.1-REL
We were missing an include for kernel FPU functions, breaking the build
on FreeBSD 12.1-RELEASE.  This was apparently being pulled in from
elsewhere on stable/12 and head.

Sorted the other includes in these files while here.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11005
2020-10-02 17:48:45 -07:00
Alan Somers a132c2b413 Fix EIO after resuming receive of new dataset over an existing one
When resuming an interrupted ZFS send stream that creates a new dataset
with the same name as an existing dataset, if the existing dataset is
accessed after the failed receive, then after the subsequent successful
receive it will return EIO. This happens because nothing mounts the new
dataset, leaving the old, no longer valid dataset still mounted.

This commit fixes zfs receive to always unmount and remount the
destination, regardless of whether the stream is a new stream or a
resumed stream.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249579
Closes #10995
Closes #10999
2020-10-02 17:47:09 -07:00
Ryan Moeller 4d55ea811d Throw const on some strings
In C, const indicates to the reader that mutation will not occur.
It can also serve as a hint about ownership.

Add const in a few places where it makes sense.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10997
2020-10-02 17:44:10 -07:00
John Poduska 5b525165e9 Mismatched nvlist names in zfs_keys_send_space
This causes "zfs send -vt ..." to fail with:

    cannot resume send: Unknown error 1030

It turns out that some of the name/value pairs in the verification
list for zfs_ioc_send_space(), zfs_keys_send_space, had the wrong
name, so the ioctl got kicked out in zfs_check_input_nvpairs().
Update the names accordingly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes #10978
2020-10-02 17:40:46 -07:00
Brian Behlendorf 266d1121c3 Fix buggy procfs_list_seq_next warning
The kernel seq_read() helper function expects ->next() to update
the passed position even there are no more entries.  Failure to
do so results in the following warning being logged.

    seq_file: buggy .next function procfs_list_seq_next [spl]
    did not update position index

Functionally there is no issue with the way procfs_list_seq_next()
is implemented and the warning is harmless.  However, we want to
silence this some what scary incorrect warning.  This commit
updates the Linux procfs code to advance the position even for
the last entry.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10984 
Closes #10996
2020-09-30 13:27:51 -07:00
Ryan Moeller d688beb191 FreeBSD: Fix legacy compat for platform IOCs
The request number is out of bounds of the platform table.

Subtract the starting offset to get the correct subscript.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10994
2020-09-30 13:25:50 -07:00
Matthew Macy 1cb8202b1b Eliminate gratuitous bzeroing in dbuf_stats_hash_table_data
`dbuf_stats_hash_table_data` can take much longer than it needs to
by repeatedly bzeroing its buffer when in fact the buffer only needs
to be NULL terminated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10993
2020-09-30 13:24:38 -07:00
Sebastian Gottschall 8a171ccd92 do a cyclic seek for unused memory objects in pool
In non regular use cases allocated memory might stay persistent in memory
pool. This small patch checks every minute if there are old objects which
can be released from memory pool.

Right now with regular use, the pool is checked for old objects on each
allocation attempt from this pool. so basically polling by its use. Now
consider what happens if someone writes a lot of files and stops use of
the volume or even unmounts it. So the code will no longer check if
objects can be released from the pool. Already allocated objects will
still stay in pool cache. this is no big issue for common use. But
someone discovered this issue while doing tests. personally i know this
behavior and I'm aware of it. Its no big issue. just a enhancement

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Closes #10938 
Closes #10969
2020-09-30 13:22:34 -07:00
Ryan Moeller c0bd2e0fe2 Drop references when skipping dmu_send due to EXDEV
When an invalid incremental send is requested where the "to" ds is
before the "from" ds, make sure to drop the reference to the pool
and the dataset before returning the error.

Add an assert on FreeBSD to make sure we don't hold any locks after
returning from an ioctl.

Add some test coverage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10919
2020-09-30 13:19:49 -07:00
Kjeld Schouten-Lebbing 68cdafdbb8 Add intel_QAT patches
Add community compatibility patches for Intel QAT
Due to incompatibility with higher kernel versions.

Also includes basic instructions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10961 
Closes #10962
2020-09-30 13:17:30 -07:00
Brian Behlendorf 5aa3e3d3be Use known license string for zzstd
The Linux kernel MODULE_LICENSE macro only recognizes a handful of
license strings and "BSD" is not one of the them.  Update the macro
to use "Dual BSD/GPL" which is recognized and what the kernel expects
BSD licensed module to use.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10982
Closes #10992
2020-09-28 18:43:27 -07:00
Brian Behlendorf 4c5159cc5c Fix CONFIG_DEBUG_LOCK_ALLOC configure check
This check was accidentally broken when the kABI checks were updated
to run in parallel, commit 608f874.  The check must be for the
config_debug_lock_alloc_license name to determine if the symbol
is license compatible.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10991
2020-09-28 16:42:54 -07:00
Brian Behlendorf 96951e0327 Fix objtool configure check
The m4 objtool configure check can incorrectly fail because of a
missing header in the test.  This appears to be the result of a
recent kernel change and was observed on the Fedora 5.8.11-200
kernel.

  In file included from /home/fedora/zfs/build/objtool/objtool.c:75:
  ./arch/x86/include/asm/frame.h:100:57: error: 'struct pt_regs'
      declared inside parameter list will not be visible outside
      of this definition or declaration [-Werror]

The consequence of this is that the "stack_frame_non_standard"
check is never run and HAVE_STACK_FRAME_NON_STANDARD is set
incorrectly which results in a build failure.  This change adds
the appropriate header to the "objtool" check so it now behaves
as intended.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10990
2020-09-28 16:40:50 -07:00
grodik 53245fe337 Note that keys must be loaded for 'zpool remove'
The error returned by `zpool remove` when the encryption keys aren't
loaded isn't very helpful.  Furthermore, the man pages make no
mention that the keys need to be loaded. This change doesn't resolve
the error message but it does update the man page to mention this
requirement.

Authored-by: grodik <pat@litke.dev>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10939
Closes #10948
2020-09-28 16:11:57 -07:00
Kjeld Schouten-Lebbing 595b7e99d3 Document branching structure
This change documents the currently used branching structure.
It has been cut down to not include any controversial changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10976
2020-09-28 13:23:49 -07:00
Matthew Macy af20b97078 zfetch: Don't issue new streams when old have not completed
The current dmu_zfetch code implicitly assumes that I/Os complete
within min_sec_reap seconds. With async dmu and a readonly workload
(and thus no exponential backoff in operations from the "write
throttle") such as L2ARC rebuild it is possible to saturate the drives
with I/O requests. These are then effectively compounded with prefetch
requests.

This change reference counts streams and prevents them from being
recycled after their min_sec_reap timeout if they still have
outstanding I/Os.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10900
2020-09-27 17:08:38 -07:00
Allan Jude cf2667759f zfs userspace: use zfs_path_to_zhandle so argument can be a path
Change zfs userspace subcommand to use zfs_path_to_zhandle() so that
the provided dataset can be a path (/usr) or a dataset (rpool/usr).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #8915
2020-09-25 14:37:10 -07:00
Adam D. Moss acfd2d4641 Add DB_RF_NOPREFETCH to dbuf_read()s in dnode.c
Prefetching of dnodes in dbuf_read() can cause significant mutex 
contention for some workloads and isn't very helpful.  This is  
because we already get 32 dnodes for each block read, and when 
iterating over a directory we prefetch the dnodes in the directory.
Disable this prefetching to prevent the lock contention.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Submitted-by: Adam Moss <c@yotes.com>
Submitted-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #10877 
Closes #10953
2020-09-25 13:49:22 -07:00
Brian Behlendorf 2e407941a2 Fix PREEMPTION=y and BLK_CGROUP=y config on arm64
With PREEMPTION=y and BLK_CGROUP=y preempt_schedule_notrace() is being
used on arm64 which is a GPL-only function and hence the build of the
DKMS kernel module fails.

Fix that by redefining preempt_schedule_notrace() to preempt_schedule()
which should be safe as long as tracing is not used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Closes #8545 
Closes #9948 
Closes #10416 
Closes #10973
2020-09-25 13:28:35 -07:00
Mateusz Guzik f6bb7c029c FreeBSD: update cache_purgevfs usage after 1300117 version bump
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Nick Wolff <darkfiberiru@gmail.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #10970
2020-09-25 13:23:43 -07:00
Ryan Moeller fa1912e80f FreeBSD: Code cleanup in zio_crypt
Address some unused value and control flow issues flagged by Coverity.

Unreachable code is pruned and unused values are avoided.
Some scattered sections are reordered for coherence.

We can assume kmem_alloc(n, KM_SLEEP) doesn't fail, so there is no need
to check if it returned NULL.  The allocated memory doesn't need to be
zeroed, other than the last iovec (the MAC).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10884
2020-09-25 13:12:35 -07:00
Ryan Moeller 863e38453e Prune dead branch reported by Coverity
wkey is NULL at every `goto error;`.
dcp is never NULL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10884
2020-09-25 13:11:53 -07:00
George Wilson bd76bcb36d zpool command complains about /etc/exports.d
If the /etc/exports.d directory does not exist, then we should only
create it when we're performing an action which already requires root
privileges.

This commit moves the directory creation to the enable/disable code
path which ensures that we have the appropriate privileges.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #10785
Closes #10934
2020-09-25 13:09:40 -07:00
Christian Schwarz a5c77dc4d5 zfs_log_write: simplify data copying code for WR_COPIED records
lr_write_t records that are WR_COPIED have the record data directly
appended to them (see lr_write_t type definition).

The data is copied from the debuf using dmu_read_by_dnode.

This function was called, only for WR_COPIED records, as part of a
short-circuiting if-statement's if-expression.

I found this side-effectful call to dmu_read_by_dnode pretty
hard to spot.
This patch improves readability by moving the call to its own line.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #10956
2020-09-25 13:06:34 -07:00
Matthew Macy 7b8363d7f0 FreeBSD: Add support for procfs_list
The procfs_list interface is required by several kstats. Implement
this functionality for FreeBSD to provide access to these kstats.
                           
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10890
2020-09-23 16:43:51 -07:00
Matthew Macy 3dad29fb4b FreeBSD: Don't save user FPU context in kernel threads
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10899
2020-09-23 11:09:48 -07:00
Kjeld Schouten-Lebbing eb267f08cf Update issue templates, commitcheck and Contributing.md
- Removes OpenZFS ports from commit check
- Removes OpenZFS ports from CONTRIBUTING.md
- Adds mailings lists and IRC to issue template selector
- Remove blank issue option from issue creator

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10965
2020-09-23 09:53:26 -07:00
Paul Dagnelie 20dfe8cd3b Don't set numobjs to UINT64_MAX or near it
Resolves an issue with `zfs send` streams from 0.8.4 which
prevents them from being received by versions < 0.7.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10911 
Closes #10916
2020-09-22 16:16:07 -07:00
наб e865e7809e contrib/initramfs: fix shellcheck and checkbashisms errors with shebang
Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #10908 
Closes #10917
2020-09-22 16:10:09 -07:00
George Amanakis c6f5e9d92f Restore clearing of L2CACHE flag in arc_read_done()
Commit 45152dc removed clearing of L2CACHE flag in arc_read_done() and
moved related code in l2arc_write_eligible(). After careful code
inspection arc_read_done() is not bypassed in the case of prefetches.
Thus restore the old behavior.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: adam moss <c@yotes.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10951
2020-09-22 16:08:05 -07:00
Mark Johnston 0daa0320e9 Fix a logic bug in the FreeBSD getpages VOP
In commit cd32b4f5b7 ("Fix a deadlock in the FreeBSD getpages VOP") I
introduced a bug while porting the patch originally committed to
FreeBSD: the rangelock pointer may be NULL if the try operation failed,
so we must avoid calling zfs_rangelock_unlock() in that case.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reported-by: Steve Wills <swills@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #10519 
Closes #10960
2020-09-22 16:05:52 -07:00
Ryan Moeller 5f8a9e6a02 FreeBSD: Reduce stack usage of Lua
Use the same reduced buffer size for lauxlib that is used on Linux.

Fixes panic on HEAD in lua gsub test designed to exhaust stack space.

With this we can remove the special case to reserve more stack space
on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kyle Evans <kevans@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10959
2020-09-22 16:03:11 -07:00
Mark Johnston 6bdb09510b Annontate FreeBSD sysctls with CTLFLAG_MPSAFE
Without this, the sysctl system calls will acquire a global lock before
invoking the handler.  This is noticeable in some situations when
running top(1).  The global lock is mostly vestigal but continues to see
some use and so contention is still a problem; until the default sense
of the MPSAFE flag changes, we have to annotate each and every handler.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #10836
2020-09-21 09:49:50 -07:00
Mark Johnston c50f3c902f Fix switch statement indentation in the FreeBSD kstat code
This is in preparation for some functional changes.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #10950
2020-09-21 09:49:12 -07:00
George Amanakis 37b00fb039 Update documentation of l2arc_mfuonly
with regard to evicted_l2_eligibile_mru. Even if l2arc_mfuonly is
enabled, this is not reflected in evicted_l2_eligible_mru as this
information is useful for deciding whether to toggle l2arc_mfuonly
depending on the current workload.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10945
2020-09-21 09:26:24 -07:00
George Wilson c494aa7f57 vdev_ashift should only be set once
== Motivation and Context

The new vdev ashift optimization prevents the removal of devices when
a zfs configuration is comprised of disks which have different logical
and physical block sizes. This is caused because we set 'spa_min_ashift'
in vdev_open and then later call 'vdev_ashift_optimize'. This would
result in an inconsistency between spa's ashift calculations and that
of the top-level vdev.

In addition, the optimization logical ignores the overridden ashift
value that would be provided by '-o ashift=<val>'.

== Description

This change reworks the vdev ashift optimization so that it's only
set the first time the device is configured. It still allows the
physical and logical ahsift values to be set every time the device
is opened but those values are only consulted on first open.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-Issue: DLPX-71831
Closes #10932
2020-09-18 12:13:47 -07:00
Allan Jude 908d43d0a9 libzfs: Don't leak buf if nvlist is too large
Resolves FreeBSD Coverity defect:
CID 1432398:  Resource leaks  (RESOURCE_LEAK)

libzfs: don't leak hdl if there is an error reading env var

Resolves FreeBSD Coverity defect:
CID 1432395:  Resource leaks  (RESOURCE_LEAK)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #10882
2020-09-18 10:23:29 -07:00
George Wilson 8e82ffba7b pool may become suspended during device expansion
When expanding a device zfs needs to rescan the partition table to
get the correct size. This can only happen when we're in the kernel
and requires the device to be closed. As part of the rescan, udev is
notified and the device links are removed and recreated. This leave a
window where the vdev code may try to reopen the device before udev
has recreated the link. If that happens, then the pool may end up in
a suspended state.

To correct this, we leverage the BLKPG_RESIZE_PARTITION ioctl which
allows the partition information to be modified even while it's in use.
This ioctl also does not remove the device link associated with the zfs
data partition so it eliminates the race condition that can occur in
the kernel.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #10897
2020-09-17 20:03:10 -07:00
Matthew Ahrens a57f954226 zdb leak detection fails with in-progress device removal
When a device removal is in progress, there are 2 locations for the data
that's already been moved: the original location, on the device that's
being removed; and the new location, which is pointed to by the indirect
mapping.  When doing leak detection, zdb needs to know about both
locations.  To determine what's already been copied, we load the
spacemaps of the removing vdev, omit the blocks that are yet to be
copied, and then use the vdev's remap op to find the new location.

The problem is with an optimization to the spacemap-loading code in zdb.
When processing the log spacemaps, we ignore entries that are not
relevant because they are past the point that's been copied.  However,
entries which span the point that's been copied (i.e. they are partly
relevant and partly irrelevant) are processed normally.  This can lead
to an illegal spacemap operation, for example if offsets up to 100KB
have been copied, and the spacemap log has the following entries:

	ALLOC 50KB-150KB (partly relevant)
	FREE 50KB-100KB (entirely relevant)
	FREE 100KB-150KB (entirely irrlevant - ignored)
	ALLOC 50KB-150KB (partly relevant)

Because the entirely irrelevant entry was ignored, its space remains in
the spacemap.  When the last entry is processed, we attempt to add it to
the spacemap, but it partially overlaps with the 100-150KB entry that
was left over.

This problem was discovered by ztest/zloop.

One solution would be to also ignore the irrelevant parts of
partially-irrelevant entries (i.e. when processing the ALLOC 50-150, to
only add 50-100 to the spacemap).  However, this commit implements a
simpler solution, which is to remove this optimization entirely.  I.e.
to process the entire spacemap log, without regard for the point that's
been copied.  After reconstructing the entire allocatable range tree,
there's already code to remove the parts that have not yet been copied.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-71820
Closes #10920
2020-09-17 10:55:30 -07:00
Ryan Moeller 3c7566cb0d FreeBSD: Do not copy vp into f_data for DTYPE_VNODE files
https://reviews.freebsd.org/D26346

Do not copy vp into f_data for DTYPE_VNODE files.  The vnode pointer is
already stored in f_vnode.  Use that so f_data can be reused.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10929
2020-09-17 10:54:14 -07:00
John Poduska 5bed68bdc4 Need a long hold in zpl_mount_impl
In zpl_mount_impl, there is:
    dmu_objset_hold	; returns with pool & ds held
    dsl_pool_rele

    sget

    dsl_dataset_rele

As spelled out in the "DSL Pool Configuration Lock" in dsl_pool.c,
this requires a long hold.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes #10936
2020-09-17 10:53:02 -07:00
Toomas Soome 741b20ce0c libzfsbootenv: lzbe_nvlist_set needs to store bootenv version VB_NVLIST
A small bug did slip into initial libzfsbootenv; while storing nvlist
in nvlist, we should make sure the bootenv is using VB_NVLIST format.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10937
2020-09-17 10:51:09 -07:00
Ryan Moeller 7ead2be3d2 Rename acltype=posixacl to acltype=posix
Prefer acltype=off|posix, retaining the old names as aliases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10918
2020-09-16 12:26:06 -07:00
Georgy Yakovlev 9cc177baa0 cmd/zgenhostid: replace with simple c implementation
It was discovered that dracut scripts and zgenhostid
always generate little-endian /etc/hostid.

This commit provides simple endianess-aware binary
and updates the scripts to use it.

New features include:
 -f flag to force overwrite.
 -o flag to write to different file (for dracut)
 accepting both 0x01234567 and 01234567 values as input

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #10887 
Closes #10925
2020-09-16 12:25:12 -07:00
Pavel Snajdr 9569c31161 Fix stack frame size: dnode_dirty_l1range()
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10879
2020-09-15 15:55:55 -07:00
Pavel Snajdr a1c5578ce0 dmu_redact_snap: fix possible memleak
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10879
2020-09-15 15:55:45 -07:00
Pavel Snajdr 8c0b16e6e9 Fix stack frame size: dmu_redact_snap()
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10879
2020-09-15 15:55:35 -07:00
Pavel Snajdr c95625769d Fix stack frame size: spa_livelist_delete_cb()
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10879
2020-09-15 15:55:03 -07:00
наб 5797d7cc4c zpoolprops.8: fix raidz par[i]ty typo
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #10923
2020-09-15 15:43:42 -07:00
Toomas Soome 1db9e6e4e4 zfs label bootenv should store data as nvlist
nvlist does allow us to support different data types and systems.

To encapsulate user data to/from nvlist, the libzfsbootenv library is
provided.

Reviewed-by: Arvind Sankar <nivedita@alum.mit.edu>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10774
2020-09-15 15:42:27 -07:00
Ryan Moeller 37325e4749 Linux: Prevent destruction while showing mount devname
Use ZFS_ENTER and ZFS_EXIT to protect datasets while their mount
devname is being retrieved.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10892
Closes #10927
2020-09-15 15:40:03 -07:00
George Amanakis 085321621e Add L2ARC arcstats for MFU/MRU buffers and buffer content type
Currently the ARC state (MFU/MRU) of cached L2ARC buffer and their
content type is unknown. Knowing this information may prove beneficial
in adjusting the L2ARC caching policy.

This commit adds L2ARC arcstats that display the aligned size
(in bytes) of L2ARC buffers according to their content type
(data/metadata) and according to their ARC state (MRU/MFU or
prefetch). It also expands the existing evict_l2_eligible arcstat to
differentiate between MFU and MRU buffers.

L2ARC caches buffers from the MRU and MFU lists of ARC. Upon caching a
buffer, its ARC state (MRU/MFU) is stored in the L2 header
(b_arcs_state). The l2_m{f,r}u_asize arcstats reflect the aligned size
(in bytes) of L2ARC buffers according to their ARC state (based on
b_arcs_state). We also account for the case where an L2ARC and ARC
cached MRU or MRU_ghost buffer transitions to MFU. The l2_prefetch_asize
reflects the alinged size (in bytes) of L2ARC buffers that were cached
while they had the prefetch flag set in ARC. This is dynamically updated
as the prefetch flag of L2ARC buffers changes.

When buffers are evicted from ARC, if they are determined to be L2ARC
eligible then their logical size is recorded in
evict_l2_eligible_m{r,f}u arcstats according to their ARC state upon
eviction.

Persistent L2ARC:
When committing an L2ARC buffer to a log block (L2ARC metadata) its
b_arcs_state and prefetch flag is also stored. If the buffer changes
its arcstate or prefetch flag this is reflected in the above arcstats.
However, the L2ARC metadata cannot currently be updated to reflect this
change.
Example: L2ARC caches an MRU buffer. L2ARC metadata and arcstats count
this as an MRU buffer. The buffer transitions to MFU. The arcstats are
updated to reflect this. Upon pool re-import or on/offlining the L2ARC
device the arcstats are cleared and the buffer will now be counted as an
MRU buffer, as the L2ARC metadata were not updated.

Bug fix:
- If l2arc_noprefetch is set, arc_read_done clears the L2CACHE flag of
  an ARC buffer. However, prefetches may be issued in a way that
  arc_read_done() is bypassed. Instead, move the related code in
  l2arc_write_eligible() to account for those cases too.

Also add a test and update manpages for l2arc_mfuonly module parameter,
and update the manpages and code comments for l2arc_noprefetch.
Move persist_l2arc tests to l2arc.

Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10743
2020-09-14 10:10:44 -07:00
Harald van Dijk d0cea309e7 config/zfs-build.m4: never define _initramfs in RPM_DEFINE_UTIL
The zfs-initramfs package has never worked as no RPM-based distribution
uses initramfs-tools, which is listed as a dependency of zfs-initramfs.

This would not ordinarily be a problem, as it is only enabled when
/usr/share/initramfs-tools is present, which should not normally be the
case on RPM-based distributions. However, other packages may install
unused files there even if initramfs-tools is not used, so remove this
auto-detection for the rpm-utils target.

This does not fully remove the logic for the zfs-initramfs package. This
splits it out into a separate rpm-utils-initramfs target so that the
Debian builds can still use it.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Harald van Dijk <harald@gigawatt.nl>
Closes #10898
2020-09-12 08:22:07 -07:00
Matthew Ahrens 9b77c57d5a libzutil depends on libnvpair
libzutil depends on libnvpair, but this dependency is undeclared in the
build system.  Therefore it isn't possible to make a new command that
depends on libzutil, but does not (directly) depend on libnvpair.

This commit makes this dependency explicit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reivewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10915
2020-09-12 08:19:48 -07:00
Mateusz Guzik 8e7fe49b25 FreeBSD: convert teardown inactive lock to a read-mostly sleepable lock
The lock is taken all the time and as a regular read-write lock
avoidably serves as a mount point-wide contention point.

This forward ports FreeBSD revision r357322.

To quote aforementioned commit:

Sample result doing an incremental -j 40 build:
before: 173.30s user 458.97s system 2595% cpu 24.358 total
after:  168.58s user 254.92s system 2211% cpu 19.147 total

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #10896
2020-09-09 10:15:52 -07:00
xdch47 c2c7ca0d6d Force the use of '.' as decimal separator.
This solves issues occurring with a different decimal operator and
keeps the command line interface consistent for all locales .
E.g. `zfs set quota=0.5T`

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Felix Neumärker <xdch47@posteo.de>
Closes #10878
2020-09-09 10:14:04 -07:00
Olaf Faaland a74259cea0 Initialize mmp_last_write when the mmp thread starts
A great deal of time may go by between when mmp_init() is called and
the MMP thread starts, particularly if there are bad devices, because
there is I/O checking configs etc.  If this time is too long,

    (gethrtime() - mmp_last_write) > mmp_fail_ns

at the time the MMP thread starts.  If MMP is configured to suspend
the pool, the pool will be suspended immediately.

This can be seen in issue #10838

The value of mmp_last_write doesn't matter before the mmp thread
starts.  To give the MMP thread time to issue and land MMP writes,
initialize mmp_last_write when the MMP thread starts.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #10873
2020-09-09 10:12:54 -07:00
Ryan Moeller cba6d86638 FreeBSD: drop dependency on cryptodev module
We only need the kernel interfaces in crypto, not the device node in
cryptodev.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10901
2020-09-09 10:10:32 -07:00
George Amanakis feb3a7eef1 Introduce ZFS module parameter l2arc_mfuonly
In certain workloads it may be beneficial to reduce wear of L2ARC
devices by not caching MRU metadata and data into L2ARC. This commit
introduces a new tunable l2arc_mfuonly for this purpose.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10710
2020-09-08 11:44:37 -07:00
Ryan Moeller ebc4b52369 Avoid possibility of division by zero
When hz > 1000, msec / (1000 / hz) results in division by zero.

I found somewhere in FreeBSD using howmany(msec * hz, 1000) to convert
ms to ticks, avoiding the potential for a zero in the divisor.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10894
2020-09-08 11:39:16 -07:00
Toomas Soome 189272f78a dnode_special_open() error: unchecked function return 'zrl_tryenter'
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10876
2020-09-08 11:36:52 -07:00
Peter Dave Hello 5a877a894e Add a missing option prefix - in zfs-tests.sh usage()
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Peter Dave Hello <hsu@peterdavehello.org>
Closes #10893
2020-09-08 09:04:36 -07:00
Fabio Buso 5266cf4826 Display pbkdf2iters property as plain number
The pbkdf2iters property is an iteration counter
and should be displayed as plain number rather
than in binary unit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fabio Buso <buso.fabio@gmail.com>
Closes #10871
2020-09-08 08:49:55 -07:00
alaviss 75bf636cd0 libshare: Add missing headers for nfs.c
On musl libc, zfs failed to compile due to the missing <fcntl.h>
include, which is required for `open()` per POSIX.

This commit add the missing <fcntl.h> include.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hiếu Lê <leorize+oss@disroot.org>
Closes #10880
2020-09-04 12:03:57 -07:00
Matthew Macy 7432d29760 FreeBSD: reduce priority of ZIO_TASKQ_ISSUE writes by a larger value
On FreeBSD, if priorities divided by four (RQ_PPQ) are equal then
a difference between them is insignificant. In other words,
incrementing pri by only one as on Linux is insufficient.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10872
2020-09-04 11:13:27 -07:00
Ryan Moeller ef55446a9c Spruce up pkg-config files for libzfs/libzfs_core
Several of the listed library dependencies are not relevant on FreeBSD.
Have ./configure save libraries that are found via pkg-config as
${LIB}_PC and use the configured automake variables instead of hard
coded names so we only get what was actually needed.

While here, update the URL to point at the OpenZFS Github repo.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10869
2020-09-04 11:11:18 -07:00
Ryan Moeller 3ec88ab205 man: Cross-reference zfs-load-key(8) for ENCRYPTION mention
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Submitted-by: Harry Schmalzbauer
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10866
2020-09-04 10:53:59 -07:00
Ryan Moeller 782b1c1249 man: Add zfs rename -r to zfs-rename(8) SYNOPSIS
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10866
2020-09-04 10:53:22 -07:00
Brian Behlendorf dce63135ae Sequential scrub and resilver updated comments
Commit d4a72f2 which introduced multi-phase scrubs and resilvers
continued the work presented by Nexenta at the 2016 ZFS developer
summit.  Update the source to reflect their contribution.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-09-04 10:51:51 -07:00
Don Brady 4f07282786 Avoid posting duplicate zpool events
Duplicate io and checksum ereport events can misrepresent that 
things are worse than they seem. Ideally the zpool events and the 
corresponding vdev stat error counts in a zpool status should be 
for unique errors -- not the same error being counted over and over. 
This can be demonstrated in a simple example. With a single bad 
block in a datafile and just 5 reads of the file we end up with a 
degraded vdev, even though there is only one unique error in the pool.

The proposed solution to the above issue, is to eliminate duplicates 
when posting events and when updating vdev error stats. We now save 
recent error events of interest when posting events so that we can 
easily check for duplicates when posting an error. 

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #10861
2020-09-04 10:34:28 -07:00
Matthew Ahrens 3808032489 nowait synctask must succeed
If a `zfs_space_check_t` other than `ZFS_SPACE_CHECK_NONE` is used with
`dsl_sync_task_nowait()`, the sync task may fail due to ENOSPC.
However, there is no way to notice or communicate this failure, so it's
extremely difficult to use this functionality correctly, and in fact
almost all callers use `ZFS_SPACE_CHECK_NONE`.

This commit removes the `zfs_space_check_t` argument from
`dsl_sync_task_nowait()`, and always uses `ZFS_SPACE_CHECK_NONE`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10855
2020-09-04 10:29:39 -07:00
Ryan Moeller cd80273909 Retain thread name when resuming a zthr
When created, a zthr is given a name to identify it by.  This name is
lost when a cancelled zthr is resumed.

Retain the name of a zthr so it can be used when resuming.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10881
2020-09-03 20:09:52 -07:00
Alexander Richardson f3064162ba Fixes for running FreeBSD buildworld on Linux/macOS hosts
Adding an #ifdef __FreeBSD__ to a FreeBSD-specific header may seem odd,
but these headers are used on non-FreeBSD systems during the bootstrap
tools phase.
Originally submitted downstream as https://reviews.freebsd.org/D26193

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Closes #10863
2020-09-03 20:06:03 -07:00
Matthew Macy ac6e5fb202 Replace cv_{timed}wait_sig with cv_{timed}wait_idle where appropriate
There are a number of places where cv_?_sig is used simply for
accounting purposes but the surrounding code has no ability to
cope with actually receiving a signal. On FreeBSD it is possible
to send signals to individual kernel threads so this could
enable undesirable behavior.

This patch adds routines on Linux that will do the same idle
accounting as _sig without making the task interruptible. On
FreeBSD cv_*_idle  are all aliases for cv_*

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10843
2020-09-03 20:04:09 -07:00
Garrett Fields f30703f6fc Remove 'ZFS on Linux' references from PR Template
As mentioned in the #OpenZFS IRC channel (thanks "Toomas Soome"):
The OpenZFS PR Template still mentions "ZFS on Linux".
This changes that reference and updates the URLs.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Garrett Fields <ghfields@gmail.com>
Closes #10868
2020-09-03 16:31:05 -07:00
Spencer Kinny 6fffc88bf3 Links in Source Files
Added comments in following files
with links to Illumos manual pages:

./module/avl/avl.c
./module/nvpair/nvpair.c
./module/os/linux/spl/spl-kstat.c
./module/os/freebsd/spl/spl_kstat.c

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Spencer Kinny <spencerkinny1995@gmail.com>
Closes #5113 
Closes #10859
2020-09-02 09:42:12 -07:00
Toomas Soome 0db1b84a6d zvol: unsigned off can not be less than zero
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10867
2020-09-02 09:30:29 -07:00
Alexander Richardson 417e646722 Fix -Werror,-Wmacro-redefined in limits.h
Those macros are also defined by the compiler-provided float.h which
will be included later on (at least in the FreeBSD buildworld case) and
triggers these -Werror warnings. Including <float.h> first and only
defining the macros when DBL_DIG/FLT_DIG is missing fixes this problem.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Closes #10864
2020-09-01 16:22:09 -07:00
Ryan Moeller 964791acdc Make spa_stats.c tunables visible on FreeBSD
Use ZFS_MODULE_PARAM for cross-platform tunables in spa_stats.c, and
add update tunables.cfg in tests for the newly supported ones.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10858
2020-09-01 16:19:19 -07:00
Matthew Macy e84e49218f FreeBSD: Fix up after spa_stats.c move
Moving spa_stats added the additional burden of supporting
KSTAT_TYPE_IO.

spa_state_addr will always return a valid value regardless of
the value of 'n'. On FreeBSD this will cause an infinite loop
as it relies on the raw ops addr routine to indicate that there
is no more data.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10860
2020-09-01 16:16:56 -07:00
Ryan Moeller 7b4e27232d Add 'zfs rename -u' to rename without remounting
Allow to rename file systems without remounting if it is possible.
It is possible for file systems with 'mountpoint' property set to
'legacy' or 'none' - we don't have to change mount directory for them.
Currently such file systems are unmounted on rename and not even
mounted back.

This introduces layering violation, as we need to update
'f_mntfromname' field in statfs structure related to mountpoint (for
the dataset we are renaming and all its children).

In my opinion it is worth it, as it allow to update FreeBSD in even
cleaner way - in ZFS-only configuration root file system is ZFS file
system with 'mountpoint' property set to 'legacy'. If root dataset is
named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone
it (system/oldrootfs), update FreeBSD and if it doesn't boot we can
boot back from system/oldrootfs and rename it back to system/rootfs
while it is mounted as /. Before it was not possible, because
unmounting / was not possible.

Authored by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported by: Matt Macy <mmacy@freebsd.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10839
2020-09-01 16:14:16 -07:00
Ryan Moeller 88d19d7cc2 FreeBSD: Remove unused SECLABEL code
SECLABEL is undefined on FreeBSD and should be pruned.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10847
2020-08-31 19:52:46 -07:00
Ryan Moeller 46b7d53baf libspl: Provide platform-specific zone implementations
FreeBSD has the concept of jails, a precursor to Solaris's zones, which
can be mapped to the required zones interface with relative ease.  The
previous ZFS implementation in FreeBSD did so, and we should continue
to provide an appropriate implementation in OpenZFS as well.

Move lib/libspl/zone.c into platform code and adopt the correct
implementation for FreeBSD.

While here, prune unused code.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10851
2020-08-31 19:43:30 -07:00
Ryan Moeller eff621071f FreeBSD: Simplify INGLOBALZONE
FreeBSD's previous ZFS implemented INGLOBALZONE(thread) as
(!jailed((thread)->td_ucred)) and passed curthread to INGLOBALZONE.

We pass curproc instead of curthread, so we can achieve the same effect
with (!jailed((proc)->p_ucred)).  The implementation is trivial enough
to fit on a single line in a define.  We don't really need a whole
separate function for something that's already macros all the way down.

Eliminate in_globalzone.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10851
2020-08-31 19:43:08 -07:00
Ryan Moeller 2f65c7a608 FreeBSD: Define crgetzoneid appropriately
The previous ZFS implementation on FreeBSD had ifdefs to use jailed()
instead of crgetzoneid() in dsl_dir.c, however we can simply provide an
appropriate definition of crgetzoneid for the same effect.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10851
2020-08-31 19:42:26 -07:00
Toomas Soome 1144586b57 zio_ereport_post() and zio_ereport_start() return values are ignored
use (void) to silence analyzers.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10857
2020-08-31 19:35:11 -07:00
Spencer Kinny abe4fbfd01 Typo Correction
Corrected the typo in zfs/cmd/zfs/zfs_main.c
line number 404 pbkfd2iters to pbkdf2iters

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Spencer Kinny <spencerkinny1995@gmail.com>
Closes #10850
2020-08-30 14:14:32 -07:00
Matthew Macy 7bb18b94c7 Move spa_stats.c to common code
Initially it was considered simplest to stub out all
of the functions on FreeBSD. Now that FreeBSD supports
KSTAT_TYPE_RAW at least some of the functionality should
be made available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10842
2020-08-30 14:12:46 -07:00
Matthew Macy cd23154a8c FreeBSD: Fix spurious failure in zvol_geom_open
In zvol_geom_open on first open we need to guarantee
that the namespace lock is held to avoid spurious
failures in zvol_first_open.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10841
2020-08-30 14:11:33 -07:00
Kjeld Schouten-Lebbing 0e6a7aaac0 Auto close "Status: Feedback requested" after a month
This commit closes issues labeled with:
"Status: Feedback requested" after 1 month, if the
label is not removed or the author has not responded

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10807 
Closes #10808
2020-08-30 14:09:54 -07:00
Matthew Macy d436de6389 FreeBSD: add support for KSTAT_TYPE_RAW
A few kstats use KSTAT_TYPE_RAW to provide a string generated on
demand.  Implementing these as sysctls was punted until now.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10836
2020-08-29 20:59:50 -07:00
Brian Behlendorf 3e29e1971b Linux 5.9 compat: NR_SLAB_RECLAIMABLE
Commit dcdc12e added compatibility code to treat NR_SLAB_RECLAIMABLE_B
as if it were the same as NR_SLAB_RECLAIMABLE.  However, the new value
is in bytes while the old value was in pages which means they are not
interchangeable.

The only place the reclaimable slab size is used is as a component of
the calculation done by arc_free_memory().  This function returns the
amount of memory the ARC considers to be free or reclaimable at little
cost.  Rather than switch to a new interface to get this value it has
been removed it from the calculation.  It is normally a minor component
compared to the number of inactive or free pages, and removing it
aligns the behavior with the FreeBSD version of arc_free_memory().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10834
2020-08-29 20:57:45 -07:00
Richard Laager 62663fb7ec Fix another dependency loop
zfs-load-key-DATASET.service was gaining an
After=systemd-journald.socket due to its stdout/stderr going to the
journal (which is the default).  systemd-journald.socket has an After
(via RequiresMountsFor=/run/systemd/journal) on -.mount.  If the root
filesystem is encrypted, -.mount gets an After
zfs-load-key-DATASET.service.

By setting stdout and stderr to null on the key load services, we avoid
this loop.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10356
Closes #10388
2020-08-28 10:17:14 -07:00
Richard Laager ec41cafee1 Fix a dependency loop
When generating units with zfs-mount-generator, if the pool is already
imported, zfs-import.target is not needed.  This avoids a dependency
loop on root-on-ZFS systems:
  systemd-random-seed.service After (via RequiresMountsFor)
  var-lib.mount After
  zfs-import.target After
  zfs-import-{cache,scan}.service After
  cryptsetup.service After
  systemd-random-seed.service

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10388
2020-08-28 10:16:13 -07:00
Georgy Yakovlev 7ddcfe7c00 config/zfs-build.m4: add --with-vendor flag
This will allow an override of auto-detection of distribution, which
is based on checking presence of /etc/*-release files.

Build systems makes a lot of file location assumptions based on
detected distribution.

Some distributions (like gentoo) may prefer explicitly
setting --with-vendor=gentoo to avoid auto-detection.

Since auto-detection checks all files in order, current script may
misdetect even on gentoo system if /etc/redhat-release file is present

Default behavior is unchanged and default is --with-vendor=check

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #10835
2020-08-28 09:43:44 -07:00
Alexander Richardson 2b07c5aa3e Fix definition of BLKGETSIZE64 on FreeBSD
The matching ioctl is DIOCGMEDIASIZE.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Closes #10818
2020-08-27 16:09:26 -07:00
Georgy Yakovlev 735ba76104 module/zstd: pass -U__BMI__
If kernel is compiled with -march=znver1 or -march=znver2 zstd module 
compilation will fail due to SSE register return with SSE disabled.
What's interesting, is that -march=skylake also implies -mbmi which 
defines __BMI__ but compilation succeeds.  It is probably due to 
different BMI implementations on AMD and INTEL processors and the 
way compiler uses instructions.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #10758 
Closes #10829
2020-08-27 15:50:13 -07:00
John-Mark Gurney 770269ef3a Add the Xr's to the SEE ALSO as well
There are a ton of zfs-* and zpool-* man pages. This adds them to 
the SEE ALSO section so that people can more quickly look through 
what all the options are, now that the pages have been split.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: John-Mark Gurney <jmg@funkthat.com>
Closes #10589
2020-08-26 22:29:00 -07:00
Patrick Mooney 8d42c98d95 dnode_sync is careless with range tree
Because dnode_sync_free_range() must drop dn_mtx during its processing,
using it as a callback to range_tree_vacate() is not safe.  No other
operations (besides destroy) are allowed once range_tree_vacate() has
begun, and dropping dn_mtx would leave a window open for another thread
to observe that invalid (and unsafe) state via dnode_block_freed().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Patrick Mooney <pmooney@oxide.computer>
Closes #10708 
Closes #10823
2020-08-26 21:48:29 -07:00
Cédric Berger 600def792e Fix NEWS file
Points to https://github.com/openzfs/zfs/releases

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Cédric Berger <cedric@precidata.com>
Closes #10824
2020-08-26 21:44:41 -07:00
Ryan Moeller a2f944a140 zpool: Change base URL for ZFS messages to openzfs-docs
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10820
2020-08-26 21:43:06 -07:00
Brian Behlendorf 03f5d2fd6a Remove duplicate dnode.h include
The zfs/sa.c source file accidentally includes sys/dnode.h twice.
Remove the second occurrence.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10816 
Closes #10819
2020-08-26 21:41:09 -07:00
Paul Dagnelie 4aa3b3bd47 Always track temporary fses and snapshots for accounting
The root cause of the issue is that we only occasionally do as the 
comments in the code suggest and actually ignore the %recv dataset when 
it comes to filesystem limit tracking. Specifically, the only time we 
ignore it is when initializing the filesystem and snapshot limit values; 
when creating a new %recv dataset or deleting one, we always update 
the bookkeeping. This causes a problem if you init the fs count on a 
filesystem that already has a %recv dataset, since the bookmarking 
will be decremented but not incremented. This is resolved in this 
patch by simply always tracking the %recv dataset as a child.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10791
2020-08-26 21:38:27 -07:00
Kjeld Schouten-Lebbing ad52de77c2 Fix broken bug report form
By accident previous PR broke the bug report form.
This commit fixes it
(and is actually tested completely to work)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10821
2020-08-26 10:49:51 -07:00
Toomas Soome ec0c480c14 Remove pragma ident lines
The #pragma ident is a historical relic and not needed any more, this
pragma is actually unknown for common compilers and is only causing
trouble.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10810
2020-08-26 10:35:50 -07:00
Matthew Macy 2dbad44710 FreeBSD: disable neon usage
The neon support code does not build on FreeBSD,
ifdef out references to fix linker issues on arm64.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10809
2020-08-26 09:54:37 -07:00
George Melikov d6f90c78ab Github CI: Enable checkbashism
Run checkbashisms on checkstyle too.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10811
2020-08-26 09:52:28 -07:00
Kjeld Schouten-Lebbing 59ad15d3c5 StaleBot Tweaks
- Add Status: Triage Needed to bug reports

Currently "Type: Defect" is auto added.
Adding a triage tag, makes sure all issues are reviewed by a maintainer
It also opens up some options to priorities defects in the near future.

- Prevent future StaleBot Spam

StaleBot will limit itself to 6 actions per hour
This should prevent future floods of StaleBot activity
(aka Spam)

- StaleBot: Ignore issues that are being worked on

Ignore the following Issues:
- tagged: "Status: Work in Progress"
- Having a maintainer assigned
- Being part of a project
- Having a milestone tag

- Rename Ignore "Type: Understood" to "Bot: Not Stale"

This Commits changes the general ignore tag for StaleBot from:
 "Type: Understood"
to
"Bot: Not Stale"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10813
2020-08-26 09:49:58 -07:00
Alexander Motin 523e1295fe Introduce limit on size of L2ARC headers
Since L2ARC buffers are not evicted on memory pressure, too large
amount of headers on system with irrationally large L2ARC can render
it slow or even unusable.  This change limits L2ARC writes and
rebuild if unevictable L2ARC-only headers reach dangerous level.

While there, call arc_adapt() on L2ARC rebuild, so that it could
properly grow arc_c, reflecting potentially significant ARC size
increase and avoiding slow growth with hopeless eviction attempts
later when "overflow" is detected.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #10765
2020-08-25 14:33:36 -07:00
Brian Behlendorf 47a3f3fc01 Tag 2.0.0-rc1
New features:
- Unified code base for Linux and FreeBSD
- Redacted 'zfs send/recv'
- Persistent L2ARC
- Sequential resilvering
- ZSTD Compression
- Log spacemaps
- Fast clone deletion
- Sectional zfs/zpool man pages
- Added 'zpool wait' subcommand
- Improved 'zfs share' scalability
- Improved AES-GCM encryption performance

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-08-25 13:11:25 -07:00
Allan Jude 7a6c12fd6a Don't assert on nvlists larger than SPA_MAXBLOCKSIZE
Originally we asserted that all reads are less than SPA_MAXBLOCKSIZE
However, nvlists are not ZFS records, and are not limited to
SPA_MAXBLOCKSIZE.

Add a new environment variable, ZFS_SENDRECV_MAX_NVLIST, to allow the
user to specify the maximum size of the nvlist that can be sent or
received.
Default value: 4 * SPA_MAXBLOCKSIZE (64 MB)

Modify libzfs send routines to return a useful error if the send stream
will generate an nvlist that is beyond the maximum size.

Modify libzfs recv routines to add an explicit error message if the
nvlist is too large, rather than abort()ing.

Move the change the assert() to only trigger on data records

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #9616
2020-08-25 11:04:20 -07:00
sterlingjensen 87688b686b Mark lua setjmp/longjmp for powerpc weak
Linux already defines setjmp/longjmp for powerpc, which leads to
duplicate symbols in a statically linked build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterlng Jensen <sterlingjensen@users.noreply.github.com>
Closes #10795
2020-08-25 10:32:49 -07:00
Brian Behlendorf 94dac3e880 Export dmu_offset_next() symbol
Export the dmu_offset_next() symbol for use by Lustre.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10796
2020-08-25 08:34:41 -07:00
Ryan Moeller b596585fd9 man: Canonicalize .TH usage
* Use all caps for document title.
* Remove section name as it can be inferred from the section number.
* Name "OpenZFS" as the document source.
* Bump modification date.

While here, fixed trailing whitespace reported by igor.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10792
2020-08-24 21:25:28 -07:00
youzhongyang b900799768 Fix inability to destroy snapshot used over NFS
The cache of struct svc_export and struct svc_expkey by nfsd and
rpc.mountd for the snapshot holds references to the mount point.
We need to flush them out before unmounting, otherwise umount
would fail with EBUSY.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #6000 
Closes #10783
2020-08-24 17:33:02 -07:00
Sebastian Gottschall 184df27eef Avoid symbol collision with in-kernel zstdlib
For Linux, when zfs is compiled as an in kernel static variant
and the in kernel zstd library is compiled statically into the kernel
a symbol collision will occur.  This wrapper header renames all
of the relevant zstd functions to avoid this problem.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Closes #10775
2020-08-24 12:20:41 -07:00
Kjeld Schouten-Lebbing 04c37b6851 Add Stale-bot
This file configures the following stale-bot:
https://github.com/apps/stale

It is set to mark issues as "Stale" after 365 days
It is also set to auto-close the issue 90 days after.

Please be aware that this issue also requires-
The listed stale-bot being added to the repo.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10778
2020-08-24 12:04:38 -07:00
Chris McDonough c3b03d0701 Appease GCC sprintf warnings found on Fedora 32/GCC 10.0.1
Increase the size of DDT_NAMELEN and MNT_LINE_MAX to appease GCC
snprintf truncation warnings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris McDonough <chrism@plope.com>
Closes #10712
Closes #10766
2020-08-24 10:32:59 -07:00
Ryan Moeller b0e75805ee ZTS: Improve block_device_wait on FreeBSD
FreeBSD doesn't have an equivalent to udevadm settle, so we have been
resorting to a three second sleep to wait for device changes to take
effect.  This is far from ideal.

We are mainly waiting for volmode=geom zvols to appear in /dev, so as
a hack, reading the geom config will have the desired effect of
quiescing the geom state.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10768
2020-08-24 08:50:15 -07:00
Chris McDonough 07ce8961e5 Improve documentation of zpool import -d/-c vs -s
Specify that, by default, zpool import uses the libblkid
cache on Linux and geom on FreeBSD, and only scans when
-d/-s is provided.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Chris McDonough <chrism@plope.com>
Closes #7656
Closes #10771
2020-08-23 21:18:30 -07:00
George Melikov a541f7d485 CI checkstyle: add linter + rename job + install latest flake8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10784
2020-08-23 21:15:25 -07:00
Tony Nguyen c686c6fe75 ZFS performance tests should clean up NFS mount
This change umounts client's NFS mount after each test so we can avoid
two sporadic issues:
1) client NFS stale mount and
2) zpool export and zpool destroy failed due to dataset busy

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #10767
2020-08-23 15:14:22 -07:00
Kjeld Schouten-Lebbing 68f2288620 Add seperate issue for questions
A big portion of issues are of "Type: Question".

This PR adds a separate issue template for those.
It also automatically adds the "Type: Question" tag.

in addition it adds "Type: Defect" to all bug reports by default

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10779
2020-08-23 15:13:07 -07:00
Ryan Moeller 7b15b8d18c libzstd: Don't warn about stack frame size in userspace
With the current way CFLAGS are modified in libzstd, CFLAGS passed on
the make command line will cause the CFLAGS in the Makefile for zstd.c
to be discarded, but not AM_CFLAGS.  This causes a smaller frame size
limit to be used, and the build fails.

We don't need to worry about stack frame sizes in userspace.  Drop the
extra flags.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10773
2020-08-23 11:13:34 -07:00
Andrew a741b386d3 Prevent zfs_acl_chmod() if aclmode restricted and ACL inherited
In absence of inheriting entry for owner@, group@, or everyone@,
zfs_acl_chmod() is called to set these. This can cause confusion for Samba
admins who do not expect these entries to appear on newly created files and
directories once they have been stripped from from the parent directory.

When aclmode is set to "restricted", chmod is prevented on non-trivial ACLs.
It is not a stretch to assume that in this case the administrator does not want
ZFS to add the missing special entries. Add check for this aclmode, and if an
inherited entry is present skip zfs_acl_chmod().

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #10748
2020-08-22 21:49:25 -07:00
Kjeld Schouten-Lebbing ab4a78c744 Update issue template
Github has started using a new issue templating structure.
This commit moves the current template and adds one additional one.

- Moves issue template to new issue-template folder
- Adds feature request template
- removes the following warning when viewing issue template

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10759
2020-08-22 21:41:01 -07:00
Ryan Moeller 72eedb69fd ZTS: Remove leftover variable names
These were overlooked when use of `local` was removed to satisfy
checkbashisms.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10762
2020-08-22 11:05:59 -07:00
Chris McDonough a5b1b60e9b Remove vestigial settings related to initramfs
Remove ZFS_POOL_IMPORT, ZFS_INITRD_PRE_MOUNTROOT_SLEEP,
ZFS_INITRD_POST_MODPROBE_SLEEP, and ZFS_INITRD_ADDITIONAL_DATASETS
features from etc/defaults/zfs.in.  These features no longer work.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Chris McDonough <chrism@plope.com>
Closes #9126
Closes #10757
2020-08-22 11:04:49 -07:00
Clint Armstrong 1ddd7cdb92 Make formatting of dedup values string consistent
All other prop values return options separated by ` | `,
dedup values do not, they are separated by `, `. This change
makes the dedup value formatting consistent with other properties.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Clint Armstrong <clint@clintarmstrong.net>
Closes #10761
2020-08-22 10:58:07 -07:00
Ryan Moeller 6fe3498ca3 Import vdev ashift optimization from FreeBSD
Many modern devices use physical allocation units that are much
larger than the minimum logical allocation size accessible by
external commands. Two prevalent examples of this are 512e disk
drives (512b logical sector, 4K physical sector) and flash devices
(512b logical sector, 4K or larger allocation block size, and 128k
or larger erase block size). Operations that modify less than the
physical sector size result in a costly read-modify-write or garbage
collection sequence on these devices.

Simply exporting the true physical sector of the device to ZFS would
yield optimal performance, but has two serious drawbacks:

 1. Existing pools created with devices that have different logical
    and physical block sizes, but were configured to use the logical
    block size (e.g. because the OS version used for pool construction
    reported the logical block size instead of the physical block
    size) will suddenly find that the vdev allocation size has
    increased. This can be easily tolerated for active members of
    the array, but ZFS would prevent replacement of a vdev with
    another identical device because it now appears that the smaller
    allocation size required by the pool is not supported by the new
    device.

 2. The device's physical block size may be too large to be supported
    by ZFS. The optimal allocation size for the vdev may be quite
    large. For example, a RAID controller may export a vdev that
    requires read-modify-write cycles unless accessed using 64k
    aligned/sized requests. ZFS currently has an 8k minimum block
    size limit.

Reporting both the logical and physical allocation sizes for vdevs
solves these problems. A device may be used so long as the logical
block size is compatible with the configuration. By comparing the
logical and physical block sizes, new configurations can be optimized
and administrators can be notified of any existing pools that are
sub-optimal.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Matthew Macy <mmacy@freebsd.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10619
2020-08-21 12:53:17 -07:00
Ryan Moeller 6706552ea6 Remove hard coded "Linux" OS from manpages
The recommended practice for `.Os` on FreeBSD is to not specify any
arguments.  The correct OS name is used automatically.

Oddly enough, on the Linux distro I tested this on (CentOS 7), the man
pager defaulted to displaying "BSD" as the OS rather than "Linux".  To
accommodate this, tack " Linux" back on in an install hook on Linux.
This is much simpler than removing it for FreeBSD when vendored in the
base system.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10760
2020-08-21 11:55:47 -07:00
Brian Behlendorf 64025fa3a1 Silence 'make checkbashisms'
Commit d2bce6d03 added the 'make checkbashisms' target but did not
resolve all of the bashisms in the scripts.  This commit doesn't
resolve them all either but it does fix up a few, and it excludes
the others so 'make checkstyle' no longer prints warnings.  It's
a small step in the right direction.

* Dracut is Linux specific and itself depends on bash.  Therefore
  all dracut support scripts can be bash specific, update their
  shebang accordingly.

* zed-functions.sh, zfs-import, zfs-mount, zfs-zed, smart
  paxcheck.sh, make_gitrev.sh - these scripts were excuded from
  the check until they can be updated and properly tested.

* zfsunlock - only whole values for sleep are allowed.

* vdev_id - removed unneeded locals; use && instead of -a.

* dkms.mkconf, dkms.postbuil - use || instead of -o.

Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Reviewed-by:  Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10755
2020-08-20 13:45:47 -07:00
Don Brady 7bba1d404c 'zfs share -a' should clean noauto exports
This is a follow on to PR #10688 where `zfs share -a` allows the 
sharing of canmount=noauto datasets if they are mounted.  However, 
when a dataset with canmount=noauto is not mounted, the command 
should also purge any existing entries from the exports file. 
Otherwise, after a reboot, the nfs server attempts to export the 
underlying mountpath, not the dataset. This can lead to a hard hang 
for existing client mounts.

Instead of just skipping the adding of an export if not mounted 
and canmount=noauto, have it also remove an existing export of the 
dataset so that, after a reboot, we don't export an unmounted dataset.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #10747
2020-08-20 13:12:12 -07:00
Matthew Ahrens 3dc18995bd Fix indentation in dnode_free_range()
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10744
2020-08-20 11:45:20 -07:00
Matthew Macy 1c2725a157 FreeBSD: 11.x arc_stats compatibility
Removing other_size from arc_stats breaks top in 11.x jails
running on HEAD.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10745
2020-08-20 10:55:02 -07:00
Michael Niewöhner 10b3c7f5e4 Add zstd support to zfs
This PR adds two new compression types, based on ZStandard:

- zstd: A basic ZStandard compression algorithm Available compression.
  Levels for zstd are zstd-1 through zstd-19, where the compression
  increases with every level, but speed decreases.

- zstd-fast: A faster version of the ZStandard compression algorithm
  zstd-fast is basically a "negative" level of zstd. The compression
  decreases with every level, but speed increases.

  Available compression levels for zstd-fast:
   - zstd-fast-1 through zstd-fast-10
   - zstd-fast-20 through zstd-fast-100 (in increments of 10)
   - zstd-fast-500 and zstd-fast-1000

For more information check the man page.

Implementation details:

Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.

The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers).  The upper bits are used to store the compression level.

It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.

All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables.  Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).

The userspace tools all use the combined/bit-shifted value.

Additional notes:

zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.

ZSTD is included with all current tests and new tests are added
as-needed.

Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born.  This is currently only used by zstd but can be
extended as needed.

Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #6247
Closes #9024
Closes #10277
Closes #10278
2020-08-20 10:30:06 -07:00
Michael Niewöhner dc544aba15 Import ZStandard v1.4.5
ZStandard is a modern, high performance, general compression algorithm.
It provides similar or better compression levels to GZIP, but with much
better performance. ZStandard provides a large selection of compression
levels to allow a storage administrator to select the preferred
performance/compression trade-off.

This commit imports the unmodified ZStandard single-file library which
will be used by ZFS.

The implementation of this new library is done with future updates of
zstd in mind. For this reason we integrated the code in a way, that does
not require modifications to the library. For more details, see
`module/zstd/README.md`.

The library is excluded from codecov calculation and cppcheck as
unaltered dependencies do not need full codecov or cppcheck.

Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
2020-08-20 10:30:06 -07:00
Pavel Snajdr 772c69d230 Linux 5.7 compat: Include linux/sched.h in spl/sys/mutex.h
struct task_struct is needed for lockdep_off() in mutex.h

This has popped up after e616cb8daadf (in linux-5.7-rc7).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10741
2020-08-19 21:37:38 -07:00
Mariusz Zaborski f2c027bd6a FreeBSD: Add option to rewind checkpoint while importing root pool
This option is used by FreeBSD boot loader.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #10738
2020-08-19 17:19:42 -07:00
Brian Behlendorf 5266a0728a ZED: Do not offline a missing device if no spare is available
Due to commit d48091d a removed device is now explicitly offlined by
the ZED if no spare is available, rather than the letting ZFS detect
it as UNAVAIL. This broke auto-replacing of whole-disk devices, as
described in issue #10577.  In short, when a new device is reinserted
in the same slot, the ZED will try to ONLINE it without letting ZFS
recreate the necessary partition table.

This change simply avoids setting the device OFFLINE when removed if
no spare is available (or if spare_on_remove is false).  This change
has been left minimal to allow it to be backported to 0.8.x release.
The auto_offline_001_pos ZTS test has been updated accordingly.

Some follow up work is planned to update the ZED so it transitions
the vdev to a REMOVED state.  This is a state which has always
existed but there is no current interface the ZED can use to
accomplish this.  Therefore it's being left to a follow up PR.

Reviewed-by: Gionatan Danti <g.danti@assyoma.it>
Co-authored-by: Gionatan Danti <g.danti@assyoma.it>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10577
Closes #10730
2020-08-18 22:13:17 -07:00
Brian Behlendorf cfd59f904b Fix ARC aggsum access after arc_state_fini()
Commit 85ec5cbae updated abd_update_scatter_stats() such that it
calls arc_space_consume() and arc_space_return() when updating the
scatter stats.  This requires that the global aggsum value for the
ARC be initialized.  Normally this is not an issue, however during
module unload the l2arc_do_free_on_write() function was called in
l2arc_cleanup() after arc_state_fini() destroyed the aggsum values.
We can resolve this issue by performing l2arc_do_free_on_write()
slightly earlier in arc_fini().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10739
2020-08-18 22:11:34 -07:00
Ryan Moeller 4f7fb135bd libzfs_core: Initialize fail_ioc_cmd to ZFS_IOC_LAST
FreeBSD numbers `ZFS_IOC_*` starting at 0, so pick a different
sentinel value to avoid unintentionally messing with
`ZFS_IOC_POOL_CREATE` ioctls.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10729
2020-08-18 18:07:43 -07:00
Matthew Macy 716b53d0a1 FreeBSD: Fix UNIX permissions checking
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10727
2020-08-18 09:57:07 -07:00
Matthew Macy 5e7eaf8fbd Add define to enable autotrim to default to on
In FreeBSD trim has defaulted to on for several
years. In order to minimize POLA violations on
import it's important to maintain this default
when importing vendored openzfs in to FreeBSD
base.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10719
2020-08-18 09:52:30 -07:00
Ryan Moeller 009cc8e884 Make zc_nvlist_src_size limit tunable
We limit the size of nvlists passed to the kernel so a user cannot make
the kernel do an unreasonably large allocation.  On FreeBSD this limit
was 128 kiB, which turns out to be a bit too small when doing some
operations involving a large number of datasets or snapshots, for
example replication.

Make this limit tunable, with a platform-specific auto default.
Linux keeps its limit at KMALLOC_MAX_SIZE. FreeBSD uses 1/4 of the
system limit on user wired memory, which allows it to scale depending
on system configuration.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Issue #6572 
Closes #10706
2020-08-18 09:33:55 -07:00
George Melikov 663a070c92 Remove unused zpool_is_bootable
Otherwise compiler errors with:

```
libzfs_pool.c:449:1: error: 'zpool_is_bootable'
 defined but not used [-Werror=unused-function]
```

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10734
2020-08-18 09:30:12 -07:00
Richard Laager eaa25f1a8e Remove GRUB restrictions
The GRUB restrictions are based around the pool's bootfs property.
Given the current situation where GRUB is not staying current with
OpenZFS pool features, having either a non-ZFS /boot or a separate
pool with limited features are pretty much the only long-term answers
for GRUB support.  Only the second case matters in this context.  For
the restrictions to be useful, the bootfs property would have to be set
on the boot pool, because that is where we need the restrictions, as
that is the pool that GRUB reads from. The documentation for bootfs
describes it as pointing to the root pool. That's also how it's used in
the initramfs. ZFS does not allow setting bootfs to point to a dataset
in another pool. (If it did, it'd be difficult-to-impossible to enforce
these restrictions cross-pool). Accordingly, bootfs is pretty much
useless for GRUB scenarios moving forward.

Even for users who have only one pool, the existing restrictions for
GRUB are incomplete. They don't prevent you from enabling the
unsupported checksums, for example. For that reason, I have ripped out
all the GRUB restrictions.

A little longer-term, I think extending the proposed features=portable
system to define a features=grub is a much more useful approach. The
user could set that on the boot pool at creation, and things would
Just Work.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #8627
2020-08-17 23:12:39 -07:00
Brian Behlendorf d60c0dbdf3 ZTS: ztest may cause mmp tests failures
The mmp_exported_import and mmp_inactive_import tests depend on
ztest simulating an active pool.  If ztest unexpectedly terminates
due to an unrelated issue the test case will fail.  Since ztest is
not yet 100% reliable I've added these tests to the maybe exception
list.  They can be removed when the issues with ztest are resolved
or if the test cases are updated to handle these unexpected failures.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10726
2020-08-17 22:31:18 -07:00
Matthew Ahrens 85ec5cbae2 Include scatter_chunk_waste in arc_size
The ARC caches data in scatter ABD's, which are collections of pages,
which are typically 4K.  Therefore, the space used to cache each block
is rounded up to a multiple of 4K.  The ABD subsystem tracks this wasted
memory in the `scatter_chunk_waste` kstat.  However, the ARC's `size` is
not aware of the memory used by this round-up, it only accounts for the
size that it requested from the ABD subsystem.

Therefore, the ARC is effectively using more memory than it is aware of,
due to the `scatter_chunk_waste`.  This impacts observability, e.g.
`arcstat` will show that the ARC is using less memory than it
effectively is.  It also impacts how the ARC responds to memory
pressure.  As the amount of `scatter_chunk_waste` changes, it appears to
the ARC as memory pressure, so it needs to resize `arc_c`.

If the sector size (`1<<ashift`) is the same as the page size (or
larger), there won't be any waste.  If the (compressed) block size is
relatively large compared to the page size, the amount of
`scatter_chunk_waste` will be small, so the problematic effects are
minimal.

However, if using 512B sectors (`ashift=9`), and the (compressed) block
size is small (e.g. `compression=on` with the default `volblocksize=8k`
or a decreased `recordsize`), the amount of `scatter_chunk_waste` can be
very large.  On a production system, with `arc_size` at a constant 50%
of memory, `scatter_chunk_waste` has been been observed to be 10-30% of
memory.

This commit adds `scatter_chunk_waste` to `arc_size`, and adds a new
`waste` field to `arcstat`.  As a result, the ARC's memory usage is more
observable, and `arc_c` does not need to be adjusted as frequently.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10701
2020-08-17 20:04:04 -07:00
Matthew Ahrens 994de7e4b7 Remove KMC_KMEM and KMC_VMEM
`KMC_KMEM` and `KMC_VMEM` are now unused since all SPL-implemented
caches are `KMC_KVMEM`.

KMC_KMEM: Given the default value of `spl_kmem_cache_kmem_limit`, we
don't use kmalloc to back the SPL caches, instead we use kvmalloc
(KMC_KVMEM).  The flag, module parameter, /proc entries, and associated
code are removed.

KMC_VMEM: This flag is not used, and kvmalloc() is always preferable to
vmalloc().  The flag, /proc entries, and associated code are removed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10673
2020-08-17 16:04:28 -07:00
Ryan Moeller 3df0c2fa32 FreeBSD: fix the build with Clang 11
* Cast void * to uintptr_t before casting to boolean_t.

* Avoid clashing definition of __asm when not on Linux to
  prevent duplicate __volatile__. This was already done in
  some places but not all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10723
2020-08-17 15:40:17 -07:00
Matthew Macy cfdc432e64 FreeBSD: fix merge error in zfs_acl_ids_create
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10721
2020-08-17 15:28:03 -07:00
Serapheim Dimitropoulos b0099072df Fix typo in btree.c
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #10725
2020-08-17 15:25:37 -07:00
Matthew Macy 5f1984f2f8 FreeBSD: fallback to /boot/ to look for zpool.cache
Up until now zpool.cache has always lived in /boot on FreeBSD.
For the sake of compatibility fallback to /boot if zpool.cache
isn't found in /etc/zfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10720
2020-08-17 14:43:47 -07:00
George Amanakis 9352d8c004 Fix reporting of L2ARC writes in arc_summary3
arc_summary3 reports L2ARC writes in bytes. However, the related
arc_stat is reported as hits. arc_summary2 report this correctly.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10717
2020-08-17 11:04:06 -07:00
Ryan Moeller 3eaf76a8d2 Fix l2arc_dev_rebuild_start thread name
`thread_create` on FreeBSD stringifies the argument passed as the
thread function to create a name for the thread. The thread name for
`l2arc_dev_rebuild_start` ended up with `(void (*)(void *))` in it.

Change the type signature so the function does not need to be cast
when creating the thread.  Rename the function to
`l2arc_dev_rebuild_thread` for clarity and consistency, as well.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10716
2020-08-17 11:02:32 -07:00
Ryan Moeller 3c3d7c8a57 FreeBSD: Create taskq threads in appropriate proc
Stepping stone toward re-enabling spa_thread on FreeBSD.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10715
2020-08-17 11:01:19 -07:00
Allan Jude fc34dfba8e Fix L2ARC reads when compressed ARC disabled
When reading compressed blocks from the L2ARC, with
compressed ARC disabled, arc_hdr_size() returns
LSIZE rather than PSIZE, but the actual read is PSIZE.
This causes l2arc_read_done() to compare the checksum
against the wrong size, resulting in checksum failure.

This manifests as an increase in the kstat l2_cksum_bad
and the read being retried from the main pool, making the
L2ARC ineffective.

Add new L2ARC tests with Compressed ARC enabled/disabled

Blocks are handled differently depending on the state of the
zfs_compressed_arc_enabled tunable.

If a block is compressed on-disk, and compressed_arc is enabled:
- the block is read from disk
- It is NOT decompressed
- It is added to the ARC in its compressed form
- l2arc_write_buffers() may write it to the L2ARC (as is)
- l2arc_read_done() compares the checksum to the BP (compressed)

However, if compressed_arc is disabled:
- the block is read from disk
- It is decompressed
- It is added to the ARC (uncompressed)
- l2arc_write_buffers() will use l2arc_apply_transforms() to
  recompress the block, before writing it to the L2ARC
- l2arc_read_done() compares the checksum to the BP (compressed)
- l2arc_read_done() will use l2arc_untransform() to uncompress it

This test writes out a test file to a pool consisting of one disk
and one cache device, then randomly reads from it. Since the arc_max
in the tests is low, this will feed the L2ARC, and result in reads
from the L2ARC.

We compare the value of the kstat l2_cksum_bad before and after
to determine if any blocks failed to survive the trip through the
L2ARC.

Sponsored-by: The FreeBSD Foundation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #10693
2020-08-13 23:31:20 -07:00
Jorgen Lundman faa296c73c Release onexit/events with any missed zfsdev_state
Linux and FreeBSD will most likely never see this issue.
On macOS when kext is unloaded, but zed is still connected, zed
will be issued ENODEV. As the cdevsw is released, the kernel
will not have zfsdev_release() called to release minor/onexit/events,
and it "leaks". This ensures it is cleaned up before unload.

Changed the for loop from zsprev, to zsnext style, for less
code duplication.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10700
2020-08-13 15:03:23 -07:00
George Melikov f67f5832ec Github workflow: checkstyle
Use github workflow to run checkstyle
- use free (for OS projects) resources
- starts for every commit and branch
- work on forks, contributors may use it
  before creating PRs

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10705
2020-08-13 14:59:24 -07:00
George Melikov 42d4a8e5fe cstyle.pl: echo commands for github workflow
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10705
2020-08-13 14:58:53 -07:00
George Melikov 2925ed6c70 Remove stale .travis.yml
- It doesn't work now.
- It has to be manually edited on tests changes.
  (even on test runtime changes!)
- Travis gives too small time to run to be useful.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10704
2020-08-13 14:55:45 -07:00
Matthew Ahrens d64c6a2eee Use zfs_dbgmsg to log metaslab_load/unload
Metaslabs are now (usually) loaded and unloaded infrequently, but when
that is not the case, it is useful to have a log of when and why these
events happened.

This commit enables the zfs_dbgmsg() in metaslab_load(), and adds a
zfs_dbgmsg() in metaslab_unload().

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10683
2020-08-12 10:10:50 -07:00
Matthew Macy e111c80247 Restore ARC MFU/MRU pressure
The arc_adapt() function tunes LRU/MLU balance according to 4 types of
cache hits (which is passed as state agrument): ghost LRU, LRU, MRU,
ghost MRU. If this function is called with wrong cache hit (state),
adaptation will be sub-optimal and performance will suffer.

Some time ago upstream received this commit:

6950 ARC should cache compressed data) in arc_read() do next
sequence (access to ghost buffer)

Before this commit, hit to any ghost list was passed arc_adapt() before
call to arc_access() which revive element in cache and change state from
ghost to real hit.

After this commit, the order of calls was reverted and arc_adapt() is
now called only with «real» hits even if hit was in one of two ghost
lists, which renders ghost lists useless and breaks the ARC algorithm.

FreeBSD fixed this problem locally in Change D19094 / Commit r348772.

This change is an adaptation of the above commit to the current arc
code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10548 
Closes #10618
2020-08-12 10:03:24 -07:00
George Wilson 53c9d1d9b5 'zfs share -a' should handle 'canmount=noauto'
The 'zfs share -a' currently skips any filesystems which
have 'canmount=noauto' set. This behavior is unexpected since the
one would expect 'zfs share -a' to share any mounted filesystem
that has the 'sharenfs' property already set.

This changes the behavior of 'zfs share -a' to allow the sharing
of 'canmount=noauto' datasets if they are mounted.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-issue: DLPX-71313
Closes #10688
2020-08-11 13:55:04 -07:00
Matthew Macy 6f763d4085 FreeBSD: Fix module autoloading when built in base
The KMOD name is "zfs" instead of "openzfs" when building in FreeBSD.

Define a ZFS_KMOD symbol as "zfs" when IN_BASE is defined, otherwise
"openzfs".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10699
2020-08-11 13:49:50 -07:00
Coleman Kane d817c17100 Linux 5.9 compat: make_request_fn replaced with submit_bio interface
The make_request_fn and associated API was replaced recently in a
Linux 5.9 merge, to replace its functionality with a new submit_bio
member in struct block_device_operations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #10696
2020-08-11 13:37:33 -07:00
Coleman Kane dcdc12e8ba Linux 5.9 compat: Update NR_SLAB_RECLAIMABLE to NR_SLAB_RECLAIMABLE_B
This change appears to primarily be a name change for the enum. Had
to update the test logic so that it works so long as either one of
these is present (favoring the newer one). Additionally, as this is
newer, it only shows up in node_page_item, so this commit doesn't
test zone_page_item for the same enum.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #10696
2020-08-11 13:35:20 -07:00
Coleman Kane 1823c8fe6a Linux 5.9 compat: add linux/blkdev.h include
Many of the block device operations (often functions with bdev in
the name) were moved into linux/blkdev.h from linux/fs.h. Seems
that this header is already included where needed in the code, but
in the autoconf tests it was missing causing false negatives. This
commit has those tests include linux/fs.h (old location) and now
also linux/blkdev.h (new locations).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #10696
2020-08-11 13:35:10 -07:00
Allan Jude 9777044f1c Fix typo
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #10694
2020-08-11 13:16:57 -07:00
Ryan Moeller ed726fb063 Move ZVOL_DIR back to zfs.h
This was previously moved because nothing else in-tree uses it, but
evidently DilOS uses it out of tree.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <freqlabs@freebsd.org>
Closes #10361 
Closes #10685
2020-08-11 13:12:12 -07:00
Matthew Macy 0f95ddcc0c FreeBSD: update vaccess signature on most recent HEAD
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10682
2020-08-07 14:16:01 -07:00
Paul Dagnelie 12045d0278 Clarify error message when a range-tree double-add occurs
In various other pieces of logic have resulted in situations where 
we double-free space in ZFS. This in turn results in a double-add 
to the range trees. These issues have been much more difficult to 
diagnose than they should have been, because the error handling 
around this case is much weaker than around the double remove case.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10654
2020-08-07 14:13:13 -07:00
Ryan Moeller d4e6e9597d ZTS: Remove bashisms from zfs-tests.sh
Bring zfs-tests.sh in to compliance with the other scripts
by converting it /bin/sh for to avoid a dependency on bash.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10640
2020-08-07 14:10:48 -07:00
Matthew Ahrens 0cab7970f9 Remove commented-out code
Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650
2020-08-05 10:28:18 -07:00
Matthew Ahrens 74e994ec63 Remove KM_NODEBUG
Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650
2020-08-05 10:28:13 -07:00
Matthew Ahrens c6f2b942be Remove KMC_NOMAGAZINE
Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650
2020-08-05 10:28:07 -07:00
Matthew Ahrens 3c09f6949a Remove KMC_QCACHE
Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650
2020-08-05 10:28:01 -07:00
Matthew Ahrens d519c10575 Remove KMC_NOHASH
Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650
2020-08-05 10:27:56 -07:00
Matthew Ahrens f68af67a0c Remove KMC_NOTOUCH
Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650
2020-08-05 10:27:46 -07:00
Matthew Ahrens 492db125dc Remove KMC_OFFSLAB
Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650
2020-08-05 10:25:37 -07:00
Matthew Ahrens d87676a9fa Fix i/o error handling of livelists and zap iteration
Pool-wide metadata is stored in the MOS (Meta Object Set).  This
metadata is stored in triplicate, in addition to any pool-level
reduncancy (e.g. RAIDZ).  However, if all 3+ copies of this metadata are
not available, we can still get EIO/ECKSUM when reading from the MOS.
If we encounter such an error in syncing context, we have typically
already committed to making a change that we now can't do because of the
corrupt/missing metadata.  We typically "handle" this with a `VERIFY()`
or `zfs_panic_recover()`.  This prevents the system from continuing on
in an undefined state, while minimizing the amount of error-handling
code.

However, there are some code paths that ignore these i/o errors, or
`ASSERT()` that they don't happen.  Since assertions are disabled on
non-debug builds, they effectively ignore them as well.  This can lead
to ZFS continuing on in an incorrect state, potentially leading to
on-disk inconsistencies.

This commit adds handling for these i/o errors on MOS metadata,
typically with a `VERIFY()`:

* Handle error return from `zap_cursor_retrieve()` in 4 places in
`dsl_deadlist.c`.

* Handle error return from `zap_contains()` in `dsl_dir_hold_obj()`.
Turns out this call isn't necessary because we can always call
`zap_lookup()`.

* Handle error return from `zap_lookup()` in `dsl_fs_ss_limit_check()`.

* Handle error return from `zap_remove()` in `dsl_dir_rename_sync()`.

* Handle error return from `zap_lookup()` in
`dsl_dir_remove_livelist()`.

* Handle error return from `dsl_process_sub_livelist()` in
`spa_livelist_delete_cb()`.

Additionally:

* Augment the internal history log message for `zfs destroy` to note
which method is used (e.g. bptree, livelist, or, synchronous) and the
mintxg.

* Correct a comment in `dbuf_init()`.

* Correct indentation in `dsl_dir_remove_livelist()`.

Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10643
2020-08-05 10:22:09 -07:00
Matthew Macy 1b376d176e FreeBSD: Add support for lockless lookup
Authored-by: mjg <mjg@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10657
2020-08-05 10:19:51 -07:00
Matthew Macy 22dcf89181 Add missed thread_exit() to vdev_{autotrim,rebuild}_thread
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10668
2020-08-05 10:17:07 -07:00
Pavel Snajdr 309d20882f Fix arc__wait__for__eviction tracepoint
3442c2a02d added new `arc_wait_for_eviction` tracepoint, which fails to
compile, when tracepoints are enabled.

The tracepoint definition begins with `DEFINE_ARC_WAIT_FOR_EVICTION_EVENT`
and is a multi-line definition, so this fixes the backslash
and parenthesis accordingly.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10669
2020-08-04 10:04:00 -07:00
Jonathon f1de1600d1 Verify zfs module loaded before starting services
This is a minor change to the systemd service templates that verifies
the zfs kernel module is loaded by the kernel prior to attempting to
import any zpool.

The services check for the presence of /sys/module/zfs which indicates
the zfs is module is loaded. This uses the systemd built-in check
ConditionPathIsDirectory.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Thode <prometheanfire@gentoo.org>
Signed-off-by: Jonathon Fernyhough <jonathon.fernyhough@york.ac.uk>
Closes #10663
2020-08-01 17:13:15 -07:00
George Amanakis da60484db5 Fix logging in l2arc_rebuild()
In case the L2ARC rebuild was canceled, do not log to spa history
log as the pool may be in the process of being removed and a panic
may occur:

BUG: kernel NULL pointer dereference, address: 0000000000000018
RIP: 0010:spa_history_log_internal+0xb1/0x120 [zfs]
Call Trace:
 l2arc_rebuild+0x464/0x7c0 [zfs]
 l2arc_dev_rebuild_start+0x2d/0x130 [zfs]
 ? l2arc_rebuild+0x7c0/0x7c0 [zfs]
 thread_generic_wrapper+0x78/0xb0 [spl]
 kthread+0xfb/0x130
 ? IS_ERR+0x10/0x10 [spl]
 ? kthread_park+0x90/0x90
 ret_from_fork+0x35/0x40

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10659
2020-08-01 11:17:18 -07:00
Ryan Moeller b6737193ee FreeBSD: Fix zfs jail and add a test
zfs_jail was not using zfs_ioctl so failed to map the IOC number
correctly.  Use zfs_ioctl to perform the jail ioctl and add a test 
case for FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10658
2020-08-01 08:44:54 -07:00
Matthew Macy fe628bc21d Fix page fault in zfsctl_snapdir_getattr
Must acquire the z_teardown_lock before accessing the zfsvfs_t object.
I can't reproduce this panic on demand, but this looks like the
correct solution.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: asomers <asomers@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10656
2020-08-01 08:42:55 -07:00
Allan Jude 8fb79fdddb Change the error handling for invalid property values
ZFS recv should return a useful error message when an invalid index
property value is provided in the send stream properties nvlist

With a compression= property outside of the understood range:

Before:
```
receiving full stream of zof/zstd_send@send2 into testpool/recv@send2
internal error: Invalid argument
Aborted (core dumped)
```
Note: the recv completes successfully, the abort() is likely just to
make it easier to track the unexpected error code.

After:
```
receiving full stream of zof/zstd_send@send2 into testpool/recv@send2
cannot receive compression property on testpool/recv: invalid property
value received 28.9M stream in 1 seconds (28.9M/sec)
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10631
2020-08-01 08:41:31 -07:00
Matthew Macy 47ed79ff60 Changes to make openzfs build within FreeBSD buildworld
A collection of header changes to enable FreeBSD to build
with vendored OpenZFS.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10635
2020-07-31 21:30:31 -07:00
Ryan Moeller 0cc3454821 Convert Linux-isms to FreeBSD-isms in platform zfs_debug.c
Change some comments copied from the Linux code to describe 
the appropriate methods on FreeBSD.

Convert some tunables to ZFS_MODULE_PARAM so they get created 
on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10647
2020-07-31 21:25:35 -07:00
Ryan Moeller af524bf7ff ZTS: FreeBSD does have a l2arc.trim_ahead tunable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10633
2020-07-31 21:18:32 -07:00
Matthew Ahrens 3442c2a02d Revise ARC shrinker algorithm
The ARC shrinker callback `arc_shrinker_count/_scan()` is invoked by the
kernel's shrinker mechanism when the system is running low on free
pages.  This happens via 2 code paths:

1. "direct reclaim": The system is attempting to allocate a page, but we
are low on memory.  The ARC shrinker callback is invoked from the
page-allocation code path.

2. "indirect reclaim": kswapd notices that there aren't many free pages,
so it invokes the ARC shrinker callback.

In both cases, the kernel's shrinker code requests that the ARC shrinker
callback release some of its cache, and then it measures how many pages
were released.  However, it's measurement of released pages does not
include pages that are freed via `__free_pages()`, which is how the ARC
releases memory (via `abd_free_chunks()`).  Rather, the kernel shrinker
code is looking for pages to be placed on the lists of reclaimable pages
(which is separate from actually-free pages).

Because the kernel shrinker code doesn't detect that the ARC has
released pages, it may call the ARC shrinker callback many times,
resulting in the ARC "collapsing" down to `arc_c_min`.  This has several
negative impacts:

1. ZFS doesn't use RAM to cache data effectively.

2. In the direct reclaim case, a single page allocation may wait a long
time (e.g. more than a minute) while we evict the entire ARC.

3. Even with the improvements made in 67c0f0dedc ("ARC shrinking blocks
reads/writes"), occasionally `arc_size` may stay above `arc_c` for the
entire time of the ARC collapse, thus blocking ZFS read/write operations
in `arc_get_data_impl()`.

To address these issues, this commit limits the ways that the ARC
shrinker callback can be used by the kernel shrinker code, and mitigates
the impact of arc_is_overflowing() on ZFS read/write operations.

With this commit:

1. We limit the amount of data that can be reclaimed from the ARC via
the "direct reclaim" shrinker.  This limits the amount of time it takes
to allocate a single page.

2. We do not allow the ARC to shrink via kswapd (indirect reclaim).
Instead we rely on `arc_evict_zthr` to monitor free memory and reduce
the ARC target size to keep sufficient free memory in the system.  Note
that we can't simply rely on limiting the amount that we reclaim at once
(as for the direct reclaim case), because kswapd's "boosted" logic can
invoke the callback an unlimited number of times (see
`balance_pgdat()`).

3. When `arc_is_overflowing()` and we want to allocate memory,
`arc_get_data_impl()` will wait only for a multiple of the requested
amount of data to be evicted, rather than waiting for the ARC to no
longer be overflowing.  This allows ZFS reads/writes to make progress
even while the ARC is overflowing, while also ensuring that the eviction
thread makes progress towards reducing the total amount of memory used
by the ARC.

4. The amount of memory that the ARC always tries to keep free for the
rest of the system, `arc_sys_free` is increased.

5. Now that the shrinker callback is able to provide feedback to the
kernel's shrinker code about our progress, we can safely enable
the kswapd hook. This will allow the arc to receive notifications
when memory pressure is first detected by the kernel. We also
re-enable the appropriate kstats to track these callbacks.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10600
2020-07-31 21:10:52 -07:00
Ryan Moeller 18c624302d ZTS: zvol_misc_volmode is flaky on FreeBSD
Mark this as a known issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10655
2020-07-31 21:05:55 -07:00
Ryan Moeller 721ed01548 ZTS: Use POSIX-compatible space character class
FreeBSD recently integrated a change which causes \s in a regex to
throw an error instead of silently being misinterpreted as an s.

Change the regex in zpool_colors.ksh to use [[:space:]].

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10651
2020-07-31 18:11:21 -07:00
Ryan Moeller 25499e2139 lua: Increase reserved stack space for FreeBSD in debug config
FreeBSD uses more stack space in debug configurations and can overflow
the stack while formatting the error message when the call depth limit
of 20 frames is reached.  This is readily reproduced by running the
gsub recursion test with increased kstack size.  I hit the panic with
16 pages per kstack, and noticed it go away when bumped to 17.

Reserve an additional 64 bytes on the stack when building for FreeBSD.
This is enough to avoid the panic with a deep stack while not wasting
too much space when the default stack size is used.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10634
2020-07-31 09:17:37 -07:00
Allan Jude 24f98ed383 When encountering EZFS_UNKNOWN, print the error text buffer anyway
Rather than just saying there was an internal error, provide any
context we might have to the user to help them understand the issue.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10632
2020-07-31 09:07:37 -07:00
Allan Jude eabf270b2c Remove duplicate include of sys/zfeature.h in dmu_objset.c
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10636
2020-07-31 09:04:45 -07:00
Allan Jude 4cf6f10714 pyzfs: Add missing entry to zfs_errno
This was causing all later errno's to have the incorrect value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10649
2020-07-31 09:01:41 -07:00
Matthew Ahrens 948423a3d1 zfs promote does not delete livelist of origin
When a clone is promoted, its livelist is no longer accurate, so it is
discarded.  If the clone's origin is also a clone (i.e. we are promoting
a clone of a clone), then the origin's livelist is also no longer
accurate, so it should be discarded, but the code doesn't actually do
that.

Consider a pool with:
* Filesystem A
* Clone B, a clone of A
* Clone C, a clone of B

If we promote C, it discards C's livelist.  It should discard B's
livelist, but that is not happening.  The impact is that when B is
destroyed, we use the livelist to find the blocks to free, but the
livelist is no longer correct so we end up freeing blocks that are still
in use by C.  The incorrectly-freed blocks can be reallocated causing
checksum errors.  And when C is destroyed it can double-free the
incorrectly-freed blocks.

The problem is that we remove the livelist of `origin_ds->ds_dir`, but
the origin snapshot has already been moved to the promoted dsl_dir.  So
this is actually trying to remove the livelist of the promoted dsl_dir,
which was already removed.  As explained in a comment in the beginning
of `dsl_dataset_promote_sync()`, we need to use the saved `odd` for the
origin's dsl_dir.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10652
2020-07-31 08:59:00 -07:00
Don Brady a15c6f3310 ZTS: minor improvements to alloc_class_009_pos functional test
* Fixed a typo that cause one of the variations to be a no-op
* Added additional coverage for adding special vdev after pool create
* Added additional coverage for using 4K sector size

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #10641
2020-07-30 09:11:05 -07:00
Ryan Moeller 2f571dbe06 Use correct prefix for share/pam-configs
Respect the configured install prefix.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10604
2020-07-30 09:09:46 -07:00
Matthew Ahrens 3a92552f75 Fix error handling of vdev_top_zap
In `vdev_load()`, we look up several entries in the `vdev_top_zap`
object.  In most cases, if we encounter an i/o error, it will be
returned to the caller.  However, when handling
`VDEV_TOP_ZAP_ALLOCATION_BIAS`, if we get an i/o error, we may continue
on, which in theory could cause us to not realize that a vdev should be
used only for `special` allocations.

In practice, if we encountered an i/o error while looking for
`VDEV_TOP_ZAP_ALLOCATION_BIAS` in the `vdev_top_zap`, we'd also get an
i/o error while looking for other entries in the same object, and thus
the zpool open/import would fail.  Therefore the impact of this problem
is negligible.

This commit adds error handling for i/o errors while accessing the
`vdev_top_zap`, so that we aren't relying on unrelated code to fail for
us.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10637
2020-07-29 17:04:34 -07:00
Jonathon ae12b02308 Verify zfs module loaded before starting services
This is a minor change to the systemd service templates that verifies the zfs
kernel module is loaded by the kernel prior to attempting to import any zpool.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jonathon Fernyhough <jonathon.fernyhough@york.ac.uk>
Closes #10627
2020-07-29 16:52:18 -07:00
Matthew Macy 27d96d2254 Rename refcount.h to zfs_refcount.h
Renamed to avoid conflicting with refcount.h when a different
implementation is already provided by the platform.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10620
2020-07-29 16:35:33 -07:00
Serapheim Dimitropoulos 843e9ca2e1 Introduce names for ZTHRs
When debugging issues or generally analyzing the runtime of
a system it would be nice to be able to tell the different
ZTHRs running by name rather than having to analyze their
stack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #10630
2020-07-29 09:43:33 -07:00
Matthew Macy 5678d3f593 Prefix zfs internal endian checks with _ZFS
FreeBSD defines _BIG_ENDIAN BIG_ENDIAN _LITTLE_ENDIAN
LITTLE_ENDIAN on every architecture. Trying to do
cross builds whilst hiding this from ZFS has proven
extremely cumbersome.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10621
2020-07-28 13:02:49 -07:00
Matthew Ahrens 3eabed74c0 Fix lua stack overflow on recursive call to gsub()
The `zfs program` subcommand invokes a LUA interpreter to run ZFS
"channel programs".  This interpreter runs in a constrained environment,
with defined memory limits.  The LUA stack (used for LUA functions that
call each other) is allocated in the kernel's heap, and is limited by
the `-m MEMORY-LIMIT` flag and the `zfs_lua_max_memlimit` module
parameter.  The C stack is used by certain LUA features that are
implemented in C.  The C stack is limited by `LUAI_MAXCCALLS=20`, which
limits call depth.

Some LUA C calls use more stack space than others, and `gsub()` uses an
unusually large amount.  With a programming trick, it can be invoked
recursively using the C stack (rather than the LUA stack).  This
overflows the 16KB Linux kernel stack after about 11 iterations, less
than the limit of 20.

One solution would be to decrease `LUAI_MAXCCALLS`.  This could be made
to work, but it has a few drawbacks:

1. The existing test suite does not pass with `LUAI_MAXCCALLS=10`.

2. There may be other LUA functions that use a lot of stack space, and
the stack space may change depending on compiler version and options.

This commit addresses the problem by adding a new limit on the amount of
free space (in bytes) remaining on the C stack while running the LUA
interpreter: `LUAI_MINCSTACK=4096`.  If there is less than this amount
of stack space remaining, a LUA runtime error is generated.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10611 
Closes #10613
2020-07-27 16:11:47 -07:00
Matthew Macy e64cc4954c Refactor ccompile.h to not include system headers
This is a step toward being able to vendor the OpenZFS code in FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10625
2020-07-25 20:09:50 -07:00
Matthew Macy 6d8da84106 Make use of ZFS_DEBUG consistent within kmod sources
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10623
2020-07-25 20:07:44 -07:00
Matthew Macy f5b189f937 FreeBSD: Fixes required to build ZFS on PowerPC
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10622
2020-07-25 11:00:23 -07:00
Ryan Moeller d364de7a89 FreeBSD: Remove accidental ARC size limiter
i386 has some additional memory reservation logic that limits the size
of the reported available memory.  This was accidentally being used on
all arches due to a missing header.

Include machine/vmparam.h in freebsd/zfs/arc_os.c to pull in the
missing UMA_MD_SMALL_ALLOC definition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10616
2020-07-25 10:49:49 -07:00
Ryan Moeller cb18d88060 FreeBSD: Implement arc_free_memory
This is only used for the kstat, but something other than 0 is nice.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10626
2020-07-25 10:47:18 -07:00
Brian Atkinson 6fba7bfd0e Add gang ABD child to parent gang ABD
By design a gang ABD can not have another gang ABD as a child. This is
to make sure the logical offset in a gang ABD is consistent with the
individual ABDS it contains as children. If a gang ABD is added as a
child of a gang ABD we will add the individual children of the gang ABD
to the parent gang ABD. This allows for a consistent view of offsets
within the parent gang ABD.

Reviewed-by: Mark Maybee <mmaybee@cray.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #10430
2020-07-24 21:09:20 -07:00
Ryan Moeller 8348fac30c Limit dbuf cache sizes based only on ARC target size by default
Set the initial max sizes to ULONG_MAX to allow the caches to grow
with the ARC.

Recalculate the metadata cache size on demand so it can adapt, too.

Update descriptions in zfs-module-parameters(5).

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10563 
Closes #10610
2020-07-24 20:38:48 -07:00
Matthew Ahrens 4fbdb10c7b remove kmem_cache module parameter KMC_EXPIRE_AGE
By default, `spl_kmem_cache_expire` is `KMC_EXPIRE_MEM`, meaning that
objects will be removed from kmem cache magazines by
`spl_kmem_cache_reap_now()`.

There is also a module parameter to change this to `KMC_EXPIRE_AGE`,
which establishes a maximum lifetime for objects to stay in the
magazine.  This setting has rarely, if ever, been used, and is not
regularly tested.

This commit removes the code for `KMC_EXPIRE_AGE`, and associated module
parameters.

Additionally, the unused module parameter
`spl_kmem_cache_obj_per_slab_min` is removed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10608
2020-07-24 09:39:26 -07:00
tony-zfs 02fced3067 Add support to decode a resume token
Adding a new subcommand to zstream called token. This
now allows users to decode a resume token to retrieve the toname
field. This can be useful for tools that need this information.
The syntax works as follows zstream token <resume_token>.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Tony Perkins <tperkins@datto.com>
Closes #10558
2020-07-23 17:44:03 -07:00
Kyle Evans bfafe1780a Annotate unused parameters on inline definitions as such
* libspl: umem: These are obviously and intentionally unused; annotate 
  them as such to appease -Wunused-parameter builds that include this 
  header.

* sys/dmu.h: In this case, clear_on_evict_dbufp is only used for 
  ZFS_DEBUG builds, so annotate it as __maybe_unused to appease 
  -Wunused-parameter.


Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #10606
2020-07-23 17:41:48 -07:00
Ryan Moeller f7a68f99d0 FreeBSD: Remove some code duplication in sysctl_os.c
Drop unnecessary redefinition's of several arcstat values.
Put missing extern declaration of arc_no_grow_shift in arc_impl.h.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10609
2020-07-23 17:35:34 -07:00
Kyle Evans b197457cd6 libzfs: const'ify path argument to zfs_path_to_zhandle
zfs_path_to_zhandle has no need to mutate the path argument,
most notably:

- zfs_open takes path as const
- getextmntent takes path as const
- fprintf most clearly doesn't need to mutate it

It's hard to foresee any reason that libzfs could conceivably
want to mutate it in the future, either, so const'ify it.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #10605
2020-07-22 11:14:20 -07:00
Ryan Moeller db5b3926e9 OpenZFSify CONTRIBUTING.md
Update stale references to "ZFS on Linux" to "OpenZFS" in
CONTRIBUTING.md.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10602
2020-07-22 11:09:04 -07:00
Ryan Moeller 0c79b070a7 ZTS: Fix devname2devid build on FreeBSD with libudev
When libudev is installed on FreeBSD, configure finds it and sets
WANT_DEVNAME2DEVID, but it isn't found by the linker because we
didn't specify where it is.

Use LIBUDEV_LIBS so the location of the library gets added to the
linker flags for devname2devid.
Also use LIBUDEV_CFLAGS here in case some other platform needs it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10590
2020-07-22 10:49:22 -07:00
Arvind Sankar 59415fc9fb Add zfs_gitrev.h to the distributed sources
Commit 109d2c9310 ("Move zfs_gitrev.h to build directory") stopped
distributing zfs_gitrev.h, as it is a generated file. Add it back, with
some changes in behavior.

Change the logic for gitrev as follows
- if the source tree is a git repository, the behavior for build is
  unchanged. For make dist, append -dist to the git tag in the
  distributed version of zfs_gitrev.h.
- otherwise, check if the source tree contains zfs_gitrev.h, and use it
  if so, falling back to "unknown" if it doesn't exist.
- clean it only in make maintainer-clean, so we don't remove it from the
  source tree on make clean or make distclean.

This allows disted sources to track what git tag they originally came
from, with the -dist suffix indicating that the code wasn't built
directly from git and so might contain additional changes beyond the git
tag.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10595
2020-07-22 10:00:40 -07:00
Arvind Sankar d32a59fe2b Restore scripts/make_gitrev.sh
Commit 109d2c9310 ("Move zfs_gitrev.h to build directory") removed
scripts/make_gitrev.sh, putting the logic into the Makefile itself.

However, at least the Arch Linux packager wants the script so that the
file can be generated without having to run configure first, for
DKMS packaging purposes.

So move the make recipe back into the script.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10595
2020-07-22 09:59:36 -07:00
Matthew Ahrens 5dd92909c6 Adjust ARC terminology
The process of evicting data from the ARC is referred to as
`arc_adjust`.

This commit changes the term to `arc_evict`, which is more specific.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10592
2020-07-22 09:51:47 -07:00
Evan Harris 317dbea173 Updated CONTRIBUTING doc links for new issue labels
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Evan Harris <eharris@puremagic.com>
Closes #10584
2020-07-19 10:22:44 -07:00
Arvind Sankar 5ad61b5b01 Disable shebang mangling on input files
The DKMS module installs the entire source tree, including the .in files
that will later be substituted when building. This makes
brp_mangle_shebangs complain about shebang lines in the .in files.

Exclude everything under /usr/src from shebang mangling in the DKMS
package.

The KMOD package doesn't contain any of the files it excludes from
mangling, so just drop the exclusion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: João Carlos Mendes Luís <jonny@jonny.eng.br>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10581 
Closes #10582
2020-07-19 10:19:08 -07:00
Ryan Moeller 0421f257b2 FreeBSD: Add legacy arc_min and arc_max
These tunables were renamed from vfs.zfs.arc_min and 
vfs.zfs.arc_max to vfs.zfs.arc.min and vfs.zfs.arc.max.
Add legacy compat tunables for the old names.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10579
2020-07-19 10:15:34 -07:00
Jean-Baptiste Lallement ceadc0dbbd Make unloading the key more robust
The unit was failing instead of stopping if someone manually unloaded
the key before stopping the unit (zfs unload-key is failing on an
unavailable key).
Follow a similar logic than for loading the key, checking for the key
status before unloading it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Co-authored-by: Didier Roche <didrocks@ubuntu.com>
Signed-off-by: Didier Roche <didrocks@ubuntu.com>
Closes #10477
2020-07-19 10:04:14 -07:00
Jean-Baptiste Lallement b717f9b95e BindsTo dataset keyload unit to mount associate unit
We need a stronger dependency between the mount unit and its keyload unit
when we know that the dataset is encrypted.
If the keyload unit fails, Wants= will still try to mount the dataset,
which will then fail.
It’s better to show that the failure is due to a dependency failing, the
keyload unit, by tighting up the dependency. We can do this as we know
that we generate both units in the generator and so, it’s not an
optional dependency.
BindsTo enable as well that if the keyload unit fails at any point, the
associated mountpoint will be then unmounted.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Didier Roche <didrocks@ubuntu.com>
Signed-off-by: Didier Roche <didrocks@ubuntu.com>
Closes #10477
2020-07-19 10:03:48 -07:00
Jean-Baptiste Lallement de817cc7b7 Ensure mount unit pilots when its ZFS key is loaded
Drop Before=zfs.mount dependency explicity on generated key-load .service
unit.
Indeed, the associated mount unit is After=<dataset-key-load>.service.
This is thus the mount point which controls at what point it wants to be
mounted (Before=zfs-mount.service in stock generator), but this can be
an automount point, or triggered by another service.
This additional dependency from the key load service is not needed thus.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Didier Roche <didrocks@ubuntu.com>
Signed-off-by: Didier Roche <didrocks@ubuntu.com>
Closes #10477
2020-07-19 10:03:02 -07:00
Matthew Ahrens 026e529cb3 Remove skc_reclaim, hdr_recl, kmem_cache shrinker
The SPL kmem_cache implementation provides a mechanism, `skc_reclaim`,
whereby individual caches can register a callback to be invoked when
there is memory pressure.  This mechanism is used in only one place: the
ARC registers the `hdr_recl()` reclaim function.  This function wakes up
the `arc_reap_zthr`, whose job is to call `kmem_cache_reap()` and
`arc_reduce_target_size()`.

The `skc_reclaim` callbacks are invoked only by shrinker callbacks and
`arc_reap_zthr`, and only callback only wakes up `arc_reap_zthr`.  When
called from `arc_reap_zthr`, waking `arc_reap_zthr` is a no-op.  When
called from shrinker callbacks, we are already aware of memory pressure
and responding to it.  Therefore there is little benefit to ever calling
the `hdr_recl()` `skc_reclaim` callback.

The `arc_reap_zthr` also wakes once a second, and if memory is low when
allocating an ARC buffer.  Therefore, additionally waking it from the
shrinker calbacks has little benefit.

The shrinker callbacks can be invoked very frequently, e.g. 10,000 times
per second.  Additionally, for invocation of the shrinker callback,
skc_reclaim is invoked many times.  Therefore, this mechanism consumes
significant amounts of CPU time.

The kmem_cache shrinker calls `spl_kmem_cache_reap_now()`, which,
in addition to invoking `skc_reclaim()`, does two things to attempt to
free pages for use by the system:
 1. Return free objects from the magazine layer to the slab layer
 2. Return entirely-free slabs to the page layer (i.e. free pages)

These actions apply only to caches implemented by the SPL, not those
that use the underlying kernel SLAB/SLUB caches.  The SPL caches are
used for objects >=32KB, which are primarily linear ABD's cached in the
DBUF cache.

These actions (freeing objects from the magazine layer and returning
entirely-free slabs) are also taken whenever a `kmem_cache_free()` call
finds a full magazine.  So there would typically be zero entirely-free
slabs, and the number of objects in magazines is limited (typically no
more than 64 objects per magazine, and there's one magazine per CPU).
Therefore the benefit of `spl_kmem_cache_reap_now()`, while nonzero, is
modest.

We also call `spl_kmem_cache_reap_now()` from the `arc_reap_zthr`, when
memory pressure is detected.  Therefore, calling
`spl_kmem_cache_reap_now()` from the kmem_cache shrinker is not needed.

This commit removes the `skc_reclaim` mechanism, its only callback
`hdr_recl()`, and the kmem_cache shrinker callback.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10576
2020-07-19 09:58:30 -07:00
Brian Behlendorf e862b7ecfc Linux 4.10 compat: has_capability()
Stock kernels older than 4.10 do not export the has_capability()
function which is required by commit e59a377.  To avoid breaking
the build on older kernels revert to the safe legacy behavior and
return EACCES when privileges cannot be checked.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10565
Closes #10573
2020-07-19 09:56:21 -07:00
Matthew Ahrens 8fbf432ae2 anon_pages are not free/evictable
`arc_free_memory()` returns the amount of memory that the ARC considers
to be free.  This includes pages that are not actually free, but can be
evicted with essentially zero cost (without doing any i/o), for example
the page cache.  The ARC can "squeeze out" any pages included in this
calculation, leaving only `arc_sys_free` (1/64th of RAM) for these
free/evictable pages.

Included in the count of free/evictable pages is
`nr_inactive_anon_pages()`, which is described as "Anonymous memory that
has not been used recently and can be swapped out".  These pages would
have to be written out to disk (swap) in order to evict them, and they
are not included in `/proc/meminfo`'s `MemAvailable`.

Therefore it is not appropriate for `nr_inactive_anon_pages()` to be
included in the free/evictable memory returned by `arc_free_memory()`,
because the ARC shouldn't (intentionally) make the system swap.

This commit removes `nr_inactive_anon_pages()` from the memory returned
by `arc_free_memory()`.  This is a step towards enabling the ARC to
manage free memory by monitoring it and reducing the ARC size as we
notice that there is insufficient free memory (in the `arc_reap_zthr`),
rather than the current method of relying on the `arc_shrinker`
callback.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10575
2020-07-16 10:11:26 -07:00
Matthew Macy 23c871671c FreeBSD: zfs commands backward compatibility
Update the zfs commands such that they're backwards compatible with
the version of ZFS is the base FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10542
2020-07-15 21:32:50 -07:00
Brian Behlendorf f1b8153794 Update zts-report.py with additional tests
The following test cases have been observed to fail frequently
enough to be a problem when reporting CI results.  Until they can
be updated to be entirely reliable add them to the zts-report.py
script.

    alloc_class/alloc_class_011_neg
    cli_root/zpool_import/zpool_import_012_pos
    mmp/mmp_on_uberblocks
    rsend/send_partial_dataset

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10578
2020-07-15 21:28:18 -07:00
Ryan Moeller 103bc5b957 ZTS: Fix nonportable use of stat in list_file_blocks
FreeBSD stat uses -f to specify the format string rather than -c.
list_file_blocks in blkdev.shlib uses stat -c %i to get a file's
object ID for zdb.  We already have a library function to do this
portably.

Use get_objnum to get the file's object ID.

Take log_must off of the call to list_free_blocks in
corrupt_blocks_at_level, which had masked the error.  It was not good
to pipe the output of log_must into the while-loop, anyway.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10572
2020-07-15 21:26:39 -07:00
Romain Dolbeau 01a4852ecb Fix early include of <linux/percpu_compat.h>
Move/add include of <linux/percpu_compat.h> to satisfy missing
requirements.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain@dolbeau.org>
Closes #10568 
Closes #10569
2020-07-15 15:58:15 -07:00
Matthew Ahrens 6774931dfa Extend zdb to print inconsistencies in livelists and metaslabs
Livelists and spacemaps are data structures that are logs of allocations
and frees.  Livelists entries are block pointers (blkptr_t). Spacemaps
entries are ranges of numbers, most often used as to track
allocated/freed regions of metaslabs/vdevs.

These data structures can become self-inconsistent, for example if a
block or range can be "double allocated" (two allocation records without
an intervening free) or "double freed" (two free records without an
intervening allocation).

ZDB (as well as zfs running in the kernel) can detect these
inconsistencies when loading livelists and metaslab.  However, it
generally halts processing when the error is detected.

When analyzing an on-disk problem, we often want to know the entire set
of inconsistencies, which is not possible with the current behavior.
This commit adds a new flag, `zdb -y`, which analyzes the livelist and
metaslab data structures and displays all of their inconsistencies.
Note that this is different from the leak detection performed by
`zdb -b`, which checks for inconsistencies between the spacemaps and the
tree of block pointers, but assumes the spacemaps are self-consistent.

The specific checks added are:

Verify livelists by iterating through each sublivelists and:
- report leftover FREEs
- report double ALLOCs and double FREEs
- record leftover ALLOCs together with their TXG [see Cross Check]

Verify spacemaps by iterating over each metaslab and:
- iterate over spacemap and then the metaslab's entries in the
  spacemap log, then report any double FREEs and double ALLOCs

Verify that livelists are consistenet with spacemaps.  The space
referenced by livelists (after using the FREE's to cancel out
corresponding ALLOCs) should be allocated, according to the spacemaps.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-66031
Closes #10515
2020-07-14 17:51:05 -07:00
Arvind Sankar 38e2e9ce83 Centralize variable substitution
A bunch of places need to edit files to incorporate the configured paths
i.e. bindir, sbindir etc. Move this logic into a common file.

Create arc_summary by copying arc_summary[23] as appropriate at build
time instead of install time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10559
2020-07-14 17:33:44 -07:00
Arvind Sankar bdb518c13a Make RPM_DEFINE_KMOD conditional on CONFIG_KERNEL
The configure variables won't be defined if CONFIG_KERNEL is disabled
and defining empty macros causes errors. The spec files do provide some
defaults if the macros are undefined.

Remove config conditionals in the tgz Makefile.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10564
2020-07-14 17:32:21 -07:00
Arvind Sankar e6c093dd94 Fix parallel make srpm
When building srpm using make -j, each of the recursive makes invoked to
build srpm-{dkms,kmod,utils} will build the dist target. This is both
unnecessary, and also has a very good chance of breaking when they race
trying to build gitrev.

Fix this by make dist a prerequisite of srpm-{dkms,kmod,utils} instead
of srpm-common, so that it will be done once before invoking the
recursive makes.

Also, gitrev is not really required for make dist, so instead of adding
it to BUILT_SOURCES, just add it as a prerequisite of the all target.

Mark the individual package targets as PHONY.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10564
2020-07-14 17:31:45 -07:00
Alexander Motin 1743c737f5 Fix LOR between dp_config_rwlock and spa_props_lock
Our QE team during automated API testing hit deadlock in ZFS, caused
by lock order reversal.  From one side dsl_sync_task_sync() locks
dp_config_rwlock as writer and calls spa_sync_props(), which waits
for spa_props_lock.  From another spa_prop_get() locks spa_props_lock
and then calls dsl_pool_config_enter(), trying to lock dp_config_rwlock
as reader.

This patch makes spa_prop_get() lock dp_config_rwlock before
spa_props_lock, making the order consistent.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #10553
2020-07-14 12:21:57 -07:00
Joao Carlos Mendes Luis 5f72109e5b Disable -Wl,-z,defs for ASAN builds
Commit af65916 added -Wl,-z,defs for the shared libraries. This
apparently does not work in some cases with --enable-asan, so only add
it for non-ASAN builds.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: João Carlos Mendes Luis <jonny@jonny.eng.br>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10557
Closes #10560
2020-07-14 12:17:44 -07:00
Brian Atkinson e4d3d77684 Fixing gang ABD child removal race condition
On linux the list debug code has been setting off a failure when
checking that the node->next->prev value is pointing back at the node.
At times this check evaluates to 0xdead. When removing a child from a
gang ABD we must acquire the child's abd_mtx to make sure that the
same ABD is not being added to another gang ABD while it is being
removed from a gang ABD. This fixes a race condition when checking
if an ABDs link is already active and part of another gang ABD before
adding it to a gang.

Added additional debug code for the gang ABD in abd_verify() to make
sure each child ABD has active links. Also check to make sure another
gang ABD is not added to a gang ABD.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #10511
2020-07-14 11:04:35 -07:00
George Wilson c15d36c674 Remove dependency on sharetab file and refactor sharing logic
== Motivation and Context

The current implementation of 'sharenfs' and 'sharesmb' relies on
the use of the sharetab file. The use of this file is os-specific
and not required by linux or freebsd. Currently the code must
maintain updates to this file which adds complexity and presents
a significant performance impact when sharing many datasets. In
addition, concurrently running 'zfs sharenfs' command results in
missing entries in the sharetab file leading to unexpected failures.

== Description

This change removes the sharetab logic from the linux and freebsd
implementation of 'sharenfs' and 'sharesmb'. It still preserves an
os-specific library which contains the logic required for sharing
NFS or SMB. The following entry points exist in the vastly simplified
libshare library:

- sa_enable_share -- shares a dataset but may not commit the change
- sa_disable_share -- unshares a dataset but may not commit the change
- sa_is_shared -- determine if a dataset is shared
- sa_commit_share -- notify NFS/SMB subsystem to commit the shares
- sa_validate_shareopts -- determine if sharing options are valid

The sa_commit_share entry point is provided as a performance enhancement
and is not required. The sa_enable_share/sa_disable_share may commit
the share as part of the implementation. Libshare provides a framework
for both NFS and SMB but some operating systems may not fully support
these protocols or all features of the protocol.

NFS Operation:
For linux, libshare updates /etc/exports.d/zfs.exports to add
and remove shares and then commits the changes by invoking
'exportfs -r'. This file, is automatically read by the kernel NFS
implementation which makes for better integration with the NFS systemd
service. For FreeBSD, libshare updates /etc/zfs/exports to add and
remove shares and then commits the changes by sending a SIGHUP to
mountd.

SMB Operation:
For linux, libshare adds and removes files in /var/lib/samba/usershares
by calling the 'net' command directly. There is no need to commit the
changes. FreeBSD does not support SMB.

== Performance Results

To test sharing performance we created a pool with an increasing number
of datasets and invoked various zfs actions that would enable and
disable sharing. The performance testing was limited to NFS sharing.
The following tests were performed on an 8 vCPU system with 128GB and
a pool comprised of 4 50GB SSDs:

Scale testing:
- Share all filesystems in parallel -- zfs sharenfs=on <dataset> &
- Unshare all filesystems in parallel -- zfs sharenfs=off <dataset> &

Functional testing:
- share each filesystem serially -- zfs share -a
- unshare each filesystem serially -- zfs unshare -a
- reset sharenfs property and unshare -- zfs inherit -r sharenfs <pool>

For 'zfs sharenfs=on' scale testing we saw an average reduction in time
of 89.43% and for 'zfs sharenfs=off' we saw an average reduction in time
of 83.36%.

Functional testing also shows a huge improvement:
- zfs share -- 97.97% reduction in time
- zfs unshare -- 96.47% reduction in time
- zfs inhert -r sharenfs -- 99.01% reduction in time

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Bryant G. Ly <bryangly@gmail.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-Issue: DLPX-68690
Closes #1603
Closes #7692
Closes #7943
Closes #10300
2020-07-13 09:19:18 -07:00
Matthew Ahrens e59a377a8f filesystem_limit/snapshot_limit is incorrectly enforced against root
The filesystem_limit and snapshot_limit properties limit the number of
filesystems or snapshots that can be created below this dataset.
According to the manpage, "The limit is not enforced if the user is
allowed to change the limit."  Two types of users are allowed to change
the limit:

1. Those that have been delegated the `filesystem_limit` or
`snapshot_limit` permission, e.g. with
`zfs allow USER filesystem_limit DATASET`.  This works properly.

2. A user with elevated system privileges (e.g. root).  This does not
work - the root user will incorrectly get an error when trying to create
a snapshot/filesystem, if it exceeds the `_limit` property.

The problem is that `priv_policy_ns()` does not work if the `cred_t` is
not that of the current process.  This happens when
`dsl_enforce_ds_ss_limits()` is called in syncing context (as part of a
sync task's check func) to determine the permissions of the
corresponding user process.

This commit fixes the issue by passing the `task_struct` (typedef'ed as
a `proc_t`) to syncing context, and then using `has_capability()` to
determine if that process is privileged.  Note that we still need to
pass the `cred_t` to syncing context so that we can check if the user
was delegated this permission with `zfs allow`.

This problem only impacts Linux.  Wrappers are added to FreeBSD but it
continues to use `priv_check_cred()`, which works on arbitrary `cred_t`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8226
Closes #10545
2020-07-11 17:18:02 -07:00
Ryan Moeller 217f48373f libzfs: Add error message for why creating mountpoint failed
When zfs_mount_at() fails to stat the mountpoint and can't create the
directory, we return an error with a message "failed to create
mountpoint" but there is no indication why it failed.

Add the error string from the syscall to the error aux message.

Update do_mount for Linux to return the errno instead of -1.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10550
2020-07-11 17:16:13 -07:00
Matthew Macy 3933305eac FreeBSD: Use a hash table for taskqid lookups
Previously a tqent could be recycled prematurely, update the
code to use a hash table for lookups to resolve this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10529
2020-07-11 17:13:45 -07:00
Serapheim Dimitropoulos 6f1db5f37e Unconditionally enable debugging for libzpool
We already enable -DDEBUG unconditionally (meaning regardless
of this is a debug build or a performance build) for zdb and
ztest as they are mostly used for development and debugging.

This patch enables -DDEBUG for libzpool extending the debugging
checks for zdb, ztest, and a couple of other test utilities.

In addition to passing -DDEBUG we also enable -DZFS_DEBUG so
all assertion checks work s expected. We do so not only in
libzpool but in every utility that links to it, even if the
utility doesn't directly use any functionality wrapped in
ZFS_DEBUG macro definitions. The reason is that these utilities
may still include headers that contain structs that have more
fields when ZFS_DEBUG is defined. This can be a problem as
enabling that flag for libzpool but not for zdb can lead into
random problems (e.g. segmentation faults) as zdb may be have
an incorrect view of a struct passed to it by libzpool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #10549
2020-07-10 15:30:31 -07:00
Arvind Sankar f040a7b0f8 Fix up FIND_SYSTEM_LIBRARY to work with cross-compiling
Make FIND_SYSTEM_LIBRARY respect a configured sysroot, otherwise it
might find headers from the build machine and assume the library is
available on the host/target.

Tighten up error checking: if pkg-config or the user specified _CFLAGS
or _LIBS but we can't find the header/library, issue a fatal error.

Fix the -L flag to /usr/local/lib instead of just /usr/local.

Clean out the _CFLAGS and _LIBS if we located something that we later
find doesn't work.

Rename FIND_SYSTEM_LIBRARY into the ZFS_AC_ scope.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538
2020-07-10 14:26:40 -07:00
Arvind Sankar 3e597dee11 Use abs_top_builddir when referencing libraries
libtool stores absolute paths in the dependency_libs component of the
.la files. If the Makefile for a dependent library refers to the
libraries by relative path, some libraries end up duplicated on the link
command line.

As an example, libzfs specifies libzfs_core, libnvpair and libuutil as
dependencies to be linked in. The .la file for libzfs_core also
specifies libnvpair, but using an absolute path, with the result that
libnvpair is present twice in the linker command line for producing
libzfs.

While the only thing this causes is to slightly slow down the linking,
we can avoid it by using absolute paths everywhere, including for
convenience libraries just for consistency.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538
2020-07-10 14:26:32 -07:00
Arvind Sankar af65916226 Add -z defs to LDFLAGS
This will make sure the installed libraries are linked with everything
they require.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538
2020-07-10 14:26:22 -07:00
Arvind Sankar 1537105a8c Add config.rpath for AM_GNU_GETTEXT
Commit e8864b1b28 ("config: libintl/libiconv for gettext() detection")
added an empty config.rpath with a comment that the real one doesn't
work with libtool.

However, an empty config.rpath doesn't really work: eg. on FreeBSD,
where libintl is in /usr/local/lib, configure thinks that gettext
doesn't exist and NLS should be disabled, which currently isn't
supported in the source, and hence requires manual workaround to
directly link -lintl without relying on configure. config.rpath is
essential to let it be detected either in --prefix or using
--with-libintl-prefix.

I also don't see the mentioned issue with libtool flags applied to
compilation, it seems to work fine to pass LTLIBINTL to libtool. It's
unnecessary to include LTLIBICONV as the configure test will
automatically append that to LTLIBINTL if it is necessary to link with
libiconv.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538
2020-07-10 14:26:12 -07:00
Arvind Sankar 4d61ade1a3 Clean up lib dependencies
libzutil is currently statically linked into libzfs, libzfs_core and
libzpool. Avoid the unnecessary duplication by removing it from libzfs
and libzpool, and adding libzfs_core to libzpool.

Remove a few unnecessary dependencies:
- libuutil from libzfs_core
- libtirpc from libspl
- keep only libcrypto in libzfs, as we don't use any functions from
  libssl
- librt is only used for clock_gettime, however on modern systems that's
  in libc rather than librt. Add a configure check to see if we actually
  need librt
- libdl from raidz_test

Add a few missing dependencies:
- zlib to libefi and libzfs
- libuuid to zpool, and libuuid and libudev to zed
- libnvpair uses assertions, so add assert.c to provide aok and
  libspl_assertf

Sort the LDADD for programs so that libraries that satisfy dependencies
come at the end rather than the beginning of the linker command line.

Revamp the configure tests for libaries to use FIND_SYSTEM_LIBRARY
instead. This can take advantage of pkg-config, and it also avoids
polluting LIBS.

List all the required dependencies in the pkgconfig files, and move the
one for libzfs_core into the latter's directory. Install pkgconfig files
in $(libdir)/pkgconfig on linux and $(prefix)/libdata/pkgconfig on
FreeBSD, instead of /usr/share/pkgconfig, as the more correct location
for library .pc files.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538
2020-07-10 14:26:00 -07:00
Arvind Sankar b6437ea41c Move libspl_assertf into .c file
Variadic functions cannot be inlined. libspl_assertf ends up being
duplicated in every file that uses it.

Fix this by moving the function into a new assert.c. Also move the
definition of aok into the new file instead of zone.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538
2020-07-10 14:25:24 -07:00
George Amanakis 2054f35e56 Fix a persistent L2ARC bug in l2arc_write_done()
In case l2arc_write_done() handles a zio that was not successful check
that the list of log block pointers is not empty when restoring them
in the device header. Otherwise zero them out. In any case perform the
actual write updating the device header after the zio of
l2arc_write_buffers() completes as l2arc_write_done() may have touched
the memory holding the log block pointers in the device header.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10540 
Closes #10543
2020-07-10 14:10:03 -07:00
Gabriel A. Devenyi d2bce6d036 Add Makefile command to run checkbashisms on all /bin/sh scripts
Based on the shellcheck make target, add a target which checks
for violations of POSIX standards for shell scripts

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Closes #10513
2020-07-09 20:04:49 -07:00
Ryan Moeller a2ec738c75 ZTS: Make bc conditional use compatible with new BSD bc
FreeBSD recently replaced the GNU bc and dc in the base system with
BSD licensed versions.  They are supposed to be compatible with all
the features present in the GNU versions, but it turns out they are
picky about `if` statements having a corresponding `else`.  ZTS uses
`echo "if ($x > $y) 1" | bc` in a few places, which causes tests to
fail unexpectedly with the new bc.

Change the two expressions in ZTS to `if ($x > $y) 1 else 0` for
compatibility with the new BSD bc.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10551
2020-07-09 17:49:02 -07:00
Ryan Moeller 659f4008be libzfs: Make zfs_cmd_t initialization consistent, use zfs_ioctl
The clang version 8.0.1 shipped in FreeBSD 12.1-RELEASE also oddly
throws a warning that is treated as an error on the initialization of
the zc struct in zpool_nextboot.

The zpool_nextboot code from FreeBSD was not updated to use zfs_ioctl.

Switch ioctl to zfs_ioctl in and use {"\0"} to initialize the struct.
Do a consistency pass for zfs_cmd_t initialization.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10539
2020-07-09 17:47:12 -07:00
Ryan Moeller fb91f0367e Add zpool_nextboot, move zfs_jail to libzfs.h
FreeBSD has a zfsbootcfg command that wants zpool_nextboot in libzfs.

Add the function to FreeBSD's libzfs_compat.c, and while here move
the prototype for zfs_jail out of param.h in FreeBSD's SPL and into
libzfs.h under an ifdef for FreeBSD, where the prototype for
zpool_nextboot joins it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10524
2020-07-06 11:57:24 -07:00
Mark Johnston cd32b4f5b7 Fix a deadlock in the FreeBSD getpages VOP
FreeBSD has a per-page "busy" lock which is held when handling a page
fault on a mapped file.  This lock is also acquired when copying data
from the DMU to the page cache in zfs_write().  File range locks are
also acquired in both of these paths, in the opposite order with respect
to the busy lock.

In the getpages VOP, the range lock is only used to determine the extent
of optional read-ahead and read-behind operations.  To resolve the lock
order reversal, modify the getpages VOP to avoid blocking on the range
lock.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #10519
2020-07-06 11:53:57 -07:00
Mark Johnston 6e00561712 Add a "try" operation for range locks
zfs_rangelock_tryenter() bails immediately instead of waiting for the
lock to become available.  This will be used to resolve a deadlock in
the FreeBSD page-in code.  No functional change intended.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #10519
2020-07-06 11:53:31 -07:00
winglq a4b0a74c7f Fix atomic_clear_long_excl wrong return
When clearing a bit, we should check whether that bit is 0.
Note atomic_clear_long_excl is not used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Liu Qing <winglq@gmail.com>
Closes #10526
2020-07-06 11:46:17 -07:00
Ryan Moeller 8a3d9186ba Update zfs_freebsd_need_inactive to fix mmapped writes
`zfs_freebsd_need_inactive` appears to been based on an unfinished
version of https://reviews.freebsd.org/D22130 which had a bug where
files written via mmap wouldn't actually persist.

Update the function to match the final version committed to FreeBSD.

Authored-by: Mateusz Guzik <mjg@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10527 
Closes #10528
2020-07-03 11:30:04 -07:00
Brian Behlendorf 9a49d3f3d3 Add device rebuild feature
The device_rebuild feature enables sequential reconstruction when
resilvering.  Mirror vdevs can be rebuilt in LBA order which may
more quickly restore redundancy depending on the pools average block
size, overall fragmentation and the performance characteristics
of the devices.  However, block checksums cannot be verified
as part of the rebuild thus a scrub is automatically started after
the sequential resilver completes.

The new '-s' option has been added to the `zpool attach` and
`zpool replace` command to request sequential reconstruction
instead of healing reconstruction when resilvering.

    zpool attach -s <pool> <existing vdev> <new vdev>
    zpool replace -s <pool> <old vdev> <new vdev>

The `zpool status` output has been updated to report the progress
of sequential resilvering in the same way as healing resilvering.
The one notable difference is that multiple sequential resilvers
may be in progress as long as they're operating on different
top-level vdevs.

The `zpool wait -t resilver` command was extended to wait on
sequential resilvers.  From this perspective they are no different
than healing resilvers.

Sequential resilvers cannot be supported for RAIDZ, but are
compatible with the dRAID feature being developed.

As part of this change the resilver_restart_* tests were moved
in to the functional/replacement directory.  Additionally, the
replacement tests were renamed and extended to verify both
resilvering and rebuilding.

Original-patch-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: John Poduska <jpoduska@datto.com>
Co-authored-by: Mark Maybee <mmaybee@cray.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10349
2020-07-03 11:05:50 -07:00
Matthew Macy 7ddb753d17 freebsd: changes necessary to coexist with dtrace in tree
Fix header conflicts when building zfs with openzfs as a vendor import.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10497
2020-07-01 09:10:08 -07:00
Harald van Dijk 22831636c8 configure fixes
a+=b is not supported by all shells. It is equivalent to a=${a}b, so
just rewrite it as that.

This also fixes commit 9ea6c3d3, which intended to only make the
definitions of _dracutdir, _udevdir, and _udevruledir conditional, but
actually ensured that _initconfdir no longer got defined if _dracutdir
was defined, and defined _udevdir to the value that should have been
used for _udevruledir.

This also fixes the fact that the checks introduced by commit 9ea6c3d3
could never work: ZFS_AC_PACKAGE was called before the configuration
options were processed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Harald van Dijk <harald@gigawatt.nl>
Closes #10518
2020-07-01 09:05:21 -07:00
Harald van Dijk 2ac6aa1176 pam: fix test usage in configure script
The standard test command does not support the == operator. Certain
shells, including bash, do support it, but in those shells it does
exactly the same thing as the standard = operator. Use that instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Harald van Dijk <harald@gigawatt.nl>
Closes #10509
2020-06-29 09:28:22 -07:00
Matthew Ahrens 3c42c9ed84 Clean up OS-specific ARC and kmem code
OS-specific code (e.g. under `module/os/linux`) does not need to share
its code structure with any other operating systems.  In particular, the
ARC and kmem code need not be similar to the code in illumos, because we
won't be syncing this OS-specific code between operating systems.  For
example, if/when illumos support is added to the common repo, we would
add a file `module/os/illumos/zfs/arc_os.c` for the illumos versions of
this code.

Therefore, we can simplify the code in the OS-specific ARC and kmem
routines.

These changes do not impact system behavior, they are purely code
cleanup.  The changes are:

Arenas are not used on Linux or FreeBSD (they are always `NULL`), so
`heap_arena`, `zio_arena`, and `zio_alloc_arena` can be removed, along
with code that uses them.

In `arc_available_memory()`:
 * `desfree` is unused, remove it
 * rename `freemem` to avoid conflict with pre-existing `#define`
 * remove checks related to arenas
 * use units of bytes, rather than converting from bytes to pages and
   then back to bytes

`SPL_KMEM_CACHE_REAP` is unused, remove it.

`skc_reap` is unused, remove it.

The `count` argument to `spl_kmem_cache_reap_now()` is unused, remove
it.

`vmem_size()` and associated type and macros are unused, remove them.

In `arc_memory_throttle()`, use a less confusing variable name to store
the result of `arc_free_memory()`.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10499
2020-06-29 09:01:07 -07:00
Arvind Sankar 94a2dca6a0 Mark phony targets as phony.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10506
2020-06-27 17:40:15 -07:00
Arvind Sankar 2b5f3045ae paxcheck needs to run against the builddir not the srcdir
Otherwise it does nothing on an out-of-tree build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10506
2020-06-27 17:40:15 -07:00
Arvind Sankar c79907f901 Move cppcheck suppressions out of .github
Move this file out of .github and add it to distribution.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10506
2020-06-27 17:40:15 -07:00
Arvind Sankar a86f0be8ef Remove config/suppressed-warnings.txt
This doesn't appear to be used by the buildbot any more, let's remove
it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10506
2020-06-27 17:40:15 -07:00
Arvind Sankar 4d8e68c42f Avoid installing kernel headers on FreeBSD
The kernel headers are installed for DKMS on linux, so don't install
them unless we're building on linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10506
2020-06-27 17:40:14 -07:00
Brian Behlendorf 67b1362f04 Style fixes
* Fix cstyle issue in shrinker.h which exceeded 80 columns.
* Silence shellcheck warning in zpool.d/smart script.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-06-27 17:38:55 -07:00
Allan Jude 3bc92b9ef6 Make zstreamdump output the size of the payload for BEGIN records
This is helpful for determining the size of the nvlist of snapshots
and properties

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10505
2020-06-27 10:29:47 -07:00
Matthew Ahrens 270ece24b6 Revise SPL wrapper for shrinker callbacks
The SPL provides a wrapper for the kernel's shrinker callbacks, which
enables the ZFS code to interface with multiple versions of the shrinker
API's from different kernel versions.  Specifically, Linux kernels 3.0 -
3.11 has a single "combined" callback, and Linux kernels 3.12 and later
have two "split" callbacks.  The SPL provides a wrapper function so that
the ZFS code only needs to implement one version of the callbacks.

Currently the SPL's wrappers are designed such that the ZFS code
implements the older, "combined" callback.  There are a few downsides to
this approach:

* The general design within ZFS is for the latest Linux kernel to be
considered the "first class" API.

* The newer, "split" callback API is easier to understand, because each
callback has one purpose.

* The current wrappers do not completely abstract out the differing
API's, so ZFS code needs `#ifdef` code to handle the differing return
values required for different kernel versions.

This commit addresses these drawbacks by having the ZFS code provide the
latest, "split" callbacks, and the SPL provides a wrapping function for
the older, "combined" API.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10502
2020-06-27 10:27:02 -07:00
Serapheim Dimitropoulos ec1fea4516 Use percpu_counter for obj_alloc counter of Linux-backed caches
A previous commit enabled the tracking of object allocations
in Linux-backed caches from the SPL layer for debuggability.
The commit is: 9a170fc6fe54f1e852b6c39630fe5ef2bbd97c16

Unfortunately, it also introduced minor performance regressions
that were highlighted by the ZFS perf test-suite. Within Delphix
we found that the regression would be from -1%, all the way up
to -8% for some workloads.

This commit brings performance back up to par by creating a
separate counter for those caches and making it a percpu in
order to avoid lock-contention.

The initial performance testing was done by myself, and the
final round was conducted by @tonynguien who was also the one
that discovered the regression and highlighted the culprit.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #10397
2020-06-26 18:06:50 -07:00
Matthew Ahrens 7b232e9354 arcstat: add 'avail', fix 'free'
The meaning of the `free` field is currently `zfs_arc_sys_free`, which
is the target amount of memory to leave free for the system, and is
constant after booting.

This commit changes the meaning of `free` to arc_free_memory(), the
amount of memory that the ARC considers to be free.

It also adds a new arcstat field `avail`, which tracks
`arc_available_memory()`.

Since `avail` can be negative, it also updates the arcstat script to
pretty-print negative values.

example output:

  $ arcstat -f time,miss,arcsz,c,grow,need,free,avail 1
      time  miss  arcsz     c  grow  need  free  avail
  15:03:02   39K   114G  114G     0     0  2.4G  407M
  15:03:03   42K   114G  114G     0     0  2.1G  120M
  15:03:04   40K   114G  114G     0     0  1.8G  -177M
  15:03:05   24K   113G  112G     0     0  1.7G  -269M
  15:03:06   29K   111G  110G     0     0  1.6G  -385M
  15:03:07   27K   110G  108G     0     0  1.4G  -535M
  15:03:08   13K   108G  108G     0     0  2.2G  239M
  15:03:09   33K   107G  107G     0     0  1.3G  -639M
  15:03:10   16K   105G  102G     0     0  2.6G  704M
  15:03:11  7.2K   102G  102G     0     0  5.1G  3.1G
  15:03:12   42K   103G  102G     0     0  4.8G  2.8G

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10494
2020-06-26 18:05:28 -07:00
Robert Novak bfcbec6f5d Add block histogram to zdb
The block histogram tracks the changes to psize, lsize and asize
both in the count of the number of blocks (by blocksize) and the
total length of all of the blocks for that blocksize.  It also
keeps a running total of the cumulative size of all of the blocks
up to each size to help determine the size of caching SSDs to be
added to zfs hardware deployments.

The block history counts and lengths are summarized in bins
which are powers of two. Even rows with counts of zero are printed.

This change is accessed by specifying one of two options:

zdb -bbb pool
zdb -Pbbb pool

The first version prints the table in fixed size columns.
The second prints in "parseable" output that can be placed into
a CSV file.

Fixed Column, nicenum output sample:
  block   psize                lsize                asize
   size   Count Length   Cum.  Count Length   Cum.  Count Length   Cum.
    512:  3.50K  1.75M  1.75M  3.43K  1.71M  1.71M  3.41K  1.71M  1.71M
     1K:  3.65K  3.67M  5.43M  3.43K  3.44M  5.15M  3.50K  3.51M  5.22M
     2K:  3.45K  6.92M  12.3M  3.41K  6.83M  12.0M  3.59K  7.26M  12.5M
     4K:  3.44K  13.8M  26.1M  3.43K  13.7M  25.7M  3.49K  14.1M  26.6M
     8K:  3.42K  27.3M  53.5M  3.41K  27.3M  53.0M  3.44K  27.6M  54.2M
    16K:  3.43K  54.9M   108M  3.50K  56.1M   109M  3.42K  54.7M   109M
    32K:  3.44K   110M   219M  3.41K   109M   218M  3.43K   110M   219M
    64K:  3.41K   218M   437M  3.41K   218M   437M  3.44K   221M   439M
   128K:  3.41K   437M   874M  3.70K   474M   911M  3.41K   437M   876M
   256K:  3.41K   874M  1.71G  3.41K   874M  1.74G  3.41K   874M  1.71G
   512K:  3.41K  1.71G  3.41G  3.41K  1.71G  3.45G  3.41K  1.71G  3.42G
     1M:  3.41K  3.41G  6.82G  3.41K  3.41G  6.86G  3.41K  3.41G  6.83G
     2M:      0      0  6.82G      0      0  6.86G      0      0  6.83G
     4M:      0      0  6.82G      0      0  6.86G      0      0  6.83G
     8M:      0      0  6.82G      0      0  6.86G      0      0  6.83G
    16M:      0      0  6.82G      0      0  6.86G      0      0  6.83G

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Robert E. Novak <novak5@llnl.gov>
Closes: #9158 
Closes #10315
2020-06-26 15:09:20 -07:00
Arvind Sankar 6b99fc0620 Fixes for make dist
Reduce the usage of EXTRA_DIST. If files are conditionally included in
_SOURCES, _HEADERS etc, automake is smart enough to dist all files that
could possibly be included, but this does not apply to EXTRA_DIST,
resulting in make dist depending on the configuration.

Add some files that were missing altogether in various Makefile's.

The changes to disted files in this commit (excluding deleted files):

+./cmd/zed/agents/README.md
+./etc/init.d/README.md
+./lib/libspl/os/freebsd/getexecname.c
+./lib/libspl/os/freebsd/gethostid.c
+./lib/libspl/os/freebsd/getmntany.c
+./lib/libspl/os/freebsd/mnttab.c
-./lib/libzfs/libzfs_core.pc
-./lib/libzfs/libzfs.pc
+./lib/libzfs/os/freebsd/libzfs_compat.c
+./lib/libzfs/os/freebsd/libzfs_fsshare.c
+./lib/libzfs/os/freebsd/libzfs_ioctl_compat.c
+./lib/libzfs/os/freebsd/libzfs_zmount.c
+./lib/libzutil/os/freebsd/zutil_compat.c
+./lib/libzutil/os/freebsd/zutil_device_path_os.c
+./lib/libzutil/os/freebsd/zutil_import_os.c
+./module/lua/README.zfs
+./module/os/linux/spl/README.md
+./tests/README.md
+./tests/zfs-tests/tests/functional/cli_root/zfs_clone/zfs_clone_rm_nested.ksh
+./tests/zfs-tests/tests/functional/cli_root/zfs_send/zfs_send_encrypted_unloaded.ksh
+./tests/zfs-tests/tests/functional/inheritance/README.config
+./tests/zfs-tests/tests/functional/inheritance/README.state
+./tests/zfs-tests/tests/functional/rsend/rsend_016_neg.ksh
+./tests/zfs-tests/tests/perf/fio/sequential_readwrite.fio

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10501
2020-06-26 14:20:02 -07:00
Arvind Sankar 7c902a5178 Include FreeBSD sources in module dist
Add os/freebsd and Makefile.bsd into distdir target.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10501
2020-06-26 14:19:35 -07:00
Matthew Ahrens 67c0f0dedc ARC shrinking blocks reads/writes
ZFS registers a memory hook, `__arc_shrinker_func`, which is supposed to
allow the ARC to shrink when the kernel experiences memory pressure.
The ARC shrinker changes `arc_c` via a call to
`arc_reduce_target_size()`.  Before commit 3ec34e5527, the ARC
shrinker would also evict data from the ARC to bring `arc_size` down to
the new `arc_c`.  However, that commit (seemingly inadvertently) made it
so that the ARC shrinker no longer evicts any data or waits for eviction
to complete.

Repeated calls to the ARC shrinker can reduce `arc_c` drastically, often
all the way to `arc_c_min`.  Since it doesn't wait for the actual
eviction of data from the ARC, this creates a situation where `arc_size`
is more than `arc_c` for the several seconds/minutes it takes for
`arc_adjust_zthr` to evict data from the ARC.  During this time,
arc_get_data_impl() will block, so ZFS can't process read/write requests
(e.g. from iSCSI, NFS, or read/write syscalls).

To ensure that `arc_c` doesn't shrink faster than the adjust thread can
keep up, this commit makes the ARC shrinker wait for the eviction to
complete, resulting in similar behavior to what we had before commit
3ec34e5527.

Note: commit 3ec34e5527 is `OpenZFS 9284 - arc_reclaim_thread
has 2 jobs` and was integrated in December 2018, and is part of ZoL
0.8.x but not 0.7.x.

Additionally, when the ARC size is reduced drastically, the
`arc_adjust_zthr` can be on-CPU for many seconds without blocking.  Any
threads that are bound to the same CPU that arc_adjust_zthr is running
on will not able to run for a long time.

To ensure that CPU-bound threads can make progress, this commit changes
`arc_evict_state_impl()` make a voluntary preemption call,
`cond_resched()`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-70703
Closes #10496
2020-06-26 10:42:27 -07:00
felixdoerre 221e67040f pam: implement a zfs_key pam module
Implements a pam module for automatically loading zfs encryption keys 
for home datasets. The pam module:

  - loads a zfs key and mounts the dataset when a session opens.
  - unmounts the dataset and unloads the key when the session closes.
  - when the user is logged on and changes the password, the module
    changes the encryption key.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: @jengelh <jengelh@inai.de>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Felix Dörre <felix@dogcraft.de>
Closes #9886
Closes #9903
2020-06-24 18:45:44 -07:00
Arvind Sankar 7513807320 Drop unnecessary srcdir paths
There's no need to specify the srcdir explicitly in _HEADERS and
EXTRA_DIST.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10493
2020-06-24 18:20:18 -07:00
Arvind Sankar 5ca349f95d Fix check for sed --in-place
The test added in commit
  4313a5b4c5 ("Detect if sed supports --in-place")
doesn't work at least on my system (autoconfig-2.69).

The issue is that SED has already been found and cached before this
function is evaluated, with the result that the test is completely
skipped.

...
checking for a sed that does not truncate output... /usr/bin/sed
...
checking for sed --in-place... (cached) /usr/bin/sed

The first test is executed by libtool.m4. This looks to have been around
in libtool for at least 15 years or so, not sure why this was not
encountered at the time of the original commit.

Fix this by caching the value of the ac_inplace flag rather than the
path to SED. Also use $SED and add AC_REQUIRE to ensure that we use the
sed that was located by the standard configure test.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10493
2020-06-24 18:19:59 -07:00
Arvind Sankar 9642beef2b Add missing third-party license files to dist
These license files were added in commit
  31b160f0a6 ("ICP: Improve AES-GCM performance")
but not added to the distributed files.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10493
2020-06-24 18:19:51 -07:00
Arvind Sankar 0b03254830 Fix tags targets in module/Makefile.in + cleanup
These targets look to have been copied from an automake-generated
Makefile.in, and can't work since none of the auto-generated automake
variables are defined here.

Moreover, ctags has been overridden in the top-level Makefile, so the
target is pointless anyway, and gtags is not a recursive target.

Fix cscopelist by moving it to the top-level Makefile as well, in line
with ctags and etags.

Also, add -a to ctags command as well, otherwise it won't work if more
than one xargs invocation takes place.

Add assembler files to ctags/etags, prune all dotted-dirs, and restrict
the find to files only.

Cleanup: add .PHONY to module/Makefile.in, and fix one recipe with a
missing continuation character.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10493
2020-06-24 18:19:44 -07:00
Arvind Sankar 2989d1012a Fix libspl/asm-generic/atomic for VPATH build
Currently, asm-generic/atomic.c is compiled into a .S file, with a
comment saying this is to simplify the upper-level Makefile.

However, this doesn't work properly with a VPATH build, which would
require better logic to deal with generated sources correctly.

It also doesn't seem more complex to just specify the .c/.S source file,
depending on the cpu, instead of only the source directory in
lib/libspl/Makefile.am, which eliminates the need to do the intermediate
compilation.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10493
2020-06-24 18:19:36 -07:00
Arvind Sankar 109d2c9310 Move zfs_gitrev.h to build directory
Currently an out-of-tree build does not work with read-only source
directory because zfs_gitrev.h can't be created. Move this file to the
build directory, which is more appropriate for a generated file, and
drop the dist-hook for zfs_gitrev.h. There is no need to distribute this
file since it will be regenerated as part of the compilation in any
case.

scripts/make_gitrev.sh tries to avoid updating zfs_gitrev.h if there has
been no change, however this doesn't cover the case when the source
directory is not in git: in that case zfs_gitrev.h gets overwritten even
though it's always "unknown". Simplify the logic to always write out a
new version of zfs_gitrev.h, compare against the old and overwrite only
if different. This is now simple enough to just include in the
Makefile, so drop the script.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10493
2020-06-24 18:19:28 -07:00
Arvind Sankar 33982eb24c Support out-of-tree kmod build on FreeBSD
If srcdir != builddir, pass down MAKEOBJDIR to the FreeBSD make to
support out-of-tree builds.

Also allow passing all the gmake options that FreeBSD make understands
to support useful flags like -k, -n, -q etc, and detect the number of
CPUs if -j was specified without an argument.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10493
2020-06-24 18:18:41 -07:00
Kevin P. Fleming f21de6883f Add trim_finish notify script for ZED
Allow users to configure notifications when TRIM operations are
completed on pools. Unlike resilver_finish and scrub_finish,
the trim_finish event is generated for each vdev in the pool
which was trimmed, so the script will generate a notification
for each one.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kevin P. Fleming <kevin@km6g.us>
Closes #10491
2020-06-24 16:57:13 -07:00
Ryan Moeller 9192f27c1d Add zfs_multihost_interval tunable handler for FreeBSD
This tunable required a handler to be implemented for
ZFS_MODULE_PARAM_CALL.

Add the handler so the tunable can be declared in common code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10490
2020-06-23 13:32:42 -07:00
Prawn 2451a55368 zfs -V: Print userland version even if kernel module not loaded
Running zfs -V when the modules are not loaded would currently 
result in the following output:

    zfs_version_kernel() failed: No such file or directory

Note the lack of userland version output.  Reorder the code to
ensure the userland version is printed even when the kmods
are not loaded.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #10483
2020-06-22 09:56:29 -07:00
Jorgen Lundman 68301ba20e zed additional features
This commit adds two features to zed, that macOS desires. The first
is that when you unload the kernel module, zed would enter into a
cpubusy loop calling zfs_events_next() repeatedly. We now look for
ENODEV, returned by kernel, so zed can exit gracefully.

Second feature is -I (idle) (alas -P persist was taken) is for the
deamon to;

1; if started without ZFS kernel module, stick around waiting for it.
2; if kernel module is unloaded, go back to 1.

This is due to daemons in macOS is started by launchctl, and is
expected to stick around.

Currently, the busy loop only exists when errno is ENODEV. This is
to ensure that functionality that upstream expects is not changed.
It did not care about errors before, and it still does not. (with the
exception of ENODEV).

However, it is probably better that all errors
(ERESTART notwithstanding) exits the loop, and the issues complaining
about zed taking all CPU will go away.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10476
2020-06-22 09:53:34 -07:00
Serapheim Dimitropoulos 42d8d1d66a Remove unnecessary terminology from error-injection in ztest
Rephrase error-injection comment in ztest to be more clear.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #10482
2020-06-22 09:48:36 -07:00
Matthew Ahrens 540493ba4f Clarify comments in config/*.m4, vdev_geom.c, zfs_allow_*.ksh
Rephrase comments to be more clear.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10481
2020-06-22 09:46:37 -07:00
Brian Behlendorf 745ace3f24 Update zts-report.py with additional tests
The following test cases may still occasionally fail and are being
added to the "maybe" list for Linux until they can be updated to be
entirely reliable.

  cli_root/zfs_rename/zfs_rename_002_pos.ksh
  cli_root/zpool_reopen/zpool_reopen_003_pos.ksh
  refreserv/refreserv_raidz

These 6 tests consistently fail only on Fedora 31+, the failures
are related to the kernel rescanning the partition table on loopback
devices which is no longer reliable unless partprobe is used.  In
order to enable the Fedora bot by default they are also being added
to the list until the tests can be updated.  Any significant regression
in functionality covered by these tests will still be detected by the
FreeBSD builders.

  alloc_class/alloc_class_009_pos
  alloc_class/alloc_class_010_pos
  cli_root/zpool_expand/zpool_expand_001_pos
  cli_root/zpool_expand/zpool_expand_005_pos
  rsend/rsend_007_pos
  rsend/rsend_010_pos
  rsend/rsend_011_pos
  snapshot/rollback_003_pos

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10489
2020-06-22 09:44:49 -07:00
Ryan Moeller 1c08fa8b5b Fix copy-paste error breaking FreeBSD head
Resolve the FreeBSD head build failure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10480
2020-06-19 15:12:34 -07:00
Andriy Gapon a8bd6dcf87 zfs allow/unallow should work with numeric uid/gid
And that should work even (especially) if there is no matching user or
group name.  The change is originally by Xin Lin <delphij@FreeBSD.org>.

Original-patch-by: Xin Li <delphij@FreeBSD.org>
Reviewed-by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed-by: Andy Stormont <astormont@racktopsystems.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Andriy Gapon <avg@FreeBSD.org>
Closes #9792 
Closes #10280
2020-06-19 10:38:43 -07:00
Ryan Moeller 2e6af52b2e Match new vfs_checkexp KPI in FreeBSD head
KPI changed in FreeBSD, update accordingly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10475
2020-06-18 13:45:36 -07:00
Arvind Sankar ae7b167a98 Enable -Wmissing-prototypes/-Wstrict-prototypes
Switch on warning flags to detect mismatch between declaration and
definition.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10470
2020-06-18 12:21:53 -07:00
Arvind Sankar c0673571d0 Switch off -Wmissing-prototypes for libgcc math functions
spl-generic.c defines some of the libgcc integer library functions on
32-bit. Don't bother checking -Wmissing-prototypes since nothing should
directly call these functions from C code.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10470
2020-06-18 12:21:46 -07:00
Arvind Sankar eebba5d8f4 Make Skein_{Get,Put}64_LSB_First inline functions
Turn the generic versions into inline functions and avoid
SKEIN_PORT_CODE trickery.

Also drop the PLATFORM_MUST_ALIGN check for using the fast bcopy
variants. bcopy doesn't assume alignment, and the userspace version is
currently different because the _ALIGNMENT_REQUIRED macro is only
defined by the kernelspace headers.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10470
2020-06-18 12:21:38 -07:00
Arvind Sankar 0ce2de637b Add prototypes
Add prototypes/move prototypes to header files.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10470
2020-06-18 12:21:32 -07:00
Arvind Sankar 60356b1a21 Add include files for prototypes
Include the header with prototypes in the file that provides definitions
as well, to catch any mismatch between prototype and definition.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10470
2020-06-18 12:21:25 -07:00
Arvind Sankar c3fe42aabd Remove dead code
Delete unused functions.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10470
2020-06-18 12:21:18 -07:00
Arvind Sankar 65c7cc49bf Mark functions as static
Mark functions used only in the same translation unit as static. This
only includes functions that do not have a prototype in a header file
either.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10470
2020-06-18 12:20:38 -07:00
Arvind Sankar 1fa5c7af33 Cleanup libzpool/kernel.c
Commit
  ec21397127 ("async zvol minor node creation interferes with receive")
replaced zvol_create_minors with zvol_create_minor and
zvol_create_minors_recursive, changing the prototype at the same time.

However the stub functions in libzpool/kernel.c were defined with the
old prototype. As the definitions are empty, this doesn't cause any
runtime issues, but an LTO build shows warnings because of the
mismatched prototypes.

Commit
  a0bd735adb ("Add support for asynchronous zvol minor operations")
removed the real zvol_remove_minor, but for some reason added a stub
implementation in libzpool/kernel.c with no references. Delete this dead
code.

Commit
  196bee4cfd ("Remove deduplicated send/receive code")
removed zfs_onexit_del_cb and zfs_onexit_cb_data. Drop the stubs as
well.

Add zvol.h include to provide prototypes, and sort the include
directives.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10470
2020-06-18 12:18:49 -07:00
adilger f734301d22 linux: add basic fallocate(mode=0/2) compatibility
Implement semi-compatible functionality for mode=0 (preallocation)
and mode=FALLOC_FL_KEEP_SIZE (preallocation beyond EOF) for ZPL.

Since ZFS does COW and snapshots, preallocating blocks for a file
cannot guarantee that writes to the file will not run out of space.
Even if the first overwrite was guaranteed, it would not handle any
later overwrite of blocks due to COW, so strict compliance is futile.
Instead, make a best-effort check that at least enough free space is
currently available in the pool (with a bit of margin), then create
a sparse file of the requested size and continue on with life.

This does not handle all cases (e.g. several fallocate() calls before
writing into the files when the filesystem is nearly full), which
would require a more complex mechanism to be implemented, probably
based on a modified version of dmu_prealloc(), but is usable as-is.

A new module option zfs_fallocate_reserve_percent is used to control
the reserve margin for any single fallocate call.  By default, this
is 110% of the requested preallocation size, so an additional 10% of
available space is reserved for overhead to allow the application a
good chance of finishing the write when the fallocate() succeeds.
If the heuristics of this basic fallocate implementation are not
desirable, the old non-functional behavior of returning EOPNOTSUPP
for calls can be restored by setting zfs_fallocate_reserve_percent=0.

The parameter of zfs_statvfs() is changed to take an inode instead
of a dentry, since no dentry is available in zfs_fallocate_common().

A few tests from @behlendorf cover basic fallocate functionality.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Issue #326
Closes #10408
2020-06-18 11:22:11 -07:00
Jorgen Lundman d553fb9b9e Avoid adding new primitives in zpool wait
zpool wait brought in sem_init() and family, which is a primitive set
not previously used in Open ZFS. It also happens to be deprecated
on macOS. Replace with phtread API calls.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10468
2020-06-18 10:44:45 -07:00
Matthew Macy 8056a75672 Disambiguate condvar API contract
On Illumos callers of cv_timedwait and cv_timedwait_hires
can't distinguish between whether or not the cv was signaled
or the call timed out. Illumos handles this (for some definition
of handles) by calling cv_signal in the return path if we were
signaled but the return value indicates instead that we timed
out. This would make sense if it were possible to query the the
cv for its net signal disposition. However, this isn't possible
and, in spite of the fact that there are places in the code that
clearly take a different and incompatible path if a timeout value
is indicated, this distinction appears to be rather subtle to most
developers. This problem is further compounded by the fact that on
Linux, calling cv_signal in the return path wouldn't even do the
right thing unless there are other waiters.

Since it is possible for the caller to independently determine how
much time is remaining but it is not possible to query if the cv
was in fact signaled, prioritizing signalling over timeout seems
like a cleaner solution. In addition, judging from usage patterns
within the code itself, it is also less error prone.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10471
2020-06-18 10:17:50 -07:00
Matthew Macy 7564073ed6 Add abd_cache_reap_now for abd_chunk_cache users
Apparently missed in the initial port integration was
the need to reap the abd_chunk_cache on FreeBSD. This
change addresses that oversight.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10474
2020-06-17 21:44:13 -07:00
Jorgen Lundman 4458157bee zfs_ioctl: saved_poolname can be truncated
As it uses kmem_strdup() and kmem_strfree() which both rely on
strlen() being the same, but saved_poolname can be truncated causing:

SPL: kernel memory allocator:
buffer freed to wrong cache
SPL: buffer was allocated from kmem_alloc_16,
SPL: caller attempting free to kmem_alloc_8.
SPL: buffer=0xffffff90acc66a38  bufctl=0x0  cache: kmem_alloc_8

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10469
2020-06-17 14:30:03 -07:00
Alexander Motin 17ca30185a Set initial arc_c to arc_c_min instead of arc_c_max
For at least 15 years since OpenSolaris arc_c was set by default to
arc_c_max, later decreased under memory pressure.  I've noticed that
if arc_c was set high enough to cause memory pressure as considered
by ZFS, setting of arc_no_grow to TRUE in arc_reap_cb_check() makes
no effect until both arc_kmem_reap_soon() and delay(reap_retry_ms)
return.  All that time ZFS can continue increasing its effective ARC
size, causing more memory pressure, potentially up to the point when
OS low memory handler activates and reduces arc_c, requesting fast
reclamation of just allocated memory.

The problem seems to be more serious on FreeBSD and I guess Linux,
since neither of them implement/use asynchronous kmem reclamation,
so arc_kmem_reap_soon() can take more time.  On older FreeBSD 11 not
supporting multiple memory domains system with lots of RAM can get
completely unresponsive for minutes due to heavy lock congestion
between ARC reclamation and page daemon kmem reclamation threads.
With this change to more conservative arc_c value ARC stops growing
just it time and does not need later reclamation.

Also while there, since now growing arc_c is a more often situation,
use aggsum_upper_bound() instead of aggsum_compare() in arc_adapt()
to reduce lock congestion.  It is also getting in sync with code in
arc_get_data_impl().

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #10437
2020-06-17 14:27:04 -07:00
Joao Carlos Mendes Luis fccaea454c Merge bash_completions changes from upstream
The current bash_completion contrib code in openzfs is very old, and
some changes have been added since.

The original repo is at https://github.com/Aneurin/zfs-bash

I've been using the original @Aneurin code since my first deploy of ZoL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: João Carlos Mendes Luís <jonny@jonny.eng.br>
Closes #10456
2020-06-16 12:27:23 -07:00
Jorgen Lundman cd07d7c83f drr_begin: can't forward declare untagged struct
When compiling with Clang++ it does not allow for untagged structs, so
struct ddr_begin needs to be declared before the struct that uses it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10453
2020-06-16 11:57:04 -07:00
Ryan Moeller 86a0f49483 FreeBSD: Kernel module should depend on xdr not krpc after 1300092
Since https://reviews.freebsd.org/D24408 FreeBSD provides XDR functions
in the xdr module instead of krpc.

For FreeBSD 13, the MODULE_DEPEND should be changed to xdr

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10442 
Closes #10443
2020-06-16 11:47:04 -07:00
Jorgen Lundman d366c8fd7a Make struct vdev_disk_t be platform private
Linux defines different vdev_disk_t members to macOS, but they are
only used in vdev_disk.c so move the declaration there.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10452
2020-06-16 11:43:33 -07:00
Matthew Ahrens ba54b180a5 Remove refences to blacklist/whitelist
These terms reinforce the incorrect notion that black is bad and white
is good.

Replace this language with more specific terms which are also more clear
and don't rely on metaphor.  Specifically:

* When vdevs are specified on the command line, they are the "selected"
vdevs.

* Entries in /dev/ which should not be considered as possible disks are
"excluded" devices.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10457
2020-06-16 11:41:45 -07:00
Brian Atkinson 0a03495e3e Fixing ABD struct allocation for FreeBSD
In the event we are allocating a gang ABD in FreeBSD we are passing 0
to abd_alloc_struct(); however, this led to an allocation of ABD scatter
with 0 chunks. This left the gang ABD allocation 24 bytes smaller than
it should have been.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #10431
2020-06-16 10:05:22 -07:00
Ryan Moeller c13facb9c4 Fix FreeBSD condvar semantics
We should return -1 instead of negative deltas, and 0 if signaled.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10460
2020-06-16 09:59:31 -07:00
Jorgen Lundman 883a40fff4 Add convenience wrappers for common uio usage
The macOS uio struct is opaque and the API must be used, this
makes the smallest changes to the code for all platforms.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10412
2020-06-14 10:09:55 -07:00
Jorgen Lundman 4f73576ea1 Upstream: zil_commit_waiter() can stall forever
On macOS clock_t is unsigned, so when cv_timedwait_hires() returns -1
we loop forever. The conditional was tweaked to ignore signedness.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10445
2020-06-14 10:08:21 -07:00
George Amanakis f2edc0078f Fix gcc10.1 truncation error
gcc10.1 complains with:

../../include/sys/dmu.h:373:24: error: ‘%s’ directive output may be
truncated writing up to 95 bytes into a region of size 75
[-Werror=format-truncation=]
  373 | #define DMU_POOL_DDT   "DDT-%s-%s-%s"
      |                        ^~~~~~~~~~~~~~
../../module/zfs/ddt.c:256:37: note: in expansion of macro
‘DMU_POOL_DDT’
  256 |  (void) snprintf(name, DDT_NAMELEN, DMU_POOL_DDT,
      |                                     ^~~~~~~~~~~~
../../include/sys/dmu.h:373:32: note: format string is defined here
  373 | #define DMU_POOL_DDT   "DDT-%s-%s-%s"
      |                                ^~
../../module/zfs/ddt.c:256:9: note: ‘snprintf’ output 7 or more bytes
(assuming 102) into a destination of size 80
  256 |  (void) snprintf(name, DDT_NAMELEN, DMU_POOL_DDT,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  257 |      zio_checksum_table[ddt->ddt_checksum].ci_name,
      |      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  258 |      ddt_ops[type]->ddt_op_name, ddt_class_name[class]);
      |      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Increasing DTT_NAMELEN fixes it.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10433
2020-06-13 11:02:00 -07:00
Ryan Moeller 499dccd69b FreeBSD: Don't require zeroing new locks before init
This has not shown to be of use enough to justify the inconvenience.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10449
2020-06-13 10:58:10 -07:00
Brian Atkinson e08b993396 Removing ZERO_PAGE abd_alloc_zero_scatter
For MIPS architectures on Linux the ZERO_PAGE macro references
empty_zero_page, which is exported as a GPL symbol. The call to
ZERO_PAGE in abd_alloc_zero_scatter has been removed and a single
zero'd page is now allocated for each of the pages in abd_zero_scatter
in the kernel ABD code path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #10428
2020-06-10 17:54:11 -07:00
Grischa Zengel 059f7c20e3 man.8: Add bookmark to list of types
While checking bash_completion I missed bookmark as type.

```
# zfs get type zpool2#b
NAME      PROPERTY  VALUE     SOURCE
zpool2#b  type      bookmark  -
```

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Grischa Zengel <github.zfsonlinux@zengel.info>
Closes #10419
2020-06-10 17:53:07 -07:00
Grischa Zengel 2bc07c6dff bash_completion: add missing attributes
There a some attributes missing which are shown in man pages:
zfs list -t type
           A comma-separated list of types to display, where type is one of filesystem, snapshot, volume, *bookmark*, or all.  For example, specifying -t snapshot displays only snapshots.
zfs get -s source
           A comma-separated list of sources to display.  Those properties coming from a source other than those in this list are ignored.  Each source must be one of the following: local, default, inherited, temporary, *received*, and none.  The default value is all sources.
zfs get -t type
           A comma-separated list of types to display, where type is one of filesystem, snapshot, volume, bookmark, or all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Grischa Zengel <github.zfsonlinux@zengel.info>
Closes #10418
2020-06-10 17:51:15 -07:00
Matthew Ahrens f66434268c Remove unnecessary references to slavery
The horrible effects of human slavery continue to impact society.  The
casual use of the term "slave" in computer software is an unnecessary
reference to a painful human experience.

This commit removes all possible references to the term "slave".

Implementation notes:

The zpool.d/slaves script is renamed to dm-deps, which uses the same
terminology as `dmsetup deps`.

References to the `/sys/class/block/$dev/slaves` directory remain.  This
directory name is determined by the Linux kernel.  Although
`dmsetup deps` provides the same information, it unfortunately requires
elevated privileges, whereas the `/sys/...` directory is world-readable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10435
2020-06-10 17:07:59 -07:00
Ryan Moeller feff3f69fc Fixup "Avoid the GEOM topology lock recursion when autoexpanding a pool"
The patch was applied to vdev_geom_open instead of vdev_geom_close by
mistake.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10427
2020-06-10 11:05:15 -07:00
Arvind Sankar 66786f7943 Fix VPATH builds for user config
cmd/zpool and lib/libzutil Makefile's use -I., which won't work with a
VPATH build. Replace it with -I$(srcdir) instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10379
Closes #10421
2020-06-10 09:25:37 -07:00
Arvind Sankar 71504277ae Cleanup linux module kbuild files
The linux module can be built either as an external module, or compiled
into the kernel, using copy-builtin. The source and build directories
are slightly different between the two cases, and currently, compiling
into the kernel still refers to some files from the configured ZFS
source tree, instead of the copies inside the kernel source tree. There
is also duplication between copy-builtin, which creates a Kbuild file to
build ZFS inside the kernel tree, and the top-level module/Makefile.in.

Fix this by moving the list of modules and the CFLAGS settings into a
new module/Kbuild.in, which will be used by the kernel kbuild
infrastructure, and using KBUILD_EXTMOD to distinguish the two cases
within the Makefiles, in order to choose appropriate include
directories etc.

Module CFLAGS setting is simplified by using subdir-ccflags-y (available
since 2.6.30) to set them in the top-level Kbuild instead of each
individual module. The disabling of -Wunused-but-set-variable is removed
from the lua and zfs modules. The variable that the Makefile uses is
actually not defined, so this has no effect; and the warning has long
been disabled by the kernel Makefile itself.

The target_cpu definition in module/{zfs,zcommon} is removed as it was
replaced by use of CONFIG_SPARC64 in
  commit 70835c5b75 ("Unify target_cpu handling")

os/linux/{spl,zfs} are removed from obj-m, as they are not modules in
themselves, but are included by the Makefile in the spl and zfs module
directories. The vestigial Makefiles in os and os/linux are removed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10379
Closes #10421
2020-06-10 09:24:15 -07:00
Andrea Gelmini dd4bc569b9 Fix typos
Correct various typos in the comments and tests.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #10423
2020-06-09 21:24:09 -07:00
Matthew Ahrens 7bcb7f0840 File incorrectly zeroed when receiving incremental stream that toggles -L
Background:

By increasing the recordsize property above the default of 128KB, a
filesystem may have "large" blocks.  By default, a send stream of such a
filesystem does not contain large WRITE records, instead it decreases
objects' block sizes to 128KB and splits the large blocks into 128KB
blocks, allowing the large-block filesystem to be received by a system
that does not support the `large_blocks` feature.  A send stream
generated by `zfs send -L` (or `--large-block`) preserves the large
block size on the receiving system, by using large WRITE records.

When receiving an incremental send stream for a filesystem with large
blocks, if the send stream's -L flag was toggled, a bug is encountered
in which the file's contents are incorrectly zeroed out.  The contents
of any blocks that were not modified by this send stream will be lost.
"Toggled" means that the previous send used `-L`, but this incremental
does not use `-L` (-L to no-L); or that the previous send did not use
`-L`, but this incremental does use `-L` (no-L to -L).

Changes:

This commit addresses the problem with several changes to the semantics
of zfs send/receive:

1. "-L to no-L" incrementals are rejected.  If the previous send used
`-L`, but this incremental does not use `-L`, the `zfs receive` will
fail with this error message:

    incremental send stream requires -L (--large-block), to match
    previous receive.

2. "no-L to -L" incrementals are handled correctly, preserving the
smaller (128KB) block size of any already-received files that used large
blocks on the sending system but were split by `zfs send` without the
`-L` flag.

3. A new send stream format flag is added, `SWITCH_TO_LARGE_BLOCKS`.
This feature indicates that we can correctly handle "no-L to -L"
incrementals.  This flag is currently not set on any send streams.  In
the future, we intend for incremental send streams of snapshots that
have large blocks to use `-L` by default, and these streams will also
have the `SWITCH_TO_LARGE_BLOCKS` feature set. This ensures that streams
from the default use of `zfs send` won't encounter the bug mentioned
above, because they can't be received by software with the bug.

Implementation notes:

To facilitate accessing the ZPL's generation number,
`zfs_space_delta_cb()` has been renamed to `zpl_get_file_info()` and
restructured to fill in a struct with ZPL-specific info including owner
and generation.

In the "no-L to -L" case, if this is a compressed send stream (from
`zfs send -cL`), large WRITE records that are being written to small
(128KB) blocksize files need to be decompressed so that they can be
written split up into multiple blocks.  The zio pipeline will recompress
each smaller block individually.

A new test case, `send-L_toggle`, is added, which tests the "no-L to -L"
case and verifies that we get an error for the "-L to no-L" case.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #6224 
Closes #10383
2020-06-09 10:41:01 -07:00
Igor K 6722be2823 ZTS: Fix add-o_ashift.ksh
Use option '-o' after action for compatibility

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #10426
2020-06-09 10:31:16 -07:00
George Amanakis b7654bd794 Trim L2ARC
The l2arc_evict() function is responsible for evicting buffers which
reference the next bytes of the L2ARC device to be overwritten. Teach
this function to additionally TRIM that vdev space before it is
overwritten if the device has been filled with data. This is done by
vdev_trim_simple() which trims by issuing a new type of TRIM,
TRIM_TYPE_SIMPLE.

We also implement a "Trim Ahead" feature. It is a zfs module parameter,
expressed in % of the current write size. This trims ahead of the
current write size. A minimum of 64MB will be trimmed. The default is 0
which disables TRIM on L2ARC as it can put significant stress to
underlying storage devices. To enable TRIM on L2ARC we set
l2arc_trim_ahead > 0.

We also implement TRIM of the whole cache device upon addition to a
pool, pool creation or when the header of the device is invalid upon
importing a pool or onlining a cache device. This is dependent on
l2arc_trim_ahead > 0. TRIM of the whole device is done with
TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t.
We save the TRIM state for the whole device and the time of completion
on-disk in the header, and restore these upon L2ARC rebuild so that
zpool status -t can correctly report them. Whole device TRIM is done
asynchronously so that the user can export of the pool or remove the
cache device while it is trimming (ie if it is too slow).

We do not TRIM the whole device if persistent L2ARC has been disabled by
l2arc_rebuild_enabled = 0 because we may not want to lose all cached
buffers (eg we may want to import the pool with
l2arc_rebuild_enabled = 0 only once because of memory pressure). If
persistent L2ARC has been disabled by setting the module parameter
l2arc_rebuild_blocks_min_l2size to a value greater than the size of the
cache device then the whole device is trimmed upon creation or import of
a pool if l2arc_trim_ahead > 0.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam D. Moss <c@yotes.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #9713
Closes #9789 
Closes #10224
2020-06-09 10:15:08 -07:00
Michael Niewöhner 32f26eaa70 Move GFP flags kernel compatibility code
Move the GFP flags kernel compat code from c file to kmem header.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #10424
2020-06-08 16:33:46 -07:00
Michael Niewöhner 080102a1b6 Linux 5.8 compat: __vmalloc()
The `pgprot` argument has been removed from `__vmalloc` in Linux 5.8,
being `PAGE_KERNEL` always now [1].

Detect this during configure and define a wrapper for older kernels.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/mm/vmalloc.c?h=next-20200605&id=88dca4ca5a93d2c09e5bbc6a62fbfc3af83c4fca

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #10422
2020-06-08 16:32:02 -07:00
Pawel Jakub Dawidek 529246df96 Restore support for in-kernel ZFS ioctls
In Illumos it is possible to call ioctl functions from within the
kernel by passing the FKIOCTL flag. Neither FreeBSD nor Linux support
that, but it doesn't hurt to keep it around, as all the code is there.

Before this commit it was a dead code and zc_iflags was always zero.
Restore this functionality by allowing to pass a flag to the
zfsdev_ioctl_common() function.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #10417
2020-06-08 13:57:22 -07:00
Pawel Jakub Dawidek 77b998fa70 Remove redundant includes
By removing excessive includes it takes us a small step close to
compiling this file in userland.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #10415
2020-06-08 09:57:36 -07:00
Paul Dagnelie b2f3709c3e Don't erase final byte of envblock
When we copy the envblock's contents out, we currently treat it as 
a normal C string. However, this functionality is supposed to more
closely emulate interacting with a file. As a consequence, we were 
incorrectly truncating the contents of the envblock by replacing 
the final byte of the buffer with a null character.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10405
2020-06-08 08:58:13 -07:00
Jorgen Lundman c9e319faae Replace sprintf()->snprintf() and strcpy()->strlcpy()
The strcpy() and sprintf() functions are deprecated on some platforms.
Care is needed to ensure correct size is used.  If some platforms
miss snprintf, we can add a #define to sprintf, likewise strlcpy().

The biggest change is adding a size parameter to zfs_id_to_fuidstr().

The various *_impl_get() functions are only used on linux and have
not yet been updated.

Reviewed by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10400
2020-06-07 11:42:12 -07:00
Ryan Moeller 60265072e0 Improve compatibility with C++ consumers
C++ is a little picky about not using keywords for names, or string
constness.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10409
2020-06-06 12:54:04 -07:00
Brian Behlendorf c1f3de18a4 ztest: Fix spa_open() ENOENT failures
The pool may not be imported when the previous pass is terminated.
In which case, spa_open() will return ENOENT to indicate the pool
is not currently imported.  Refactor to code slightly to handle
this case by importing the pool and then retrying the spa_open().

The ztest_import() function was moved before ztest_run() and the
import logic split in to a small internal helper function.  The
ztest_freeze() function was also moved but no changes were made.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10407
2020-06-06 12:51:35 -07:00
alaviss 13dd63ff81 mkfile: include missing headers
Without these headers, compilation fails on musl libc with offset_t
being undeclared and MIN being implictly declared.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hiếu Lê <leorize+oss@disroot.org>
Closes #10406
2020-06-05 17:22:10 -07:00
Brian Behlendorf 63761a8f1a zfsvfs_setup(): zap_stats_t may have undefined content when accessed (#10398)
Signed-off-by: Allan Jude <allanjude@klarasystems.com>

Co-authored-by: Allan Jude <allanjude@klarasystems.com>
2020-06-05 17:21:04 -07:00
Allan Jude 4547fc4e07 Connect dataset_kstats for FreeBSD
Expand the FreeBSD spl for kstats to support all current types

Move the dataset_kstats_t back to zvol_state_t from zfs_state_os_t
now that it is common once again

```
kstat.zfs/mypool.dataset.objset-0x10b.nunlinked: 0
kstat.zfs/mypool.dataset.objset-0x10b.nunlinks: 0
kstat.zfs/mypool.dataset.objset-0x10b.nread: 150528
kstat.zfs/mypool.dataset.objset-0x10b.reads: 48
kstat.zfs/mypool.dataset.objset-0x10b.nwritten: 134217728
kstat.zfs/mypool.dataset.objset-0x10b.writes: 1024
kstat.zfs/mypool.dataset.objset-0x10b.dataset_name: mypool/datasetname
```

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10386
2020-06-05 17:17:02 -07:00
Paul Dagnelie 99b281f1ae Fix double mutex_init bug in send code
It was possible to cause a kernel panic in the send code by 
initializing an already-initialized mutex, if a record was created 
with type DATA, destroyed with a different type (bypassing the 
mutex_destroy call) and then re-allocated as a DATA record again.

We tweak the logic to not change the type of a record once it has 
been created, avoiding the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10374
2020-06-03 19:53:21 -07:00
George Melikov 52998c7f36 Update wiki links with new address
Use direct links to new documentation resource.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10395
2020-06-03 19:46:31 -07:00
Allan Jude 7da304bbce zfsvfs_setup(): zap_stats_t may have undefined content when accessed
Signed-off-by: Allan Jude <allanjude@klarasystems.com>
2020-06-03 22:20:27 +00:00
Ryan Moeller c0eb5c35ef FreeBSD: Simplify zvol and fix locking
zvol_geom_bio_strategy should handle its own use of the zvol
suspend reader lock and ensure the zilog exists when needed.

A few other places using the zvol zilog should use the suspend
reader lock as well.

Simplify consumers of zvol_geom_bio_strategy, fix the locking, and
while in here, use the boolean_t constants with doread.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10381
2020-06-03 10:45:12 -07:00
Ryan Moeller a9dcfac51c Periodically update ARC kstats
FreeBSD needs arc_adjust_zthr to run periodically for kstats to be
updated.  A comment in the code suggests this may have been the
original intent in illumos as well:

https://github.com/openzfs/zfs/blob/c946d5a91329b075fb9bda1ac703a2e85139cf1c/module/zfs/arc.c#L4697-L4700

Create the thread with a 1 second timer.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10371
2020-06-03 09:52:38 -07:00
Jorgen Lundman 081de9a86d Restore avl_update() calls and related functions
The macOS kmem implementation uses avl_update() and related
functions.  These same function exist in the Solaris AVL code but
were removed because they were unused.  Restore them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10390
2020-06-03 09:49:32 -07:00
Matthew Macy 3bf3b164ee Fix crypto build on FreeBSD HEAD
Update API usage to reflect recent change.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10384
2020-05-30 12:54:57 -07:00
gregory-lee-bartholomew 9052e3d70b Add bootfs.snapshot and bootfs.rollback kernel parameters
Unlike other filesystems, snapshots and rollbacks of bootfs need to be
done from a rescue environment. This patch makes it possible to snap-
shot or rollback the bootfs simply by specifying bootfs.snapshot or
bootfs.rollback on the kernel command line. The operation will be
performed by dracut just before bootfs is mounted.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com> 
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #10198
2020-05-29 21:16:08 -07:00
Brian Behlendorf 3d93161b01 ztest: Fix ztest_run_zdb() failure
It's possible for ztest to be killed while the pool is exported
which results in an empty cache file.  This is a valid state to
test, but the validation check performed by ztest_run_zdb()
depends on the pool being in the cache file.  If it's not the
following error is printed.

    zdb -bccsv -G -d -Y -U /tmp/zloop-run/zpool.cache ztest
    zdb: can't open '/tmp/zloop-run': No such file or directory

Resolve these failures by removing the dependency on the cache
file.  Functionally, we only care that the pool can be imported
and that the zdb verification passes.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10385
2020-05-29 21:14:10 -07:00
allen-4 4d829ad9c1 Update zfs-functions.in
The init.d zfs-share script does not perform the intended 
action without having a variable set for ZFS_SHARE and 
ZFS_UNSHARE

Assign default values to ZFS_SHARE and ZFS_UNSHARE. Export 
the environment variables after sourcing the configuration 
file.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Signed-off-by: Allen Holl <allen.m.holl@gmail.com>
Closes #10341 
Closes #10382
2020-05-29 12:01:57 -07:00
наб a1ba120927 Always use "%lld" for formatting time_ts
Given the following test program:
  #include <time.h>
  #include <stdio.h>
  #include <stdint.h>
  int main() {
    printf("time_t:    %d\n", sizeof(time_t));
    printf("long:      %d\n", sizeof(long));
    printf("long long: %d\n", sizeof(long long));
  }

These are output on various x86 architectures:
  x32$   time_t:    8
  x32$   long:      4
  x32$   long long: 8

  amd64$ time_t:    8
  amd64$ long:      8
  amd64$ long long: 8

  i386$  time_t:    4
  i386$  long:      4
  i386$  long long: 8

Therefore code using "%l[du]" to format time_ts produced warnings on x32

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@gmail.com>
Closes #10357
Closes #844
2020-05-28 10:29:58 -07:00
наб 6059f3a1f6 Correctly handle the x32 ABI
__x86_64__ && _ILP32 => don't forcibly define _LP64

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@gmail.com>
Closes #10357
Closes #844
2020-05-28 10:28:20 -07:00
John Gallagher 50ff632787 Rework error handling in zpool_trim()
When a manual trim is run against an entire pool, errors about
particular devices which don't support trim are suppressed. This changes
zpool_trim() in libzfs so that it doesn't return an error when the only
errors are suppressed ones. An exception is made when none of the
devices support trim, in which case an error is reported and a non-zero
status is returned.

This also fixes how the --wait flag works in the presence of suppressed
errors. In particular, suppressed errors no longer cause zpool_trim()
to skip the wait.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
Closes #10263 
Closes #10372
2020-05-27 17:27:28 -07:00
Georgy Yakovlev 40f96f4643 etc/zfs/Makefile.am: set initconfdir
The initconfdir variable is not defined in etc/zfs/Makefile,
so the sed code does not perform the correct replacement.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Issue #10375 
Closes #10376
2020-05-27 17:22:19 -07:00
Ryan Moeller 4f8b2356cc ZTS: Retry export/destroy when busy in zpool_import_012
It can take a moment for the NFS server to give up the mountpoint
after unsharing a filesystem.

Use log_must_busy to retry export/destroy a few times after switching
off sharenfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10380
2020-05-27 17:18:06 -07:00
Jorgen Lundman 70a5fc0530 Memory leak in dsl_destroy_snapshots_nvl error case
The dsl_destroy_snapshots_nvl() function has an early error out, 
and temporary nvlists were not freed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10366
2020-05-26 16:13:41 -07:00
Brian Behlendorf d1b84da8c1 Revert "Let zfs mount all tolerate in-progress mounts"
This reverts commit a9cd8bf which introduced a segfault when running
`zfs mount -a` multiple times when there are mountpoints which are
not empty.  This segfault is now seen frequently by the CI after
the mount code was updated to directly call mount(2).

The original reason this logic was added is described in #8881.
Since then the systemd `zfs-share.target` has been updated to run
"After" the `zfs-mount.server` which should avoid this issue.

Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9560
Closes #10364
2020-05-26 16:07:50 -07:00
Marcel Schilling ce98ed25de Fix dead links http://list.zfsonlinux.org
Originally, I wanted to point to directly to
https://zfsonlinux.topicbox.com/groups/zfs-discuss
as the text refers to that specific mailing list, but George Melikov
requested to change it to the general to give users the overview.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Marcel Schilling <marcel.schilling@uni-luebeck.de>
Closes #10367
Closes #10369
2020-05-26 15:09:25 -07:00
Brian Behlendorf c946d5a913 ZTS: Fix zfs_mount.kshlib cleanup
Update cleanup_filesystem to use destroy_dataset when performing
cleanup.  This ensures the destroy is retried if the pool is busy
preventing occasional failures.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10358
2020-05-23 17:13:42 -07:00
Brian Atkinson fb822260b1 Gang ABD Type
Adding the gang ABD type, which allows for linear and scatter ABDs to
be chained together into a single ABD.

This can be used to avoid doing memory copies to/from ABDs. An example
of this can be found in vdev_queue.c in the vdev_queue_aggregate()
function.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian <bwa@clemson.edu>
Co-authored-by: Mark Maybee <mmaybee@cray.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #10069
2020-05-20 18:06:09 -07:00
felixdoerre 501a1511ae mount: use the mount syscall directly
Allow zfs datasets to be mounted on Linux without relying on the
invocation of an external processes.  This is the same behavior
which is implemented for FreeBSD.

Use of the libmount library was originally considered because it 
provides functionality to properly lock and update the /etc/mtab 
file.  However, these days /etc/mtab is typically a symlink to 
/proc/self/mounts so there's nothing to updated.  Therefore, we
call mount(2) directly and avoid any additional dependencies. 

If required the legacy behavior can be enabled by setting the 
ZFS_MOUNT_HELPER environment variable.  This may be needed in
environments where SELinux in enabled and the zfs binary does  
not have mount permission.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Felix Dörre <felix@dogcraft.de>
#10294
2020-05-20 18:02:41 -07:00
DeHackEd 57434abae6 Use boot_ncpus in place of max_ncpus in taskq_create
Due to hotplug support or BIOS bugs sometimes max_ncpus can be
an absurdly high value. I have a system with 32 cores/threads
but reports max_ncpus == 440. This many threads potentially
cripples the system during arc_prune floods for example.

boot_ncpus is the number of working CPUs when called so use
that instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #10282
2020-05-20 10:07:21 -07:00
Paul Dagnelie de4f06c275 Small program that converts a dataset id and an object id to a path
Small program that converts a dataset id and an object id to a path

Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10204
2020-05-20 10:05:33 -07:00
Ryan Moeller 7b0e39030c freebsd: Correct the order of arguments to copyin() for Q_SETQUOTA
Sponsored by: DARPA
External-issue: https://reviews.freebsd.org/D24656
FreeBSD-commit: freebsd/freebsd@a431c095d3

Authored by: jhb <jhb@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10344
2020-05-19 16:45:25 -07:00
George Amanakis 7cd723e685 Fix gcc 10.1 stringop-truncation error
As we do not expect the destination of these strncpy calls to be NULL
terminated, substitute them with memcpy.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10346
2020-05-19 14:24:10 -07:00
Kyle Evans 5b090d57d4 freebsd: return EISDIR for read(2) on directories
This is arguably a change for internal consistency within OpenZFS, as the
Linux implementation will reject read(2) on directories with EISDIR. It's
not unreasonable for read(2) to do something here on FreeBSD, but we don't
currently copy out anything useful anyways so start rejecting it with the
appropriate error.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #10338
2020-05-16 10:12:01 -07:00
Ryan Moeller d2782af461 Fix ZVOL_DIR
We only use ZVOL_DIR on FreeBSD, and on FreeBSD it isn't correct.

Move the definition to the file where it is needed, and define it as
/dev/zvol/.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10337
2020-05-16 10:10:38 -07:00
ColMelvin 4d6043f2b7 RPM: Remove old versions of DKMS on upgrade
Due to a mismatch between the text and a regex looking for that text,
the `%preuninstall` script would never run the `dkms remove` command
necessary to avoid corrupting the DKMS data configuration.  Increase
regex specificity to avoid this issue.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com>
Closes: #9891
Closes #10327
2020-05-14 20:51:33 -07:00
Matthew Ahrens 1b9cd1a9d9 Fix error handling in receive_writer_thread()
If `receive_writer_thread()` gets an error from `receive_process_record()`,
it should be saved in `rwa->err` so that we will stop processing records,
and the main thread will notice that the receive has failed.

When an error is first encountered, this happens correctly.  However, if
there are more records to dequeue, the next time through the loop we
will reset `rwa->err` to zero, allowing us to try to process the
following record (2 after the failed record).  Depending on what types
of records remain, we may incorrectly complete the receive
"successfully", but without actually having processed all the records.

The fix is to only set `rwa->err` if we got a *non-zero* error.

This bug was introduced by #10099 "Improve zfs receive performance by
batching writes".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10320
2020-05-14 20:48:29 -07:00
yparitcher cdcce2f019 Fix VN_OPEN_INVFS typo
The VN_OPEN_INVFS literal is in the wrong field.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: yparitcher <y@paritcher.com>
Closes #10322
2020-05-14 20:47:14 -07:00
Brian Behlendorf 2ade659eb4 Fix abd_enter/exit_critical wrappers
Commit fc551d7 introduced the wrappers abd_enter_critical() and
abd_exit_critical() to mark critical sections.  On Linux these are
implemented with the local_irq_save() and local_irq_restore() macros
which set the 'flags' argument when saving.  By wrapping them with
a function the local variable is no longer set by the macro and is
no longer properly restored.

Convert abd_enter_critical() and abd_exit_critical() to macros to
resolve this issue and ensure the flags are properly restored.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10332
2020-05-14 20:45:16 -07:00
Jorgen Lundman eeb8fae9c7 Upstream: add missing thread_exit()
Undo FreeBSD wrapper for thread_create() added to call thread_exit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10314
2020-05-14 15:58:09 -07:00
Matthew Ahrens 8b240f14f9 remove unneeded member drc_err of dmu_recv_cookie_t
The member drc_err of dmu_recv_cookie_t is used only locally in
receive_read, so we can replace it with a local variable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10319
2020-05-14 12:10:29 -07:00
Brian Behlendorf c87f958668 flake8 E741 variable name warning
Update the zts-report.py script to conform to the flake8 E741 rule.

    "Variables named I, O, and l can be very hard to read. This is
    because the letter I and the letter l are easily confused, and
    the letter O and the number 0 can be easily confused."

- https://www.flake8rules.com/rules/E741.html

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10323
2020-05-14 09:41:29 -07:00
John Wren Kennedy e1fcd940e7 ZTS: zpool_split_indirect deletes zfstest log file
The cleanup routine for this test attempts to remove some temporary
files with `rm -f $VDEV_*`, but VDEV_ is undefined. As a result, all
files in the current working directory (/var/tmp/test_results/current)
get removed instead. This includes the complete log file of all tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #10324
2020-05-14 09:39:47 -07:00
John Poduska 41035a0496 Resilver restarts unnecessarily when it encounters errors
When a resilver finishes, vdev_dtl_reassess is called to hopefully
excise DTL_MISSING (amongst other things). If there are errors during
the resilver, they are tracked in DTL_SCRUB, as spelled out in the
block comment in vdev.c. DTL_SCRUB is in-core only, so it can only
be used if the pool was online for the whole resilver. This state is
tracked with the spa_scrub_started flag, which only gets set when
the scan is initialized. Unfortunately, this flag gets cleared right
before vdev_dtl_reassess gets called, so if there are any errors
during the scan, DTL_MISSING will never get excised and the resilver
will just continually restart. This fix simply moves clearing that
flag until after the call to vdev_dtl_reasses.

In addition, if a pool is imported and already has scn_errors > 0,
this change will restart the resilver immediately instead of doing
the rest of the scan and then restarting it from the beginning. On
the other hand, if scn_errors == 0 at import, then no errors have
been encountered so far, so the spa_scrub_started flag can be safely
set.

A test has been added to verify that resilver does not restart when
relevant DTL's are available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes #10291
2020-05-13 10:54:27 -07:00
AJ Jordan b29e31d80d Fix outdated comment header
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: AJ Jordan <alex@strugee.net>
Closes #10288
2020-05-11 16:23:16 -07:00
AJ Jordan 5a04b17717 Fix up arcstat(1) to match our version
Turns out the illumos manpage, which is what this originates from, was
written for the original Perl version of the utility which is not the
version in the OpenZFS tree. *That* version originates from a Python
rewrite that was done for FreeNAS. So fix up the manpage to match what
we actually ship (and fix a few typos in the process).

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: AJ Jordan <alex@strugee.net>
Closes #10288
2020-05-11 16:22:48 -07:00
AJ Jordan ac806a2559 Import the arcstat(1m) manpage from illumos
And move it from section 1m to section 1 for consistency.

Imported from illumos commit f34d737f.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: AJ Jordan <alex@strugee.net>
Closes #10288
2020-05-11 16:22:18 -07:00
AJ Jordan 2b21da4f76 Fix inconsistent capitalization in arcstat -v
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: AJ Jordan <alex@strugee.net>
Closes #10288
2020-05-11 16:21:08 -07:00
Richard Laager 7fcf82451c Change zfsunlock for better busybox compatibility
It turns out that there are two versions of Busybox, at least on Ubuntu
18.04.  If you have the busybox-static package installed, you get a
busybox that supports `ps a` and `head`.  If you only have
busybox-initramfs, you don't.  Either way, you have `awk`.

This change should also make this compatible with GNU ps, if you somehow
end up with that in the initramfs environment.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Andrey Prokopenko <job@terem.fr>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10307
2020-05-10 12:26:08 -07:00
Brian Atkinson fc551d7efb Combine OS-independent ABD Code into Common Source File
Reorganizing ABD code base so OS-independent ABD code has been placed
into a common abd.c file. OS-dependent ABD code has been left in each
OS's ABD source files, and these source files have been renamed to
abd_os.

The OS-independent ABD code is now under:
module/zfs/abd.c
With the OS-dependent code in:
module/os/linux/zfs/abd_os.c
module/os/freebsd/zfs/abd_os.c

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #10293
2020-05-10 12:23:52 -07:00
Petros Koutoupis bd95f00d4b Fixed LDADD library links in Makefiles for cross compilation builds
When building on native dev system, there are no issues but when
cross-compiling for target system, some linker errors are observed.
The only way to avoid these errors is by adjusting the Makefile.am
of those various components to add the library dependencies.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Petros Koutoupis <petros@petroskoutoupis.com>
Closes #10304
2020-05-09 10:17:08 -07:00
Brian Behlendorf d775c86dd4 ZTS: refreserv_005_pos.ksh
When recursively destroying the dataset it's possible for the
dataset volume to be open by an unrelated process, like blkid.
Use the destroy_dataset() which will retry when this occurs.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10305
2020-05-08 13:50:02 -07:00
Andrey Prokopenko 1cc635a2dd Unlock encrypted root partition over SSH
This commit add a new feature for Debian-based distributions to unlock
encrypted root partition over SSH.  This feature is very handy on
headless NAS or VPS cloud servers.  To use this feature, you will need
to install the dropbear-initramfs package.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Andrey Prokopenko <job@terem.fr>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10027
2020-05-07 16:41:16 -07:00
Richard Laager 746d22ee02 Rework README.initramfs.markdown
This file is listed as being in Markdown format, but it didn't really
use much Markdown.  I have added a fair amount of formatting.

I have reordered and reworded things to improve the flow of the text.

Reviewed-By: Andrey Prokopenko <job@terem.fr>
Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10027
2020-05-07 16:40:00 -07:00
Richard Laager 4cfd339ce4 Cleanup contrib/initramfs automake
The initramfs hook scripts depend on Makefile.  This way, if the
substitution code is changed, they should update.  This brings it in
line with etc/init.d (which was modified to match the example in the
automake docs).

The initramfs hook script cleaning now matches etc/init.d.

There was a mix of SUBDIRS recursion and custom install rules for files
in subdirectories.  This was duplicated for the "hooks" and "scripts"
subdirectories.  Now everything uses SUBDIRS.

I fixed the substitution of DEFAULT_INITCONF_DIR for hooks/zfs.

Reviewed-By: Andrey Prokopenko <job@terem.fr>
Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10027
2020-05-07 16:39:08 -07:00
George Amanakis 657fd33bcf Improvements on persistent L2ARC
Functional changes:

We implement refcounts of log blocks and their aligned size on the
cache device along with two corresponding arcstats. The refcounts are
reflected in the header of the device and provide valuable information
as to whether log blocks are accounted for correctly. These are
dynamically adjusted as log blocks are committed/evicted. zdb also uses
this information in the device header and compares it to the
corresponding values as reported by dump_l2arc_log_blocks() which
emulates l2arc_rebuild(). If the refcounts saved in the device header
report higher values, zdb exits with an error. For this feature to work
correctly there should be no active writes on the device. This is also
employed in the tests of persistent L2ARC. We extend the structure of
the cache device header by adding the two new variables mirroring the
refcounts after the existing variables to preserve backward
compatibility in terms of persistent L2ARC.

1) a new arcstat "l2_log_blk_asize" and refcount "l2ad_lb_asize" which
   reflect the total aligned size of log blocks on the device. This is
   also reflected in the header of the cache device as "dh_lb_asize".
2) a new arcstat "l2arc_log_blk_count" and refcount "l2ad_lb_count"
   which reflect the total number of L2ARC log blocks present on cache
   devices.  It is also reflected in the header of the cache device as
   "dh_lb_count".

In l2arc_rebuild_vdev() if the amount of committed log entries in a log
block is 0 and the device header is valid we update the device header.
This will facilitate trimming of the whole device in this case when
TRIM for L2ARC is implemented.

Improve loop protection in l2arc_rebuild() by using the starting offset
of the payload of each log block instead of the starting offset of the
log block.

If the zio in l2arc_write_buffers() fails, restore the lbps array in the
header of the device to its previous state in l2arc_write_done().

If l2arc_rebuild() ends the rebuild process without restoring any L2ARC
log blocks in ARC and without any other error, this means that the lbps
array in the header is pointing to non-existent or invalid log blocks.
Reset the device header in this case.

In l2arc_rebuild() change the zfs_dbgmsg messages to
spa_history_log_internal() making them user visible with zpool history
command.

Non-functional changes:

Make the first test in persistent L2ARC use `zdb -lll` to increase
coverage in `zdb.c`.

Rename psize with asize when referring to log blocks, since
L2ARC_SET_PSIZE stores the vdev aligned size for log blocks. Also
rename dh_log_blk_entries to dh_log_entries to make it clear that
it is a mirror of l2ad_log_entries. Added comments for both changes.

Fix inaccurate comments for example in l2arc_log_blk_restore().

Add asserts at the end in l2arc_evict() and l2arc_write_buffers().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10228
2020-05-07 16:34:03 -07:00
Paul Dagnelie 108a454a46 Add support for boot environment data to be stored in the label
Modern bootloaders leverage data stored in the root filesystem to 
enable some of their powerful features. GRUB specifically has a grubenv 
file which can store large amounts of configuration data that can be 
read and written at boot time and during normal operation. This allows 
sysadmins to configure useful features like automated failover after 
failed boot attempts. Unfortunately, due to the Copy-on-Write nature 
of ZFS, the standard behavior of these tools cannot handle writing to
ZFS files safely at boot time. We need an alternative way to store 
data that allows the bootloader to make changes to the data.

This work is very similar to work that was done on Illumos to enable 
similar functionality in the FreeBSD bootloader. This patch is different 
in that the data being stored is a raw grubenv file; this file can store 
arbitrary variables and values, and the scripting provided by grub is 
powerful enough that special structures are not required to implement 
advanced behavior.

We repurpose the second padding area in each label to store the grubenv 
file, protected by an embedded checksum. We add two ioctls to get and 
set this data, and libzfs_core and libzfs functions to access them more 
easily. There are no direct command line interfaces to these functions; 
these will be added directly to the bootloader utilities.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10009
2020-05-07 09:36:33 -07:00
Philip Pokorny a36bad1759 Fix column width calculation issue with certain terminal widths
If the reported terminal width is 0 or less than 42, the signed variable
width was set to a negative number that was then assigned to the
unsigned column width becoming a huge number.

Add comments and change logic to better explain what's happening.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Philip Pokorny <ppokorny@mindspring.com>
Closes #10247
2020-05-06 17:17:38 -07:00
George Amanakis 1b664952ae Enable splitting mirrors with indirect vdevs
When a top-level vdev is removed from a pool it is converted to an
indirect vdev. Until now splitting such mirrored pools was not possible
with zpool split. This patch enables handling of indirect vdevs and
splitting of those pools with zpool split.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10283
2020-05-06 10:32:28 -07:00
Ryan Moeller ddc7a2dd3b taskq: Don't leak system_delay_taskq on FreeBSD
Adds a missing taskq_destroy() call.

Reported by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10292
2020-05-05 09:36:41 -07:00
alaviss 3e5d41d807 config/kernel-inode-times: initialize timespec
Usage of this variable uninitialized triggers -Werror,-Wuninitialized
when compiled under clang for linux kernel 5.6, leading the build system
to believe that the function is not declared.

This commit initializes the variable to suppress the warning and fix the
build for kernel 5.6 with clang.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hiếu Lê <leorize+oss@disroot.org>
Closes #10279
Closes #10281
2020-05-04 15:25:48 -07:00
Ryan Moeller 6f3e1a4828 Avoid the GEOM topology lock recursion when autoexpanding a pool
The steps to reproduce the problem:

        mdconfig -a -t swap -s 3g -u 0
        gpart create -s GPT md0
        gpart add -t freebsd-zfs -s 1g md0
        zpool create -o autoexpand=on foo md0p1
        gpart resize -i 1 -s 2g md0

Authored by: pjd <pjd@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@bccd2db598

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Ported-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10270
2020-05-04 15:10:41 -07:00
Ryan Moeller 639dfeb831 Update FreeBSD SPL atomics
Sync up with the following changes from FreeBSD:

ZFS: add emulation of atomic_swap_64 and atomic_load_64

Some 32-bit platforms do not provide 64-bit atomic operations that ZFS
requires, either in userland or at all.  We emulate those operations
for those platforms using a mutex.  That is not entirely correct and
it's very efficient.  Besides, the loads are plain loads, so torn
values are possible.

Nevertheless, the emulation seems to work for some definition of work.

This change adds atomic_swap_64, which is already used in ZFS code,
and atomic_load_64 that can be used to prevent torn reads.

Authored by: avg <avg@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@3458e5d1e6

cleanup of illumos compatibility atomics

atomic_cas_32 is implemented using atomic_fcmpset_32 on all platforms.
Ditto for atomic_cas_64 and atomic_fcmpset_64 on platforms that have
it.  The only exception is sparc64 that provides MD atomic_cas_32 and
atomic_cas_64.
This is slightly inefficient as fcmpset reports whether the operation
updated the target and that information is not needed for cas.
Nevertheless, there is less code to maintain and to add for new
platforms.  Also, the operations are done inline now as opposed to
function calls before.

atomic_add_64_nv is implemented using atomic_fetchadd_64 on platforms
that provide it.

casptr, cas32, atomic_or_8, atomic_or_8_nv are completely removed as
they have no users.

atomic_mtx that is used to emulate 64-bit atomics on platforms that
lack them is defined only on those platforms.

As a result, platform specific opensolaris_atomic.S files have lost
most of their code.  The only exception is i386 where the
compat+contrib code provides 64-bit atomics for userland use.  That
code assumes availability of cmpxchg8b instruction.  FreeBSD does not
have that assumption for i386 userland and does not provide 64-bit
atomics.  Hopefully, this can and will be fixed.

Authored by: avg <avg@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@e9642c209b

emulate illumos membar_producer with atomic_thread_fence_rel

membar_producer is supposed to be a store-store barrier.
Also, in the code that FreeBSD has ported from illumos membar_producer
is used only with regular stores to regular memory (with respect to
caching).

We do not have an MI primitive for the store-store barrier, so
atomic_thread_fence_rel is the closest we have as it provides
(load | store) -> store barrier.

Previously, membar_producer was an empty function call on all 32-bit
arm-s, 32-bit powerpc, riscv and all mips variants.  I think that it
was inadequate.
On other platforms, such as amd64, arm64, i386, powerpc64, sparc64,
membar_producer was implemented using stronger primitives than required
for a store-store barrier with respect to regular memory access.
For example, it used sfence on amd64 and lock-ed nop in i386 (despite
TSO).
On powerpc64 we now use recommended lwsync instead of eieio.
On sparc64 FreeBSD uses TSO mode.
On arm64/aarch64 we now use dmb sy instead of dmb ish.  Not sure if
this is an improvement, actually.

After this change we can drop opensolaris_atomic.S for aarch64, amd64,
powerpc64 and sparc64 as all required atomic operations have either
direct or light-weight mapping to FreeBSD native atomic operations.

Discussed with: kib
Authored by: avg <avg@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@50cdda62fc

fix up r353340, don't assume that fcmpset has strong semantics

fcmpset can have two kinds of semantics, weak and strong.
For practical purposes, strong semantics means that if fcmpset fails
then the reported current value is always different from the expected
value.  Weak semantics means that the reported current value may be the
same as the expected value even though fcmpset failed.  That's a so
called "sporadic" failure.

I originally implemented atomic_cas expecting strong semantics, but
many platforms actually have weak one.

Reported by:    pkubaj (not confirmed if same issue)
Discussed with: kib, mjg
Authored by: avg <avg@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@238787c74e

[PowerPC] [MIPS] Implement 32-bit kernel emulation of atomic64 operations

This is a lock-based emulation of 64-bit atomics for kernel use, split off
from an earlier patch by jhibbits.

This is needed to unblock future improvements that reduce the need for
locking on 64-bit platforms by using atomic updates.

The implementation allows for future integration with userland atomic64,
but as that implies going through sysarch for every use, the current
status quo of userland doing its own locking may be for the best.

Submitted by:   jhibbits (original patch), kevans (mips bits)
Reviewed by:    jhibbits, jeff, kevans
Authored by: bdragon <bdragon@FreeBSD.org>
Differential Revision:  https://reviews.freebsd.org/D22976
FreeBSD-commit: freebsd/freebsd@db39dab3a8

Remove sparc64 kernel support

Remove all sparc64 specific files
Remove all sparc64 ifdefs
Removee indireeect sparc64 ifdefs

Authored by: imp <imp@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@48b94864c5

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Ported-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10250
2020-05-04 15:07:04 -07:00
Ryan Moeller 6ed4391da9 ZTS: Count CKSUM for all vdevs in verify_pool
The verify_pool function should detect checksum errors on any vdev, but
it was only checking at the root of the pool.

Accumulate the errors for all vdevs to obtain the correct count.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10271
2020-04-30 17:50:16 -07:00
Ryan Moeller 154e48eac9 zdb: Fix ignored zfs_arc_max tuning
Running zdb -l $disk shows a warning that zfs_arc_max is being ignored.
zdb sets zfs_arc_max below zfs_arc_min, which causes the value to be
ignored by arc_tuning_update().

Set zfs_arc_min to the bare minimum in zdb, which is below zfs_arc_max.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10269
2020-04-30 17:48:58 -07:00
Paul B. Henson 0aeb0bed6f OpenZFS 6765 - zfs_zaccess_delete() comments do not accurately
reflect delete permissions for ACLs

Authored by: Kevin Crowe <kevin.crowe@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Paul B. Henson <henson@acm.org>

Porting Notes:
* Only comments are updated

OpenZFS-issue: https://www.illumos.org/issues/6765
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/da412744bc
Closes #10266
2020-04-30 11:24:55 -07:00
Paul B. Henson 235a856576 OpenZFS 6762 - POSIX write should imply DELETE_CHILD on directories
- and some additional considerations

Authored by: Kevin Crowe <kevin.crowe@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Paul B. Henson <henson@acm.org>

OpenZFS-issue: https://www.illumos.org/issues/6762
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1eb4e906ec
Closes #10266
2020-04-30 11:24:38 -07:00
Paul B. Henson 99495ba6ab OpenZFS 8984 - fix for 6764 breaks ACL inheritance
Authored by: Dominik Hassler <hadfl@omniosce.org>
Reviewed by: Sam Zaydel <szaydel@racktopsystems.com>
Reviewed by: Paul B. Henson <henson@acm.org>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Ported-by: Paul B. Henson <henson@acm.org>

OpenZFS-issue: https://www.illumos.org/issues/8984
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/e9bacc6d1a
Closes #10266
2020-04-30 11:24:27 -07:00
Paul B. Henson 5a2f527d4b OpenZFS 6764 - zfs issues with inheritance flags during chmod(2)
with aclmode=passthrough

Authored by: Albert Lee <trisk@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Paul B. Henson <henson@acm.org>

OpenZFS-issue: https://www.illumos.org/issues/6764
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/de0f1ddb59
Closes #10266
2020-04-30 11:24:14 -07:00
Paul B. Henson 7bf3e1fa0f OpenZFS 3254 - add support in zfs for aclmode=restricted
Authored-by: Paul B. Henson <henson@acm.org>
Reviewed by: Albert Lee <trisk@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Paul B. Henson <henson@acm.org>

OpenZFS-issue: https://www.illumos.org/issues/3254
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/71dbfc287c
Closes #10266
2020-04-30 11:23:59 -07:00
Paul B. Henson a1af567bb6 OpenZFS 742 - Resurrect the ZFS "aclmode" property OpenZFS 664 - Umask masking "deny" ACL entries OpenZFS 279 - Bug in the new ACL (post-PSARC/2010/029) semantics
Porting notes:
* Updated zfs_acl_chmod to take 'boolean_t isdir' as first parameter
  rather than 'zfsvfs_t *zfsvfs'
* zfs man pages changes mixed between zfs and new zfsprops man pages

Reviewed by: Aram Hvrneanu <aram@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Robert Gordon <rbg@openrbg.com>
Reviewed by: Mark.Maybee@oracle.com
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@nexenta.com>
Ported-by: Paul B. Henson <henson@acm.org>

OpenZFS-issue: https://www.illumos.org/issues/742
OpenZFS-issue: https://www.illumos.org/issues/664
OpenZFS-issue: https://www.illumos.org/issues/279
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/a3c49ce110
Closes #10266
2020-04-30 11:22:45 -07:00
Adam D. Moss d7d4678fe6 Fix regression caused by c14ca14
The 'zfs load-key' command was broken for 'keyformat=passphrase'.
Use the correct output vars when stdin is an interactive terminal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: adam moss <c@yotes.com>
Closes #10264 
Closes #10265
2020-04-29 17:33:33 -07:00
Brian Behlendorf ab16c87e55 Add longjmp support for Thumb-2
When a Thumb-2 kernel is being used, then longjmp must be implemented
using the Thumb-2 instruction set in module/lua/setjmp/setjmp_arm.S.

Original-patch-by: @jsrlabs
Reviewed-by: @awehrfritz
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7408 
Closes #9957 
Closes #9967
2020-04-29 17:30:13 -07:00
Jason King c14ca1456e Support custom URI schemes for the keylocation property
Every platform has their own preferred methods for implementing URI 
schemes beyond the currently supported file scheme (e.g. 'https' on 
FreeBSD would likely use libfetch, while Linux distros and illumos
would probably use libcurl, etc). It would be helpful if libzfs can 
be extended to support additional schemes in a simple manner.

A table of (scheme, handler_function) pairs is added to libzfs_crypto.c, 
and the existing functions in libzfs_crypto.c so that when the key 
format is ZFS_KEYFORMAT_URI, the scheme from the URI string is 
extracted, and a matching handler it located in the aforementioned 
table (returning an error if no matching handler is found). The handler 
function is then invoked to retrieve the key material (in the format 
specified by the keyformat property) and the key is loaded or the 
handler can return an error to abort the key loading process.

Reviewed by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason King <jason.king@joyent.com>
Closes #10218
2020-04-28 10:55:18 -07:00
Sara Hartse 89a6610ed0 Add more sanity testing for zdb input args
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: sara hartse <sara.hartse@delphix.com>
Closes #10243
2020-04-28 09:56:31 -07:00
George Amanakis fa25460538 Add missing zfs_refcount_destroy() in key_mapping_rele()
Otherwise when running with reference_tracking_enable=TRUE mounting
and unmounting an encrypted dataset panics with:

Call Trace:
 dump_stack+0x66/0x90
 slab_err+0xcd/0xf2
 ? __kmalloc+0x174/0x260
 ? __kmem_cache_shutdown+0x158/0x240
 __kmem_cache_shutdown.cold+0x1d/0x115
 shutdown_cache+0x11/0x140
 kmem_cache_destroy+0x210/0x230
 spl_kmem_cache_destroy+0x122/0x3e0 [spl]
 zfs_refcount_fini+0x11/0x20 [zfs]
 spa_fini+0x4b/0x120 [zfs]
 zfs_kmod_fini+0x6b/0xa0 [zfs]
 _fini+0xa/0x68c [zfs]
 __x64_sys_delete_module+0x19c/0x2b0
 do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <tcaputi@datto.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10246
2020-04-28 09:53:45 -07:00
Ryan Moeller a8085184d6 Fix zlib leak on FreeBSD
zlib_inflateEnd was accidentally a wrapper for inflateInit instead of
inflateEnd, and hilarity ensues.

Fix the typo so we free memory instead of allocating more.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10225
Closes #10252
2020-04-28 09:14:30 -07:00
alex 47c9299fcc zfs_create: round up volume size to multiple of bs
Round up the volume size requested in `zfs create -V size` to the next
higher multiple of the volblocksize. Updates the man page and adds a
test to verify the new behavior.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: puffi <puffi@users.noreply.github.com>
Signed-off-by: Alex John <alex@stty.io>
Closes #8541 
Closes #10196
2020-04-24 19:04:34 -07:00
Tom Caputi aa646323db Fix missing ivset guid with resumed raw base recv
This patch corrects a bug introduced in 61152d1069. When
resuming a raw base receive, the dmu_recv code always sets
drc->drc_fromsnapobj to the object ID of the previous
snapshot. For incrementals, this is correct, but for base
sends, this should be left at 0. The presence of this ID
eventually allows a check to run which determines whether
or not the incoming stream and the previous snapshot have
matching IVset guids. This check fails becuase it is not
meant to run when there is no previous snapshot. When it
does fail, the user receives an error stating that the
incoming stream has the problem outlined in errata 4.

This patch corrects this issue by simply ensuring
drc->drc_fromsnapobj is left as 0 for base receives.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #10234 
Closes #10239
2020-04-24 19:00:32 -07:00
Brian Behlendorf 6de3e59bdd Fix unitialized variable in zstream redup command
Fix uninitialized variable in `zstream redup` command.  The compiler
may determine the 'stream_offset' variable can be uninitialized
because not all rdt_lookup() exit paths set it.  This should never
happen in practice as documented by the assert, but initialize it
regardless to resolve the warning.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10241
Closes #10244
2020-04-23 15:54:38 -07:00
Matthew Ahrens 5d4ed9614f change libspl list member names to match kernel
This aids in debugging, so that we can use the same infrastructure to
walk zfs's list_t in the kernel module and in the userland libraries
(e.g. when debugging ztest).

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10236
2020-04-23 15:53:14 -07:00
Matthew Ahrens 196bee4cfd Remove deduplicated send/receive code
Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such
streams) are deprecated.  Deduplicated send streams can be received by
first converting them to non-deduplicated with the `zstream redup`
command.

This commit removes the code for sending and receiving deduplicated send
streams.  `zfs send -D` will now print a warning, ignore the `-D` flag,
and generate a regular (non-deduplicated) send stream.  `zfs receive` of
a deduplicated send stream will print an error message and fail.

The resulting code simplification (especially in the kernel's support
for receiving dedup streams) should help enable future performance
enhancements.

Several new tests are added which leverage `zstream redup`.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Issue #7887
Issue #10117
Issue #10156
Closes #10212
2020-04-23 10:06:57 -07:00
Joao Carlos Mendes Luis 70e5ad31f6 Fix more leaks detected by ASAN
This commit fixes a bunch of missing free() calls in a10d50f99

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: João Carlos Mendes Luís <jonny@jonny.eng.br>
Closes #10219
2020-04-22 10:40:34 -07:00
Matthew Ahrens 32d805c3e2 Use a struct to organize metaslab-group-allocator fields
Each metaslab group (of which there is one per top-level vdev) has
several (4, by default) "metaslab group allocators".  Each "allocator"
has its own metaslab that it prefers to allocate from (the "primary"
allocator), and each can perform allocations concurrently with the other
allocators.  In addition to the primary metaslab, there are several
other fields that need to be tracked separately for each allocator.
These are currently stored as several arrays in the metaslab_group_t,
each array indexed by allocator number.

This change organizes all the metaslab-group-allocator-specific fields
into a new struct, metaslab_group_allocator_t.  The metaslab_group_t now
needs only one array indexed by the allocator number - which contains
the metaslab_group_allocator_t's.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10213
2020-04-22 10:26:56 -07:00
Niklas Haas a84c92f933 Don't attempt trimming "hole" vdevs
On zpools containing hole vdevs (e.g. removed log devices), the `zpool
trim` (and presumably `zpool initialize`) commands will attempt calling
their respective functions on "hole", which fails, as this is not a real
vdev.

Avoid this by removing HOLE vdevs in zpool_collect_leaves.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Niklas Haas <git@haasn.xyz>
Closes #10227
2020-04-21 09:29:31 -07:00
Matthew Ahrens 1f043c8be1 Fix zfs send progress reporting
The progress of a send is supposed to be reported by `zfs send -v`, but
it is not.  This works by creating a new user thread (with
pthread_create()) which does ZFS_IOC_SEND_PROGRESS ioctls to check how
much progress has been made.  This IOCTL finds the specified send (since
there may be multiple concurrent sends in the system).  The IOCTL also
checks that the specified send was started by the current process.

On Linux, different threads of the same process are represented as
different `struct task_struct`s (and, confusingly, have different
PID's).  To check if if two threads are in the same process, we need to
check if they have the same `struct task_struct:group_leader`.

We used to to this correctly, but it was inadvertently changed by
30af21b025 (Redacted Send) to simply check if the current
`struct task_struct` is the one that started the send.

This commit changes the code back to checking if the send was started by
a `struct task_struct` with the same `group_leader` as the calling
thread.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Chris Wedgwood <cw@f00f.org>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10215 
Closes #10216
2020-04-20 10:12:48 -07:00
Matthew Macy c614fd6e12 Use new FreeBSD API to largely eliminate object locking
Propagate changes in HEAD that mostly eliminate object locking.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10205
2020-04-17 09:30:26 -07:00
George Amanakis 9249f1272e Persistent L2ARC minor fixes
Minor fixes on persistent L2ARC improving code readability and fixing 
a typo in zdb.c when byte-swapping a log block. It also improves the 
pesist_l2arc_007_pos.ksh test by giving it more time to retrieve log 
blocks on the cache device.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam D. Moss <c@yotes.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10210
2020-04-17 09:27:40 -07:00
Ryan Moeller a7929f3137 Update FreeBSD tunables
Remove some obsolete legacy compat, rename some misnamed, and add some
missing tunables for FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10203
2020-04-15 11:14:47 -07:00
Ryan Moeller af99094dee Don't delete freebsd.run in distclean
Add a comment so the file is not empty.

The comment can be removed when FreeBSD-specific tests are added.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10206
2020-04-15 09:21:40 -07:00
Ryan Moeller 813c8564ee Fix SC2086 note in zpool.d/smart
./cmd/zpool/zpool.d/smart:78:32:
note: Double quote to prevent globbing and word splitting. [SC2086]

Reported by latest shellcheck on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10194
2020-04-14 13:18:23 -07:00
alaviss 6b1139e82c sys/mnttab.h: include sys/stat.h for stat64
Musl libc defined `stat64` as a macro, which causes the build to fail
upon compiling os/linux/getmntany.c due to conflicts between the forward
declaration and the implementation.

This commit fixes that by including <sys/stat.h> in "sys/mnttab.h"
directly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hiếu Lê <leorize+oss@disroot.org>
Closes #10195
2020-04-14 11:47:40 -07:00
Matthew Macy 9f0a21e641 Add FreeBSD support to OpenZFS
Add the FreeBSD platform code to the OpenZFS repository.  As of this
commit the source can be compiled and tested on FreeBSD 11 and 12.
Subsequent commits are now required to compile on FreeBSD and Linux.
Additionally, they must pass the ZFS Test Suite on FreeBSD which is
being run by the CI.  As of this commit 1230 tests pass on FreeBSD
and there are no unexpected failures.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #898 
Closes #8987
2020-04-14 11:36:28 -07:00
Joao Carlos Mendes Luis 75c62019f3 Fix allocation errors, detected using ASAN
The test for VDEV_TYPE_INDIRECT is done after a memory allocation, and
could return from function without freeing it.  Since we don't need that
allocation yet, just postpone it.

Add a missing free() when buffer is no longer needed.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: João Carlos Mendes Luís <jonny@jonny.eng.br>
Closes #10193
2020-04-13 10:54:41 -07:00
Brian Behlendorf 791e480c6a Disable user space reference tracking
The memory and cpu cost of reference count tracking with the current
implementation is significant.  For this reason it has always been
disabled by default for the kmods.  Apply this same default to user
space so ztest doesn't always incur this performance penalty.

Our intention is to re-enable this by default for ztest once the code
has been optimized.  Since we expect to at some point provide a FUSE
implementation we wouldn't want this enabled by default for libzpool.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10189
2020-04-13 10:51:44 -07:00
alex c602b35cf7 ZTS: Fix and change testcase cache_010_neg
Commit 379ca9c removed the requirement on aux devices to be block
devices only but the test case cache_010_neg was not updated, making it
fail consistently.

This change changes the test to check that cache devices _can_ be
anything that presents a block interface. The testcase is renamed to
cache_010_pos and the exceptions for known failure removed from the test
runner.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Alex John <alex@stty.io>
Closes #10172
2020-04-13 10:50:41 -07:00
Matthew Ahrens 20f287855a zvol_write() can use dmu_tx_hold_write_by_dnode()
We can improve the performance of writes to zvols by using
dmu_tx_hold_write_by_dnode() instead of dmu_tx_hold_write().  This
reduces lock contention on the first block of the dnode object, and also
reduces the amount of CPU needed.  The benefit will be highest with
multi-threaded async writes (i.e. writes that don't call zil_commit()).

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10184
2020-04-10 21:14:01 -07:00
Brian Behlendorf 8080848254 Minor zstream redup command fixes
* Fix uninitialized variable in `zstream redup` command.  The
  'rdt.ddt_count' variable is uninitialized because it was
  allocated from the stack and not globally.  Initialize it.
  This was reported by gcc when compiling with debugging enabled.

    zstream_redup.c:157:16: error: 'rdt.ddt_count' may be used
    uninitialized in this function [-Werror=maybe-uninitialized]

* Remove the cmd/zstreamdump/.gitignore file.  It's no longer
  needed now that the zstreamdump command is a script.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10192
2020-04-10 21:10:09 -07:00
Matthew Ahrens c618f87cd2 Add zstream redup command to convert deduplicated send streams
Deduplicated send and receive is deprecated.  To ease migration to the
new dedup-send-less world, the commit adds a `zstream redup` utility to
convert deduplicated send streams to normal streams, so that they can
continue to be received indefinitely.

The new `zstream` command also replaces the functionality of
`zstreamdump`, by way of the `zstream dump` subcommand.  The
`zstreamdump` command is replaced by a shell script which invokes
`zstream dump`.

The way that `zstream redup` works under the hood is that as we read the
send stream, we build up a hash table which maps from `<GUID, object,
offset> -> <file_offset>`.

Whenever we see a WRITE record, we add a new entry to the hash table,
which indicates where in the stream file to find the WRITE record for
this block. (The key is `drr_toguid, drr_object, drr_offset`.)

For entries other than WRITE_BYREF, we pass them through unchanged
(except for the running checksum, which is recalculated).

For WRITE_BYREF records, we change them to WRITE records.  We find the
referenced WRITE record by looking in the hash table (for the record
with key `drr_refguid, drr_refobject, drr_refoffset`), and then reading
the record header and payload from the specified offset in the stream
file.  This is why the stream can not be a pipe.  The found WRITE record
replaces the WRITE_BYREF record, with its `drr_toguid`, `drr_object`,
and `drr_offset` fields changed to be the same as the WRITE_BYREF's
(i.e. we are writing the same logical block, but with the data supplied
by the previous WRITE record).

This algorithm requires memory proportional to the number of WRITE
records (same as `zfs send -D`), but the size per WRITE record is
relatively low (40 bytes, vs. 72 for `zfs send -D`).  A 1TB send stream
with 8KB blocks (`recordsize=8k`) would use around 5GB of RAM to
"redup".

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10124 
Closes #10156
2020-04-10 10:39:55 -07:00
George Amanakis 77f6826b83 Persistent L2ARC
This commit makes the L2ARC persistent across reboots. We implement
a light-weight persistent L2ARC metadata structure that allows L2ARC
contents to be recovered after a reboot. This significantly eases the
impact a reboot has on read performance on systems with large caches.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Saso Kiselkov <skiselkov@gmail.com>
Co-authored-by: Jorgen Lundman <lundman@lundman.net>
Co-authored-by: George Amanakis <gamanakis@gmail.com>
Ported-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #925 
Closes #1823 
Closes #2672 
Closes #3744 
Closes #9582
2020-04-10 10:33:35 -07:00
Ryan Moeller 36a6e2335c Don't ignore zfs_arc_max below allmem/32
Set arc_c_min before arc_c_max so that when zfs_arc_min is set lower
than the default allmem/32 zfs_arc_max can also be set lower.

Add warning messages when tunables are being ignored.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10157
Closes #10158
2020-04-09 15:39:48 -07:00
Matthew Macy 8b27e08ed8 Add separate field for indicating that spa is in middle of split
By default it's not possible to open a device already owned by an
active vdev. It's necessary to make an exception to this for vdev
split. The FreeBSD platform code will make an exception if
spa_is splitting is set to to true.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10178
2020-04-09 09:59:31 -07:00
Brian Behlendorf 68dde63d13 Linux 5.7 compat: blk_alloc_queue()
Commit https://github.com/torvalds/linux/commit/3d745ea5 simplified
the blk_alloc_queue() interface by updating it to take the request
queue as an argument.  Add a wrapper function which accepts the new
arguments and internally uses the available interfaces.

Other minor changes include increasing the Linux-Maximum to 5.6 now
that 5.6 has been released.  It was not bumped to 5.7 because this
release has not yet been finalized and is still subject to change.

Added local 'struct zvol_state_os *zso' variable to zvol_alloc.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10181 
Closes #10187
2020-04-09 09:16:46 -07:00
Matthew Macy 01c4f2bf29 Use vn_io_fault_uiomove on FreeBSD to avoid potential deadlock
Added to prevent a possible deadlock, the following comments from
FreeBSD explain the issue.  The comment describing vn_io_fault_uiomove:

/*
 * Helper function to perform the requested uiomove operation using
 * the held pages for io->uio_iov[0].iov_base buffer instead of
 * copyin/copyout.  Access to the pages with uiomove_fromphys()
 * instead of iov_base prevents page faults that could occur due to
 * pmap_collect() invalidating the mapping created by
 * vm_fault_quick_hold_pages(), or pageout daemon, page laundry or
 * object cleanup revoking the write access from page mappings.
 *
 * Filesystems specified MNTK_NO_IOPF shall use vn_io_fault_uiomove()
 * instead of plain uiomove().
 */

This used for vn_io_fault which has the following motivation:

/*
 * The vn_io_fault() is a wrapper around vn_read() and vn_write() to
 * prevent the following deadlock:
 *
 * Assume that the thread A reads from the vnode vp1 into userspace
 * buffer buf1 backed by the pages of vnode vp2.  If a page in buf1 is
 * currently not resident, then system ends up with the call chain
 *   vn_read() -> VOP_READ(vp1) -> uiomove() -> [Page Fault] ->
 *     vm_fault(buf1) -> vnode_pager_getpages(vp2) -> VOP_GETPAGES(vp2)
 * which establishes lock order vp1->vn_lock, then vp2->vn_lock.
 * If, at the same time, thread B reads from vnode vp2 into buffer buf2
 * backed by the pages of vnode vp1, and some page in buf2 is not
 * resident, we get a reversed order vp2->vn_lock, then vp1->vn_lock.
 *
 * To prevent the lock order reversal and deadlock, vn_io_fault() does
 * not allow page faults to happen during VOP_READ() or VOP_WRITE().
 * Instead, it first tries to do the whole range i/o with pagefaults
 * disabled. If all pages in the i/o buffer are resident and mapped,
 * VOP will succeed (ignoring the genuine filesystem errors).
 * Otherwise, we get back EFAULT, and vn_io_fault() falls back to do
 * i/o in chunks, with all pages in the chunk prefaulted and held
 * using vm_fault_quick_hold_pages().
 *
 * Filesystems using this deadlock avoidance scheme should use the
 * array of the held pages from uio, saved in the curthread->td_ma,
 * instead of doing uiomove().  A helper function
 * vn_io_fault_uiomove() converts uiomove request into
 * uiomove_fromphys() over td_ma array.
 *
 * Since vnode locks do not cover the whole i/o anymore, rangelocks
 * make the current i/o request atomic with respect to other i/os and
 * truncations.
 */

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10177
2020-04-08 10:30:27 -07:00
Ryan Moeller 7e3df9db12 Finish refactoring for ZFS_MODULE_PARAM_CALL
Linux and FreeBSD have different parameters for tunable proc handler.
This has prevented FreeBSD from implementing the ZFS_MODULE_PARAM_CALL
macro.

To complete the sharing of ZFS_MODULE_PARAM_CALL declarations, create
per-platform definitions of the parameter list, ZFS_MODULE_PARAM_ARGS.

With the declarations wired up we discovered an incorrect scope prefix
for spa_slop_shift, so this is now fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10179
2020-04-07 10:06:22 -07:00
alex 2a15c6aab4 libzfs_pool: Remove unused check for ENOTBLK
Commit 379ca9c removed the check on aux devices to be block devices also
changing zfs_ioctl(hdl, ZFS_IOC_VDEV_ADD, ...) and
zfs_ioctl(hdl, ZFS_IOC_POOL_CREATE, ...) to never set ENOTBLK. This
change removes the dangling check for ENOTBLK that will never trigger.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Alex John <alex@stty.io>
Closes #10173
2020-04-07 10:04:40 -07:00
Ryan Moeller 4a21ec0560 ZTS: Fix non-portable date format
The delegate tests use `date(1)` to generate snapshot names, using
the format '%F-%T-%N' to get nanosecond resolution (since multiple
snapshots may be taken in the same second).  '%N' is not portable, and
causes tests to fail on FreeBSD.

Since the only purpose these timestamps serve is to create a unique
name, simply use $RANDOM instead.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10170
2020-04-06 16:07:35 -07:00
Paul Dagnelie 5a42ef04fd Add 'zfs wait' command
Add a mechanism to wait for delete queue to drain.

When doing redacted send/recv, many workflows involve deleting files 
that contain sensitive data. Because of the way zfs handles file 
deletions, snapshots taken quickly after a rm operation can sometimes 
still contain the file in question, especially if the file is very 
large. This can result in issues for redacted send/recv users who 
expect the deleted files to be redacted in the send streams, and not 
appear in their clones.

This change duplicates much of the zpool wait related logic into a 
zfs wait command, which can be used to wait until the internal
deleteq has been drained.  Additional wait activities may be added 
in the future. 

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9707
2020-04-01 10:02:06 -07:00
Fabio Scaccabarozzi c9e3efdb3a Bugfix/fix uio partial copies
In zfs_write(), the loop continues to the next iteration without
accounting for partial copies occurring in uiomove_iov when 
copy_from_user/__copy_from_user_inatomic return a non-zero status.
This results in "zfs: accessing past end of object..." in the 
kernel log, and the write failing.

Account for partial copies and update uio struct before returning
EFAULT, leave a comment explaining the reason why this is done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: ilbsmart <wgqimut@gmail.com>
Signed-off-by: Fabio Scaccabarozzi <fsvm88@gmail.com>
Closes #8673 
Closes #10148
2020-04-01 09:48:54 -07:00
Matthew Ahrens 0929c4de39 Improve ZVOL sync write performance by using a taskq
== Summary ==

Prior to this change, sync writes to a zvol are processed serially.
This commit makes zvols process concurrently outstanding sync writes in
parallel, similar to how reads and async writes are already handled.
The result is that the throughput of sync writes is tripled.

== Background ==

When a write comes in for a zvol (e.g. over iscsi), it is processed by
calling `zvol_request()` to initiate the operation.  ZFS is expected to
later call `BIO_END_IO()` when the operation completes (possibly from a
different thread).  There are a limited number of threads that are
available to call `zvol_request()` - one one per iscsi client (unless
using MC/S).  Therefore, to ensure good performance, the latency of
`zvol_request()` is important, so that many i/o operations to the zvol
can be processed concurrently.  In other words, if the client has
multiple outstanding requests to the zvol, the zvol should have multiple
outstanding requests to the storage hardware (i.e. issue multiple
concurrent `zio_t`'s).

For reads, and async writes (i.e. writes which can be acknowledged
before the data reaches stable storage), `zvol_request()` achieves low
latency by dispatching the bulk of the work (including waiting for i/o
to disk) to a taskq.  The taskq callback (`zvol_read()` or
`zvol_write()`) blocks while waiting for the i/o to disk to complete.
The `zvol_taskq` has 32 threads (by default), so we can have up to 32
concurrent i/os to disk in service of requests to zvols.

However, for sync writes (i.e. writes which must be persisted to stable
storage before they can be acknowledged, by calling `zil_commit()`),
`zvol_request()` does not use `zvol_taskq`.  Instead it blocks while
waiting for the ZIL write to disk to complete.  This has the effect of
serializing sync writes to each zvol.  In other words, each zvol will
only process one sync write at a time, waiting for it to be written to
the ZIL before accepting the next request.

The same issue applies to FLUSH operations, for which `zvol_request()`
calls `zil_commit()` directly.

== Description of change ==

This commit changes `zvol_request()` to use
`taskq_dispatch_ent(zvol_taskq)` for sync writes, and FLUSh operations.
Therefore we can have up to 32 threads (the taskq threads)
simultaneously calling `zil_commit()`, for a theoretical performance
improvement of up to 32x.

To avoid the locking issue described in the comment (which this commit
removes), we acquire the rangelock from the taskq callback (e.g.
`zvol_write()`) rather than from `zvol_request()`.  This applies to all
writes (sync and async), reads, and discard operations.  This means that
multiple simultaneously-outstanding i/o's which access the same block
can complete in any order.  This was previously thought to be incorrect,
but a review of the block device interface requirements revealed that
this is fine - the order is inherently not defined.  The shorter hold
time of the rangelock should also have a slight performance improvement.

For an additional slight performance improvement, we use
`taskq_dispatch_ent()` instead of `taskq_dispatch()`, which avoids a
`kmem_alloc()` and eliminates a failure mode.  This applies to all
writes (sync and async), reads, and discard operations.

== Performance results ==

We used a zvol as an iscsi target (server) for a Windows initiator
(client), with a single connection (the default - i.e. not MC/S).

We used `diskspd` to generate a workload with 4 threads, doing 1MB
writes to random offsets in the zvol.  Without this change we get
231MB/s, and with the change we get 728MB/s, which is 3.15x the original
performance.

We ran a real-world workload, restoring a MSSQL database, and saw
throughput 2.5x the original.

We saw more modest performance wins (typically 1.5x-2x) when using MC/S
with 4 connections, and with different number of client threads (1, 8,
32).

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10163
2020-03-31 10:50:44 -07:00
George Amanakis 37c22948e5 Reset l2ad_hand and l2ad_first in l2arc_evict
Increasing l2arc_write_size or l2arc_write_boost can result in
l2arc_write_buffers() not having enough space to perform its writes and
panic zio_write_phys().

Instead of resetting l2ad_hand to l2ad_start at the end of
l2arc_write_buffers() and not taking into account a possible
user-mediated increase of l2arc_write_max, we do this in l2arc_evict(),
right after l2arc_write_size() has run. If there is not enough space to
evict (ie we will exceed l2ad_end) we evict to the end of the device,
reset l2ad_hand to l2ad_start, set l2ad_first to 0 and iterate
l2arc_evict(). We avoid infinite iteration of l2arc_evict() by making
sure in l2arc_write_size() that l2ad_start + size does not exceed
l2ad_end.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10154
2020-03-31 10:46:48 -07:00
Ryan Moeller c96a32e1a3 ZTS: Skip udev actions in zvol_misc when not Linux
udev is only used on Linux.

Skip udev_wait and udev_cleanup when not on Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10165
2020-03-31 10:35:14 -07:00
Ryan Moeller 9a51738b60 Let default arc_c_max be platform dependent
Linux changed the default max ARC size to 1/2 of physical memory to
deal with shortcomings of the Linux SLUB allocator.  Other platforms
do not require the same logic.

Implement an arc_default_max() function to determine a default max ARC
size in platform code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10155
2020-03-27 09:14:46 -07:00
Matthew Ahrens 3f38797338 Compile cityhash code into libzfs
Make the cityhash code compile into libzfs, in preparation for the new
"zstream" command.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10152
2020-03-27 09:11:22 -07:00
Ryan Moeller ef3331e703 ZTS: Wait for free space between quota tests
And in removal tests, sync the specific pool we are waiting on.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10146
2020-03-26 10:48:19 -07:00
Dirkjan Bussink 112c1bff94 Remove checks for null out value in encryption paths
These paths are never exercised, as the parameters given are always
different cipher and plaintext `crypto_data_t` pointers.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fueloep <attila@fueloep.org>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Closes #9661 
Closes #10015
2020-03-26 10:41:57 -07:00
alex 1d2ddb9bb9 zfs_get: change time format string from %k to %H
Issue #10090 reported that snapshots created between midnight and 1 AM
are missing a padded zero in the creation property

This change fixes the bug reported in issue #10090 where snapshots
created between midnight and 1 AM were missing a padded zero in the
creation timestamp output.

The leading zero was missing because the time format string used `%k`
which formats the hour as a decimal number from 0 to 23 where single
digits are preceded by blanks[0] and is fixed by changing it to `%H`
which formats the hour as 00-23.

The difference in output is as below

```
-Thu Mar 26  0:39 2020
+Thu Mar 26 00:39 2020
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Alex John <alex@stty.io>
Closes #10090 
Closes #10153
2020-03-26 08:28:22 -07:00
Matthew Ahrens 652bdc9b0e Deprecate deduplicated send streams
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #7887 
Closes #10117
2020-03-18 13:31:10 -07:00
Ryan Moeller 22df2457a7 Avoid core dump on invalid redaction bookmark
libzfs aborts and dumps core on EINVAL from the kernel when trying to
do a redacted send with a bookmark that is not a redaction bookmark.

Move redacted bookmark validation into libzfs.

Check if the bookmark given for redactions is actually a redaction
bookmark.  Print an error message and exit gracefully if it is not.

Don't abort on EINVAL in zfs_send_one.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10138
2020-03-18 12:54:12 -07:00
Avatat 4df8b2c373 Changed decimals to integers in the arcstat script
Changed interval value type from decimal to integer,
because of deprecation warning in Python 3.8 and above.
Also changed kstat values type from decimal to integer,
because all the values are integers.

Fixed behavior of arcstat when run without args.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Bartosz Zieba <bartosz@zieba.pro>
Closes #10132 
Closes #10142
2020-03-18 11:50:45 -07:00
Brian Behlendorf 5351951274 Fix zfs_rmnode() unlink / rollback issue
If a has rollback has occurred while a file is open and unlinked.
Then when the file is closed post rollback it will not exist in the
rolled back version of the unlinked object.  Therefore, the call to
zap_remove_int() may correctly return ENOENT and should be allowed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6812 
Closes #9739
2020-03-18 11:47:07 -07:00
Brian Behlendorf 6b7028ec51 Fix cstyle warnings
Fix minor cstyle warnings accidentally introduced by 7145123b.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10143
2020-03-17 15:42:27 -07:00
Paul Dagnelie 7145123b0a Separate warning for incomplete and corrupt streams
This change adds a separate return code to zfs_ioc_recv that is used 
for incomplete streams, in addition to the existing return code for 
streams that contain corruption.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10122
2020-03-17 10:30:33 -07:00
Attila Fülöp 5b3b79559c ICP: gcm-avx: Support architectures lacking the MOVBE instruction
There are a couple of x86_64 architectures which support all needed
features to make the accelerated GCM implementation work but the
MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge
and AMD Bulldozer, Piledriver, and Steamroller.

By using MOVBE only if available and replacing it with a MOV
followed by a BSWAP if not, those architectures now benefit from
the new GCM routines and performance is considerably better
compared to the original implementation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam D. Moss <c@yotes.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Followup #9749 
Closes #10029
2020-03-17 10:24:38 -07:00
Mariusz Zaborski a57d3d45d6 Add option for forcible unmounting dataset while receiving snapshot.
Currently when the dataset is in use we can't receive snapshots.

    zfs send test/1@asd | zfs recv -FM test/2
    cannot unmount '/test/2': Device busy

This commits add option 'M' which attempts to forcibly unmount the
dataset.  Thanks to this we can enforce receiving snapshots in a
single step.

Note that this functionality is not supported on Linux because the
VFS will prevent active mounted filesystems from being unmounted,
even with the force option.  This is the intended VFS behavior.

Test cases were added to verify the expected behavior based on
the platform.

Discussed-with: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
External-issue: https://reviews.freebsd.org/D22306
Closes #9904
2020-03-17 10:08:32 -07:00
Ryan Moeller 80d98a8f3a ZTS: Use default_cleanup_noexit where needed
And add log_pass appropriately.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10136
2020-03-17 09:55:18 -07:00
Ryan Moeller e0d3284bc9 Exit status 256+signum is actually baked in to ksh
While #10121 did fix the signal numbers for FreeBSD/Darwin, it
incorrectly changed the expected encoding of exit status for commands
that exited on a signal.  The encoding 256+signum is a feature of the
shell.  Only the signal numbers themselves are platform-dependent.

Always use the encoding 256+signum when checking exit status for
signal exits.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10137
2020-03-17 09:49:58 -07:00
Ryan Moeller 4d32abaa87 libzfs: Fix bounds checks for float parsing
UINT64_MAX is not exactly representable as a double.

The closest representation is UINT64_MAX + 1, so we can use a >=
comparison instead of > for the bounds check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10127
2020-03-16 11:56:29 -07:00
Matthew Ahrens 7261fc2e81 Improve zfs receive performance by batching writes
For each WRITE record in the stream, `zfs receive` creates a DMU
transaction (`dmu_tx_create()`) and writes this block's data into the
object.  If per-block overheads (as opposed to per-byte overheads)
dominate performance (as is often the case with small recordsize), the
per-dmu-transaction overheads can be significant.  For example, in some
workloads the `receieve_writer` thread is 100% on CPU, and more than
half of its CPU time is in these per-tx routines (e.g.
dmu_tx_hold_write, dmu_tx_assign, dmu_tx_commit).

To improve performance of `zfs receive`, this commit batches WRITE
records which are to nearby offsets of the same object, and uses one DMU
transaction to write them all.  By default the batch size is 1MB, which
for recordsize=8K reduces the number of DMU transactions by 128x for
full send streams (incrementals will depend on how "clumpy" the changed
blocks are).

This commit improves the performance of `dd if=stream | zfs recv`
from 78,800 blocks/sec to 98,100 blocks/sec (25% improvement).

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10099
2020-03-16 11:51:56 -07:00
Brian Behlendorf c94fb10917 Remove CI builder customization from TEST
The default options are reasonable for all of the CI builders.

* TEST_XFSTESTS_SKIP=yes  - This is already the default.
* TEST_ZTEST_TIMEOUT=3600 - Increased ztest run time only increases
  code coverage by a small degree.  Default 900s runs are sufficient.
* Disabling certain tests on 32-bit builders is no longer needed.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10129
2020-03-16 10:46:03 -07:00
Ryan Moeller d3fe62cb35 ZTS: Update flaky tests in zts-report
Some tests which pass on FreeBSD but fail on Linux had been put in the
"maybe" set.  Move these back to "known" under an "if Linux" check so
the expected outcome is clear.

Add some tests that have been found to be flaky on FreeBSD stable/12
to the "maybe" set.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10120
2020-03-13 09:29:10 -07:00
Matthew Ahrens 0fdd6106bb dmu_objset_from_ds must be called with dp_config_rwlock held
The normal lock order is that the dp_config_rwlock must be held before
the ds_opening_lock.  For example, dmu_objset_hold() does this.
However, dmu_objset_open_impl() is called with the ds_opening_lock held,
and if the dp_config_rwlock is not already held, it will attempt to
acquire it.  This may lead to deadlock, since the lock order is
reversed.

Looking at all the callers of dmu_objset_open_impl() (which is
principally the callers of dmu_objset_from_ds()), almost all callers
already have the dp_config_rwlock.  However, there are a few places in
the send and receive code paths that do not.  For example:
dsl_crypto_populate_key_nvlist, send_cb, dmu_recv_stream,
receive_write_byref, redact_traverse_thread.

This commit resolves the problem by requiring all callers ot
dmu_objset_from_ds() to hold the dp_config_rwlock.  In most cases, the
code has been restructured such that we call dmu_objset_from_ds()
earlier on in the send and receive processes, when we already have the
dp_config_rwlock, and save the objset_t until we need it in the middle
of the send or receive (similar to what we already do with the
dsl_dataset_t).  Thus we do not need to acquire the dp_config_rwlock in
many new places.

I also cleaned up code in dmu_redact_snap() and send_traverse_thread().

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #9662
Closes #10115
2020-03-12 10:55:02 -07:00
Alexander Motin fa130e010c Fix infinite scan on a pool with only special allocations
Attempt to run scrub or resilver on a new pool containing only special
allocations (special vdev added on creation) caused infinite loop
because of dsl_scan_should_clear() limiting memory usage to 5% of pool
size, which it calculated accounting only normal allocation class.

Addition of special and just in case dedup classes fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #10106 
Closes #8694
2020-03-12 10:52:03 -07:00
Ryan Moeller 94eb65b4c1 ZTS: Use correct signal numbers for status checks
Different operating systems encode exit status in different ways.
The logapi shell library assumes the Solaris meaning of exit codes,
which is not correct on other platforms.

Define the needed constants according to the platform we are running
on and use those to decode process exit status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10121
2020-03-12 10:50:51 -07:00
Ryan Moeller cdbc34fc2b ZTS: Test boundary conditions in alloc_class_012
Issue #9142 describes an error in the checks for device removal that
can prevent removal of special allocation class vdevs in some
situations.

Enhance alloc_class/alloc_class_012_pos to check situations where this
bug occurs.

Update zts-report with knowledge of issue #9142.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10116 
Issue #9142
2020-03-12 10:50:01 -07:00
Ryan Moeller e70b127e05 ZTS: Wait for free space between write_dirs tests
Cleanup for write_dirs involves destroying a dataset filling a pool
and then recreating the dataset for the next test.  Due to the
asynchronous nature of free space accounting, recreating the dataset
can fail for lack of space, causing problems for the next test.

Add wait_freeing $TESTPOOL to wait for the space to be freed and then
sync_pool $TESTPOOL to update the space accounting before attempting
to recreate the test filesystem.

Only use a single disk to create the pool.  Make it a small file so it
does not take too long to fill.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10112
2020-03-12 10:48:46 -07:00
John Poduska e6b28efccc Prevent race condition in dnode_dest (#10101)
dnode_special_close() waits for the refcount of dn_holds to go to zero
without holding the dn_mtx. dnode_rele_and_unlock() does the final
remove to dn_holds with dn_mtx being held:

	refs = zfs_refcount_remove(&dn->dn_holds, tag);
	mutex_exit(&dn->dn_mtx);

So, there is a race condition after the remove until dn_mtx is
dropped. During that time, dnode_destroy() can get called, which ends
up in dnode_dest() calling mutex_destroy() and a panic since the lock
is still held.

This change adds a condvar to wait for the final dnode_rele_and_unlock()
to release the dn_mtx before calling dnode_destroy().

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes #7814
Closes #10101
2020-03-12 10:25:56 -07:00
Mark Roper 1e9231ada8 Prevent deadlock in arc_read in Linux memory reclaim callback
Using zfs with Lustre, an arc_read can trigger kernel memory allocation
that in turn leads to a memory reclaim callback and a deadlock within a
single zfs process. This change uses spl_fstrans_mark and
spl_trans_unmark to prevent the reclaim attempt and the deadlock
(https://zfsonlinux.topicbox.com/groups/zfs-devel/T4db2c705ec1804ba).
The stack trace observed is:

    __schedule at ffffffff81610f2e
    schedule at ffffffff81611558
    schedule_preempt_disabled at ffffffff8161184a
    __mutex_lock at ffffffff816131e8
    arc_buf_destroy at ffffffffa0bf37d7 [zfs]
    dbuf_destroy at ffffffffa0bfa6fe [zfs]
    dbuf_evict_one at ffffffffa0bfaa96 [zfs]
    dbuf_rele_and_unlock at ffffffffa0bfa561 [zfs]
    dbuf_rele_and_unlock at ffffffffa0bfa32b [zfs]
    osd_object_delete at ffffffffa0b64ecc [osd_zfs]
    lu_object_free at ffffffffa06d6a74 [obdclass]
    lu_site_purge_objects at ffffffffa06d7fc1 [obdclass]
    lu_cache_shrink_scan at ffffffffa06d81b8 [obdclass]
    shrink_slab at ffffffff811ca9d8
    shrink_node at ffffffff811cfd94
    do_try_to_free_pages at ffffffff811cfe63
    try_to_free_pages at ffffffff811d01c4
    __alloc_pages_slowpath at ffffffff811be7f2
    __alloc_pages_nodemask at ffffffff811bf3ed
    new_slab at ffffffff81226304
    ___slab_alloc at ffffffff812272ab
    __slab_alloc at ffffffff8122740c
    kmem_cache_alloc at ffffffff81227578
    spl_kmem_cache_alloc at ffffffffa048a1fd [spl]
    arc_buf_alloc_impl at ffffffffa0befba2 [zfs]
    arc_read at ffffffffa0bf0924 [zfs]
    dbuf_read at ffffffffa0bf9083 [zfs]
    dmu_buf_hold_by_dnode at ffffffffa0c04869 [zfs]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Roper <markroper@gmail.com>
Closes #9987
2020-03-12 10:24:43 -07:00
Olaf Faaland 61871518dd zloop.sh should call ZDB with pool name
Commit 54007c79 introduced an error, changing the final
argument to $ZDB from ztest to $ZTEST.  This argument
indicates the pool name, not the script, and so should
not have been changed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #10118
2020-03-11 10:02:23 -07:00
Ryan Moeller ddd9ef3a4f ZTS: Add a failsafe callback to run after each test
Tests that get killed do not have an opportunity to clean up.

There are many bad states this can leave the system in, but of
particular gravity is when zinject has been used to induce bad
behavior for one or more of the test disks.

Create a failsafe mechanism in test-runner.py that runs a callback
script after every test. The script is common to all tests so all
tests benefit from the protection.

Add an obligatory `zinject -c all` to clear all zinject state after
every test case is run.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10096
2020-03-10 11:00:56 -07:00
Matthew Ahrens 1dc32a67e9 Improve zfs send performance by bypassing the ARC
When doing a zfs send on a dataset with small recordsize (e.g. 8K),
performance is dominated by the per-block overheads.  This is especially
true with `zfs send --compressed`, which further reduces the amount of
data sent, for the same number of blocks.  Several threads are involved,
but the limiting factor is the `send_prefetch` thread, which is 100% on
CPU.

The main job of the `send_prefetch` thread is to issue zio's for the
data that will be needed by the main thread.  It does this by calling
`arc_read(ARC_FLAG_PREFETCH)`.  This has an immediate cost of creating
an arc_hdr, which takes around 14% of one CPU.  It also induces later
costs by other threads:

 * Since the data was only prefetched, dmu_send()->dmu_dump_write() will
   need to call arc_read() again to get the data.  This will have to
   look up the arc_hdr in the hash table and copy the data from the
   scatter ABD in the arc_hdr to a linear ABD in arc_buf.  This takes
   27% of one CPU.

 * dmu_dump_write() needs to arc_buf_destroy()  This takes 11% of one
   CPU.

 * arc_adjust() will need to evict this arc_hdr, taking about 50% of one
   CPU.

All of these costs can be avoided by bypassing the ARC if the data is
not already cached.  This commit changes `zfs send` to check for the
data in the ARC, and if it is not found then we directly call
`zio_read()`, reading the data into a linear ABD which is used by
dmu_dump_write() directly.

The performance improvement is best expressed in terms of how many
blocks can be processed by `zfs send` in one second.  This change
increases the metric by 50%, from ~100,000 to ~150,000.  When the amount
of data per block is small (e.g. 2KB), there is a corresponding
reduction in the elapsed time of `zfs send >/dev/null` (from 86 minutes
to 58 minutes in this test case).

In addition to improving the performance of `zfs send`, this change
makes `zfs send` not pollute the ARC cache.  In most cases the data will
not be reused, so this allows us to keep caching useful data in the MRU
(hit-once) part of the ARC.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10067
2020-03-10 10:51:04 -07:00
Ryan Moeller 9be70c3784 ZTS: Simplify some libtest functions
Don't echo the results of arithmetic expressions, it's not necessary.

Use hw.clockrate sysctl to get CPU freq instead of parsing dmesg.boot
for a line that might not even be there anymore.

Reduce bookkeeping in fill_fs, making it easier to follow.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10113
2020-03-10 10:44:14 -07:00
Richard Laager 5ecbb293c6 Fix zfs-functions packaging bug
This fixes a bug where the generated zfs-functions was being included
along with original zfs-functions.in in the make dist tarball.  This
caused an unfortunate series of events during build/packaging that
resulted in the RPM-installed /etc/zfs/zfs-functions listing the
paths as:

ZFS="/usr/local/sbin/zfs"
ZED="/usr/local/sbin/zed"
ZPOOL="/usr/local/sbin/zpool"

When they should have been:

ZFS="/sbin/zfs"
ZED="/sbin/zed"
ZPOOL="/sbin/zpool"

This affects init.d (non-systemd) distros like CentOS 6.

/etc/default/zfs and /etc/zfs/zfs-functions are also used by the
initramfs, so they need to be built even when init.d support is not.
They have been moved to the (new) etc/default and (existing) etc/zfs
source directories, respectively.

Fixes: #9443

Co-authored-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
2020-03-10 09:53:20 -07:00
Richard Laager 01243e72a5 initramfs: Eliminate substitutions
These are now handled in zfs-functions, so this is all duplicative and
unnecessary.

Signed-off-by: Richard Laager <rlaager@wiktel.com>
2020-03-10 09:53:20 -07:00
Richard Laager 49afc91387 Delete built init scripts in make clean
Previously, they were being deleted in make distclean.  This brings it
in line with the example:
https://www.gnu.org/software/automake/manual/html_node/Scripts.html

Signed-off-by: Richard Laager <rlaager@wiktel.com>
2020-03-10 09:53:20 -07:00
Richard Laager dc4dd46728 Make init scripts depend on Makefile
This brings it in line with the example:
https://www.gnu.org/software/automake/manual/html_node/Scripts.html

This way, if the substitution code is changed, they should update.

Signed-off-by: Richard Laager <rlaager@wiktel.com>
2020-03-10 09:53:20 -07:00
InsanePrawn ff2f960b24 Systemd mount generator: don't fail keyload from file if already loaded
Previously the generated keyload units for encryption roots with
keylocation=file://* didn't contain the code to detect if the key
was already loaded and would be marked failed in such situations.

Move the code to check whether the key is already loaded
from keylocation=prompt handling to general key loading code.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #10103
2020-03-09 11:09:09 -07:00
Ryan Moeller 2b95e91132 ZTS: Another round of changes for FreeBSD
Highlights:
* is_linux -> is_illumos swaps
* make block_device_wait more clever when paths are given
* slightly optimize default_cleanup_noexit
* remove platform differences in user_run
* temporarily expect non-libfetch behavior for keylocation=/foo/bar
* fix sharenfs exceptions
* don't test multihost property
* fix misc broken platform checks
* clear zinjected faults in removal_resume_export callback

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10092
2020-03-06 09:31:32 -08:00
Ryan Moeller f5f6fb03b7 Change default to overlay=on
Filesystems allow overlay mounts by default on FreeBSD and Linux.

Respect the native convention by switching the default to overlay=on,
while retaining the option to turn the property off for compatibility
with other operating systems' conventions.

Update documentation and tests accordingly.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10030
2020-03-06 09:28:19 -08:00
Ryan Moeller 788398c562 ZTS: Update zts-report exceptions for FreeBSD
The new zfs_sync_trim_* tests are skipped on FreeBSD.
Both of the previously failing tests are now passing.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10105
2020-03-06 09:26:38 -08:00
Brian Behlendorf 5a1abc4b5b ZTS: Speed up write_dirs cleanup
The write_dirs tests fill a filesystem with a bunch of files until it
is full.  In cleanup the files are truncated and removed individually.
These tests already take a while to run.

It is quicker and easier to destroy the whole dataset and create a new
one to replace it in the cleanup functions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10098
2020-03-04 15:12:12 -08:00
Brian Behlendorf fa23c5be88 ZTS: Add missing quotes
`default_setup` takes a disk list as the first argument and has
optional additional arguments that control secondary functionality.
A couple of test setups mistakenly call `default_setup $DISKS`.

Add quotes so the second and subsequent disks are correctly included
in the pool as vdevs rather than triggering unwanted behavior from
`default_setup`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10097
2020-03-04 15:10:45 -08:00
Brian Behlendorf 4b06d05298 ZTS: Add zts-report exceptions for FreeBSD
There are three tests we expect to fail only on FreeBSD.
* link_count never exits and eventually times out:
 - @amotin tells me this test is probably not applicable to us
 - Skip on FreeBSD
* userobj feature does not activate immediately after pool upgrade
 - low impact; we are aware of this issue
* removal does not appear to condense on export
 - low impact; we are aware of this issue

Additionally removal_with_zdb passes on FreeBSD, so it is moved to
"maybe".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10093
2020-03-04 15:09:40 -08:00
Brian Behlendorf f49db9b504 zio: dprintf_bp() if errors > 0 in zfs_blkptr_verify()
Also dprintf_bp() in case BLK_VERIFY_HALT of zfs_blkptr_verify_log()
since dprintf_bp() in zfs_blkptr_verify() will never be executed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Justin Keogh <commits@v6y.net>
Closes #10086
2020-03-04 15:08:41 -08:00
Brian Behlendorf d16c029f99 ZTS: Test the correct filesystem_limits behavior
See issue #8226: Property filesystem_limit does not work as documented

There have been previous attempts to fix the behavior on Linux, but so
far the issue is still open.  See PRs #8228, #8280.

The existing tests pass for the incorrect behavior.  This is a problem
on FreeBSD; we are failing the tests because we implement the feature
correctly.

I have adapted the tests based on the work by @loli10k in #8280 and
extended the changes to fix the snapshot_limit test as well.

Linux now fails these tests, so entries linking to the issue have been
added to the "maybe" group in zts-report.py.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10082
2020-03-04 15:07:52 -08:00
Brian Behlendorf 2288d41968 Add trim support to zpool wait
Manual trims fall into the category of long-running pool activities
which people might want to wait synchronously for. This change adds
support to 'zpool wait' for waiting for manual trim operations to
complete. It also adds a '-w' flag to 'zpool trim' which can be used to
turn 'zpool trim' into a synchronous operation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
Closes #10071
2020-03-04 15:07:11 -08:00
Matthew Ahrens b3212d2fa6 Improve performance of zio_taskq_member
__zio_execute() calls zio_taskq_member() to determine if we are running
in a zio interrupt taskq, in which case we may need to switch to
processing this zio in a zio issue taskq.  The call to
zio_taskq_member() can become a performance bottleneck when we are
processing a high rate of zio's.

zio_taskq_member() calls taskq_member() on each of the zio interrupt
taskqs, of which there are 21.  This is slow because each call to
taskq_member() does tsd_get(taskq_tsd), which on Linux is relatively
slow.

This commit improves the performance of zio_taskq_member() by having it
cache the value of tsd_get(taskq_tsd), reducing the number of those
calls to 1/21th of the current behavior.

In a test case running `zfs send -c >/dev/null` of a filesystem with
small blocks (average 2.5KB/block), zio_taskq_member() was using 6.7% of
one CPU, and with this change it is reduced to 1.3%.  Overall time to
perform the `zfs send` reduced by 10% (~150,000 block/sec to ~165,000
blocks/sec).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10070
2020-03-03 10:29:38 -08:00
Ryan Moeller 0a0f9a7dc6 ZTS: Provide for nested cleanup routines
Shared test library functions lack a simple way to ensure proper
cleanup in the event of a failure.  The `log_onexit` cleanup pattern
cannot be used in library functions because it uses one global
variable to store the cleanup command.

An example of where this is a serious issue is when a tunable that
artifically stalls kernel progress gets activated and then some check
fails.  Unless the caller knows about the tunable and sets it back,
the system will be left in a bad state.

To solve this problem, turn the global cleanup variable into a stack.
Provide push and pop functions to add additional cleanup steps and
remove them after it is safe again.

The first use of this new functionality is in attempt_during_removal,
which sets REMOVAL_SUSPEND_PROGRESS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10080
2020-03-03 10:28:09 -08:00
Ryan Moeller 9bb907bc3f Make spa_history_zone platform-dependent in kernel
This function should only return "linux" on Linux.

Move the kernel part of the function out of common code.
Fix the tests for FreeBSD.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10079
2020-03-02 09:43:30 -08:00
Ryan Moeller 1289fbb557 ZTS: Change issue URL template to OpenZFS org
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10081
2020-03-02 09:42:22 -08:00
Matthew Macy d32eff3a27 Don't open zfs control device exclusively
With the FreeBSD platform changes that were made for #10073
it is no longer necessary on FreeBSD to open the control device
exclusively to get onexit callbacks invoked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10076
2020-02-28 14:54:14 -08:00
Matthew Macy cf118ae8dc Don't call zrele on passed zp in zfs_xattr_owner_unlinked on FreeBSD
FreeBSD has a somewhat more cumbersome locking and refcounting
protocol for the platform counterpart to znode. We need to not call
zrele on the passed zp, but do need to do so on any intermediate zp.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10075
2020-02-28 14:53:18 -08:00
Matthew Macy ae9f92f6f3 Re-share zfsdev_getminor and zfs_onexit_fd_hold
By adding a zfs_file_private accessor to the common
interfaces and some extensions to FreeBSD platform
code it is now possible to share the implementations
for the aforementioned functions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10073
2020-02-28 14:50:32 -08:00
Matthew Ahrens 9cdf7b1f6b Improve zfs destroy performance with zio_t-free zio_free()
When "zfs destroy" is run, it completes quickly, and in the background
we locate the blocks to free and free them.  This background activity
can be observed with `zpool get freeing` and `zpool wait -t free ...`.

This background activity is processed by a single thread (the spa_sync
thread) which calls zio_free() on each of the blocks to free.  With even
modest storage performance, the CPU consumption of zio_free() can be the
performance bottleneck.

Performance of zio_free() can be improved by not actually creating a
zio_t in the common case (non-dedup, non-gang), instead calling
metaslab_free() directly.  This avoids the CPU cost of allocating the
zio_t, and more importantly the cost of adding and later removing this
zio_t from the parent zio's child list.

The result is that performance of background freeing more than doubles,
from 0.6 million blocks per second to 1.3 million blocks per second.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10034
2020-02-28 14:49:44 -08:00
Ryan Moeller 6c0abcfddd ZTS: Fixup shebang in rsend_016, add to common.run
All other ksh scripts use /bin/ksh in the shebang.

Make rsend_016_neg consistent with the rest of the suite.

The test also was absent from any runfiles. Add it to common.run.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10051
2020-02-28 09:48:29 -08:00
Ryan Moeller f0410e9806 ZTS: Eliminate partitioning from zpool_add
Use file vdevs if we are short on $DISKS.
Also fixed vol recursion for FreeBSD in 004.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10060
2020-02-28 09:46:51 -08:00
Brian Behlendorf 3f99a3abc7 Fix CONFIG_MODULES=no Linux kernel config
When configuring as builtin (--enable-linux-builtin) for kernels
without loadable module support (CONFIG_MODULES=n) only the object
file is created.  Never a loadable kmod.

Update ZFS_LINUX_TRY_COMPILE to handle this in a manor similar to
the ZFS_LINUX_TEST_COMPILE_ALL macro.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9887
Closes #10063
2020-02-28 09:23:48 -08:00
Brian Behlendorf bd0d24e09b Linux 5.5 compat: blkg_tryget()
Commit https://github.com/torvalds/linux/commit/9e8d42a0f accidentally
converted the static inline function blkg_tryget() to GPL-only for
kernels built with CONFIG_PREEMPT_RCU=y and CONFIG_BLK_CGROUP=y.

Resolve the build issue by providing our own equivalent functionality
when needed which uses rcu_read_lock_sched() internally as before.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9745
Closes #10072
2020-02-28 08:58:39 -08:00
Ryan Moeller 2ce90dca91 arc_summary: Make get_descriptions per platform
Linux uses modinfo to get tunables descriptions, FreeBSD has to use
sysctl.

Move the existing function definition so it is defined that way on
Linux, and add a definition in terms of sysctl for FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10062
2020-02-27 17:15:06 -08:00
Ryan Moeller 276410387e pyzfs: Add constants for platform-specific errnos
FreeBSD doesn't have EBADE, ECHRNG, or ETIME.

Add constants for these and set them appropriately for the platform.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10061
2020-02-27 17:14:21 -08:00
Matthew Macy 13fac09868 Consolidate arc_buf allocation checks
The following check currently occurs in three separate locations
in dbuf.c.  This change consolidates those checks in to the
dbuf_alloc_arcbuf_from_arcbuf() function.

if (arc_is_encrypted(data)) {
...
} else if (compress_type != ZIO_COMPRESS_OFF) {
...
} else {
...
}

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10057
2020-02-27 17:12:44 -08:00
Ryan Moeller 3d5ba1cf29 ZTS: Misc fixes for FreeBSD
* Set geom debug flags in corrupt_blocks_at_level
* Use the right time zone for history tests
* Add missing commands.cfg entry for diskinfo
* Rewrite get_last_txg_synced to use zdb
* Don't check ulimits for sparse files
* Suspend removal before removing a vdev, not after

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10054
2020-02-27 09:38:34 -08:00
Matthew Ahrens ab9646166d ZTS: Fix zfs_receive_004_neg
`zfs recv` of an incremental stream that already exists is ignored, with
a message like:

    receiving incremental stream of pool/fs@incsnap into pool/fs@incsnap
    snap testpool/testfs@incsnap already exists; ignoring

And the command exits successfully (exit code 0).

The zfs_receive_004_neg test is expecting that a this case will fail,
with nonzero exit code.

The fix is to remove this specific command from the test case.  This
lets us check that the remaining commands do in fact fail.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10055
2020-02-27 09:37:34 -08:00
Brian Behlendorf 2c3a83701d Linux 5.6 compat: time_t
As part of the Linux kernel's y2038 changes the time_t type has been
fully retired.  Callers are now required to use the time64_t type.

Rather than move to the new type, I've removed the few remaining
places where a time_t is used in the kernel code.  They've been
replaced with a uint64_t which is already how ZFS internally
handled these values.

Going forward we should work towards updating the remaining user
space time_t consumers to the 64-bit interfaces.

Reviewed-by: Matthew Macy <mmacy@freebsd.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10052
Closes #10064
2020-02-27 09:31:02 -08:00
Brian Behlendorf ff5587d651 Linux 5.6 compat: ktime_get_raw_ts64()
The getrawmonotonic() and getrawmonotonic64() interfaces have been
fully retired.  Update gethrtime() to use the replacement interface
ktime_get_raw_ts64() which was introduced in the 4.18 kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10052
Closes #10064
2020-02-27 09:30:45 -08:00
Matthew Macy 28caa74b19 Refactor dnode dirty context from dbuf_dirty
* Add dedicated donde_set_dirtyctx routine.
* Add empty dirty record on destroy assertion.
* Make much more extensive use of the SET_ERROR macro.

Reviewed-by: Will Andrews <wca@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9924
2020-02-26 16:09:17 -08:00
Ryan Moeller 647ff8e975 ZTS: Fix zfs_copies_002_pos
The function `get_used_prop` does not exist.

Use `get_prop used` instead.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10059
2020-02-26 14:29:12 -08:00
Ryan Moeller abef699866 ZTS: Adapt casenorm tests for FreeBSD
Several casenorm tests pass on FreeBSD but are expected to fail on
Linux.

Move the passing tests from "fail" to "maybe" so that passing on
FreeBSD is not unexpected.

Invert platform logic so FreeBSD doesn't use illumos-only zlook.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10050
2020-02-26 08:41:30 -08:00
Ryan Moeller 3a192f7d89 ZTS: Misc fixes for FreeBSD
* Check for mountd in is_shared to avoid timeout when not running
* Enhance robustness of some cleanup functions
* Simplify atime lookup
* Skip sharenfs validation for now
* Don't add mountpoint property to inheritance validation on FreeBSD

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10047
2020-02-25 16:23:27 -08:00
Ryan Moeller a33cb7e01a Add missing newline after zfs redact help message
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10045
2020-02-25 16:20:50 -08:00
Olaf Faaland 034313908a ZTS: zed_start should not fail if zed is already running
zed_start may be called in places where zed is not
typically already running, but this is not a requirement
of the tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #9974
2020-02-25 16:02:10 -08:00
Matthew Macy c6a6b4d50a Remove dead code error handling from dsl_crypt.c
Sleepable (KM_SLEEP) allocations cannot fail. Hence
error handling for them is not useful.

Reviewed-By: Tom Caputi <tcaputi@datto.com>
Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10031
2020-02-25 15:59:29 -08:00
Ryan Moeller 2757010166 ZTS: Move atime_003 to linux.run
This test verifies relatime behavior, which is only present on Linux.

Move the test to linux.run

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10046
2020-02-25 15:27:41 -08:00
Matthew Ahrens 521abcc49a Update README for OpenZFS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10053
2020-02-25 11:43:20 -08:00
Dirkjan Bussink 327000ce04 Remove zfs_getattr and convoff dead code
The `convoff` function is called only in one code path in `zfs_space`.
Each caller of `zfs_space` is called with a `flock64_t` that has
`l_whence` set to `SEEK_SET`. This means that `convoff` always results
in a no-op as the `bfp` parameter has `l_whence` set to `SEEK_SET` and
`int whence` is `SEEK_SET` as well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Closes #10006
2020-02-24 15:38:22 -08:00
Ryan Moeller 92bd4cabd6 ZTS: Misc fixes for FreeBSD
* Force UFS sync before snap in vol rollback tests
* rw is not a valid share option on FreeBSD, use ro instead
* zfs_unmount_nested: mountpoint is in the pool, rmdir *before* export
* Fix some more platform checks
* Fix disappearing group in delegate tests
* Don't try delegating for jailed, only root can set it

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10038
2020-02-24 10:17:55 -08:00
Matthew Ahrens 31a69fbccb Remove unused structs and members in dmu_send.c
There are several structs (and members of structs) related to redaction,
which are no longer used.  This commit removes them.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10039
2020-02-24 09:50:14 -08:00
Ryan Moeller 24fcd9fc5c ZTS: Eliminate partitioning from zpool_destroy
The zpool destroy tests partition a single disk to create two pools.

This can be done using two disks and no partitioning instead.
And temporarily allow vol recursion for FreeBSD while in here.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10036
2020-02-21 16:00:22 -08:00
Ryan Moeller b7dbbf6aa7 ZTS: Refactor is_shared, fix impl on FreeBSD
FreeBSD doesn't have a `share` command.  It does have showmount.

Split the separate platform impls out of is_shared_impl.
Dispatch to the correct platform impl function from is_shared.
Eliminate the use of is_shared_impl from tests.  is_shared works.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10037
2020-02-21 15:59:20 -08:00
Ryan Moeller f5f438194d ZTS: Move privilege tests to sunos.run
These tests are unspported on FreeBSD and Linux for lack of pfexec.

Move the privilege tests to sunos.run and remove the platform checks.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10035
2020-02-21 08:52:44 -08:00
Ryan Moeller 6a60841631 ZTS: Don't use lsblk on FreeBSD
These tests use lsblk to find the sector size of a disk.
FreeBSD doesn't have lsblk.

Use diskinfo -v to get sector size on FreeBSD.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>\
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10033
2020-02-21 08:38:34 -08:00
Ryan Moeller ca7ea23f8a ZTS: Fix userquota_006_pos on FreeBSD
FreeBSD uses `pw` for account management. `userquota_006_pos`
erroneously invokes the non-existent `groupdel` command on FreeBSD.

Use `pw groupdel -n` instead of `groupdel` on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10032
2020-02-20 08:14:24 -08:00
Ryan Moeller b11375d74a ZTS: Check the right mount options on FreeBSD
FreeBSD does not support the "devices" and "nodevices" mount options.

Do not check these options on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10028
2020-02-20 08:12:23 -08:00
Ryan Moeller 325a551232 ZTS: Fix faulty slog_replay_fs_001 test
This test is supposed to verify zil operations. For TX_WRITE, writes
must be synchronous in order to be entered in the zil. Linux seems to
be doing sync writes even when they are not asked for, but on FreeBSD
the test does not do what is intended.

Use dd oflag=sync for the parts of this test that are supposed to
result in TX_WRITE zil entries.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10022
2020-02-20 08:11:51 -08:00
Arvind Sankar 65635c3403 Fix icp include directories for in-tree build
When zfs is built in-tree using --enable-linux-builtin, the compile
commands are executed from the kernel build directory. If the build
directory is different from the kernel source directory, passing
-Ifs/zfs/icp will not find the headers as they are not present in the
build directory.

Fix this by adding @abs_top_srcdir@ to pull the headers from the zfs
source tree instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10021
2020-02-20 08:10:47 -08:00
Ryan Moeller 8136956716 ZTS: Eliminate partitioning from zpool_create etc
These tests can be made to work without a bunch of complex
partitioning of physical disks.

Use the 3 disks directly, creating a few file disks if needed for a
compelling reason.

Reduce the use of shared variables that don't have a clear utility.

Catch the fallout in tests that include cfg/shlib from zpool_create.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10002
2020-02-20 08:10:13 -08:00
Ryan Moeller 873cd182de ZTS: Fix zpool_create/create-o_ashift on FreeBSD
For some unknown reason, egrep was misbehaving with this pattern on
FreeBSD.  The command works fine run interactively from a shell, but
in the test the output of egrep is empty.

Work around the issue by using a filter in the awk script instead.

While here, add a bit of diagnostic output and other simplifications
to the awk script as well.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10023
2020-02-19 10:27:23 -08:00
Ryan Moeller 392556f0ef ZTS: Avoid nonportable cmp flag
FreeBSD doesn't have the -n flag for cmp.

Read the area for the first four labels from the disk to a separate
file to compare instead of using the special flag to limit the size.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10024
2020-02-19 09:03:31 -08:00
Mariusz Zaborski 89adffb7bc Add notice that forcefully unmount is not supported on Linux
The Linux VFS will never allow a filesystem which is in use to
be unmounted.  This behavior differs from other platforms like
FreeBSD which allow a filesystem to be force unmounted.  This 
will result in errors being returned to applications actively
using the filesystem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #10013
2020-02-18 13:36:23 -08:00
Ryan Moeller 43849fdf3f ZTS: Move free to Linux commands list
FreeBSD does not have the free command. This command is only used by
Linux in a perf hostinfo function.

Move free from the list of common commands to the list of Linux
commands.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10011
2020-02-18 11:23:41 -08:00
Ryan Moeller 5f087dda78 Enable zpool events tunables and tests on FreeBSD
We have have made the necessary changes in our module code to expose
zevents through both devd and the zpool events ioctl. Now the tunables
can be exposed and zpool events tests can be enabled on both platforms.

A few minor tweaks to the tests were needed to accommodate the way wc
formats output on FreeBSD.

zed remains to be ported.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10008
2020-02-18 11:22:56 -08:00
Matthew Macy 8b3547a481 Factor out some dbuf subroutines and add state change tracing
Create dedicated dbuf_read_hole and dbuf_read_bonus.
Additionally, add a dtrace probe to allow state change tracing.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Will Andrews <wca@FreeBSD.org>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Authored-by: Will Andrews <wca@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9923
2020-02-18 11:21:37 -08:00
Richard Laager f244846462 Prefer org.openzfs for features and properties
Moving forward, we wish to use org.openzfs (no dash) rather than
org.open-zfs or org.zfsonlinux for feature GUIDs and property names.
The existing feature GUIDs cannot be changed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10003
2020-02-18 09:36:50 -08:00
Ryan Moeller fb63fc0c03 ZTS: Move cksum to common system commands
The cksum command is used by delegate tests. We have it on FreeBSD,
so it should not have been moved to the Linux commands list.

Move it back to the common commands list.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10007
2020-02-16 12:49:49 -08:00
DeHackEd d09dc5980c Honour sync=disabled when relinking tpmfiles
Unlinked files don't respect synchronous flush commands, but when they get relinked
their state is unknown. Previously we force flushed all such files even when
sync=disabled. Correct this case.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: DHE <git@dehacked.net>
Closes #10005
2020-02-16 12:44:08 -08:00
InsanePrawn ecbbdac799 Systemd mount generator: Generate noauto units; add control properties
This commit refactors the systemd mount generators and makes the
following major changes:

- The generator now generates units for datasets marked canmount=noauto,
  too. These units are NOT WantedBy local-fs.target.
  If there are multiple noauto datasets for a path, no noauto unit will
  be created. Datasets with canmount=on are prioritized.

- Introduces handling of new user properties which are now included in
  the zfs-list.cache files:
    - org.openzfs.systemd:requires:
      List of units to require for this mount unit
    - org.openzfs.systemd:requires-mounts-for:
      List of mounts to require by this mount unit
    - org.openzfs.systemd:before:
      List of units to order after this mount unit
    - org.openzfs.systemd:after:
      List of units to order before this mount unit
    - org.openzfs.systemd:wanted-by:
      List of units to add a Wants dependency on this mount unit to
    - org.openzfs.systemd:required-by:
      List of units to add a Requires dependency on this mount unit to
    - org.openzfs.systemd:nofail:
      Toggles between a wants and a requires dependency.
    - org.openzfs.systemd:ignore:
      Do not generate a mount unit for this dataset.

  Consult the updated man page for detailed documentation.

- Restructures and extends the zfs-mount-generator(8) man page with the
  above properties, information on unit ordering and a license header.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9649
2020-02-14 15:32:55 -08:00
InsanePrawn 9d2f3b7f94 Systemd mount generator: Silence shellcheck warnings
Silences a warning about an intentionally unquoted variable.
Fixes a warning caused by strings split across lines by slightly
refactoring keyloadcmd.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9649
2020-02-14 15:32:33 -08:00
Jason King 13b5a4d5c0 Support setting user properties in a channel program
This adds support for setting user properties in a
zfs channel program by adding 'zfs.sync.set_prop'
and 'zfs.check.set_prop' to the ZFS LUA API.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Co-authored-by: Sara Hartse <sara.hartse@delphix.com>
Contributions-by: Jason King <jason.king@joyent.com>
Signed-off-by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Jason King <jason.king@joyent.com>
Closes #9950
2020-02-14 13:41:42 -08:00
Matthew Ahrens 4fe3a842bb Remove limit on number of async zio_frees of non-dedup blocks
The module parameter zfs_async_block_max_blocks limits the number of
blocks that can be freed by the background freeing of filesystems and
snapshots (from "zfs destroy"), in one TXG.  This is useful when freeing
dedup blocks, becuase each zio_free() of a dedup block can require an
i/o to read the relevant part of the dedup table (DDT), and will also
dirty that block.

zfs_async_block_max_blocks is set to 100,000 by default.  For the more
typical case where dedup is not used, this can have a negative
performance impact on the rate of background freeing (from "zfs
destroy").  For example, with recordsize=8k, and TXG's syncing once
every 5 seconds, we can free only 160MB of data per second, which may be
much less than the rate we can write data.

This change increases zfs_async_block_max_blocks to be unlimited by
default.  To address the dedup freeing issue, a new tunable is
introduced, zfs_max_async_dedup_frees, which limits the number of
zio_free()'s of dedup blocks done by background destroys, per txg.  The
default is 100,000 free's (same as the old zfs_async_block_max_blocks
default).

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10000
2020-02-14 08:39:46 -08:00
Ryan Moeller 0f1832106d Make zpool.d/iostat work on FreeBSD
There are slight differences in the iostat commands between FreeBSD and
Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9979
2020-02-14 08:37:40 -08:00
Andrew J. Hesford db0ad393b1 Use POSIX stdout/stderr redirect in configure macro
This PR fixes an issue wherein redirecting stdout and stderr when 
building kernel modules in configure tests relied on a bashism that 
does not work as expected when /bin/sh is not bash.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Andrew J. Hesford <ajh@sideband.org>
Closes #9990 
Closes #9998
2020-02-14 08:30:29 -08:00
Ryan Moeller 4f4ddf98ee ZTS: Misc test fixes for FreeBSD
Add missing logic for FreeBSD to a few test scripts.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9994
2020-02-13 13:52:34 -08:00
Ryan Moeller 71439163bb ZTS: Don't include zpool_create.shlib in zpool_add
The zpool_add tests include zpool_create.shlib for a few silly
variables.

Don't use those variables for the file names. Include zpool_add.kshlib
for whatever variables we still need.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9997
2020-02-13 12:11:25 -08:00
Ryan Moeller 3bdc4f6314 ZTS: Eliminate partitioning from zpool_remove
These tests do not need to use partitions.

Get rid of the partitioning and just use the disks directly.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9996
2020-02-13 12:10:35 -08:00
Ryan Moeller 3e725f0ad2 ZTS: Eliminate partitioning from write_dirs
These tests do not need to use partitions.

Get rid of the partitioning and just use the disks directly.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9995
2020-02-13 12:08:59 -08:00
Ryan Moeller 5ceccda5cb ZTS: Cleanup some cleanup functions
Cleanup functions should make a best effort to clean up as much as
possible.

Do a consistency pass in a bunch of tests to make the cleanup
functions less prone to failure and fix a few typos here and there.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9993
2020-02-13 12:05:32 -08:00
Ryan Moeller 523bc0d548 Fix a typo/whitespace in tests README
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9991
2020-02-13 12:04:47 -08:00
Ryan Moeller b90b01cbc8 ZTS: Use ECKSUM instead of EBADE in libzfs test
Linux defines ECKSUM as EBADE, FreeBSD defines it as EINTEGRITY.

Test for ECKSUM instead of EBADE so we don't have to define EBADE for
this test on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9992
2020-02-13 12:03:01 -08:00
Richard Laager d1d65bb367 zfs-mount-generator: Fix escaping for /
The correct name for the mount unit for / is "-.mount", not ".mount".

Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Antonio Russo <antonio.e.russo@gmail.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #9970
2020-02-13 11:55:59 -08:00
Matthew Ahrens f49b7a0d8e fix zstreamdump -C
zstreamdump -C always fails.  It is not calculating the checksums, but
it's still trying to verify that the (non-calculated) checksum matches
the one stored in the send stream.

This change makes zstreamdump -C not verify checksums.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #9983
2020-02-13 11:24:57 -08:00
Matthew Ahrens 2adc6b35ae Missed wakeup when growing kmem cache
When growing the size of a (VMEM or KVMEM) kmem cache, spl_cache_grow()
always does taskq_dispatch(spl_cache_grow_work), and then waits for the
KMC_BIT_GROWING to be cleared by the taskq thread.

The taskq thread (spl_cache_grow_work()) does:
1. allocate new slab and add to list
2. wake_up_all(skc_waitq)
3. clear_bit(KMC_BIT_GROWING)

Therefore, the waiting thread can wake up before GROWING has been
cleared.  It will see that the growing has not yet completed, and go
back to sleep until it hits the 100ms timeout.

This can have an extreme performance impact on workloads that alloc/free
more than fits in the (statically-sized) magazines.  These workloads
allocate and free slabs with high frequency.

The problem can be observed with `funclatency spl_cache_grow`, which on
some workloads shows that 99.5% of the time it takes <64us to allocate
slabs, but we spend ~70% of our time in outliers, waiting for the 100ms
timeout.

The fix is to do `clear_bit(KMC_BIT_GROWING)` before
`wake_up_all(skc_waitq)`.

A future investigation should evaluate if we still actually need to
taskq_dispatch() at all, and if so on which kernel versions.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #9989
2020-02-13 11:23:02 -08:00
Alexander Motin 465e4e795e Remove duplicate dbufs accounting
Since AVL already has embedded element counter, use dn_dbufs_count
only for dbufs not counted there (bonus buffers) and just add them.
This removes two atomics per dbuf life cycle.

According to profiler it reduces time spent by dbuf_destroy() inside
bottlenecked dbuf_evict_thread() from 13.36% to 9.20% of the core.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #9949
2020-02-13 11:20:42 -08:00
Ryan Moeller 610eec452d ZTS: Move user_namespace test to linux.run
Namespaces is a Linux feature not available on other platforms.

Move the user_namespace test out of common.run to linux.run.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9982
2020-02-12 13:06:00 -08:00
Ryan Moeller 834f274fbf ZTS: Interpret env vars in faketty on FreeBSD
This was missed in review. On FreeBSD, script does not understand
environment variables being passed as a command.

Use env to make faketty handle env vars on FreeBSD.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9981
2020-02-12 13:04:51 -08:00
Ryan Moeller 9c536b9a78 ZTS: Fix zdb_display_block on FreeBSD
Missed this in the review, but wc output on FreeBSD is indented,
so string comparisons mismatch when comparing to an unindented number.

Compare counts as integers instead of strings.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9980
2020-02-12 13:01:55 -08:00
Ryan Moeller e7be5c47bd Move zfs_version_kernel to platform code
Linux uses sysfs to determine the module version, FreeBSD uses a
different method.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9978
2020-02-12 13:00:19 -08:00
Ryan Moeller 1bbeb6d755 ZTS: Move zpool_split_wholedisks to linux.run
This test uses the scsi_debug Linux kernel module.

Move the test to linux.run until we have an alternative to scsi_debug
worked out on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9984
2020-02-12 12:59:28 -08:00
Justin Keogh 12f7b90c93 zdb: Always print symlink target
When zdb is printing paths, also print the symlink target if it exists.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Justin Keogh <commits@v6y.net>
Closes #9925
2020-02-12 11:36:05 -08:00
Christian Schwarz 948f0c4419 zcp: add zfs.sync.bookmark
Add support for bookmark creation and cloning.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9571
2020-02-11 13:19:17 -08:00
Christian Schwarz a73f361fdb Implement bookmark copying
This feature allows copying existing bookmarks using

    zfs bookmark fs#target fs#newbookmark

There are some niche use cases for such functionality,
e.g. when using bookmarks as markers for replication progress.

Copying redaction bookmarks produces a normal bookmark that
cannot be used for redacted send (we are not duplicating
the redaction object).

ZCP support for bookmarking (both creation and copying) will be
implemented in a separate patch based on this work.

Overview:

- Terminology:
    - source = existing snapshot or bookmark
    - new/bmark = new bookmark
- Implement bookmark copying in `dsl_bookmark.c`
  - create new bookmark node
  - copy source's `zbn_phys` to new's `zbn_phys`
  - zero-out redaction object id in copy
- Extend existing bookmark ioctl nvlist schema to accept
  bookmarks as sources
  - => `dsl_bookmark_create_nvl_validate` is authoritative
- use `dsl_dataset_is_before` check for both snapshot
  and bookmark sources
- Adjust CLI
  - refactor shortname expansion logic in `zfs_do_bookmark`
- Update man pages
  - warn about redaction bookmark handling
- Add test cases
  - CLI
  - pyyzfs libzfs_core bindings

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9571
2020-02-11 13:19:12 -08:00
Matthew Macy 7b49bbc816 Address Coverity warnings in #9902
Coverity reports the variable may be NULL, but due to the
way the dirty records are handled this cannot be the case.
Add a comment and VERIFY to make this clear and silence
the warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9962
2020-02-11 13:12:41 -08:00
Brian Behlendorf dceeca5bbd Add missing dmu_buf_unlock_parent() calls to dbuf_read_impl()
As explained by the comment in dbuf_read() and above dbuf_read_impl().
Under all circumstances the parent lock specified by dblt should be
dropped when existing dbuf_read_impl().  This was not being done for
two exist paths.  Additionally, ensure the mutex is unlocked before
dropping the parent lock.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9968
2020-02-10 14:54:12 -08:00
Paul Zuchowski bc67cba7c0 Fix zdb -R with 'b' flag
zdb -R :b fails due to the indirect block being compressed,
and the 'b' and 'd' flag not working in tandem when specified.
Fix the flag parsing code and create a zfs test for zdb -R
block display.  Also fix the zio flags where the dotted notation
for the vdev portion of DVA (i.e. 0.0:offset:length) fails.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9640
Closes #9729
2020-02-10 14:00:05 -08:00
Graham Christensen dda702fd16 bash scripts: use /usr/bin/env for bash shebangs
Not all systems / distros have a `/bin/bash`, and these scripts are
more difficult to run at development time.

For example, my system is NixOS which doesn't have a /bin/bash. This
is not a problem for NixOS building ZFS as a package: the build
environment automatically replaces these shebangs with corrected
paths.

The problem is much more annoying at development time: either the
scripts don't run, or I correct them for my local machine and deal with
a perpetually dirty work tree.

Before committing this patch I confirmed there are existing scripts
which use `/usr/bin/env` to locate bash, so I am thinking this is a
safe transformation.

There are a handful of other shebangs in this repository which don't
work on my system. This patch is useful on its own specifically for
`commitcheck.sh`, otherwise I can't validate my commits before
submission.

Here are the remaining shebangs which NixOS systems won't have:

       1274 #!/bin/ksh -p
         91 #!/bin/ksh
         89 #! /bin/ksh -p
          2 #!/bin/sed -f
          1 #!/usr/bin/perl -w
          1 #!/usr/bin/ksh
          1 #!/bin/nawk -f

plus this which will create an invalid shebang in
`tests/zfs-tests/tests/functional/mv_files/mv_files_common.kshlib`:

        echo "#!/bin/ksh" > $TEST_BASE_DIR/exitsZero.ksh

I chose to leave those alone for now, and gauge the interest in this
much smaller patch first.

The fixes for these are easy enough by simply using `/usr/bin/env ksh`:

         91 #!/bin/ksh
          1 #!/usr/bin/ksh

The fix for the other set is much trickier. Quoting the GNU coreutils
manual:

    Most operating systems (e.g. GNU/Linux, BSDs) treat all text after
    the first space as a single argument. When using env in a script it
    is thus not possible to specify multiple arguments.

and not all `env`'s support arguments.

Mine (GNU Coreutils 8.31) does, though this feature is new since
April 2018, GNU Coreutils 8.30:
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=668306ed86c8c79b0af0db8b9c882654ebb66db2

and worse, requires the -S argument:

    -S, --split-string=S  process and split S into separate arguments;
                          used to pass multiple arguments on shebang
                          lines

Example:

    $ seq 1 2 | $(nix-build '<nixpkgs>' -A coreutils)/bin/env "sort -nr"
    /nix/[...]-coreutils-8.31/bin/env: ‘sort -nr’: No such file or directory
    /nix/[...]-coreutils-8.31/bin/env: use -[v]S to pass options in shebang lines

    $ seq 1 2 | $(nix-build '<nixpkgs>' -A coreutils)/bin/env "-S sort -nr"
    2
    1

GNU Coreutils says FreeBSD's `env` does, though I wonder if FreeBSD's
would be unhappy with the `-S`:
https://www.gnu.org/software/coreutils/manual/html_node/env-invocation.html#env-invocation

BusyBox v1.30.1 does not, and does not have a `-S`-like option:

    $ seq 1 2 | $(nix-build '<nixpkgs>' -A busybox)/bin/env "sort -nr"
    env: can't execute 'sort -nr': No such file or directory

Toybox 0.8.1 also does not, and also does not have a `-S` option:

    $ seq 1 2 | $(nix-build '<nixpkgs>' -A toybox)/bin/env "sort -nr"
    env: exec sort -nr: No such file or directory

---

At any rate, if this patch merges and the remaining ~1,500 are updated,
the much larger patch should probably include a checkstyle-like test
asserting all new shebangs use `/usr/bin/env`. I also don't mind
dealing with NixOS weirdness if the project would prefer that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Graham Christensen <graham@grahamc.com>
Closes #9893
2020-02-10 13:13:46 -08:00
Ryan Moeller 57940b435c Share some code for spa deadman tunables
We need to do the same thing to update all spas on any OS for these
tunables, so let's share the code.

While here let's match the types of the literals initializing the
variables with the type of the variable.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9964
2020-02-10 13:11:30 -08:00
Ryan Moeller 572b5b302a ZTS: Test zvol I/O in different volmodes
We found that our zvol code had some issues with volmode=dev that were
not revealed by ZTS.

Add some basic I/O operations to exercise more code paths in the
volmode test.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9953
2020-02-10 13:10:25 -08:00
Attila Fülöp 31b160f0a6 ICP: Improve AES-GCM performance
Currently SIMD accelerated AES-GCM performance is limited by two
factors:

a. The need to disable preemption and interrupts and save the FPU
state before using it and to do the reverse when done. Due to the
way the code is organized (see (b) below) we have to pay this price
twice for each 16 byte GCM block processed.

b. Most processing is done in C, operating on single GCM blocks.
The use of SIMD instructions is limited to the AES encryption of the
counter block (AES-NI) and the Galois multiplication (PCLMULQDQ).
This leads to the FPU not being fully utilized for crypto
operations.

To solve (a) we do crypto processing in larger chunks while owning
the FPU. An `icp_gcm_avx_chunk_size` module parameter was introduced
to make this chunk size tweakable. It defaults to 32 KiB. This step
alone roughly doubles performance. (b) is tackled by porting and
using the highly optimized openssl AES-GCM assembler routines, which
do all the processing (CTR, AES, GMULT) in a single routine. Both
steps together result in up to 32x reduction of the time spend in
the en/decryption routines, leading up to approximately 12x
throughput increase for large (128 KiB) blocks.

Lastly, this commit changes the default encryption algorithm from
AES-CCM to AES-GCM when setting the `encryption=on` property.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Jason King <jason.king@joyent.com>
Reviewed-By: Tom Caputi <tcaputi@datto.com>
Reviewed-By: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #9749
2020-02-10 12:59:50 -08:00
Matthew Macy fa3922df75 Factor out dbuf_sync_bonus
Factor the portion of dbuf_sync_leaf() responsible for handling bonus
buffers out in to its own dbuf_sync_bonus() helper function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9909
2020-02-07 14:22:29 -08:00
Igor K 818d4a87fd ZTS: Add an is_dilos function for future ZTS updates
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9960
2020-02-07 12:32:52 -08:00
Ryan Moeller 9825e7ad1d ZTS: Use wc -c instead of --bytes for portability
FreeBSD does not have the long opts for wc.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9963
2020-02-07 12:31:38 -08:00
Brian Behlendorf 795699a6cc Linux 5.6 compat: timestamp_truncate()
The timestamp_truncate() function was added, it replaces the existing
timespec64_trunc() function.  This change renames our wrapper function
to be consistent with the upstream name and updates the compatibility
code for older kernels accordingly.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9956
Closes #9961
2020-02-07 11:04:32 -08:00
Brian Behlendorf 0dd7364853 Linux 5.6 compat: struct proc_ops
The proc_ops structure was introduced to replace the use of of the
file_operations structure when registering proc handlers.  This
change creates a new kstat_proc_op_t typedef for compatibility
which can be used to pass around the correct structure.

This change additionally adds the 'const' keyword to all of the
existing proc operations structures.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9961
2020-02-07 11:03:53 -08:00
Alexander Motin e0ce98d57c Reduce number of atomic_add() calls in aggsum
Previous code used 4 atomics to do aggsum_flush_bucket() and 2 more to
re-borrow after the flush.  But since asc_borrowed and asc_delta are
accessed only while holding asc_lock, it makes no any sense to modify
as_lower_bound and as_upper_bound in multiple steps.  Instead of that
the new code uses only 2 atomics in all the cases, one per as_*_bound
variable.  I think even that is overkill, simple atomic store and
load could be used here, since all modifications are done under the
as_lock, but there are no such primitives in ZFS code now.

While there, make borrow code consider previous borrow value, so that
on mixed request patterns reduce chance of needing to borrow again if
much larger request follows tiny one that needed borrow.

Also reduce as_numbuckets from uint64_t to u_int.  It makes no sense
to use so large division operation on every aggsum_add().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #9930
2020-02-06 13:21:06 -08:00
Romain Dolbeau 77122f9d68 Replace static per-cpu with dynamic per-cpu data
This solves the issue of loading the spl module on RISC-V.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes #9942
2020-02-06 09:26:13 -08:00
Romain Dolbeau af09c050e9 Fix static data to link with -fno-common
-fno-common is the new default in GCC 10, replacing -fcommon in
GCC <= 9, so static data must only be allocated once.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes #9943
2020-02-06 09:25:29 -08:00
Ryan Moeller 5b7e6a3617 Fix unknown cc flag -fno-ipa-sra
Clang does not recognize -fno-ipa-sra, so add a check for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9946
2020-02-06 09:24:37 -08:00
Gerardwx 81acb1edcb Suggest using visudo to edit
Suggest visudo which allows editing the sudoers file in a safe fashion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gerardwx <gerardw@alum.mit.edu>
Closes #9918
2020-02-05 11:31:53 -08:00
Alexander Motin cbd8f5b759 Few microoptimizations to dbuf layer
Move db_link into the same cache line as db_blkid and db_level.
It allows significantly reduce avl_add() time in dbuf_create() on
systems with large RAM and huge number of dbufs per dnode.

Avoid few accesses to dbuf_caches[].size, which is highly congested
under high IOPS and never stays in cache for a long time.  Use local
value we are receiving from zfs_refcount_add_many() any way.

Remove cache_size_bytes_max bump from dbuf_evict_one().  I don't see
a point to do it on dbuf eviction after we done it on insertion in
dbuf_rele_and_unlock().

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #9931
2020-02-05 11:08:44 -08:00
Matthew Macy cccbed9f98 Convert dbuf dirty record record list to a list_t
Additionally pull in state machine comments about
upcoming async cow work.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9902
2020-02-05 11:07:19 -08:00
Alexander Motin 741db5a346 Prepare ks_data before calling kstat_install()
It violated sequence described in kstat.h, and at least on FreeBSD
kstat_install() uses provided names to create the sysctls.  If the
names are not available at the time, it ends up bad.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #9933
2020-02-04 08:49:12 -08:00
韩朴宇 52c487a0e4 Use -Werror to check if the compiler supports specific options
Be default, clang treats unknown warning option as warning.
We need to use -Werror to make it an error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: 12101111 <w12101111@gmail.com>
Closes #9927
2020-02-04 08:47:37 -08:00
Ryan Moeller 8c4987c489 Restore aclmode and remove acltype on FreeBSD
This replaces the placeholder ZFS_PROP_PRIVATE with ZFS_PROP_ACLMODE,
matching what is done in the NFSv4 ACLs PR (#9709).

On FreeBSD we hide ZFS_PROP_ACLTYPE, while on Linux we hide
ZFS_PROP_ACLMODE.

The tests already assume this arrangement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9913
2020-02-04 08:40:07 -08:00
Ryan Moeller 07bc2bc231 Fix const-correctness in raidz math
Clang warns (errors) that "cast from 'const void *' to 'struct v *'
drops const qualifier."

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9917
2020-02-03 10:52:41 -08:00
Matthew Ahrens ec21397127 async zvol minor node creation interferes with receive
When we finish a zfs receive, dmu_recv_end_sync() calls
zvol_create_minors(async=TRUE).  This kicks off some other threads that
create the minor device nodes (in /dev/zvol/poolname/...).  These async
threads call zvol_prefetch_minors_impl() and zvol_create_minor(), which
both call dmu_objset_own(), which puts a "long hold" on the dataset.
Since the zvol minor node creation is asynchronous, this can happen
after the `ZFS_IOC_RECV[_NEW]` ioctl and `zfs receive` process have
completed.

After the first receive ioctl has completed, userland may attempt to do
another receive into the same dataset (e.g. the next incremental
stream).  This second receive and the asynchronous minor node creation
can interfere with one another in several different ways, because they
both require exclusive access to the dataset:

1. When the second receive is finishing up, dmu_recv_end_check() does
dsl_dataset_handoff_check(), which can fail with EBUSY if the async
minor node creation already has a "long hold" on this dataset.  This
causes the 2nd receive to fail.

2. The async udev rule can fail if zvol_id and/or systemd-udevd try to
open the device while the the second receive's async attempt at minor
node creation owns the dataset (via zvol_prefetch_minors_impl).  This
causes the minor node (/dev/zd*) to exist, but the udev-generated
/dev/zvol/... to not exist.

3. The async minor node creation can silently fail with EBUSY if the
first receive's zvol_create_minor() trys to own the dataset while the
second receive's zvol_prefetch_minors_impl already owns the dataset.

To address these problems, this change synchronously creates the minor
node.  To avoid the lock ordering problems that the asynchrony was
introduced to fix (see #3681), we create the minor nodes from open
context, with no locks held, rather than from syncing contex as was
originally done.

Implementation notes:

We generally do not need to traverse children or prefetch anything (e.g.
when running the recv, snapshot, create, or clone subcommands of zfs).
We only need recursion when importing/opening a pool and when loading
encryption keys.  The existing recursive, asynchronous, prefetching code
is preserved for use in these cases.

Channel programs may need to create zvol minor nodes, when creating a
snapshot of a zvol with the snapdev property set.  We figure out what
snapshots are created when running the LUA program in syncing context.
In this case we need to remember what snapshots were created, and then
try to create their minor nodes from open context, after the LUA code
has completed.

There are additional zvol use cases that asynchronously own the dataset,
which can cause similar problems.  E.g. changing the volmode or snapdev
properties.  These are less problematic because they are not recursive
and don't touch datasets that are not involved in the operation, there
is still potential for interference with subsequent operations.  In the
future, these cases should be similarly converted to create the zvol
minor node synchronously from open context.

The async tasks of removing and renaming minors do not own the objset,
so they do not have this problem.  However, it may make sense to also
convert these operations to happen synchronously from open context, in
the future.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-65948
Closes #7863
Closes #9885
2020-02-03 09:33:14 -08:00
Prakash Surya 147622b2f1 Disable "-fipa-sra" when using "--enable-debuginfo"
To better enable dynamic tracing tools (e.g. "bpftrace") this change
disables the "-fipa-sra" compilation optimization. This way, function
signatures are not changed by the compiler, which allows us to better
attach to kprobes and kretprobes with dynamic tracing tools. Otherwise,
the compiler may append ".isra" to the function name, and possibly
change the function arguments as well.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #9921
2020-02-03 09:21:05 -08:00
Ryan Moeller 591505dc27 ZTS: Only use ext4 on Linux
And while here, add a workaround for FreeBSD to ensure dirty data is
synced before taking a snapshot, by remounting read-only, then remount
again read-write after the snapshot.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9914
2020-01-31 09:00:47 -08:00
Ryan Moeller fe7c15985b Left-align index props
Index type props display as strings, which should be aligned to the
left not to the right.

Before:
```
FreeBSD-13_0-CURRENT-r356528 ➜  ~ zfs list -ro name,aclmode,mountpoint
NAME        ACLMODE  MOUNTPOINT
p0      passthrough  /p0
p0/foo      discard  /p0/foo
```

After:
```
FreeBSD-13_0-CURRENT-r356528 ➜  ~ zfs list -ro name,aclmode,mountpoint
NAME    ACLMODE      MOUNTPOINT
p0      passthrough  /p0
p0/foo  discard      /p0/foo
```

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9912
2020-01-31 08:55:51 -08:00
Ryan Moeller 12430796d6 Use the correct spelling of 'jailed' in tests
FreeBSD has jails, not zones.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9899
2020-01-31 08:52:29 -08:00
Ryan Moeller a3bddd49f8 ZTS: Fix a few defaults
Linux was missing a default value for DEV_DSKDIR. Set it to /dev.
Fix resulting fallout.

SLICE_PREFIX seems like a good candidate for including in the defaults.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9898
2020-01-31 08:51:23 -08:00
Ryan Moeller 9d8ce2457d ZTS: FreeBSD tunable for DISABLE_IVSET_GUID_CHECK
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9907
2020-01-29 11:25:15 -08:00
Ryan Moeller 395a6b19d3 ZTS: Simplify zero_partitions on FreeBSD
We can avoid a great deal of `sleep 3` by simply destroying the whole
partition table in one shot with gpart.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9908
2020-01-29 11:24:18 -08:00
Ryan Moeller 74b4d349b3 ZTS: Reverse constrained path lookup order
FreeBSD base system zfs utils are in /sbin. ZoF utils install to
/usr/local/sbin.

Ensure we link to the ZoF utils not the base utils when searching for
utils to constrain paths to for the tests.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9906
2020-01-29 11:23:20 -08:00
Ryan Moeller 0ecd910923 ZTS: Don't use edonr on FreeBSD
FreeBSD doesn't support feature@edonr.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9901
2020-01-28 08:38:02 -08:00
Ryan Moeller 7a298ae975 ZTS: Eliminate random and shuf, consolidate code
Both GNU and FreeBSD sort have -R to randomize input.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9900
2020-01-28 08:36:33 -08:00
Graham Christensen 25df8fb42f Add an .editorconfig; document git whitespace settings
Most of the projects I work on don't use tabs, and while authoring my
first patch I had to wrestle with my editor to not introduce
whitespace editors.

The `.editorconfig` file is supported by a large number of editors
out of the box, and many more with plugins.

As a first-time contributor, I can't say for certain these settings
are totally correct, but thus far git and my editor are satisfied
enough.

I considered adding `git config --local format.signOff true` but
wanted to respect the warning:

    format.signOff
        A boolean value which lets you enable the -s/--signoff
        option of format-patch by default.  Note: Adding the
        Signed-off-by: line to a patch should be a conscious act and
        means that you certify you have the rights to submit this
        work under the same open source license. Please see the
        SubmittingPatches document for further discussion.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Graham Christensen <graham@grahamc.com>
Closes #9892
2020-01-27 13:32:52 -08:00
Ryan Moeller e4c439021f ZTS: Move more tests out of common.run
These tests won't run on all platforms as currently implemented:
 * add_nested_replacing_spare (needs zed)
 * fault (needs zed)
 * mmp (needs multihost_history)
 * umount_unlink_drained (needs procfs)
 * zpool_expand (needs udev events and zed)
 * zpool_reopen (needs scsi_debug)
 * zvol_swap_003_pos (needs to modify vfstab)
 * zvol_swap_00[56]_pos (needs swaphigh/swaplen)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9871
2020-01-27 13:29:25 -08:00
Ned Bass a3403164d7 zdb: add support for object ranges for zdb -d
Allow a range of object identifiers to dump with -d. This may
be useful when dumping a large dataset and you want to break
it up into multiple phases, or to resume where a previous scan
left off. Object type selection flags are supported to reduce
the performance overhead of verbosely dumping unwanted objects,
and to reduce the amount of post-processing work needed to
filter out unwanted objects from zdb output.

This change extends existing syntax in a backward-compatible
way. That is, the base case of a range is to specify a single
object identifier to dump. Ranges and object identifiers can
be intermixed as command line parameters.

Usage synopsis:

    Object ranges take the form <start>:<end>[:<flags>]
        start    Starting object number
        end      Ending object number, or -1 for no upper bound
        flags    Optional flags to select object types:
         A    All objects (this is the default)
         d    ZFS directories
         f    ZFS files
         m    SPA space maps
         z    ZAPs
         -    Negate effect of next flag

Examples:

 # Dump all file objects
 zdb -dd tank/fish 0:-1:f

 # Dump all file and directory objects
 zdb -dd tank/fish 0:-1:fd

 # Dump all types except file and directory objects
 zdb -dd tank/fish 0:-1:A-f-d

 # Dump object IDs in a specific range
 zdb -dd tank/fish 1000:2000

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #9832
2020-01-24 11:00:46 -08:00
Tony Nguyen 8e9e90bba3 Performance tests, some variables missing PERF_ prefix
Adding the expected PERF_ prefix to RANDSEED, COMPPERCENT,
and COMPCHUNK.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #9877
2020-01-23 21:18:02 -08:00
John Wren Kennedy 0d37c2bb2e ZTS: zpool offline requires whole disk name
When used with non-loop devices, zdb_004_pos fails because the disk
argument provided is the partition rather than the expected whole disk
name.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #9876
2020-01-23 21:14:55 -08:00
Christian Schwarz 20ea8540a6 dsl_bookmark_create_check: fix NULL pointer deref if dbca_errors == NULL
Discovered in preparation of zcp support for creating bookmarks.
Handle the case where dbca_errors is NULL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9880
2020-01-23 21:13:42 -08:00
Christian Schwarz 3aea3c9d54 entity_namecheck: doc comment: include space as allowed character
The helper function valid_char already allows it but
the doc comment was out of date.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9879
2020-01-23 21:11:54 -08:00
Christian Schwarz 603059e41a scripts/zfs-test.sh: example for -t
Add an example for running a single test case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9878
2020-01-23 21:11:05 -08:00
Ryan Moeller dbfec5cc09 ZTS: Get xattr tests running on FreeBSD
This mostly involves reworking platform checks to make illumos the
exception (thanks to their unusual way of exposing xattrs). Other
platforms are able to take advantage of the recently added xattr
wrappers in libtest.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9872
2020-01-23 17:14:40 -08:00
Romain Dolbeau 35b07497c6 Add AltiVec RAID-Z
Implements the RAID-Z function using AltiVec SIMD.
This is basically the NEON code translated to AltiVec.

Note that the 'fletcher' algorithm requires 64-bits
operations, and the initial implementations of AltiVec
(PPC74xx a.k.a. G4, PPC970 a.k.a. G5) only has up to
32-bits operations, so no 'fletcher'.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes #9539
2020-01-23 11:01:24 -08:00
Ryan Moeller 1a69856034 ZTS: Add missing export for DEV_DSKDIR default
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9873
2020-01-23 09:38:09 -08:00
Christian Schwarz 0ea03c7c82 dmu_send: redacted: fix memory leak on invalid redaction/from bookmark
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9867
2020-01-23 09:34:31 -08:00
Christian Schwarz f658f61c72 cmd/zfs: redact: better error message for common usage errors
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9867
2020-01-23 09:33:53 -08:00
Christian Schwarz 7b53e2e5a9 cmd/zfs: send: meaningful error message for incorrect redaction bookmark
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9867
2020-01-23 09:33:10 -08:00
Matthew Macy 3d91490f7c Simplify FreeBSD's locking requirements in zfs_replay.c
Now that the FreeBSD zfs_vnops code avoids asserting that
a vnode lock is held when z_replay is true we can limit
the FreeBSD specific changes to the couple of changes
where it is necessary to drop the vnode locks because
a function returns with it held.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9865
2020-01-22 17:55:56 -08:00
Jason King e2ef1cbf04 Support inheriting properties in channel programs
This adds support in channel programs to inherit properties analogous
to `zfs inherit` by adding `zfs.sync.inherit` and `zfs.check.inherit`
functions to the ZFS LUA API.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason King <jason.king@joyent.com>
Closes #9738
2020-01-22 17:03:17 -08:00
Richard Laager 79add96766 Order zfs-import-*.service after multipathd
If someone is using both multipathd and ZFS, they are probably using
them together.  Ordering the zpool imports after multipathd is ready
fixes import issues for multipath configurations.

Tested-by: Mike Pastore <mike@oobak.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #9863
2020-01-22 12:45:25 -08:00
Matthew Macy 5206b8228e Disable get_numeric_property for xattr on FreeBSD
FreeBSD doesn't have a mount flag for determining the
disposition of xattr. Disable so that it is fetched
by the default route so that 'zfs get xattr' returns
the correct value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9862
2020-01-21 15:06:10 -08:00
Matthew Macy af26a86958 Update tunable macro usage for disable_ivset_guid_check
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9861
2020-01-21 15:05:23 -08:00
Matthew Macy d3c1e45b7a Re-consolidate zio_delay_interrupt
With recent SPL changes there is no longer any need for a per
platform version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9860
2020-01-21 15:04:13 -08:00
Ryan Moeller 09436c5d88 ZTS: Eliminate partitioning in zpool_import setup
There doesn't seem to be a need for this complexity.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9854
2020-01-17 12:41:45 -08:00
Ryan Moeller 914bd02f29 ZTS: Make DVA pattern in zdb tests more robust
Ensure the capture ends at the first DVA in case there are multiple
DVAs on the same line by only capturing up to the first '>' character.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9851
2020-01-17 12:41:03 -08:00
Brian Behlendorf 70835c5b75 Unify target_cpu handling
Over the years several slightly different approaches were used
in the Makefiles to determine the target architecture.  This
change updates both the build system and Makefile to handle
this in a consistent fashion.

TARGET_CPU is set to i386, x86_64, powerpc, aarch6 or sparc64
and made available in the Makefiles to be used as appropriate.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9848
2020-01-17 12:40:09 -08:00
Brian Behlendorf e5030fbc28 ZTS: Enable zpool_create_008_pos.ksh
Remove the blkid version check from zpool_create_008_pos.ksh
so the test case will not be skipped.

All versions of blkid tested by the CI are either new enough
to not suffer from this issue, or have been patched as is
the case with CentOS 7 (libblkid-2.23.2-61).

Additionally, add a block_device_wait after device partitioning
to ensure the expected partitions will exist.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9853
2020-01-17 12:38:13 -08:00
Ryan Moeller 31712a7ea8 ZTS: Fix incorrect is_physical_device usage
This check isn't meant to be used for command substitution.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9844
2020-01-16 13:26:26 -08:00
Paul Zuchowski f12e42cccf zdb -d should accept the numeric objset id
As an alternative to the dataset name, zdb now allows the decimal 
or hexadecimal objset ID to be specified.  When permanent errors
are reported as 2 hexadecimal numbers (objset ID : object ID) in 
zpool status; you can now use 'zdb <pool>[/objset ID] object' to
determine the names of the objset and object which have the error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9733
2020-01-16 09:22:49 -08:00
Tony Nguyen 1b64627e73 ZFS performance suite should use JSON fio output
Making the default FIO output format be JSON thus easier to post process
performance results. To get previous 'normal' output format,
PERF_FIO_FORMAT can be set prior to invoking zfs-tests.sh. For example:

'PERF_FIO_FORMAT=normal ./zfs-tests.sh -T perf -r ./runfiles/perf.run'

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #9847
2020-01-16 09:16:16 -08:00
Ryan Moeller 1bb5f5e2b4 ZTS: Fix ksh path in btree tests
Every other ksh script has /bin/ksh in the shebang.
If we're going to assume a path, we can at least be consistent.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9845
2020-01-15 16:23:29 -08:00
Ryan Moeller 6f481612f3 ZTS: Avoid using PCRE with grep in zdb tests
On FreeBSD grep does not support Perl extensions

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9841
2020-01-15 09:27:22 -08:00
Ryan Moeller 83c30248ef ZTS: Fix is_physical_device on FreeBSD
This should have been using egrep.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9840
2020-01-15 09:26:26 -08:00
Ryan Moeller 900010daf1 ZTS: Remove some path workarounds for FreeBSD
These are no longer needed after fixing device name matching for whole
disks in libzutil on FreeBSD.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9839
2020-01-15 09:24:55 -08:00
Ryan Moeller 2476f10306 ZTS: Catalog tunable names for tests in tunables.cfg
Update tests to use the variables for tunable names.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9831
2020-01-14 14:57:28 -08:00
Tom Caputi 61152d1069 Fix errata #4 handling for resuming streams
Currently, the handling for errata #4 has two issues which allow
the checks for this issue to be bypassed using resumable sends.
The first issue is that drc->drc_fromsnapobj is not set in the
resuming code as it is in the non-resuming code. This causes
dsl_crypto_recv_key_check() to skip its checks for the
from_ivset_guid. The second issue is that resumable sends do not
clean up their on-disk state if they fail the checks in
dmu_recv_stream() that happen before any data is received.

As a result of these two bugs, a user can attempt a resumable send
of a dataset without a from_ivset_guid. This will fail the initial
dmu_recv_stream() checks, leaving a valid resume state. The send
can then be resumed, which skips those checks, allowing the receive
to be completed.

This commit fixes these issues by setting drc->drc_fromsnapobj in
the resuming receive path and by ensuring that resumablereceives
are properly cleaned up if they fail the initial dmu_recv_stream()
checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9818 
Closes #9829
2020-01-14 12:25:20 -08:00
Richard Laager f744f36ce5 Document zfs change-key caveats
As discussed on the 2019-01-07 OpenZFS Leadership Meeting, we need to be
clear about the limitations of `zfs change-key`.  Changing the user key
does not change the master key, nor does it currently overwrite the old
wrapped master key on disk.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #9819
2020-01-14 10:11:07 -08:00
loli10K 7e2da7786e KMC_KVMEM disrupts kv_alloc() memory alignment expectations
On kernels with KASAN enabled the following failure can be observed as
soon as the zfs module is loaded:

  VERIFY(IS_P2ALIGNED(ptr, PAGE_SIZE)) failed
  PANIC at spl-kmem-cache.c:228:kv_alloc()

The problem is kmalloc() has never guaranteed aligned allocations; this
requirement resulted in zfsonlinux/spl@8b45dda which removed all
kmalloc() usage in kv_alloc().

Until a GFP_ALIGNED flag (or equivalent functionality) is provided by
the kernel this commit partially reverts 66955885 and 6d948c35 to
prevent k(v)malloc() allocations in kv_alloc().

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Michael Niewöhner <foss@mniewoehner.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9813
2020-01-14 09:09:59 -08:00
Kyle Evans 68a192e4b7 libzfs: add zfs_mount_at() function
zfs_mount_at() mounts a dataset at an arbitrary mountpoint rather than
at the configured mountpoint. This may be used by consumers that wish to
temporarily expose a dataset at another mountpoint without altering
dataset/pool properties.

This will be used by FreeBSD's libbe be_mount(), which mounts a boot
environment at an arbitrary mountpoint.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #9833
2020-01-14 08:49:54 -08:00
Brian Behlendorf e458fcca75 Change http://zfsonlinux.org links to https://zfsonlinux.org
Update the project website links contained in to repository to
reference the secure https://zfsonlinux.org address.

Reviewed-By: Richard Laager <rlaager@wiktel.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9837
2020-01-13 16:43:59 -08:00
Brian Behlendorf a4c3e3c74f ZTS: Remove obsolete zts-report.py exceptions
The disk_reason and udev_reason exceptions can be removed since
they apply to now unsupported kernel versions (<v3.10).

The checks in the test cases were kept for the purposes of
documentation and as useful sanity checks for the test environment.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9828
2020-01-13 16:42:23 -08:00
Ryan Moeller 602f667285 ZTS: Clean up properties.shlib a bit
Fixes the last property having an empty value on FreeBSD and makes the
code a bit more readable.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9834
2020-01-13 15:21:42 -08:00
Ryan Moeller 6e1c594d64 ZTS: Create xattr helpers to hide platform
Create xattr helpers to hide platform and update usage in tests.

This does not generally aim to enable all xattr tests yet, but it is a
necessary step in that direction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9826
2020-01-10 13:24:59 -08:00
Tom Caputi ba0ba69e50 Add 'zfs send --saved' flag
This commit adds the --saved (-S) to the 'zfs send' command.
This flag allows a user to send a partially received dataset,
which can be useful when migrating a backup server to new
hardware. This flag is compatible with resumable receives, so
even if the saved send is interrupted, it can be resumed.
The flag does not require any user / kernel ABI changes or any
new feature flags in the send stream format.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Reviewed-by: Christian Schwarz <me@cschwarz.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9007
2020-01-10 10:16:58 -08:00
Brian Behlendorf 9ab6109fb5 ZTS: Improve zts-auto_offline_001_pos
The zts-auto_offline_001_pos test could exceed the 10 minute test
limit and be KILLED by the test infrastructure.  To prevent this
speed up the test case by:

* Removing redundant pool configurations.  Each of the following
  vdev types is tested once: mirror, raidz, cache, and special.

* The block_device_wait function need only wait on the block
  device which has been removed as part of the test.

Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9827
2020-01-10 09:14:49 -08:00
Kjeld Schouten-Lebbing 36e5b4a94b Performance tests, fio enhancements
- Set fixed chunk pattern, for sane compression
- Adjust buffer to blocksize, for cross blocksize repeatability
- Use fixed seed, for improved repeatability
- Move comp-percent and comp-chunk to variables
- set variables (mostly) to old defaults

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #9793
2020-01-09 11:21:33 -08:00
Ryan Moeller 90ae48733c ZTS: Provide an alternative to shuf for FreeBSD
There was a shuf package but the upstream for the port has recently
disappeared, so it is no longer available.

Create a function to hide the usage of shuf. Implement using seq|random
on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9824
2020-01-09 09:31:17 -08:00
Nick Black 4abd7d80b2 Remove parameter names from libzfs.h signatures
Most of libzfs.h doesn't provide names for the parameters
in its signatures. These few functions included them. That
wouldn't be a problem, per se, but the 'lines' parameter
conflicts with the 'lines' #define from terminfo's term.h,
present for at least a decade. This makes it difficult to
compile code making use of both ZFS and terminfo.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Nick Black <dankamongmen@gmail.com>
Closes #9821
2020-01-08 17:50:05 -08:00
Ryan Moeller f8d55b95a5 ZTS: Eliminate functions named 'random'
The name overlaps with a command needed by FreeBSD.
There is also no sense having two 'random' functions that do nearly
the same thing, so consolidate to just the more general one and name
it 'random_int_between'.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9820
2020-01-08 09:08:30 -08:00
lorenz 028e3b3b1a Avoid here-documents in systemd mount generator
On some systems - openSUSE, for example - there is not yet a writeable
temporary file system available, so bash bails out with an error,

  'cannot create temp file for here-document: Read-only file system',

on the here documents in zfs-mount-generator. The simple fix is to
change these into a multi-line echo statement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Richard Laager <rlaager@wiktel.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Lorenz Hüdepohl <dev@stellardeath.org>
Closes #9802
2020-01-07 17:03:00 -08:00
DeHackEd 67709516db zfs-module-parameters(5): add all missing items
I ran a report against the output of `modinfo zfs.ko`. This commit adds
everything missing and corrects a few renamed module parameters.
Specifically:

* zfs_checksums_per second renamed in ad796b8a3
* vdev_ms_count_limit renamed in c853f382d

Also fixes some variable type inconsistencies (unsigned int => uint)

Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #9809
2020-01-07 12:41:53 -08:00
Ryan Moeller 798b2b8477 ZTS: Add sunos.run to dist data
This was missed when the file was introduced.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9815
2020-01-07 12:38:12 -08:00
loli10K c24fa4b19a Fix "zpool add -n" for dedup, special and log devices
For dedup, special and log devices "zpool add -n" does not print
correctly their vdev type:

~# zpool add -n pool dedup /tmp/dedup special /tmp/special log /tmp/log
would update 'pool' to the following configuration:
	pool
	  /tmp/normal
	  /tmp/dedup
	  /tmp/special
	  /tmp/log

This could lead storage administrators to modify their ZFS pools to
unexpected and unintended vdev configurations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9783 
Closes #9390
2020-01-06 15:40:06 -08:00
Brian Behlendorf bc9cef11fd Fix QAT allocation failure return value
When qat_compress() fails to allocate the required contiguous memory
it mistakenly returns success.  This prevents the fallback software
compression from taking over and (un)compressing the block.

Resolve the issue by correctly setting the local 'status' variable
on all exit paths.  Furthermore, initialize it to CPA_STATUS_FAIL
to ensure qat_compress() always fails safe to guard against any
similar bugs in the future.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9784
Closes #9788
2020-01-06 11:17:53 -08:00
Brian Behlendorf 581ca28169 ZTS: Cleanup partition tables
The cleanup_devices function should remove any partitions created
on the device and force the partition table to be reread.  This
is needed to ensure that blkid has an up to date version of what
devices and partitions are used by zfs.

The cleanup_devices call was removed from inuse_008_pos.ksh since
it operated on partitions instead of devices and was not needed.

Lastly ddidecode may be called by parted and was therefore added
to the constrained path.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9806
2020-01-06 11:14:19 -08:00
Brian Behlendorf 33dc49e0c0 Fix ZPOOL_VDEV_NAME_PATH option description
The corresponding zpool status option is -P and not -p.  Update
this description to reference the correct option.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9803
2020-01-06 10:43:32 -08:00
Ryan Moeller 665684d721 ZTS: Add helper for disk device check
Replace `test -b` and equivalents with `is_disk_device`, so that `-c`
is used instead on FreeBSD which has no block cache layer for devices.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9795
2020-01-03 09:10:17 -08:00
Ryan Moeller d7164b27be ZTS: Move dumpdev tests to sunos.run
Neither FreeBSD nor Linux support dumping to zvols.

DilOS still uses these tests, so the files are kept and the tests have
been relocated to sunos.run.

An `is_illumos` function was added to libtest.shlib to eliminate some
awkward platform checks.

A few functions that are not expected to be used outside of illumos
have been sanitized of extraneous FreeBSD adaptations.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9794
2020-01-03 09:08:23 -08:00
Brian Behlendorf cc618d179e Static symbols exported by ICP
The crypto_cipher_init_prov and crypto_cipher_init are declared static
and should not be exported by the ICP.  This resolves the following
warnings observed when building with the 5.4 kernel.

WARNING: "crypto_cipher_init" [.../icp] is a static EXPORT_SYMBOL
WARNING: "crypto_cipher_init_prov" [.../icp] is a static EXPORT_SYMBOL

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9791
2020-01-02 18:08:45 -08:00
Ryan Moeller 54007c791f Add FreeBSD core handling in zloop.sh
And use the correct path to libtool and ztest.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9790
2020-01-02 13:48:06 -08:00
Kjeld Schouten-Lebbing 82e996c261 Improve Pull Request guidelines
- Splits PR advice into two sections.
- Add "co-authored-by" instructions.
- Add description of draft PR and when using it is appropriate.
- Reword ZFS Test Suite checklist question.
- Link to zfs-tests.sh and zloop.sh.

Reviewed-By: Marcel Schilling <marcel.schilling@mdc-berlin.de>
Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Richard Laager <rlaager@wiktel.com>
Co-Authored-By: Marcel Schilling <marcel.schilling@mdc-berlin.de>
Co-Authored-By: Brian Behlendorf <behlendorf1@llnl.gov>
Co-Authored-By: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #9753
2019-12-30 09:24:41 -08:00
Ned Bass 8b3438e503 zdb: print block checksums with 6 d's of verbosity
Include checksums in the output of 'zdb -dddddd' along
with other indirect block information already displayed.

Example output follows (with long lines trimmed):

$ zdb -dddddd tank/fish 128
Dataset tank/fish [ZPL], ID 259, cr_txg 10, 16.2M, 93 objects, rootbp DV

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
       128    2   128K   128K   634K     512     1M  100.00  ZFS plain f
                                               168   bonus  System attri
    dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
    dnode maxblkid: 7
    path    /c
    uid     0
    gid     0
    atime    Sat Dec 21 10:49:26 2019
    mtime    Sat Dec 21 10:49:26 2019
    ctime    Sat Dec 21 10:49:26 2019
    crtime    Sat Dec 21 10:49:26 2019
    gen    41
    mode    100755
    size    964592
    parent    34
    links    1
    pflags    40800000104
Indirect blocks:
               0 L1  0:2c0000:400 0:c021e00:400 20000L/400P F=8 B=41/41
               0  L0 0:227800:13800 20000L/13800P F=1 B=41/41 cksum=167a
           20000  L0 0:25ec00:17c00 20000L/17c00P F=1 B=41/41 cksum=2312
           40000  L0 0:276800:18400 20000L/18400P F=1 B=41/41 cksum=24e0
           60000  L0 0:2a7800:18800 20000L/18800P F=1 B=41/41 cksum=25be
           80000  L0 0:28ec00:18c00 20000L/18c00P F=1 B=41/41 cksum=2579
           a0000  L0 0:24d000:11c00 20000L/11c00P F=1 B=41/41 cksum=140a
           c0000  L0 0:23b000:12000 20000L/12000P F=1 B=41/41 cksum=164e
           e0000  L0 0:221e00:5a00 20000L/5a00P F=1 B=41/41 cksum=9de790

        segment [0000000000000000, 0000000000100000) size    1M

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #9765
2019-12-30 09:14:40 -08:00
Ben Cordero 153db76197 zfs-load-key.sh: ${ZFS} is not the zfs binary
A change[1] was merged yesterday that should refer
to the zfs binary in the initramfs, but is actually
an unset shell variable.

This commit changes this line to call `zfs` directly
like the surrounding code.

[1]: cb5b875b273235a4a3ed28e16f416d5bb8865166

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ben Cordero <bencord0@condi.me>
Closes #9780
2019-12-29 11:25:00 -08:00
Brian Behlendorf 2c47bd575a ZTS: Fix pool_state cleanup
The externally faulted vdev should be brought back online and have
its errors cleared before the pool is destroyed.  Failure to do so
will leave a vdev with a valid active label.  This vdev may then
not be used to create a new pool without the -f flag potentially
leading to subsequent test failures.

Additionally remove an unreachable log_pass from setup.ksh.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9777
2019-12-28 08:43:23 -08:00
Brian Behlendorf edb24bec3b ZTS: Replace /var/tmp with $TEST_BASE_DIR
Remove a few hardcoded instances of /var/tmp.  This should use
the $TEST_BASE_DIR in order to allow the ZTS to be optionally
run in an alternate directory using `zfs-tests.sh -d <path>`.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9775
2019-12-27 16:41:16 -08:00
Brian Behlendorf a6f6ef8bda ZTS: zfs_program_json
As of Python 3.5 the default behavior of json.tool was changed to
preserve the input order rather than lexical order.  The test case
expects the output to be sorted so apply the --sort-keys option
to the json.tool command when using Python 3.5 and the option is
supported.

    https://docs.python.org/3/library/json.html#module-json.tool

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9774
2019-12-27 12:12:41 -08:00
Brian Behlendorf 590ff61a55 ZTS: devices_001_pos and devices_002_neg
Update the devices_001_pos and devices_002_neg test cases such that the
special block device file created is backed by a ZFS volume.  Specifying
a specific device allows the major and minor numbers to be easily
determined.  Furthermore, this avoids the potentially dangerous behavior
of opening the first block device we happen to find under /dev/.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9773
2019-12-27 12:11:27 -08:00
Steve Mokris d5c97f3de7 Avoid some crashes when importing a pool with corrupt metadata
- Skip invalid DVAs when importing pools in readonly mode
  (in addition to when the config is untrusted).

- Upon encountering a DVA with a null VDEV, fail gracefully
  instead of panicking with a NULL pointer dereference.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Steve Mokris <smokris@softpixel.com>
Closes #9022
2019-12-26 10:57:05 -08:00
sam-lunt ad353e2147 In initramfs, do not prompt if keylocation is "file://"
If the encryption key is stored in a file, the initramfs should not
prompt for the password. For example, this could be the case if the boot
partition is stored on removable media that is only present at boot time

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Sam Lunt <samuel.j.lunt@gmail.com>
Closes #9764
2019-12-26 10:55:20 -08:00
Nick Black 8cda5c5ce9 libspl: declare aok extern in header
Rather than defining a new instance of 'aok' in every compilation
unit which includes this header, there is a single instance
defined in zone.c, and the header now only declares an extern.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Nick Black <dankamongmen@gmail.com>
Closes #9752
2019-12-26 10:52:14 -08:00
Brian Behlendorf 635a01aafd Cancel initialize and TRIM before vdev_metaslab_fini()
Any running 'zpool initialize' or TRIM must be cancelled prior
to the vdev_metaslab_fini() call in spa_vdev_remove_log() which
will unload the metaslabs and set ms->ms_group == NULL.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8602
Closes #9751
2019-12-26 10:50:23 -08:00
Brian Behlendorf 80bde2c4ba ZTS: Test case failures
* large_dnode_008_pos - Force a pool sync before invoking zdb to
  ensure the updated dnode blocks have been persisted to disk.

* refreserv_raidz - Wait for the /dev/zvol links to be both created
  and removed, this is important because the same device volume
  names are being used repeatedly.

* btree_test - Add missing .gitignore file for btree_test binary.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9769
2019-12-26 10:49:07 -08:00
Brian Behlendorf c587b2c7d8 Update maximum kernel version to 5.4
Increase the maximum supported kernel version to 5.4.  This was
verified using the Fedora 5.4.2-300.fc31.x86_64 kernel.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9754
Closes #9759
2019-12-23 14:24:36 -08:00
Ryan Moeller 54aefa6abf Fix test pattern in zts-report.py
The pattern was not updated to match when the test output changed to
include a platform identifier for platform specific tests.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9750
2019-12-20 09:31:59 -08:00
Tony Hutter 9fb2771aa5 Colorize zpool status output
If the ZFS_COLOR env variable is set, then use ANSI color
output in zpool status:

- Column headers are bold
- Degraded or offline pools/vdevs are yellow
- Non-zero error counters and faulted vdevs/pools are red
- The 'status:' and 'action:' sections are yellow if they're
  displaying a warning.

This also includes a new 'faketty' function in libtest.shlib that is
compatible with FreeBSD (code provided by @freqlabs).

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #9340
2019-12-19 16:26:07 -08:00
Brian Behlendorf 5e8ac05590 ZTS: Various test case fixes
* devices_001_pos and devices_002_neg - Failing after FreeBSD ZTS
  merged due to missing 'function' keyword for create_dev_file_linux.

* pool_state - Occasionally fails due to an insufficient delay
  before checking 'zpool status'.  Increasing the delay from 1 to 3
  seconds resolved the issue in local testing.

* procfs_list_basic - Fails when run in-tree because the logged
  command is actually 'lt-zfs'.  Updated the regex accordingly.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9748
2019-12-19 15:32:56 -08:00
John Wren Kennedy 523fc80069 Tests for btree implementation used by range trees
Additional test cases for the btree implementation, see #9181.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #9717
2019-12-19 11:53:54 -08:00
Ryan Moeller a3640486ff Update zfs.sh work on FreeBSD
Extend the zfs.sh script to load and unload zfs kmods on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9746
2019-12-19 09:31:16 -08:00
Brian Behlendorf d16a207f2e cppcheck: (warning) Possible null pointer dereference: nvh
Move the 'nvh = (void *)buf' assignment after the 'buf == NULL'
check to resolve the warning.  Interestingly, cppcheck 1.88
correctly determines that the existing code is safe, while
cppcheck 1.86 reports the warning.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9732
2019-12-18 17:25:57 -08:00
Brian Behlendorf 487bddad67 cppcheck: (error) Address of local auto-variable assigned
Suppress autoVariables warnings in the lua interpreter.  The usage
here while unconventional in intentional and the same as upstream.

[module/lua/ldebug.c:327]: (error) Address of local auto-variable
    assigned to a function parameter.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9732
2019-12-18 17:25:42 -08:00
Brian Behlendorf 1e49b288cb cppcheck: (error) Null pointer dereference: who_perm
As indicated by the VERIFY the local who_perm variable can never
be NULL in parse_fs_perm().  Due to the existence of the is_set
conditional, which is always true, cppcheck 1.88 was reporting
a possible NULL reference.  Resolve the issue by removing the
extraneous is_set variable.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9732
2019-12-18 17:25:23 -08:00
Brian Behlendorf 070402f112 cppcheck: (warning) Possible null pointer dereference: dnp
The dnp argument can only be set to NULL when the DNODE_DRY_RUN flag
is set.  In which case, an early return path will be executed and a
NULL pointer dereference at the given location is impossible.  Add
an additional ASSERT to silence the cppcheck warning and document
that dbp must never be NULL at the point in the function.

[module/zfs/dnode.c:1566]: (warning) Possible null pointer deref: dnp

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9732
2019-12-18 17:25:13 -08:00
Brian Behlendorf fe20400db5 cppcheck: (error) Memory leak: vtoc
Resolve the reported memory leak by using a dedicated local vptr
variable to store the pointer reported by calloc().  Only assign the
passed **vtoc function argument on success, in all other cases vptr
is freed.

[lib/libefi/rdwr_efi.c:403]: (error) Memory leak: vtoc
[lib/libefi/rdwr_efi.c:422]: (error) Memory leak: vtoc
[lib/libefi/rdwr_efi.c:440]: (error) Memory leak: vtoc
[lib/libefi/rdwr_efi.c:454]: (error) Memory leak: vtoc
[lib/libefi/rdwr_efi.c:470]: (error) Memory leak: vtoc

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9732
2019-12-18 17:25:04 -08:00
Ubuntu abfdb83607 cppcheck: (error) Shifting signed 64-bit value by 63 bits
As of cppcheck 1.82 surpress the warning regarding shifting too many
bits for __divdi3() implemention.  The algorithm used here is correct.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9732
2019-12-18 17:24:42 -08:00
Ubuntu 7cf1fe6331 cppcheck: (error) Uninitialized variable
As of cppcheck 1.82 warnings are issued when using the list_for_each_*
functions with an uninitialized variable.  Functionally, this is fine
but to resolve the warning initialize these variables.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9732
2019-12-18 17:24:29 -08:00
Ubuntu 5215fdd43c cppcheck: (error) Uninitialized variable
Resolve the following uninitialized variable warnings.  In practice
these were unreachable due to the goto.  Replacing the goto with a
return resolves the warning and yields more readable code.

[module/icp/algs/modes/ccm.c:892]: (error) Uninitialized variable: ccm_param
[module/icp/algs/modes/ccm.c:893]: (error) Uninitialized variable: ccm_param
[module/icp/algs/modes/gcm.c:564]: (error) Uninitialized variable: gcm_param
[module/icp/algs/modes/gcm.c:565]: (error) Uninitialized variable: gcm_param
[module/icp/algs/modes/gcm.c:599]: (error) Uninitialized variable: gmac_param
[module/icp/algs/modes/gcm.c:600]: (error) Uninitialized variable: gmac_param

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9732
2019-12-18 17:23:54 -08:00
Garrett Fields b8a899e94f Exchanged two "${ZFS} get -H -o value" commands
Initramfs uses "get_fs_value()" elsewhere.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Garrett Fields <ghfields@gmail.com>
Closes #9736
2019-12-18 12:32:31 -08:00
Matthew Macy 7839c4b5e1 Update ZTS to work on FreeBSD
Update the common ZTS scripts and individual test cases as needed 
in order to allow them to be run on FreeBSD.  The high level goal
is to provide compatibility wrappers whenever possible to minimize
changes to individual test cases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9692
2019-12-18 12:29:43 -08:00
Romain Dolbeau 118fc3ef07 Minor performance fix for NEON RAID-Z
The NEON code replicates too closely the SSE code, including
a masked 16-bits shift. But NEON, like AltiVec (#9539), has
unsigned 8-bits shift, so use that instead and drop the masking.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes #9725
2019-12-17 19:34:52 -08:00
Thomas Geppert fe564845c0 Create symbolic links in /dev/disk/by-vdev for nvme disk devices
The existing rules miss nvme disk devices because of the trailing
digits in the KERNEL device name, e.g. nvme0n1. Partitions of nvme
disk devices are already properly handled by the existing rule for
ENV{DEVTYPE}=="partition".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Thomas Geppert <geppi@digitx.de>
Closes #9730
2019-12-17 17:50:20 -08:00
Kjeld Schouten-Lebbing 3035f14d82 Moves Codecov Ignore to LCOV
Rely on ax_code_coverage to exclude test directories.

- Removes broken codecov ignore
- Places ignore section in ax_code_coverage
- Forwards users from codecov to LCOV for ignores

Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #9726
2019-12-17 17:47:58 -08:00
Garrett Fields e8b199faec Force systems with kernel option "quiet" to display prompt for password
On systems that utilize TTY for password entry, if the kernel 
option "quiet" is set, the system would appear to freeze on a 
blank screen, when in fact it is waiting for password entry 
from the user.

Since TTY is the fallback method, this has no effect on systemd 
or plymouth password prompting.

By temporarily setting "printk" to "7", running the command, 
then resuming with the original "printk" state, the user can 
see the password prompt.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Garrett Fields <ghfields@gmail.com>
Closes #9731
2019-12-17 17:45:06 -08:00
Richard Laager ad97643773 initramfs: setup keymapping and video for prompts
From Steve Langasek <steve.langasek@canonical.com>:
> The poorly-named 'FRAMEBUFFER' option in initramfs-tools controls
> whether the console_setup and plymouth scripts are included and used
> in the initramfs. These are required for any initramfs which will be
> prompting for user input: console_setup because without it the user's
> configured keymap will not be set up, and plymouth because you are
> not guaranteed to have working video output in the initramfs without
> it (e.g. some nvidia+UEFI configurations with the default GRUB
> behavior).

> The zfs initramfs script may need to prompt the user for passphrases
> for encrypted zfs datasets, and we don't know definitively whether
> this is the case or not at the time the initramfs is constructed (and
> it's difficult to dynamically populate initramfs config variables
> anyway), therefore the zfs-initramfs package should just set
> FRAMEBUFFER=yes in a conf snippet the same way that the
> cryptsetup-initramfs package does
> (/usr/share/initramfs-tools/conf-hooks.d/cryptsetup).

https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1856408

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Steve Langasek <steve.langasek@canonical.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #9723
2019-12-16 16:54:51 -08:00
Matthew Macy ba434b18ec Fix zfs_xattr_owner_unlinked on FreeBSD and comment
Explain FreeBSD VFS' unfortunate idiosyncratic locking requirements.
There is no functional change for other platforms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9720
2019-12-16 09:49:05 -08:00
Tomohiro Kusumi ddb4e69db5 Don't fail to apply umask for O_TMPFILE files
Apply umask to `mode` which will eventually be applied to inode.
This is needed since VFS doesn't apply umask for O_TMPFILE files.

(Note that zpl_init_acl() applies `ip->i_mode &= ~current_umask();`
only when POSIX ACL is used.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8997 
Closes #8998
2019-12-13 15:02:23 -08:00
Tom Caputi c317c8c811 Allow empty ds_props_obj to be destroyed
Currently, 'zfs list' and 'zfs get' commands can be slow when
working with snapshots that have a ds_props_obj. This is
because the code that discovers all of the properties for these
snapshots needs to read this object for each snapshot, which
almost always ends up causing an extra random synchronous read
for each snapshot. This performance penalty exists even if the
properties on that snapshot have been unset because the object
is normally only freed when the snapshot is freed, even though
it is only created when it is needed.

This patch allows the user to regain 'zfs list' performance on
these snapshots by destroying the ds_props_obj when it no longer
has any entries left. In practice on a production machine, this
optimization seems to make 'zfs list' about 55% faster.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9704
2019-12-13 11:51:39 -08:00
Matthew Macy 13a9a6f5e8 Make zfs_replay.c work on FreeBSD
FreeBSD's vfs currently doesn't permit file systems
to do their own locking. To avoid having to have
duplicate zfs functions with and without locking add
locking here. With luck these changes can be removed
in the future.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9715
2019-12-13 07:54:10 -08:00
Matthew Ahrens 9bb0d89c5c Fix use-after-free of vd_path in spa_vdev_remove()
After spa_vdev_remove_aux() is called, the config nvlist is no longer
valid, as it's been replaced by the new one (with the specified device
removed).  Therefore any pointers into the nvlist are no longer valid.
So we can't save the result of
`fnvlist_lookup_string(nv, ZPOOL_CONFIG_PATH)` (in vd_path) across the
call to spa_vdev_remove_aux().

Instead, use spa_strdup() to save a copy of the string before calling
spa_vdev_remove_aux.

Found by AddressSanitizer:

ERROR: AddressSanitizer: heap-use-after-free on address ...
READ of size 34 at 0x608000a1fcd0 thread T686
    #0 0x7fe88b0c166d  (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5166d)
    #1 0x7fe88a5acd6e in spa_strdup spa_misc.c:1447
    #2 0x7fe88a688034 in spa_vdev_remove vdev_removal.c:2259
    #3 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229
    #4 0x55ffbc769fba in ztest_execute ztest.c:6714
    #5 0x55ffbc779a90 in ztest_thread ztest.c:6761
    #6 0x7fe889cbc6da in start_thread
    #7 0x7fe8899e588e in __clone

0x608000a1fcd0 is located 48 bytes inside of 88-byte region
freed by thread T686 here:
    #0 0x7fe88b14e7b8 in __interceptor_free
    #1 0x7fe88ae541c5 in nvlist_free nvpair.c:874
    #2 0x7fe88ae543ba in nvpair_free nvpair.c:844
    #3 0x7fe88ae57400 in nvlist_remove_nvpair nvpair.c:978
    #4 0x7fe88a683c81 in spa_vdev_remove_aux vdev_removal.c:185
    #5 0x7fe88a68857c in spa_vdev_remove vdev_removal.c:2221
    #6 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229
    #7 0x55ffbc769fba in ztest_execute ztest.c:6714
    #8 0x55ffbc779a90 in ztest_thread ztest.c:6761
    #9 0x7fe889cbc6da in start_thread

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #9706
2019-12-11 15:38:21 -08:00
Ryan Moeller 957c7aa23c Relocate common quota functions to shared code
The quota functions are common to all implementations and can be
moved to common code.  As a simplification they were moved to the
Linux platform code in the initial refactoring.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9710
2019-12-11 12:12:08 -08:00
Matthew Macy 4bc721965f Add FreeBSD jail support hooks
Add the 'zfs jail/unjail' subcommands along with the relevant 
documentation from FreeBSD.  This feature is not supported on
Linux and still requires the match kernel ioctls which will
be included when the FreeBSD platform code is integrated.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9686
2019-12-11 11:58:37 -08:00
Matthew Macy 657ce25357 Eliminate Linux specific inode usage from common code
Change many of the znops routines to take a znode rather
than an inode so that zfs_replay code can be largely shared
and in the future the much of the znops code may be shared.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9708
2019-12-11 11:53:57 -08:00
Paul Zuchowski f0bf435176 zio_decompress_data always ASSERTs successful decompression
This interferes with zdb_read_block trying all the decompression
algorithms when the 'd' flag is specified, as some are
expected to fail.  Also control the output when guessing
algorithms, try the more common compression types first, allow
specifying lsize/psize, and fix an uninitialized variable.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9612 
Closes #9630
2019-12-10 15:51:58 -08:00
Fabian-Gruenbichler b119e2c6f1 SIMD: Use alloc_pages_node to force alignment
fxsave and xsave require the target address to be 16-/64-byte aligned.

kmalloc(_node) does not (yet) offer such fine-grained control over
alignment[0,1], even though it does "the right thing" most of the time
for power-of-2 sizes. unfortunately, alignment is completely off when
using certain debugging or hardening features/configs, such as KASAN,
slub_debug=Z or the not-yet-upstream SLAB_CANARY.

Use alloc_pages_node() instead which allows us to allocate page-aligned
memory. Since fpregs_state is padded to a full page anyway, and this
code is only relevant for x86 which has 4k pages, this approach should
not allocate any unnecessary memory but still guarantee the needed
alignment.

0: https://lwn.net/Articles/787740/
1: https://lore.kernel.org/linux-block/20190826111627.7505-1-vbabka@suse.cz/

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9608 
Closes #9674
2019-12-10 12:53:25 -08:00
Matthew Macy 362ae8d11f Abstract away platform specific superblock references
The zfsvfs->z_sb field is Linux specified and should be abstracted.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9697
2019-12-10 09:21:07 -08:00
John Poduska 7bda69a1a9 ZTS: Fixes for spurious failures of resilver_restart_001 test
The resilver restart test was reported as failing about 2% of the
time. Two issues were found:

- The event log wasn't large enough, so resilver events were missing
- One 'zpool sync' wasn't enough for resilver to start after zinject

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: John Poduska <jpoduska@datto.com>
Issue #9588 
Closes #9677 
Closes #9703
2019-12-10 09:10:36 -08:00
Matthew Macy 3c502d3b75 Exclude data from cores unconditionally and metadata conditionally
This change allows us to align the code dump logic across platforms.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9691
2019-12-09 12:29:56 -08:00
Matthew Macy ea79e90f99 Mark dsl_dataset_deactivate_feature_impl static
The dsl_dataset_deactivate_feature_impl() function is private and
should be marked as such.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9696
2019-12-09 12:26:33 -08:00
Brian Behlendorf a25861dcae ZTS: Fix zpool_reopen_001_pos
Update the vdev_disk_open() retry logic to use a specified number
of milliseconds to be more robust.  Additionally, on failure log
both the time waited and requested timeout to the internal log.

The default maximum allowed open retry time has been increased
from 500ms to 1000ms.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9680
2019-12-09 11:09:14 -08:00
Kjeld Schouten 5927917987 Fixes typo from #9681
Going from #9672 to 9681 I made a typo.  This removes that typo.

    The pattern folder /* will not match recursively in the folder.
    Please use this folder /**/*

    source: https://docs.codecov.io/docs/ignoring-paths

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #9700
2019-12-09 10:10:55 -08:00
Matthew Macy 0dcef9b966 Disable sysfs feature checks on FreeBSD
The sysfs infrastructure for reporting supported features and
properties is Linux specific.  Disable it on FreeBSD until it can
be extended to be more portable.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9684
2019-12-06 09:44:29 -08:00
Kjeld Schouten 69b7e3f92d Set send_realloc_files.ksh to use properties.shlib
This sets send_realloc_files.ksh to use properties.shlib
(like the other compression related tests)

It was missing from #9645

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Issue #9645
Closes #9679
2019-12-06 09:37:48 -08:00
Attila Fülöp 3ac34ca375 ICP: Fix out of bounds write
If gcm_mode_encrypt_contiguous_blocks() is called more than once
in succession, with the accumulated lengths being less than
blocksize, ctx->copy_to will be incorrectly advanced. Later, if
out is NULL, the bcopy at line 114 will overflow
ctx->gcm_copy_to since ctx->gcm_remainder_len is larger than the
ctx->gcm_copy_to buffer can hold.

The fix is to set ctx->copy_to only if it's not already set.

For ZoL the issue may be academic, since in all my testing I wasn't
able to hit neither of both conditions needed to trigger it, but
other consumers can easily do so.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #9660
2019-12-06 09:36:19 -08:00
Kjeld Schouten f784828416 Fix codecov ignore, wrong syntax
The current codecov ignore syntax is incorrect.
Corrected it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #9681
2019-12-06 09:35:02 -08:00
Matthew Macy 1f654753ba Remove stale ASSERTV comment
Followup for #9671, remove stale comment.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Issue #9671 
Closes #9682
2019-12-06 09:33:27 -08:00
Matthew Macy f95704ca5e Disable EDONR on FreeBSD
FreeBSD uses its own crypto framework in-kernel which, at this time,
has no EDONR implementation.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9664
2019-12-05 13:10:29 -08:00
Matthew Macy 054a049841 Add ZFS_IOC offsets for FreeBSD
FreeBSD requires three additional ioctls, they are ZFS_IOC_NEXTBOOT,
ZFS_IOC_JAIL, and ZFS_IOC_UNJAIL.  These have been added after the
Linux-specific ioctls.  The range 0x80-0xFF has been reserved for 
future optional platform-specific ioctls.  Any platform may choose
to implement these as appropriate.

None of the existing ioctl numbers have been changed to maintain
compatibility.  For Linux no vectors have been registered for the
new ioctls and they are reported as unsupported.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9667
2019-12-05 13:06:51 -08:00
Matthew Macy e64e84eca5 Refactor deadman set failmode to be cross platform
Update zfs_deadman_failmode to use the ZFS_MODULE_PARAM_CALL
wrapper, and split the common and platform specific portions.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9670
2019-12-05 12:40:45 -08:00
Matthew Macy 2a8ba608d3 Replace ASSERTV macro with compiler annotation
Remove the ASSERTV macro and handle suppressing unused 
compiler warnings for variables only in ASSERTs using the 
__attribute__((unused)) compiler annotation.  The annotation
is understood by both gcc and clang.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9671
2019-12-05 12:37:00 -08:00
George Amanakis 12395c7b0b Fix reporting of L2ARC hits/misses in arc_summary3
arc_summary3 reports L2ARC hits and misses as Bytes, whereas they
should be reported as events. arc_summary2 reports these correctly.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #9669
2019-12-04 13:24:55 -08:00
Matthew Macy be627fc847 Refactor zfs_context.h to build on FreeBSD
- on Linux move Linux specific headers to zfs_context_os.h
- on FreeBSD move FreeBSD specific definitions to zfs_context_os.h
- remove duplicate tsd_ definitions
- remove unused AT_TYPE

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9668
2019-12-04 13:12:57 -08:00
Kjeld Schouten 618b6adfbf Refactor compression algorithm selection for tests
- Moves compression algorithms for tests to properties.shlib
- Removes all compression algorithms levels from general tests
- Replaces on with lz4 for compression tests
- Removes random algorithm selection, if not needed
- Cleans copyright header formatting

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #9645
2019-12-04 13:10:12 -08:00
Paul Zuchowski 5a08977374 Fix zdb_read_block using zio after it is destroyed
The checksum display code of zdb_read_block uses a zio
to read in the block and then calls zio_checksum_compute.
Use a new zio in the call to zio_checksum_compute not the zio
from the read which has been destroyed by zio_wait.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9644
Closes #9657
2019-12-03 14:37:15 -08:00
Attila Fülöp 54c8366e39 ICP: Fix null pointer dereference and use after free
In gcm_mode_decrypt_contiguous_blocks(), if vmem_alloc() fails,
bcopy is called with a NULL pointer destination and a length > 0.
This results in undefined behavior. Further ctx->gcm_pt_buf is
freed but not set to NULL, leading to a potential write after
free and a double free due to missing return value handling in
crypto_update_uio(). The code as is may write to ctx->gcm_pt_buf
in gcm_decrypt_final() and may free ctx->gcm_pt_buf again in
aes_decrypt_atomic().

The fix is to slightly rework error handling and check the return
value in crypto_update_uio().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #9659
2019-12-03 10:28:47 -08:00
Kjeld Schouten 7af72863fd Codecov tweaks
Modify the Codecov settings to provide a more realistic and stable
report.  The following change were made:

- Precision has been limited to whole percents only, but will round
  to nearest. This means 0.0-0.49 will round to zero (no change) and
  0.51 will round to 1%.

- Exclude the tests/zfs-tests directory from the report.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #9650
2019-12-03 10:23:48 -08:00
Alexander Motin 5ff2249fa5 Fix use-after-free in case of L2ARC prefetch failure
In case L2ARC read failed, l2arc_read_done() creates _different_ ZIO
to read data from the original storage device.  Unfortunately pointer
to the failed ZIO remains in hdr->b_l1hdr.b_acb->acb_zio_head, and if
some other read try to bump the ZIO priority, it will crash.

The problem is reproducible by corrupting L2ARC content and reading
some data with prefetch if l2arc_noprefetch tunable is changed to 0.
With the default setting the issue is probably not reproducible now.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #9648
2019-12-03 09:59:30 -08:00
Brian Behlendorf 624222ae31 Increase allowed 'special_small_blocks' maximum value
There may be circumstances where it's desirable that all blocks
in a specified dataset be stored on the special device.  Relax
the artificial 128K limit and allow the special_small_blocks
property to be set up to 1M.  When blocks >1MB have been enabled
via the zfs_max_recordsize module option, this limit is increased
accordingly.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9131
Closes #9355
2019-12-03 09:58:03 -08:00
Matthew Macy b3673342c7 Wrap module_param_call() routines under __linux__
The module_param_call() functionality is currently still
Linux-specific and should be wrapped accordingly.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9666
2019-12-03 09:56:15 -08:00
Matthew Macy bff8fb395b Mark write_record static
The write_record() function is private and should be marked as such.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9665
2019-12-03 09:51:44 -08:00
Matthew Macy 74d1d74959 Move linux qsort def to platform header
Moving qsort to the platform header allows each platform to
provide an appropriate sorting implementation.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9663
2019-12-03 09:49:40 -08:00
Michael Niewöhner e69bb31b71 Adapt gitignore for modules
Remove the specific gitignore rules for module left-overs and add a
generic one in modules/.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9656
2019-12-02 13:23:47 -08:00
Matthew Macy 5142032106 Move zfs_cmd_t copyin/copyout to platform code
FreeBSD needs to cope with multiple version of the zfs_cmd_t
structure. Allowing the platform code to pre and post
process the cmd structure makes it possible to work with
legacy tooling.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9624
2019-12-02 10:08:27 -08:00
Matthew Macy 42a826eed3 Add FreeBSD required defines to mntent.h
Linux and FreeBSD use different names for suid / setuid.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9632
2019-11-30 15:49:09 -08:00
Matthew Macy 758699b6f1 Restructure nvlist_nv_alloc to work on FreeBSD
KM_PUSHPAGE is an Illumosism - On FreeBSD it's
aliased to the same malloc flag as KM_SLEEP.
The compiler naturally rejects multiple case
statements with the same value.  This is effectively
a no-op since all callers pass a specific KM_* flag.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9643
2019-11-30 15:45:06 -08:00
Ryan Moeller 101f9b1771 Add FreeBSD code to arc_summary and arcstat
Adding the FreeBSD code allows arc_summary and arcstat
to be used on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9641
2019-11-30 15:43:23 -08:00
Matthew Macy c54687cd23 Make asm-x86_64/atomic.S build on FreeBSD
Include the required headers for FreeBSD.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9634
2019-11-30 15:42:00 -08:00
Matthew Macy f348c78f97 Mark Linux fallocate extensions as specific to Linux
fallocate(2) is a Linux-specific system call which in unavailable
on other platforms.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9633
2019-11-30 15:40:22 -08:00
Matthew Macy 77323bcf53 Add FreeBSD support to zio_crypto.h
Minimal compatibility changes for FreeBSD.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9631
2019-11-30 15:38:16 -08:00
Matthew Macy a5b762ab1d Resolve ZoF differences in zfs_ioctl.h
FreeBSD needs to be able to pass the jail id to the jail/unjail ioctls
and the struct file in the device structure is unused.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9625
2019-11-30 15:35:54 -08:00
Igor K a7c358845b ZTS: Fix attach-o_ashift.ksh for multiple platforms
The `-o ashift` option must appear after attach to be properly
interpreted by getopt(3) on all platforms.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9636
2019-11-27 14:55:43 -08:00
Matthew Macy d6f67df63c Minor diff reduction with ZoF in include/sys
- move linux/ includes to platform headers
- add void * io_bio to zio for tracking the underlying bio
- add freebsd specific fields to abd_scatter

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9615
2019-11-27 11:11:03 -08:00
InsanePrawn c940bf0c37 Fix encryption logic in systemd mount generator
Previously the generator would skip a dataset if it wasn't mountable by
'zfs mount -a' (legacy/none mountpoint, canmount off/noauto). This also
skipped the generation of key-load units for such datasets, breaking
the dependency handling for mountable child datasets.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9611
2019-11-27 10:54:49 -08:00
InsanePrawn 70d2dd922b Fix non-absolute path in systemd mount generator
Systemd will ignore units that try to execute programs from non-absolute
paths. Use hardcoded /bin/sh instead.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9611
2019-11-27 10:54:24 -08:00
InsanePrawn d8ce455c1e Fix small typo in systemd mount generator
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9611
2019-11-27 10:53:37 -08:00
Paul Zuchowski 7c1bf0cf27 Implement -A (ignore ASSERTs) for zdb
The command line switch -A (ignore ASSERTs) has always been available 
in zdb but was never connected up to the correct global variable.

There are times when you need zdb to ignore asserts and keep dumping 
out whatever information it can get despite the ASSERT(s) failing. 
It was always intended to be part of zdb but was incomplete.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9610
2019-11-27 10:45:56 -08:00
Brian Behlendorf 9e17e6f254 Remove zfs_vdev_elevator module option
As described in commit f81d5ef6 the zfs_vdev_elevator module
option is being removed.  Users who require this functionality
should update their systems to set the disk scheduler using a
udev rule.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #8664
Closes #9417
Closes #9609
2019-11-27 10:35:49 -08:00
jwpoduska 3c819a2c7d Prevent unnecessary resilver restarts
If a device is participating in an active resilver, then it will have a
non-empty DTL. Operations like vdev_{open,reopen,probe}() can cause the
resilver to be restarted (or deferred to be restarted later), which is
unnecessary if the DTL is still covered by the current scan range. This
is similar to the logic in vdev_dtl_should_excise() where the DTL can
only be excised if it's max txg is in the resilvered range.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: John Poduska <jpoduska@datto.com>
Issue #840 
Closes #9155
Closes #9378
Closes #9551
Closes #9588
2019-11-27 10:15:01 -08:00
Paul Zuchowski 894f6696b4 Add display of checksums to zdb -R
The function zdb_read_block (zdb -R) was always intended to have a :c 
flag which would read the DVA and length supplied by the user, and 
display the checksum. Since we don't know which checksum goes with 
the data, we should calculate and display them all.

For each checksum in the table, read in the data at the supplied 
DVA:length, calculate the checksum, and display it. Update the man 
page and create a zfs test for the new feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9607
2019-11-27 10:08:18 -08:00
Mauricio Faria de Oliveira 0c46813805 Check for unlinked znodes after igrab()
The changes in commit 41e1aa2a / PR #9583 introduced a regression on
tmpfile_001_pos: fsetxattr() on a O_TMPFILE file descriptor started
to fail with errno ENODATA:

    openat(AT_FDCWD, "/test", O_RDWR|O_TMPFILE, 0666) = 3
    <...>
    fsetxattr(3, "user.test", <...>, 64, 0) = -1 ENODATA

The originally proposed change on PR #9583 is not susceptible to it,
so just move the code/if-checks around back in that way, to fix it.

Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Original-patch-by: Heitor Alves de Siqueira <halves@canonical.com>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Closes #9602
2019-11-21 12:24:03 -08:00
Matthew Macy da92d5cbb3 Add zfs_file_* interface, remove vnodes
Provide a common zfs_file_* interface which can be implemented on all 
platforms to perform normal file access from either the kernel module
or the libzpool library.

This allows all non-portable vnode_t usage in the common code to be 
replaced by the new portable zfs_file_t.  The associated vnode and
kobj compatibility functions, types, and macros have been removed
from the SPL.  Moving forward, vnodes should only be used in platform
specific code when provided by the native operating system.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9556
2019-11-21 09:32:57 -08:00
Brian Behlendorf 67a6c3bc9f ZTS: tst.terminate_by_signal increase test threshold
The tst.terminate_by_signal test case may occasionally fail when
running in a less consistent virtual environment.  For all observed
failures the process was terminated correctly but it took longer than
expected resulting in too many snapshot being created.

To minimize the likelyhood of this occuring increase the threshold
from 50 to 90 snapshots.  The larger limit will still verifiy that
the channel program was correctly terminated early.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9601
2019-11-20 17:26:32 -08:00
George Melikov 540312df7f ZTS: Casenorm fix unicode interpretation
Use `printf` to properly interpret unicode characters.

Illumos uses a utility called `zlook` to allow additional flags to be 
provided to readdir and lookup for testing.  This functionality could 
be ported to Linux, but even without it several of the tests can be 
enabled by instead using the standard `test` command.

Additional, work is required to enable the remaining test cases.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Issue #7633 
Closes #8812
2019-11-19 16:23:27 -08:00
InsanePrawn 8221bcf1e4 Remove requirement for -d 1 for zfs list and zfs get with bookmarks
df58307 removed the need to specify -d 1 when zfs list and zfs get are
called with -t snapshot on a datset. This commit extends the same
behaviour to -t bookmark.

This commit also introduces the 'snap' shorthand for snapshots from
zfs list to zfs get.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9589
2019-11-18 16:44:28 -08:00
Brian Behlendorf 7ae3f8dc8f Partially revert 5a6ac4c
Reinstate the zpl_revalidate() functionality to resolve a regression
where dentries for open files during a rollback are not invalidated.

The unrelated functionality for automatically unmounting .zfs/snapshots
was not reverted.  Nor was the addition of shrink_dcache_sb() to the
zfs_resume_fs() function.

This issue was not immediately caught by the CI because the test case
intended to catch it was included in the list of ZTS tests which may
occasionally fail for unrelated reasons.  Remove all of the rollback
tests from this list to help identify the frequency of any spurious
failures.

The rollback_003_pos.ksh test case exposes a real issue with the
long standing code which needs to be investigated.  Regardless,
it has been enable with a small workaround in the test case itself.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9587
Closes #9592
2019-11-18 13:05:56 -08:00
Heitor Alves de Siqueira 41e1aa2a06 Break out of zfs_zget early if unlinked znode
If zp->z_unlinked is set, we're working with a znode that has been
marked for deletion. If that's the case, we can skip the "goto again"
loop and return ENOENT, as the znode should not be discovered.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Heitor Alves de Siqueira <halves@canonical.com>
Closes #9583
2019-11-15 09:56:05 -08:00
InsanePrawn cc1a1e17d9 Remove inappropiate error message suggesting to use '-r'
Removes an incorrect error message from libzfs that suggests applying
'-r' when a zfs subcommand is called with a filesystem path while
expecting either a snapshot or bookmark path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9574
2019-11-15 09:52:11 -08:00
Kjeld Schouten 64c77c4bad Change zed.service to zfs-zed.service in man page
zed.service does not exist
replaced with correct service name in man.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #9581
2019-11-13 10:23:23 -08:00
loli10K 7ba964cc3f Prevent NULL pointer dereference in blkg_tryget() on EL8 kernels
blkg_tryget() as shipped in EL8 kernels does not seem to handle NULL
@blkg as input; this is different from its mainline counterpart where
NULL is accepted.  To prevent dereferencing a NULL pointer when dealing
with block devices which do not set a root_blkg on the request queue
perform the NULL check in vdev_bio_associate_blkg().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9546 
Closes #9577
2019-11-13 10:19:06 -08:00
Michael Niewöhner 8aaa10a9a0 Check for __GFP_RECLAIM instead of GFP_KERNEL
Check for __GFP_RECLAIM instead of GFP_KERNEL because zfs modifies
IO and FS flags which breaks the check for GFP_KERNEL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9034
2019-11-13 10:05:23 -08:00
Michael Niewöhner 6d948c3519 Add kmem_cache flag for forcing kvmalloc
This adds a new KMC_KVMEM flag was added to enforce use of the
kvmalloc allocator in kmem_cache_create even for large blocks, which
may also increase performance in some specific cases (e.g. zstd), too.

Default to KVMEM instead of VMEM in spl_kmem_cache_create.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9034
2019-11-13 10:05:23 -08:00
Michael Niewöhner 66955885e2 Make use of kvmalloc if available and fix vmem_alloc implementation
This patch implements use of kvmalloc for GFP_KERNEL allocations, which
may increase performance if the allocator is able to allocate physical
memory, if kvmalloc is available as a public kernel interface (since
v4.12). Otherwise it will simply fall back to virtual memory (vmalloc).

Also fix vmem_alloc implementation which can lead to slow allocations
since the first attempt with kmalloc does not make use of the noretry
flag but tells the linux kernel to retry several times before it fails.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9034
2019-11-13 10:05:10 -08:00
Michael Niewöhner c025008df5 Add missing documentation for some KMC flags
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9034
2019-11-13 09:34:51 -08:00
Brian Behlendorf 94a570e39c Fix zpool create -o <property> error message
When `zpool create -o <property>` is run without root permissions
and the pool property requested is not specifically enumerated in
zpool_valid_proplist().  Then an incorrect error message referring
to an invalid property is printed rather than the expected permission
denied error.

Specifying a pool property at create time should be handled the same
way as filesystem properties in zfs_valid_proplist().  There should
not be default zfs_error_aux() set for properties which are not
listed.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9550 
Closes #9568
2019-11-13 09:23:14 -08:00
Ross Williams c5ebfbbe19 Reorganize zpool(8) man page into sections
Moved subcommand topics into individual manpages. Reordered and 
grouped the list of subcommands by topic.

Moved concepts overview to `zpoolconcepts.8` and the long list of
available pool properties to `zpoolprops.8`.

Internal cross-references copied from `zpool.8` needed to be 
converted to `.Xr` external references to new subcommand manual 
pages.

Move `autotrim` into lexical order, autotrim tacked onto the end
of a list. Now it is in alphabetical order.

Clarify attach/detach description. Description was too specific to
command syntax. Overview clarifies reason for attaching or detaching
a device.

Clarify replace description, don't refer to subcommand arguments.

Clarify split command description, say what split actually does and
why you'd want to do it.

Clarify description of upgrade, and simplify the zpool.8 wording of
the zpool-upgrade(8) description.

Clarify description of import, detail what zpool-import(8) actually 
does.

Add appropriate SEE ALSO sections. Divided zpool subcommand manual 
pages need their own SEE ALSO sections. Also modified fsck.zfs.8 
to point directly to zfs-scrub.8 and zed.8.in to include a direct
reference to zfs-events.8

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ross Williams <ross@ross-williams.net>
Closes #9564
2019-11-13 09:21:07 -08:00
Ross Williams 870fc32fc9 Reorganize zfs(8) man page into sections
Most subcommands got their own manpages (e.g. create). Some related
commands grouped into a single manpage and symlinks created (e.g. set, 
get, and inherit). I did this when topics were either too short to 
warrant their own file or so interrelated that a user would want to
refer between commands in the same file.

Corrected .Sx internal references to .Xr cross refs; lots of .Sx 
references from when text was all in zfs.8 needed to be changed to 
.Xr zfs-$SUBCOMMAND 8 cross references.

Divided subcommand list in zfs(8) into sections of related 
functionality. This required writing new descriptions for some 
commands.

Preserved ".Os Linux", `.Os` macro parsing behavior differs between 
mandoc from the "BSD" mandoc package (available on Ubuntu) and man 
from Ubuntu's man-db package, which calls groff to format the manpages.

Groff handles the `.Os` macro differently and wrongly, defaulting 
it to "BSD" in `/usr/share/groff/*/tmac/mdoc/doc-common`, instead of
getting the default from `uname`.

A future set of changes will introduce build-time preprocessing of
manpages for platform-specific documentation and can insert the
correct operating system name.

Added SEE ALSO sections, the newly-divided zfs-*.8 subcommand man
pages needed their own SEE ALSO sections pointing to related
subcommands and, in some cases, documentation from other packages
(e.g. zfs-share.8).

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ross Williams <ross@ross-williams.net>
Closes #9559
2019-11-12 11:17:40 -08:00
Matthew Macy 1f2f46be95 Add wrapper stub for zfs_cmd ioctl to libzpool
FreeBSD needs a wrapper for handling zfs_cmd ioctls.
In libzfs this is handled by zfs_ioctl. However, here
we need to wrap the call directly.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9511
2019-11-12 10:40:39 -08:00
Brian Behlendorf 066e825221 Linux compat: Minimum kernel version 3.10
Increase the minimum supported kernel version from 2.6.32 to 3.10.
This removes support for the following Linux enterprise distributions.

    Distribution     | Kernel | End of Life
    ---------------- | ------ | -------------
    Ubuntu 12.04 LTS | 3.2    | Apr 28, 2017
    SLES 11          | 3.0    | Mar 32, 2019
    RHEL / CentOS 6  | 2.6.32 | Nov 30, 2020

The following changes were made as part of removing support.

* Updated `configure` to enforce a minimum kernel version as
  specified in the META file (Linux-Minimum: 3.10).

    configure: error:
        *** Cannot build against kernel version 2.6.32.
        *** The minimum supported kernel version is 3.10.

* Removed all `configure` kABI checks and matching C code for
  interfaces which solely predate the Linux 3.10 kernel.

* Updated all `configure` kABI checks to fail when an interface is
  missing which was in the 3.10 kernel up to the latest 5.1 kernel.
  Removed the HAVE_* preprocessor defines for these checks and
  updated the code to unconditionally use the verified interface.

* Inverted the detection logic in several kABI checks to match
  the new interface as it appears in 3.10 and newer and not the
  legacy interface.

* Consolidated the following checks in to individual files. Due
  the large number of changes in the checks it made sense to handle
  this now.  It would be desirable to group other related checks in
  the same fashion, but this as left as future work.

  - config/kernel-blkdev.m4 - Block device kABI checks
  - config/kernel-blk-queue.m4 - Block queue kABI checks
  - config/kernel-bio.m4 - Bio interface kABI checks

* Removed the kABI checks for sops->nr_cached_objects() and
  sops->free_cached_objects().  These interfaces are currently unused.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9566
2019-11-12 08:59:06 -08:00
Ryan Moeller 035ebb3653 Allow platform dependent path stripping for vdevs
On Linux the full path preceding devices is stripped when formatting
vdev names. On FreeBSD we only want to strip "/dev/". Hide the
implementation details of path stripping behind zfs_strip_path().

Make zfs_strip_partition_path() static in Linux implementation while
here, since it is never used outside of the file it is defined in.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9565
2019-11-11 12:15:44 -08:00
Pavel Snajdr 5a6ac4cffc Remove zpl_revalidate
This patch removes the need for zpl_revalidate altogether.

There were 3 main reasons why we used d_revalidate:

1. periodic automounted snapshots umount deferral
2. negative dentries created before snapshot rollback
3. stale inodes referenced by dentry cache after snapshot rollback

Periodic snapshots deferral solution introduces zfs_exit_fs function,
which is called as a part of ZFS_EXIT(zfsvfs_t) macro.

Negative dentries and stale inodes are solved by flushing the dcache
for the particular dataset on zfs_resume_fs call.

This patch also removes now unused HAVE_S_D_OP configure test.

Reviewed-by: Aleksa Sarai <cyphar@cyphar.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #8774 
Closes #9549
2019-11-11 09:34:21 -08:00
Alexander Motin f15d6a5457 Improve logging of 128KB writes
Before my ZIL space optimization few years ago 128KB writes were logged
as two 64KB+ records in two 128KB log blocks.  After that change it
became ~127KB+/1KB+ in two 128KB log blocks to free space in the second
block for another record.  Unfortunately in case of 128KB only writes,
when space in the second block remained unused, that change increased
write latency by unbalancing checksum computation and write times
between parallel threads.  It also didn't help with SLOG space
efficiency in that case.

This change introduces new 68KB log block size, used for both writes
below 67KB and 128KB-sharp writes.  Writes of 68-127KB are still using
one 128KB block to not increase processing overhead.  Writes above
131KB are still using full 128KB blocks, since possible saving there
is small.  Mixed loads will likely also fall back to previous 128KB,
since code uses maximum of the last 16 requested block sizes.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Closes #9409
2019-11-11 09:27:59 -08:00
Ryan Moeller 2f1ca8a32a Isolate code specific to Linux in cmd/
Use sys.platform to choose the correct implementation of functions and
values of variables for the platform being run on.

Reword some comments to avoid describing implementation details in the
wrong places.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9561
2019-11-11 09:24:04 -08:00
Witaut Bajaryn 6c7023a532 Skip loading already loaded key
Don't ask for the password / try to load the key if the key for the 
encryptionroot is already loaded.  The user might have loaded the key 
manually or by other means before the scripts get called.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Witaut Bajaryn <vitaut.bayaryn@gmail.com>
Closes #9495
Closes #9529
2019-11-08 14:34:07 -08:00
M. Zhou 734de7ced1 Add a notice in /etc/defaults/zfs for systemd users
Some systemd users may want to change configurations in 
/etc/defaults/zfs, but these settings won't affect systemd 
services.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mo Zhou <cdluminate@gmail.com>
Closes #9544
2019-11-06 11:36:33 -08:00
Romain Dolbeau 4254e40729 Preliminary support for RV64G
This adds basic support for RISC-V, specifically RV64G.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes #9540
2019-11-06 10:56:09 -08:00
Matthew Macy 27ece2ee4d Move platform specific parts of zfs_znode.h to platform code
Some of the znode fields are different and functions
consuming an inode don't exist on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9536
2019-11-06 10:54:25 -08:00
Pavel Zakharov 1c47c2c42c zvol_wait should ignore redacted zvols
zvol_wait waits for zvol links to be created under /dev/zvol for each zvol.
Links are not created for redacted zvols so we should ignore those.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Closes #9545
2019-11-06 10:51:19 -08:00
Prakash Surya ae38e00968 Add tracepoints for taskq entry lifetime events
This adds some new DTRACE_PROBE* endpoints so that we can observe taskq
latencies on a system. Additionally, a new "taskqlatency.bt" script is
added to do this observation via "bpftrace". Lastly, a "zfs-trace.sh"
script is added to wrap "bpftrace" with the proper options required to
run and use "taskqlatency.bt".

For example, with these changes in place, a user can run the following:

    $ cd ./contrib/bpftrace
    $ sudo ./zfs-trace.sh taskqlatency.bt
    Attaching 6 probes...
    ^C

Here's some example output, showing latency information for time spent
executing the taskq entry's function:

    @exec_lat_us[dp_sync_taskq, userquota_updates_task]:
    [2, 4)                 5 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [4, 8)                 0 |                                                    |
    [8, 16)                1 |@@@@@@@@@@                                          |
    [16, 32)               2 |@@@@@@@@@@@@@@@@@@@@                                |

    @exec_lat_us[z_wr_int_h, zio_execute]:
    [8, 16)               16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [16, 32)               2 |@@@@@@                                              |

    @exec_lat_us[z_wr_iss_h, zio_execute]:
    [16, 32)               4 |@@@@@@@@@@@@@@@@                                    |
    [32, 64)              13 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [64, 128)              1 |@@@@                                                |

    @exec_lat_us[z_ioctl_int, zio_execute]:
    [2, 4)                 1 |@@@@                                                |
    [4, 8)                11 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [8, 16)                8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@               |

    @exec_lat_us[dp_sync_taskq, sync_dnodes_task]:
    [2, 4)                 1 |@@@@@@                                              |
    [4, 8)                 7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       |
    [8, 16)                8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [16, 32)               2 |@@@@@@@@@@@@@                                       |
    [32, 64)               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |
    [64, 128)              1 |@@@@@@                                              |
    [128, 256)             0 |                                                    |
    [256, 512)             1 |@@@@@@

Here's some example output, showing latency information for time spent
waiting on the taskq, prior to starting execution of entry's function:

    @queue_lat_us[dp_sync_taskq]:
    [2, 4)                 1 |@@@@                                                |
    [4, 8)                 7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                      |
    [8, 16)                2 |@@@@@@@@                                            |
    [16, 32)               3 |@@@@@@@@@@@@@                                       |
    [32, 64)              12 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [64, 128)              6 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |
    [128, 256)             0 |                                                    |
    [256, 512)             1 |@@@@                                                |

    @queue_lat_us[z_wr_iss]:
    [4, 8)                 4 |@@@@                                                |
    [8, 16)               13 |@@@@@@@@@@@@@@@                                     |
    [16, 32)               6 |@@@@@@@                                             |
    [32, 64)               2 |@@                                                  |
    [64, 128)             12 |@@@@@@@@@@@@@@                                      |
    [128, 256)            15 |@@@@@@@@@@@@@@@@@@                                  |
    [256, 512)            33 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@             |
    [512, 1K)             27 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                    |
    [1K, 2K)               7 |@@@@@@@@                                            |
    [2K, 4K)              14 |@@@@@@@@@@@@@@@@                                    |
    [4K, 8K)              14 |@@@@@@@@@@@@@@@@                                    |
    [8K, 16K)             23 |@@@@@@@@@@@@@@@@@@@@@@@@@@@                         |
    [16K, 32K)            43 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

    @queue_lat_us[z_wr_int]:
    [2, 4)                10 |@@@@@                                               |
    [4, 8)                71 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           |
    [8, 16)               88 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [16, 32)              50 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                       |
    [32, 64)              65 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@              |
    [64, 128)             43 |@@@@@@@@@@@@@@@@@@@@@@@@@                           |
    [128, 256)            19 |@@@@@@@@@@@                                         |
    [256, 512)             3 |@                                                   |
    [512, 1K)              1 |                                                    |

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #9525
2019-11-01 13:14:54 -07:00
Prakash Surya e5d1c27e30 Enable use of DTRACE_PROBE* macros in "spl" module
This change modifies some of the infrastructure for enabling the use of
the DTRACE_PROBE* macros, such that we can use tehm in the "spl" module.

Currently, when the DTRACE_PROBE* macros are used, they get expanded to
create new functions, and these dynamically generated functions become
part of the "zfs" module.

Since the "spl" module does not depend on the "zfs" module, the use of
DTRACE_PROBE* in the "spl" module would result in undefined symbols
being used in the "spl" module. Specifically, DTRACE_PROBE* would turn
into a function call, and the function being called would be a symbol
only contained in the "zfs" module; which results in a linker and/or
runtime error.

Thus, this change adds the necessary logic to the "spl" module, to
mirror the tracing functionality available to the "zfs" module. After
this change, we'll have a "trace_zfs.h" header file which defines the
probes available only to the "zfs" module, and a "trace_spl.h" header
file which defines the probes available only to the "spl" module.

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #9525
2019-11-01 13:13:43 -07:00
Matthew Macy 4a2ed90013 Wrap Linux module macros
MODULE_VERSION is already defined on FreeBSD. Wrap all of the
used MODULE_* macros for the sake of consistency and portability.

Add a user space noop version to reduce the need for _KERNEL ifdefs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9542
2019-11-01 10:41:03 -07:00
Matthew Macy bd4dde8ef7 Prefix struct rangelock
A struct rangelock already exists on FreeBSD.  Add a zfs_ prefix as
per our convention to prevent any conflict with existing symbols.
This change is a follow up to 2cc479d0.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9534
2019-11-01 10:37:33 -07:00
Matthew Macy bbc18de83a Remove ECKSUM alias in zinject
The custom ECKSUM errno is defined as appropriate by the
platform specific os/linux/spl/sys/errno.h header.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9537
2019-11-01 10:31:42 -07:00
Matthew Macy 32682b0c03 Fix icp build on FreeBSD
- ROTATE_LEFT is not used by amd64, move it down within
  the scope it's used to silence a clang warning.

- __unused is an alias for the compiler annotation
  __attribute__((__unused__)) on FreeBSD.  Rename the
  field to ____unused.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9538
2019-11-01 10:27:53 -07:00
Matthew Macy 156f74fc03 Return an error code from zfs_acl_chmod_setattr
The FreeBSD implementation can fail, allow this function to
fail and add the required error handling for Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9541
2019-11-01 10:19:11 -07:00
Matthew Macy c4ae27c763 Move sha2.h to platform code
FreeBSD has its own sha routines that the port uses.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9530
2019-10-31 15:45:58 -07:00
Brian Behlendorf ab44be142a ZTS: Fix removal_with_errors
The removal_with_errors.ksh test case could occasionally complete
the removal process instead of canceling due to an injected error.

To prevent this false positive, export and import the pool between
test phases to flush the ARC cache.  Furthermore, double the amount
of data in the pool to increase the removal time.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9528
2019-10-31 12:57:42 -07:00
Matthew Macy a5308e252d Don't cast away const
Follow up to 511fce6b which missed a cast.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9533
2019-10-31 10:38:03 -07:00
Matthew Macy 59055a0164 Include prototypes for vdev_initialize
Address two prototype related warnings emitted by clang.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9535
2019-10-31 10:09:01 -07:00
Matthew Macy 2a3aa5a109 Factor Linux specific code out of spa_misc.c
Move these Linux module parameter get/set helpers in to
platform specific code.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9457
2019-10-31 09:52:22 -07:00
alaviss 936e2d6d3e dracut/zfs-load-key.sh: properly remove prefixes
Removes the 'ZFS=' prefix from $BOOTFS instead of $root. This makes sure
that the 'zfs:' prefix remains stripped so that users with
'root=zfs:dataset' cmdline can have key loaded on boot again.

Reviewed-by: Garrett Fields <ghfields@gmail.com>
Reviewed-by: Dacian Reece-Stremtan <dacianstremtan@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hiếu Lê <leorize+oss@disroot.org>
Closes #9520
2019-10-30 14:38:41 -07:00
Brian Behlendorf f63e2fe132 Fix contrib/zcp/Makefile.am
Remove the stray leading + from the Makefile.  This was
preventing the autosnap.lua channel program from being
properly included by `make dist`.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9527
2019-10-30 12:37:49 -07:00
Romain Dolbeau 0b2a642351 Add AVX512BW variant of fletcher
It is much faster than AVX512F when byteswapping on Skylake-SP
and newer, as we can do the byteswap in a single vshufb instead
of many instructions.

Reviewed by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@atos.net>
Closes #9517
2019-10-30 12:26:14 -07:00
Tom Caputi bae11ba8dc Fix 'zfs change-key' with unencrypted child
Currently, when you call 'zfs change-key' on an encrypted dataset
that has an unencrypted child, the code will trigger a VERIFY.
This VERIFY is leftover from before we allowed unencrypted
datasets to exist underneath encrypted ones. This patch fixes the
issue by simply replacing the VERIFY with an early return when
recursing through datasets.

Reviewed by: Jason King <jason.brian.king@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9524
2019-10-30 11:27:28 -07:00
Matthew Macy d46f0deb03 Add wrapper for Linux BLKFLSBUF ioctl
FreeBSD has no analog. Buffered block devices were removed a decade
plus ago.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9508
2019-10-28 09:53:39 -07:00
Matthew Macy 4a22ba5be0 Minor spa portability fixes
- FreeBSD's rootpool import code uses spa_config_parse
- Move the zvol_create_minors call out from under the
  spa_namespace_lock in spa_import. It isn't needed and it causes
  a lock order reversal on FreeBSD.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9499
2019-10-28 09:51:53 -07:00
Chunwei Chen 7125a109dc Fix zpool history unbounded memory usage
In original implementation, zpool history will read the whole history
before printing anything, causing memory usage goes unbounded. We fix
this by breaking it into read-print iterations.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #9516
2019-10-28 09:49:44 -07:00
loli10K e35704647e Fix for ARC sysctls ignored at runtime
This change leverage module_param_call() to run arc_tuning_update()
immediately after the ARC tunable has been updated as suggested in
cffa8372 code review.

A simple test case is added to the ZFS Test Suite to prevent future
regressions in functionality.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9487  
Closes #9489
2019-10-26 15:22:19 -07:00
Matthew Macy 6963414d70 Remove unneeded header from libzpool/kernel.c
The sys/signal.h header doesn't exist on FreeBSD, nor is
it needed on Linux.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9510
2019-10-26 10:07:59 -07:00
Matthew Macy 9e2ca0b3c6 Fix header guard typo
Fix header guard typo accidentally introduced by #9497.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9514
2019-10-26 10:04:10 -07:00
Ryan Moeller fdf5593232 ZTS: Move more tests to linux.run
Tests that rely on special filesystems that are specific to Linux
should only be run on Linux.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9512
2019-10-25 20:03:46 -07:00
Matthew Macy 5f3eceb98f Remove unused headers from uu_string.c
The malloc.h include is gratuitous and runs in to the following error
on FreeBSD:

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9509
2019-10-25 13:53:50 -07:00
Matthew Macy 24cf9f4eb2 Use zfs_ioctl() in zinject.c
Consistently use the `zfs_ioctl()` wrapper since `ioctl()` cannot be
called directly due to differing semantics between platforms.

Follow up PR to #9492.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9507
2019-10-25 13:50:34 -07:00
Matthew Macy 0ee89a1252 Remove non-portable pointer is valid assert
This assert makes non portable assumptions about the state of memory
returned by the memory allocator.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9506
2019-10-25 13:46:07 -07:00
Matthew Macy c392c5aec0 Move final zvol_remove_minors to common code
This logic is not platform dependent and should reside in the
common code.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9505
2019-10-25 13:42:54 -07:00
Matthew Macy 1952fe0e25 Move platform dependent errno aliases
EBADE, EBADR, and ENOANO do not exist on FreeBSD

The libspl errno.h is similarly platform dependent.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9498
2019-10-25 13:40:50 -07:00
Matthew Macy 68a1b1589a Remove sdt.h
It's mostly a noop on ZoL and it conflicts with platforms that 
support dtrace.  Remove this header to resolve the conflict.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9497
2019-10-25 13:38:37 -07:00
Tom Caputi b4238327b4 Fix incremental recursive encrypted receive
Currently, incremental recursive encrypted receives fail to work
for any snapshot after the first. The reason for this is because
the check in zfs_setup_cmdline_props() did not properly realize
that when the user attempts to use '-x encryption' in this
situation, they are not really overriding the existing encryption
property and instead are attempting to prevent it from changing.
This resulted in an error message stating: "encryption property
'encryption' cannot be set or excluded for raw or incremental
streams".

This problem is fixed by updating the logic to expect this use
case.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9494
2019-10-24 10:51:01 -07:00
Ryan Moeller 28f7427ab1 ZTS: Move tmpfile tests to linux.run
O_TMPFILE is not available on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9503
2019-10-24 10:49:19 -07:00
Ryan Moeller 4c6225b688 ZTS: Consistency pass for .ksh extensions
* Use .ksh extension for ksh scripts, not .sh
* Remove .ksh extension from tests in common.run

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9502
2019-10-24 10:47:47 -07:00
Brian Behlendorf 10fa254539 Linux 4.14, 4.19, 5.0+ compat: SIMD save/restore
Contrary to initial testing we cannot rely on these kernels to
invalidate the per-cpu FPU state and restore the FPU registers.
Nor can we guarantee that the kernel won't modify the FPU state
which we saved in the task struck.

Therefore, the kfpu_begin() and kfpu_end() functions have been
updated to save and restore the FPU state using our own dedicated
per-cpu FPU state variables.

This has the additional advantage of allowing us to use the FPU
again in user threads.  So we remove the code which was added to
use task queues to ensure some functions ran in kernel threads.

Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #9346
Closes #9403
2019-10-24 10:17:33 -07:00
Matthew Macy b834b58ae6 Use zfs_ioctl with zfs_cmd_t in libzfs
Consistently use the `zfs_ioctl()` wrapper since `ioctl()` cannot be
called directly due to differing semantics between platforms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9492
2019-10-23 17:29:43 -07:00
chrisrd 05d07ba9a7 Don't call arc_buf_destroy on unallocated arc_buf
Fixes an obvious issue of calling arc_buf_destroy() on an
unallocated arc_buf.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #9453
2019-10-23 15:26:17 -07:00
Matthew Macy 64b2e7d7ec Use platform independent error code for libzfs_run_process_impl
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9493
2019-10-23 13:48:31 -07:00
Matthew Macy 685abd595c Use correct format string when printing int8
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9486
2019-10-20 20:37:30 -07:00
Matthew Macy 8b2d097c17 Remove gratuitous Linux only include in ztest & zdb
We don't need to include stdio_ext.h

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9483
2019-10-19 17:08:19 -07:00
John Wren Kennedy 4063440b73 ZTS: Written props test fails with 4k disks
With 4k disks, this test will fail in the last section because the
expected human readable value of 20.0M is reported as 20.1M. Rather than
use the human readable property, switch to the parsable property and
verify that the values are reasonably close.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #9477
2019-10-18 13:27:02 -04:00
Serapheim Dimitropoulos 9f3c72a2a8 Name anonymous enum of KMC_BIT constants
Giving a name to this enum makes it discoverable from
debugging tools like DRGN and SDB. For example, with
the name proposed on this patch we can iterate over
these values in DRGN:
```
>>> prog.type('enum kmc_bit').enumerators
(('KMC_BIT_NOTOUCH', 0), ('KMC_BIT_NODEBUG', 1),
('KMC_BIT_NOMAGAZINE', 2), ('KMC_BIT_NOHASH', 3),
('KMC_BIT_QCACHE', 4), ('KMC_BIT_KMEM', 5),
('KMC_BIT_VMEM', 6), ('KMC_BIT_SLAB', 7),
...
```
This enables SDB to easily pretty-print the flags of
the spl_kmem_caches in the system like this:
```
> spl_kmem_caches -o "name,flags,total_memory"
name                                       flags total_memory
------------------------ ----------------------- ------------
abd_t                    KMC_NOMAGAZINE|KMC_SLAB        4.5MB
arc_buf_hdr_t_full       KMC_NOMAGAZINE|KMC_SLAB       12.3MB
... <cropped> ...
ddt_cache                               KMC_VMEM      583.7KB
ddt_entry_cache          KMC_NOMAGAZINE|KMC_SLAB         0.0B
... <cropped> ...
zio_buf_1048576             KMC_NODEBUG|KMC_VMEM         0.0B
... <cropped> ...
```

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9478
2019-10-18 13:25:44 -04:00
Serapheim Dimitropoulos 851eda3566 Update skc_obj_alloc for spl kmem caches that are backed by Linux
Currently, for certain sizes and classes of allocations we use
SPL caches that are backed by caches in the Linux Slab allocator
to reduce fragmentation and increase utilization of memory. The
way things are implemented for these caches as of now though is
that we don't keep any statistics of the allocations that we
make from these caches.

This patch enables the tracking of allocated objects in those
SPL caches by making the trade-off of grabbing the cache lock
at every object allocation and free to update the respective
counter.

Additionally, this patch makes those caches visible in the
/proc/spl/kmem/slab special file.

As a side note, enabling the specific counter for those caches
enables SDB to create a more user-friendly interface than
/proc/spl/kmem/slab that can also cross-reference data from
slabinfo. Here is for example the output of one of those
caches in SDB that outputs the name of the underlying Linux
cache, the memory of SPL objects allocated in that cache,
and the percentage of those objects compared to all the
objects in it:
```
> spl_kmem_caches | filter obj.skc_name == "zio_buf_512" | pp
name        ...            source total_memory util
----------- ... ----------------- ------------ ----
zio_buf_512 ... kmalloc-512[SLUB]       16.9MB    8
```

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9474
2019-10-18 13:24:28 -04:00
Matthew Macy c9c9c1e213 OpenZFS restructuring - ARC memory pressure
Factor Linux specific memory pressure handling out of ARC.  Each
platform will have different available interfaces for managing memory
pressure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9472
2019-10-18 13:23:19 -04:00
Ryan Moeller 4313a5b4c5 Detect if sed supports --in-place
Not all versions of sed have the --in-place flag. Detect support for
the flag during ./configure and provide a fallback mechanism for those
systems where sed's behavior differs. The autoconf variable
${ac_inplace} can be used to choose the correct flags for editing a
file in place with sed.

Replace violating usages in Makefile.am with ${ac_inplace}.

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9463
2019-10-16 19:19:48 -07:00
Matthew Macy 08f530c699 Make zfsdev_getminor signature cross platform
Only pass the file descriptor to make zfsdev_get_miror() portable.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9466
2019-10-16 18:43:52 -07:00
Matthew Macy 0e939e434a Move linux specific mmp module_param_call handler to platform code
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9465
2019-10-16 18:37:31 -07:00
Matthew Macy cf2eba666e Add default case to lua kernel code
Some platforms, e.g. FreeBSD, support user space setjmp
semantics in kernel.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9450
2019-10-16 18:36:15 -07:00
Matthew Macy 47d57dbccf Fix signature for private functions without header declarations
Clang will complain if a function has no prior declaration

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9467
2019-10-16 18:34:19 -07:00
Matthew Macy 177c79dfbe Remove dead code and cleanup scoping in dmu_send.c
This addresses a number of problems with dmu_send.c:

* bp_span is unused which makes clang complain
* dump_write conflicts with FreeBSD's existing core dump code
* range_alloc is private to the file and not declared in any headers
  causing clang to complain

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9432
2019-10-13 19:25:19 -07:00
Paul Dagnelie 511fce6b1f Don't call sizeof on void
We get the sizeof the appropriate type, and don't cast away const.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9455
2019-10-13 19:21:51 -07:00
Matthew Macy cdbba101f4 Move zfs_onexit_fd_hold to platform code
FreeBSD has a very different implementation.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9442
2019-10-13 19:19:39 -07:00
Matthew Macy c324701332 Move zio_delay_interrupt to platform code
FreeBSD has its own implementation as do other platforms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9439
2019-10-13 19:15:27 -07:00
Brian Behlendorf d292307212 Modify sharenfs=on default behavior
While it may sometimes be convenient to export an NFS filesystem with
no_root_squash it should not be the default behavior.  Align the
default behavior with the Linux NFS server defaults.  To restore
the previous behavior use 'zfs set sharenfs="no_root_squash,..."'.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9397
Closes #9425
2019-10-13 19:13:26 -07:00
chrisrd 2c6fa6eafb Typo fix in comment: dso_dryrun
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #9452
2019-10-11 10:33:13 -07:00
Brian Behlendorf ea25ed2365 ZTS: Fix zpool_status_-s
After commit 5e74ac51 which split and reordered the run files the
`zpool_status_-s` test began failing.  The new ordering placed
the test after a previous test which used `zpool replace` to replace
a disk but did not clear its label.  This resulted in the next test,
`zpool_status_-s`, failing because of the potentially active
pool being detected on the replaced vdev.

    /dev/loop0 is part of potentially active pool 'testpool'

Use the default_mirror_setup_noexit() and default_cleanup_noexit()
functions to create the pool in `zpool_status_-s`.  They use the -f
flag by default.

In the `scrub_after_resilver` test wipe the label during cleanup
to prevent future failures if the tests are again reordered.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9451
2019-10-11 10:22:20 -07:00
Paul Dagnelie 516a83f886 Function name and comment updates
Rename certain functions for more consistency when they share common 
features. Make comments clearer about what arguments should be passed 
to the insert and add functions.

Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9441
2019-10-11 10:13:21 -07:00
Matthew Macy bce795ad7a Remove linux/mod_compat.h from common code
It is no longer necessary; mod_compat.h is included from zfs_context.h.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9449
2019-10-11 10:10:20 -07:00
Matthew Macy af1698f59b Expose dmu_buf_hold_array_by_dnode to platform code
FreeBSD uses this in its pager ops routines

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9431
2019-10-11 10:06:18 -07:00
Richard Yao 803884217f Implement ZPOOL_IMPORT_UDEV_TIMEOUT_MS
Since 0.7.0, zpool import would unconditionally block on udev for 30
seconds. This introduced a regression in initramfs environments that
lack udev (particularly mdev based environments), yet use a zfs userland
tools intended for the system that had been built against udev. Gentoo's
genkernel is the main example, although custom user initramfs
environments would be similarly impacted unless special builds of the
ZFS userland utilities were done for them.  Such environments already
have their own mechanisms for blocking until device nodes are ready
(such as genkernel's scandelay parameter), so it is unnecessary for
zpool import to block on a non-existent udev until a timeout is reached
inside of them.

Rather than trying to intelligently determine whether udev is available
on the system to avoid unnecessarily blocking in such environments, it
seems best to just allow the environment to override the timeout.  I
propose that we add an environment variable called
ZPOOL_IMPORT_UDEV_TIMEOUT_MS. Setting it to 0 would restore the 0.6.x
behavior that was more desirable in mdev based initramfs environments.
This allows the system user land utilities to be reused when building
mdev-based initramfs archives.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #9436
2019-10-11 09:52:48 -07:00
Ryan Moeller d96635a5c7 Clarify loop variable name in zfs copies test
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9445
2019-10-11 09:50:46 -07:00
Ryan Moeller 767f65cf73 Fix some style nits in tests
Mostly whitespace changes, no functional changes intended.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9447
2019-10-11 09:49:48 -07:00
loli10K 715c996d3b Fix pool creation with feature@allocation_classes disabled
When "feature@allocation_classes" is not enabled on the pool no vdev
with "special" or "dedup" allocation type should be allowed to exist in
the vdev tree.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9427 
Closes #9429
2019-10-10 16:39:41 -07:00
Matthew Macy 2516a87821 Move get_temporary_prop to platform code
Temporary property handling at the VFS layer requires
platform specific code.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9401
2019-10-10 15:59:34 -07:00
Matthew Macy 6501906280 Add kmem cache accessors
Make the metaslab platform agnostic again by adding
accessor functions which can be implemented by each
platform.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9404
2019-10-10 15:45:52 -07:00
Matthew Macy eedb3a62b9 Make zil_async_to_sync visible to platform code
FreeBSD's zvol platform code requires access to the
zil_async_to_sync() function.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9440
2019-10-10 15:39:44 -07:00
Brian Behlendorf f3dc4a85e9 Update zfs program command usage
Update the zfs(8) man page to clearly describe that arguments for
channel programs are to be listed after the -- sentinel which
terminates argument processing.  This behavior is supported by
getopt on Linux, FreeBSD, and Illumos according to each platforms
respective man pages.

    zfs program [-jn] [-t instruction-limit] [-m memory-limit]
        pool script [--] arg1 ...

Reviewed-by: Clint Armstrong <clint@clintarmstrong.net>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9056 
Closes #9428
2019-10-10 09:49:45 -07:00
Matthew Macy e4f5fa1229 Fix strdup conflict on other platforms
In the FreeBSD kernel the strdup signature is:

```
char	*strdup(const char *__restrict, struct malloc_type *);
```

It's unfortunate that the developers have chosen to change
the signature of libc functions - but it's what I have to
deal with.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9433
2019-10-10 09:47:06 -07:00
Matthew Macy c5858ff946 Make clang happy with vdev_raidz_ code
The macros are used to generate code for conditions without a
corresponding branch. This is not a problem in practice, but
clang has no way of knowing that. Add a default branch with a
VERIFY(0) to indicate that it "can't happen"

```
In file included from \
/usr/home/mmacy/devel/ZoF/module/zfs/vdev_raidz_math_sse2.c:607:
/usr/home/mmacy/devel/ZoF/module/zfs/vdev_raidz_math_impl.h:281:3: \
error: no case matching constant switch condition '3' [-Werror]
```

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9434
2019-10-10 09:45:37 -07:00
Ryan Moeller 381d91de7d Sort list of generated files in configure.ac
No functional change, just a quality of life improvement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9426
2019-10-09 10:54:22 -07:00
Ryan Moeller 5e74ac51c7 Move platform independent tests to a shared runfile
Tests that aren't limited to running on Linux can be moved to a common
runfile to be shared with other platforms.

The test runner and wrapper script are enhanced to allow specifying
multiple runfiles as a comma-separated list. The default runfiles are
now "common.run,PLATFORM.run" where PLATFORM is determined at run time.

Sections in runfiles that share a path with another runfile can append
a colon separator and an identifier to the path in the section
name, ie `[tests/functional/atime:Linux]`, to avoid overriding the tests
specified by other runfiles.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9391
2019-10-09 10:39:26 -07:00
Paul Dagnelie ca5777793e Reduce loaded range tree memory usage
This patch implements a new tree structure for ZFS, and uses it to 
store range trees more efficiently.

The new structure is approximately a B-tree, though there are some 
small differences from the usual characterizations. The tree has core 
nodes and leaf nodes; each contain data elements, which the elements 
in the core nodes acting as separators between its children. The 
difference between core and leaf nodes is that the core nodes have an 
array of children, while leaf nodes don't. Every node in the tree may 
be only partially full; in most cases, they are all at least 50% full 
(in terms of element count) except for the root node, which can be 
less full. Underfull nodes will steal from their neighbors or merge to 
remain full enough, while overfull nodes will split in two. The data 
elements are contained in tree-controlled buffers; they are copied 
into these on insertion, and overwritten on deletion. This means that 
the elements are not independently allocated, which reduces overhead, 
but also means they can't be shared between trees (and also that 
pointers to them are only valid until a side-effectful tree operation 
occurs). The overhead varies based on how dense the tree is, but is 
usually on the order of about 50% of the element size; the per-node 
overheads are very small, and so don't make a significant difference. 
The trees can accept arbitrary records; they accept a size and a 
comparator to allow them to be used for a variety of purposes.

The new trees replace the AVL trees used in the range trees today. 
Currently, the range_seg_t structure contains three 8 byte integers 
of payload and two 24 byte avl_tree_node_ts to handle its storage in 
both an offset-sorted tree and a size-sorted tree (total size: 64 
bytes). In the new model, the range seg structures are usually two 4 
byte integers, but a separate one needs to exist for the size-sorted 
and offset-sorted tree. Between the raw size, the 50% overhead, and 
the double storage, the new btrees are expected to use 8*1.5*2 = 24 
bytes per record, or 33.3% as much memory as the AVL trees (this is 
for the purposes of storing metaslab range trees; for other purposes, 
like scrubs, they use ~50% as much memory).

We reduced the size of the payload in the range segments by teaching 
range trees about starting offsets and shifts; since metaslabs have a 
fixed starting offset, and they all operate in terms of disk sectors, 
we can store the ranges using 4-byte integers as long as the size of 
the metaslab divided by the sector size is less than 2^32. For 512-byte
sectors, this is a 2^41 (or 2TB) metaslab, which with the default
settings corresponds to a 256PB disk. 4k sector disks can handle 
metaslabs up to 2^46 bytes, or 2^63 byte disks. Since we do not 
anticipate disks of this size in the near future, there should be 
almost no cases where metaslabs need 64-byte integers to store their 
ranges. We do still have the capability to store 64-byte integer ranges 
to account for cases where we are storing per-vdev (or per-dnode) trees, 
which could reasonably go above the limits discussed. We also do not 
store fill information in the compact version of the node, since it 
is only used for sorted scrub.

We also optimized the metaslab loading process in various other ways
to offset some inefficiencies in the btree model. While individual
operations (find, insert, remove_from) are faster for the btree than 
they are for the avl tree, remove usually requires a find operation, 
while in the AVL tree model the element itself suffices. Some clever 
changes actually caused an overall speedup in metaslab loading; we use 
approximately 40% less cpu to load metaslabs in our tests on Illumos.

Another memory and performance optimization was achieved by changing 
what is stored in the size-sorted trees. When a disk is heavily 
fragmented, the df algorithm used by default in ZFS will almost always 
find a number of small regions in its initial cursor-based search; it 
will usually only fall back to the size-sorted tree to find larger 
regions. If we increase the size of the cursor-based search slightly, 
and don't store segments that are smaller than a tunable size floor 
in the size-sorted tree, we can further cut memory usage down to 
below 20% of what the AVL trees store. This also results in further 
reductions in CPU time spent loading metaslabs.

The 16KiB size floor was chosen because it results in substantial memory 
usage reduction while not usually resulting in situations where we can't 
find an appropriate chunk with the cursor and are forced to use an 
oversized chunk from the size-sorted tree. In addition, even if we do 
have to use an oversized chunk from the size-sorted tree, the chunk 
would be too small to use for ZIL allocations, so it isn't as big of a 
loss as it might otherwise be. And often, more small allocations will 
follow the initial one, and the cursor search will now find the 
remainder of the chunk we didn't use all of and use it for subsequent 
allocations. Practical testing has shown little or no change in 
fragmentation as a result of this change.

If the size-sorted tree becomes empty while the offset sorted one still 
has entries, it will load all the entries from the offset sorted tree 
and disregard the size floor until it is unloaded again. This operation 
occurs rarely with the default setting, only on incredibly thoroughly 
fragmented pools.

There are some other small changes to zdb to teach it to handle btrees, 
but nothing major.
                                           
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed by: Sebastien Roy seb@delphix.com
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9181
2019-10-09 10:36:03 -07:00
Igor K d0a84ba92b ZTS: Fix mmp_hostid test
Correctly use the `mntpnt_fs` variable, and include additional
logic to ensure the /etc/hostid is correct set up and cleaned up.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9349
2019-10-08 13:40:17 -07:00
George Melikov 7b50929851 module/Makefile.in: don't run xargs if empty
If stdin if empty - don't run xargs command,
otherwise we can get `cp: missing file operand`
error.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #9418
2019-10-08 10:10:23 -07:00
Brian Behlendorf 94bcf6f5e3 ZTS: Fix trim/trim_config and trim/autotrim_config
There have been occasional CI failures which occur when the trimmed
vdev size exactly matches the target size.  Resolve this by slightly
relaxing the conditional and checking for -ge rather than -gt.  In
all of the cases observer, the values match exactly.  For example:

    Failure /mnt/trim-vdev1 is 768 MB which is not -gt than 768 MB

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9399
2019-10-04 12:38:07 -07:00
Brian Behlendorf 6bd4f4545d Fix automount for root filesystems
Commit 093bb64 resolved an automount failures for chroot'd processes
but inadvertently broke automounting for root filesystems where the
vfs_mntpoint is NULL.  Resolve the issue by checking for NULL in order
to generate the correct path.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9381
Closes #9384
2019-10-04 12:30:51 -07:00
Matthew Macy 2cc479d049 Rename rangelock_ functions to zfs_rangelock_
A rangelock KPI already exists on FreeBSD.  Add a zfs_ prefix as
per our convention to prevent any conflict with existing symbols.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9402
2019-10-03 15:54:29 -07:00
Tony Nguyen 64b6c47d90 dbuf_hold_impl() cleanup to improve cached read performance
Currently every dbuf_hold_impl() incurs kmem_alloc() and kmem_free()
which can be costly for cached read performance.

This change reverts the dbuf_hold_impl() fix stack commit, i.e.
fc5bb51f08 to eliminate the extra
kmem_alloc() and kmem_free() operations and improve cached read
performance. With the change, each dbuf_hold_impl() frame uses 40 bytes
more, total of 800 for 20 recursive levels. Linux kernel stack sizes are
8K and 16K for 32bit and 64bit, respectively, so stack overrun risk is
limited.

Sample stack output comparisons with 50 PB file and recordsize=512
Current code
 11)     2240      64   arc_alloc_buf+0x4a/0xd0 [zfs]
 12)     2176     264   dbuf_read_impl.constprop.16+0x2e3/0x7f0 [zfs]
 13)     1912     120   dbuf_read+0xe5/0x520 [zfs]
 14)     1792      56   dbuf_hold_impl_arg+0x572/0x630 [zfs]
 15)     1736      64   dbuf_hold_impl_arg+0x508/0x630 [zfs]
 16)     1672      64   dbuf_hold_impl_arg+0x508/0x630 [zfs]
 17)     1608      40   dbuf_hold_impl+0x23/0x40 [zfs]
 18)     1568      40   dbuf_hold_level+0x32/0x60 [zfs]
 19)     1528      16   dbuf_hold+0x16/0x20 [zfs]

dbuf_hold_impl() cleanup
 11)     2320      64   arc_alloc_buf+0x4a/0xd0 [zfs]
 12)     2256     264   dbuf_read_impl.constprop.17+0x2e3/0x7f0 [zfs]
 13)     1992     120   dbuf_read+0xe5/0x520 [zfs]
 14)     1872      96   dbuf_hold_impl+0x50f/0x5e0 [zfs]
 15)     1776     104   dbuf_hold_impl+0x4df/0x5e0 [zfs]
 16)     1672     104   dbuf_hold_impl+0x4df/0x5e0 [zfs]
 17)     1568      40   dbuf_hold_level+0x32/0x60 [zfs]
 18)     1528      16   dbuf_hold+0x16/0x20 [zfs]

Performance observations on 8K recordsize filesystem:
- 8/128/1024K at 1-128 sequential cached read, ~3% improvement

Testing done on Ubuntu 18.04 with 4.15 kernel, 8vCPUs and SSD storage on
VMware ESX.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #9351
2019-10-03 15:33:38 -07:00
Matthew Macy 73cdcc6323 OpenZFS restructuring - libzfs
Factor Linux specific functionality out of libzfs.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Macy <mmacy@FreeBSD.org>
Closes #9377
2019-10-03 10:33:16 -07:00
Matthew Macy 7c5eff9400 OpenZFS restructuring - libzutil
Factor Linux specific functionality out of libzutil.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9356
2019-10-03 10:20:44 -07:00
Brian Behlendorf e1c216fb0c ZTS: Fix upgrade_readonly_pool
Update cleanup_upgrade to use destroy_dataset and destroy_pool
when performing cleanup.  These wrappers retry if the pool is busy
preventing occasional failures like those observed when running
tests upgrade_readonly_pool.  For example:

    SUCCESS: test enabled == enabled
    User accounting upgrade is not executed on readonly pool
    NOTE: Performing local cleanup via log_onexit (cleanup_upgrade)
    cannot destroy 'testpool': pool is busy
    ERROR: zpool destroy testpool exited 1

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9400
2019-10-03 09:39:13 -07:00
Didier Roche 8ae8b2a144 Workaround to avoid a race when /var/lib is a persistent dataset
If /var/lib is a dataset not under <pool>/ROOT/<root_dataset>, as
proposed in the ubuntu root on zfs upstream guide
(https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS),
we end up with a race where some services, like systemd-random-seed
are writing under /var/lib, while zfs-mount is called. zfs mount will
then potentially fail because of /var/lib isn't empty and so, can't be
mounted.
Order those 2 units for now (more may be needed) as we can't declare
virtually a provide mount point to match
"RequiresMountsFor=/var/lib/systemd/random-seed" from
systemd-random-seed.service.
The optional generator for zfs 0.8 fixes it, but it's not enabled
by default nor necessarily required.

Example:
- rpool/ROOT/ubuntu (mountpoint = /)
- rpool/var/ (mountpoint = /var)
- rpool/var/lib  (mountpoint = /var/lib)

Both zfs-mount.service and systemd-random-seed.service are starting
After=systemd-remount-fs.service. zfs-mount.service should be done
before local-fs.target while systemd-random-seed.service should finish
before sysinit.target (which is a later target).
Ideally, we would have a way for zfs mount -a unit to declare all paths
or move systemd-random-seed after local-fs.target.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Didier Roche <didrocks@ubuntu.com>
Closes #9360
2019-10-02 10:51:55 -07:00
Matthew Macy d31277abb1 OpenZFS restructuring - libspl
Factor Linux specific pieces out of libspl.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9336
2019-10-02 10:39:48 -07:00
Matthew Macy 6360e2779e Add inode accessors to common code
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9389
2019-10-02 09:15:12 -07:00
Matthew Macy 13a4027a7c OpenZFS restructuring - arc_stats
Make arc_stats visible to platform code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9386
2019-10-01 16:35:05 -07:00
Matthew Macy 7111c86ca3 Enable clang to use intrinsics for lz4
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9385
2019-10-01 13:17:32 -07:00
dacianstremtan bd76e6817c Fix for zfs-dracut regression
Line 31 and 32 overwrote the ${root} variable which broke mount-zfs.sh
We have create a new variable for the dataset instead of overwriting the
${root} variable in zfs-load-key.sh${root} variable in zfs-load-key.sh

Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dacian Reece-Stremtan <dacianstremtan@gmail.com>
Closes #8913 
Closes #9379
2019-10-01 12:54:27 -07:00
Brian Behlendorf 608f8749a1 Perform KABI checks in parallel
Reduce the time required for ./configure to perform the needed
KABI checks by allowing kbuild to compile multiple test cases in
parallel.  This was accomplished by splitting each test's source
code from the logic handling whether that code could be compiled
or not.

By introducing this split it's possible to minimize the number of
times kbuild needs to be invoked.  As importantly, it means all of
the tests can be built in parallel.  This does require a little extra
care since we expect some tests to fail, so the --keep-going (-k)
option must be provided otherwise some tests may not get compiled.
Furthermore, since a failure during the kbuild modpost phase will
result in an early exit; the final linking phase is limited to tests
which passed the initial compilation and produced an object file.

Once everything has been built the configure script proceeds as
previously.  The only significant difference is that it now merely
needs to test for the existence of a .ko file to determine the
result of a given test.  This vastly speeds up the entire process.

New test cases should use ZFS_LINUX_TEST_SRC to declare their test
source code and ZFS_LINUX_TEST_RESULT to check the result.  All of
the existing kernel-*.m4 files have been updated accordingly, see
config/kernel-current-time.m4 for a basic example.  The legacy
ZFS_LINUX_TRY_COMPILE macro has been kept to handle special cases
but it's use is not encouraged.

                  master (secs)   patched (secs)
                  -------------   ----------------
autogen.sh        61              68
configure         137             24  (~17% of current run time)
make -j $(nproc)  44              44
make rpms         287             150

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8547 
Closes #9132
Closes #9341
2019-10-01 12:50:34 -07:00
Prakash Surya 99573cc053 Timeout waiting for ZVOL device to be created
We've seen cases where after creating a ZVOL, the ZVOL device node in
"/dev" isn't generated after 20 seconds of waiting, which is the point
at which our applications gives up on waiting and reports an error.

The workload when this occurs is to "refresh" 400+ ZVOLs roughly at the
same time, based on a policy set by the user. This refresh operation
will destroy the ZVOL, and re-create it based on a snapshot.

When this occurs, we see many hundreds of entries on the "z_zvol" taskq
(based on inspection of the /proc/spl/taskq-all file). Many of the
entries on the taskq end up in the "zvol_remove_minors_impl" function,
and I've measured the latency of that function:

Function = zvol_remove_minors_impl
msecs               : count     distribution
  0 -> 1          : 0        |                                        |
  2 -> 3          : 0        |                                        |
  4 -> 7          : 1        |                                        |
  8 -> 15         : 0        |                                        |
 16 -> 31         : 0        |                                        |
 32 -> 63         : 0        |                                        |
 64 -> 127        : 1        |                                        |
128 -> 255        : 45       |****************************************|
256 -> 511        : 5        |****                                    |

That data is from a 10 second sample, using the BCC "funclatency" tool.
As we can see, in this 10 second sample, most calls took 128ms at a
minimum. Thus, some basic math tells us that in any 20 second interval,
we could only process at most about 150 removals, which is much less
than the 400+ that'll occur based on the workload.

As a result of this, and since all ZVOL minor operations will go through
the single threaded "z_zvol" taskq, the latency for creating a single
ZVOL device can be unreasonably large due to other ZVOL activity on the
system. In our case, it's large enough to cause the application to
generate an error and fail the operation.

When profiling the "zvol_remove_minors_impl" function, I saw that most
of the time in the function was spent off-cpu, blocked in the function
"taskq_wait_outstanding". How this works, is "zvol_remove_minors_impl"
will dispatch calls to "zvol_free" using the "system_taskq", and then
the "taskq_wait_outstanding" function is used to wait for all of those
dispatched calls to occur before "zvol_remove_minors_impl" will return.

As far as I can tell, "zvol_remove_minors_impl" doesn't necessarily have
to wait for all calls to "zvol_free" to occur before it returns. Thus,
this change removes the call to "taskq_wait_oustanding", so that calls
to "zvol_free" don't affect the latency of "zvol_remove_minors_impl".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #9380
2019-10-01 12:33:12 -07:00
Matthew Macy 3283f137d7 OpenZFS restructuring - zpool
Factor Linux specific functions out of the zpool command.
    
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9333
2019-09-30 12:16:06 -07:00
Matthew Macy 7bb0c29468 OpenZFS restructuring - zfs_ioctl
Refactor the zfs ioctls in to platform dependent and independent bits.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Signed-off-by: Matthew Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9301
2019-09-27 10:46:28 -07:00
Ben McGough 3768db24ab Adding slack notifier
Allow ZED notification via slack incoming webhook.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Ben McGough <bmcgough@fredhutch.org>
Closes #9076
Closes #9350
2019-09-26 09:52:10 -07:00
Tom Caputi bb61cc3185 Fix encryption hierarchy issues with zfs recv -d
Currently, the recv_fix_encryption_hierarchy() function accepts
'destsnap' as one of its parameters. Originally, this was intended
to be the top-level dataset of a receive (whether or not the
receive was recursive). Unfortunately, this parameter actually is
simply the input that is passed in from the command line. When
the user specifies 'zfs recv -d', this string is actually only the
name of the receiving pool since the rest of the name is derived
from the send stream. This causes the function to fail, leaving
some datasets with an invalid encryption hierarchy.

This patch resolves this problem by passing in the top_zfs variable
instead. In order to make this work, this patch also includes some
changes that ensure the value is always present when we need it.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9273
Closes #9309
2019-09-25 17:02:32 -07:00
Ryan Moeller 479d7d3ca6 ZTS: Fix typos in zfs_destroy tests cleanup
lot_must -> log_must

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9362
2019-09-25 11:42:04 -07:00
Brian Behlendorf 5e5cefbaee ZTS: harden xattr/cleanup.ksh
When the xattr/cleanup.ksh script is unable to remove the test group
due to an active process then it will not call default_cleanup.  This
will result in a zvol_ENOSPC/setup failure when attempting to create
the /mnt/testdir directory which will already exist.

Resolve the issue by performing the default_cleanup before removing
the test user and group to ensure this step always happens.  Also
allow one more retry to further minimize the likelihood of the
cleanup failing.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9358
2019-09-25 09:24:45 -07:00
Brian Behlendorf f81d5ef686 Add warning for zfs_vdev_elevator option removal
Originally the zfs_vdev_elevator module option was added as a
convenience so the requested elevator would be automatically set
on the underlying block devices.  At the time this was simple
because the kernel provided an API function which did exactly this.

This API was then removed in the Linux 4.12 kernel which prompted
us to add compatibly code to set the elevator via a usermodehelper.
While well intentioned this introduced a bug which could cause a
system hang, that issue was subsequently fixed by commit 2a0d4188.

In order to avoid future bugs in this area, and to simplify the code,
this functionality is being deprecated.  A console warning has been
added to notify any existing consumers and the documentation updated
accordingly.  This option will remain for the lifetime of the 0.8.x
series for compatibility but if planned to be phased out of master.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #8664
Closes #9317
2019-09-25 09:23:29 -07:00
Matthew Macy 5df7e9d85c OpenZFS restructuring - zvol
Refactor the zvol in to platform dependent and independent bits.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9295
2019-09-25 09:20:30 -07:00
loli10K d359e99c38 diff_cb() does not handle large dnodes
Trying to 'zfs diff' a snapshot with large dnodes will incorrectly try
to access its interior slots when dnodesize > sizeof(dnode_phys_t).
This is normally not an issue because the interior slots are
zero-filled, which report_dnode() handles calling
report_free_dnode_range(). However this is not the case for encrypted
large dnodes or filesystem using many SA based xattrs where the extra
data past the legacy dnode size boundary is interpreted as a
dnode_phys_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7678 
Closes #8931 
Closes #9343
2019-09-24 12:01:37 -07:00
Ryan Moeller 73d7820bba Use signed types to prevent subtraction overflow
The difference between the sizes could be positive or negative. Leaving
the types as unsigned means the result overflows when the difference is
negative and removing the labs() means we'll have introduced a bug. The
subtraction results in the correct value when the unsigned integer is
interpreted as a signed integer by labs().

Clang doesn't see that we're doing a subtraction and abusing the types.
It sees the result of the subtraction, an unsigned value, being passed
to an absolute value function and emits a warning which we treat as an
error.

Reviewed by: Youzhong Yang <youzhong@gmail.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9355
2019-09-22 15:27:53 -07:00
Kody A Kantor d49d7336dd Disabled resilver_defer feature leads to looping resilvers
When a disk is replaced with another on a pool with the resilver_defer
feature present, but not enabled the resilver activity restarts during
each spa_sync. This patch checks to make sure that the resilver_defer
feature is first enabled before requesting a deferred resilver.

This was originally fixed in illumos-joyent as OS-7982.

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Signed-off-by: Kody A Kantor <kody@kkantor.com>
External-issue: illumos-joyent OS-7982
Closes #9299 
Closes #9338
2019-09-22 15:25:39 -07:00
Ryan Moeller afc8f0a6ff Refactor libzfs_error_init newlines
Move the trailing newlines from the error message strings to the format
strings to more closely match the other error messages.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9330
2019-09-18 09:05:57 -07:00
Andriy Gapon dd262c9681 Fix dsl_scan_ds_clone_swapped logic
The was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.

In both cases, if ds1 was already present, then it would be first
replaced with ds2 and then ds would be replaced back with ds1.
Also, both cases did not properly handle a situation where both ds1 and
ds2 are already queued.  A duplicate insertion would be attempted and
its failure would result in a panic.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #9140 
Closes #9163
2019-09-18 09:04:45 -07:00
loli10K fcd37b622b Device removal of indirect vdev panics the kernel
This commit fixes a NULL pointer dereference triggered in
spa_vdev_remove_top_check() by trying to "zpool remove" an indirect
vdev.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9327
2019-09-16 10:46:59 -07:00
loli10K b24771a8c9 Prevent gcc -Werror=maybe-uninitialized warnings in spa_wait_common()
This commit fixes the following build failure detected on Debian9
(GCC 6.3.0):

     CC [M]  module/zfs/spa.o
   module/zfs/spa.c: In function ‘spa_wait_common.part.31’:
   module/zfs/spa.c:9468:6: error: ‘in_progress’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
      if (!in_progress || spa->spa_waiters_cancel || error)
         ^
   cc1: all warnings being treated as errors

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9326
2019-09-16 10:46:02 -07:00
loli10K 7e15647ce9 ZTS: Fix /usr/bin/env: 'python2': No such file or directory
Since 4f342e45 env(1) must be able to find a "python2" executable in
the "constrained path" on systems configured with --with-python=2.x
otherwise the ZFS Test Suite won't be able to use Python scripts.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9325
2019-09-16 10:44:51 -07:00
Tom Caputi 637f0c6019 Fix clone handling with encryption roots
Currently, spa_keystore_change_key_sync_impl() does not recurse
into clones when updating encryption roots for either a call to
'zfs promote' or 'zfs change-key'. This can cause children of
these clones to end up in a state where they point to the wrong
dataset as the encryption root. It can also trigger ASSERTs in
some cases where the code checks reference counts on wrapping
keys. This patch fixes this issue by ensuring that this function
properly recurses into clones during processing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9267 
Closes #9294
2019-09-16 10:07:33 -07:00
loli10K 2a0d41889e Scrubbing root pools may deadlock on kernels without elevator_change() (#9321)
Originally the zfs_vdev_elevator module option was added as a
convenience so the requested elevator would be automatically set
on the underlying block devices. At the time this was simple
because the kernel provided an API function which did exactly this.

This API was then removed in the Linux 4.12 kernel which prompted
us to add compatibly code to set the elevator via a usermodehelper.

Unfortunately changing the evelator via usermodehelper requires reading
some userland binaries, most notably modprobe(8) or sh(1), from a zfs
dataset on systems with root-on-zfs. This can deadlock the system if
used during the following call path because it may need, if the data
is not already cached in the ARC, reading directly from disk while
holding the spa config lock as a writer:

  zfs_ioc_pool_scan()
    -> spa_scan()
      -> spa_scan()
        -> vdev_reopen()
          -> vdev_elevator_switch()
            -> call_usermodehelper()

While the usermodehelper waits sh(1), modprobe(8) is blocked in the
ZIO pipeline trying to read from disk:

  INFO: task modprobe:2650 blocked for more than 10 seconds.
       Tainted: P           OE     5.2.14
  modprobe        D    0  2650    206 0x00000000
  Call Trace:
   ? __schedule+0x244/0x5f0
   schedule+0x2f/0xa0
   cv_wait_common+0x156/0x290 [spl]
   ? do_wait_intr_irq+0xb0/0xb0
   spa_config_enter+0x13b/0x1e0 [zfs]
   zio_vdev_io_start+0x51d/0x590 [zfs]
   ? tsd_get_by_thread+0x3b/0x80 [spl]
   zio_nowait+0x142/0x2f0 [zfs]
   arc_read+0xb2d/0x19d0 [zfs]
   ...
   zpl_iter_read+0xfa/0x170 [zfs]
   new_sync_read+0x124/0x1b0
   vfs_read+0x91/0x140
   ksys_read+0x59/0xd0
   do_syscall_64+0x4f/0x130
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This commit changes how we use the usermodehelper functionality from
synchronous (UMH_WAIT_PROC) to asynchronous (UMH_NO_WAIT) which prevents
scrubs, and other vdev_elevator_switch() consumers, from triggering the
aforementioned issue.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Issue #8664 
Closes #9321
2019-09-13 18:09:59 -07:00
John Gallagher e60e158eff Add subcommand to wait for background zfs activity to complete
Currently the best way to wait for the completion of a long-running
operation in a pool, like a scrub or device removal, is to poll 'zpool
status' and parse its output, which is neither efficient nor convenient.

This change adds a 'wait' subcommand to the zpool command. When invoked,
'zpool wait' will block until a specified type of background activity
completes. Currently, this subcommand can wait for any of the following:

 - Scrubs or resilvers to complete
 - Devices to initialized
 - Devices to be replaced
 - Devices to be removed
 - Checkpoints to be discarded
 - Background freeing to complete

For example, a scrub that is in progress could be waited for by running

    zpool wait -t scrub <pool>

This also adds a -w flag to the attach, checkpoint, initialize, replace,
remove, and scrub subcommands. When used, this flag makes the operations
kicked off by these subcommands synchronous instead of asynchronous.

This functionality is implemented using a new ioctl. The type of
activity to wait for is provided as input to the ioctl, and the ioctl
blocks until all activity of that type has completed. An ioctl was used
over other methods of kernel-userspace communiction primarily for the
sake of portability.

Porting Notes:
This is ported from Delphix OS change DLPX-44432. The following changes
were made while porting:

 - Added ZoL-style ioctl input declaration.
 - Reorganized error handling in zpool_initialize in libzfs to integrate
   better with changes made for TRIM support.
 - Fixed check for whether a checkpoint discard is in progress.
   Previously it also waited if the pool had a checkpoint, instead of
   just if a checkpoint was being discarded.
 - Exposed zfs_initialize_chunk_size as a ZoL-style tunable.
 - Updated more existing tests to make use of new 'zpool wait'
   functionality, tests that don't exist in Delphix OS.
 - Used existing ZoL tunable zfs_scan_suspend_progress, together with
   zinject, in place of a new tunable zfs_scan_max_blks_per_txg.
 - Added support for a non-integral interval argument to zpool wait.

Future work:
ZoL has support for trimming devices, which Delphix OS does not. In the
future, 'zpool wait' could be extended to add the ability to wait for
trim operations to complete.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
Closes #9162
2019-09-13 18:09:06 -07:00
Chengfei ZHu 7238cbd4d3 QAT related bug fixes
1. Fix issue:  Kernel BUG with QAT during decompression  #9276.
   Now it is uninterruptible for a specific given QAT request,
   but Ctrl-C interrupt still works in user-space process.

2. Copy the digest result to the buffer only when doing encryption,
   and vise-versa for decryption.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chengfei Zhu <chengfeix.zhu@intel.com>
Closes #9276 
Closes #9303
2019-09-12 13:33:44 -07:00
Ryan Moeller 4f342e45be Canonicalize Python shebangs
/usr/bin/env python3 is the suggested[1] shebang for Python in general
(likewise for python2) and is conventional across platforms. This eases
development on systems where python is not installed in /usr/bin
(FreeBSD for example) and makes it possible to develop in virtual
environments (venv) for isolating dependencies.

Many packaging guidelines discourage the use of /usr/bin/env, but since
this is the canonical way of writing shebangs in the Python community,
many packaging scripts are already equipped to handle substituting the
appropriate absolute path to python automatically.

Some RPM package builders lacking brp-mangle-shebangs need a small
fallback mechanism in the package spec to stamp the appropriate shebang
on installed Python scripts.

[1]: https://docs.python.org/3/using/unix.html?#miscellaneous

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9314
2019-09-12 13:32:32 -07:00
Matthew Macy b01a6574ae Move objnode handling to common code
objnode is OS agnostic and used only by dmu_redact.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9315
2019-09-12 13:31:09 -07:00
Matthew Macy 74756182d2 Enable compiler to typecheck logging
Annotate spa logging declarations with printflike
Workaround gcc bug (non disable-able warning) by
replacing "" with " "

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9316
2019-09-12 13:28:26 -07:00
Matthew Macy d66620681d OpenZFS restructuring - move linux tracing code to platform directories
Move Linux specific tracing headers and source to platform directories
and update the build system.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9290
2019-09-11 14:25:53 -07:00
Tom Caputi 5815f7ac30 Fix stalled txg with repeated noop scans
Currently, the DSL scan code figures out when it should suspend
processing and allow a txg to continue by calling the function
dsl_scan_check_suspend(). Unfortunately, this function only
allows the scan to suspend at a level 0 block. In the event that
the system is scanning a bunch of empty snapshots or a resilver
is running with a high enough scn_cur_min_txg, the scan will
stop processing each dataset at the root level, deciding it
has nothing left to do. This means that the check_suspend
function is never called and the txg remains stuck until a
dataset is found that has data to scan.

This patch fixes the problem by allowing scans to suspend at
the root level of the objset. For backwards compatibility, we
use the bookmark <objsetid, 0, 0, 0> when we suspend here so
that older versions of the code will work as intended.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9300
2019-09-11 11:16:48 -07:00
Brian Behlendorf 5b51c15861 kmodtool: depmod path
Determine the location of depmod on the system, either /sbin/depmod or
/usr/sbin/depmod.  Then use that path when generating the specfile.

Additionally, update the Requires lines to reference the package which
provides depmod rather than the binary itself.  For CentOS/RHEL 7+8
and all supported Fedora releases this is the kmod package, and for
CentOS/RHEL 6 it is the module-init-tools package.

Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8724
Closes #9310
2019-09-11 11:14:50 -07:00
Brian Behlendorf 490e23cdf4 copy-builtin: SPL must be in Kbuild first (again)
Commit bced7e3 accidentally reintroduced issue #7595 which was
previously addressed by 517d247.  Re-apply the original fix to
resolve the issue and include a comment to make it clear the
ordering is important.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matthew Thode <prometheanfire@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9302
Closes #9208
2019-09-11 11:09:50 -07:00
Brian Behlendorf 25f06d677a Fix /etc/hostid on root pool deadlock
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9256 
Closes #9285
2019-09-10 13:42:30 -07:00
Ryan Moeller 562e1c0327 Add/generalize abstractions in arc_summary3
Code for interfacing with procfs for kstats and tunables is Linux-
specific. A more generic interface can be used for the abstractions of
loading kstats and various tunable parameters, allowing other platforms
to implement the functions cleanly. In a similar vein, determining the
ZFS/SPL version can be abstracted away in order for other platforms to
provide their own implementations of this function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9279
2019-09-10 13:27:53 -07:00
Ryan Moeller 6122948b3b Add/generalize abstraction in arc_summary2
A more generic interface can be used for the abstraction of loading
kstats, allowing other platforms to implement the function cleanly.

In a similar vein, loading tunables can be abstracted away in order for
other platforms to provide their own implementations of this function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9277
2019-09-10 12:17:54 -07:00
Brian Behlendorf b88ca2acf5 Enable SIMD for encryption
When adding the SIMD compatibility code in e5db313 the decryption of a
dataset wrapping key was left in a user thread context.  This was done
intentionally since it's a relatively infrequent operation.  However,
this also meant that the encryption context templates were initialized
using the generic operations.  Therefore, subsequent encryption and
decryption operations would use the generic implementation even when
executed by an I/O pipeline thread.

Resolve the issue by initializing the context templates in an I/O
pipeline thread.  And by updating zio_do_crypt_uio() to dispatch any
encryption operations to a pipeline thread when called from the user
context.  For example, when performing a read from the ARC.

Tested-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9215
Closes #9296
2019-09-10 10:45:46 -07:00
John Wren Kennedy b63e2d881f ZTS: Introduce targeted corruption in file blocks
filetest_001_pos verifies that various checksum algorithms detect
corruption by overwriting the underlying vdev on which a file resides.
It is possible for the overwrite to miss the blocks of a file, causing a
spurious failure. This change introduces a function to corrupt the
individual blocks of a file as determined by zdb.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #9288
2019-09-09 16:11:07 -07:00
Ryan Moeller 43f4495bde Clean up do_vol_test in zfs_copies tests
Get rid of the `get_used_prop` function. `get_prop used` works fine.

Fix the comment describing the function parameters. The type does not
have a default, and mntp is also used for ext2.

Rename the variable for the number of copies from `copy` to `copies`.

Use a `case` statement to match the type parameter, order the cases
alphabetically, and add a little sanity checking for good measure.

Use eval to make sure the output of commands is silenced rather than
the log messages when redirecting output to /dev/null.

Simplify cases where zfs requires special behavior.

Don't allow the test to loop forever in the event space usage does not
change. Bail out of the loop and fail after an arbitrary number of
iterations.

Add more information to the log message when the test fails, to help
debugging.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9286
2019-09-09 16:04:05 -07:00
Olaf Faaland 4bbf0477a7 BuildRequires libtirpc-devel needed for RHEL 8
Building against RHEL 8 requires libtirpc-devel, as with fedora 28.
Add rhel8 and centos8 options to the test, to account for that.

BuildRequires Originally added for fedora 28 via commit
1a62a305be

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #9289
2019-09-06 11:30:07 -07:00
Matthew Macy bced7e3aaa OpenZFS restructuring - move platform specific sources
Move platform specific Linux source under module/os/linux/
and update the build system accordingly.  Additional code
restructuring will follow to make the common code fully
portable.
    
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Macy <mmacy@FreeBSD.org>
Closes #9206
2019-09-06 11:26:26 -07:00
Tom Caputi 870e7a52c1 Fix noop receive of raw send stream
Currently, the noop receive code fails to work with raw send streams
and resuming send streams. This happens because zfs_receive_impl()
reads the DRR_BEGIN payload without reading the payload itself.
Normally, the kernel expects to read this itself, but in this case
the recv_skip() code runs instead and it is not prepared to handle
the stream being left at any place other than the beginning of a
record.

This patch resolves this issue by manually reading the DRR_BEGIN
payload in the dry-run case. This patch also includes a number of
small fixups in this code path.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9221
Closes #9173
2019-09-05 16:22:05 -07:00
Ryan Moeller 8e2c502cf3 Clean up zfs_clone_010_pos
Remove a lot of unnecessary setting and incrementing of `i`.

Remove unused variable `j`.

Instead of calling out to Python in a loop to generate the same string
repeatedly, generate the string once using shell constructs before
entering the loop.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9284
2019-09-05 16:20:09 -07:00
Matthew Macy 03fdcb9adc Make module tunables cross platform
Adds ZFS_MODULE_PARAM to abstract module parameter
setting to operating systems other than Linux.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9230
2019-09-05 14:49:49 -07:00
Serapheim Dimitropoulos 65a91b166e metaslab_verify_weight_and_frag() shouldn't cause side-effects
`metaslab_verify_weight_and_frag()` a verification function and
by the end of it there shouldn't be any side-effects.

The function calls `metaslab_weight()` which in turn calls
`metaslab_set_fragmentation()`. The latter can dirty and otherwise
not dirty metaslab fro the next TXGand set `metaslab_condense_wanted`
if the spacemaps were just upgraded (meaning we just enabled the
SPACEMAP_HISTOGRAM feature through upgrade).

This patch adds a new flag as a parameter to `metaslab_weight()` and
`metaslab_set_fragmentation()` making the dirtying of the metaslab
optional.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9185 
Closes #9282
2019-09-05 09:57:55 -07:00
Ryan Moeller 240c015ac6 Refactor checksum operations in tests
md5sum in particular but also sha256sum to a lesser extent is used
in several areas of the test suite for computing checksums. The vast
majority of invocations are followed by `| awk '{ print $1 }'`.

Introduce functions to wrap up `md5sum $file | awk '{ print $1 }'` and
likewise for sha256sum. These also serve as a convenient interface for
alternative implementations on other platforms.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9280
2019-09-05 09:51:59 -07:00
Matthew Macy 006e9a4088 OpenZFS restructuring - move platform specific headers
Move platform specific Linux headers under include/os/linux/.
Update the build system accordingly to detect the platform.
This lays some of the initial groundwork to supporting building
for other platforms.

As part of this change it was necessary to create both a user
and kernel space sys/simd.h header which can be included in
either context.  No functional change, the source has been
refactored and the relevant #include's updated.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Matthew Macy <mmacy@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9198
2019-09-05 09:34:54 -07:00
loli10K d02186ee2b Fix zpool subcommands error message with some unsupported options
Both 'detach' and 'online' zpool subcommands, when provided with an
unsupported option, forget to print it in the error message:

   # zpool online -t rpool vda3
   invalid option ''
   usage:
      online [-e] <pool> <device> ...

This changes fixes the error message in order to include the actual
option that is not supported.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9270
2019-09-04 13:36:25 -07:00
loli10K a49dbbbe06 Fix zfs-dkms .deb package warning in prerm script
Debian zfs-dkms package generated by alien doesn't call the prerm script
(rpm's %preun) with an integer as first parameter, which results in the
following warning when the package is uninstalled:

   "zfs-dkms.prerm: line 3: [: remove: integer expression expected"

Modify the if-condition to avoid the warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9271
2019-09-03 15:20:39 -07:00
Ryan Moeller a6a2877774 Use the right booleans
TRUE and FALSE happen to be defined, but we should use B_TRUE and
B_FALSE for the sake of consistency.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9264
2019-09-03 12:44:08 -07:00
Igor K e242b67cee Fix panic on DilOS with kstat per dataset statistics
Account for ZFS_MAX_DATASET_NAME_LEN in kstat data size.  This value
is ignored in the Linux kstat code but resolves the issue for other
platforms.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9254 
Closes #9151
2019-09-03 12:12:31 -07:00
Pavel Zakharov a91e4790a6 zvol_wait script should ignore partially received zvols
Partially received zvols won't have links in /dev/zvol.

Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Closes #9260
2019-09-03 11:29:52 -07:00
Andriy Gapon ebeb6f23bf Always refuse receving non-resume stream when resume state exists
This fixes a hole in the situation where the resume state is left from
receiving a new dataset and, so, the state is set on the dataset itself
(as opposed to %recv child).

Additionally, distinguish incremental and resume streams in error
messages.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #9252
2019-09-03 10:56:55 -07:00
Igor K 1a504d27df ZTS: Fix removal_cancel.ksh
Create a larger file to extend the time required to perform the 
removal.  Occasional failures were observed due to the removal
completing before the cancel could be requested.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9259
2019-09-03 10:46:40 -07:00
loli10K 6988f3ed9a Fix Intel QAT / ZFS compatibility on v4.7.1+ kernels
This change use the compat code introduced in 9cc1844a.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9268 
Closes #9269
2019-09-03 10:36:33 -07:00
George Wilson 1e52716257 maxinflight can overflow in spa_load_verify_cb()
When running on larger memory systems, we can overflow the value of
maxinflight. This can result in maxinflight having a value of 0 causing
the system to hang.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <george.wilson@delphix.com>
Closes #9272
2019-09-02 19:17:51 -07:00
Andrea Gelmini a57c82fc50 Fix typos
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9251
2019-09-02 18:17:39 -07:00
Andrea Gelmini c6e457dffb Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9250
2019-09-02 18:14:53 -07:00
Andrea Gelmini cb14aa4ca9 Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9249
2019-09-02 18:13:19 -07:00
Andrea Gelmini 4001f09055 Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9247
2019-09-02 18:12:01 -07:00
Andrea Gelmini 220dd4ae84 Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9246
2019-09-02 18:10:31 -07:00
Andrea Gelmini 24739cd5b0 Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9244
2019-09-02 18:08:56 -07:00
Andrea Gelmini 37e42197a4 Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9243
2019-09-02 18:07:35 -07:00
Andrea Gelmini ade306a9d4 Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9242
2019-09-02 17:58:26 -07:00
Andrea Gelmini e1cfd73f7f Fix typos in module/zfs/
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9240
2019-09-02 17:56:41 -07:00
Andrea Gelmini 7859537768 Fix typos in lib/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9237
2019-09-02 17:53:27 -07:00
Andrea Gelmini c953960048 Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9248
2019-08-30 16:53:48 -07:00
Andrea Gelmini b520706f29 Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9245
2019-08-30 16:52:00 -07:00
Andrea Gelmini 9f5c1bc609 Fix typos in module/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9241
2019-08-30 14:32:18 -07:00
Andrea Gelmini 9d40bdf414 Fix typos in modules/icp/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9239
2019-08-30 14:26:07 -07:00
Andrea Gelmini cf7c5a030e Fix typos in include/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9238
2019-08-30 09:53:15 -07:00
Andrea Gelmini 0463c95501 Fix typos in etc/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9236
2019-08-30 09:46:52 -07:00
Andrea Gelmini cd6b910b64 Fix typos in contrib/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9235
2019-08-30 09:44:43 -07:00
Andrea Gelmini ad0b23b14a Fix typos in cmd/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9234
2019-08-30 09:43:30 -07:00
Andrea Gelmini ac3d4d0cf6 Fix typos in man/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9233
2019-08-30 09:41:35 -07:00
Andrea Gelmini 2b96f77423 Fix typos in config/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9232
2019-08-30 09:40:30 -07:00
Igor K d39c71d365 Fix refquota_007_neg.ksh
Must use 'zfs' instead of '$ZFS' which is undefined.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9257
2019-08-30 09:32:25 -07:00
Paul Dagnelie 475aa97cab Prevent metaslab_sync panic due to spa_final_dirty_txg
If a pool enables the SPACEMAP_HISTOGRAM feature shortly before being
exported, we can enter a situation that causes a kernel panic. Any metaslabs
that are loaded during the final dirty txg and haven't already been condensed
will cause metaslab_sync to proceed after the final dirty txg so that the
condense can be performed, which there are assertions to prevent. Because of
the nature of this issue, there are a number of ways we can enter this
state. Rather than try to prevent each of them one by one, potentially missing
some edge cases, we instead cut it off at the point of intersection; by
preventing metaslab_sync from proceeding if it would only do so to perform a
condense and we're past the final dirty txg, we preserve the utility of the
existing asserts while preventing this particular issue.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9185
Closes #9186
Closes #9231
Closes #9253
2019-08-30 09:28:31 -07:00
Georgy Yakovlev e2fcfa70e3 etc/init.d/zfs-functions.in: remove arch warning
Remove the x86_64 warning, it's no longer the case that this is the
only supported architecture.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes: #9177
2019-08-29 13:14:48 -07:00
Ryan Moeller 815a645692 Simplify deleting partitions in libtest
Eliminate unnecessary code duplication. We can use a for-loop instead
of a while-loop. There is no need to echo $DISKSARRAY in a subshell or
return 0. Declare all variables with typeset.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9224
2019-08-29 13:11:29 -07:00
Ryan Moeller f66ad580cc Use compatible arg order in tests
BSD getopt() and getopt_long() want options before arguments.
Reorder arguments to zfs/zpool in tests to put all the options first.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9228
2019-08-29 11:03:09 -07:00
Paul Dagnelie eef0f4d84e Keep more metaslabs loaded
With the other metaslab changes loaded onto a system, we can 
significantly reduce the memory usage of each loaded metaslab and 
unload them on demand if there is memory pressure. However, none 
of those changes actually result in us keeping more metaslabs loaded. 
If we don't keep more metaslabs loaded, we will still have to wait 
for demand-loading to finish when no loaded metaslab can satisfy our 
allocation, which can cause ZIL performance issues. In addition,
performance is traditionally measured by IOs per unit time, while 
unloading is currently done on a txg-count basis. Txgs can take a 
widely varying range of times, from tenths of a second to several 
seconds. This can result in confusing, hard to predict behavior.

This change simply adds a time-based component to metaslab unloading. 
A metaslab will remain loaded for one minute and 8 txgs (by default) 
after it was last used, unless it is evicted due to memory pressure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
External-issue: DLPX-65016
External-issue: DLPX-65047
Closes #9197
2019-08-29 10:20:36 -07:00
Pavel Zakharov e6cebbf86e zfs_handle used after being closed/freed in change_one callback
This is a typical case of use after free. We would call zfs_close(zhp) 
which would free the handle, and then call zfs_iter_children() on that 
handle later.  This change ensures that the zfs_handle is only closed 
when we are ready to return.

Running `zfs inherit -r sharenfs pool` was failing with an error
code without any error messages. After some debugging I've pinpointed 
the issue to be memory corruption, which would cause zfs to try to 
issue an ioctl to the wrong device and receive ENOTTY.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Signed-off-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Issue #7967 
Closes #9165
2019-08-28 15:02:58 -07:00
Tony Nguyen 8d04284281 Use smaller default slack/delta value for schedule_hrtimeout_range()
For interrupt coalescing, cv_timedwait_hires() uses a 100us slack/delta
for calls to schedule_hrtimeout_range(). This 100us slack can be costly
for small writes.

This change improves small write performance by passing resolution `res`
parameter to schedule_hrtimeout_range() to be used as delta/slack. A new
tunable `spl_schedule_hrtimeout_slack_us` is added to preserve old
behavior when desired.

Performance observations on 8K recordsize filesystem:
- 8K random writes at 1-64 threads, up to 60% improvement for one thread
  and smaller gains as thread count increases. At >64 threads, 2-5%
  decrease in performance was observed.
- 8K sequential writes, similar 60% improvement for one thread and
  leveling out around 64 threads. At >64 threads, 5-10% decrease in
  performance was observed.
- 128K sequential write sees 1-5 for the 128K. No observed regression at
  high thread count.

Testing done on Ubuntu 18.04 with 4.15 kernel, 8vCPUs and SSD storage on
VMware ESX.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #9217
2019-08-28 14:56:54 -07:00
Brian Behlendorf 07a328dde4 ZTS: Temporarily disable several upgrade tests
Until issues #9185 and #9186 have been resolved the following zpool
upgrade tests are being disabled to prevent CI failures.

  zpool_upgrade_002_pos,
  zpool_upgrade_003_pos,
  zpool_upgrade_004_pos,
  zpool_upgrade_007_pos,
  zpool_upgrade_008_pos

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #9185
Issue #9186
Closes #9225
2019-08-28 14:52:08 -07:00
Don Brady 28c91ab66d Tag ABD pages for exclusion in kernel crash dumps
Tag the ABD data pages so that they can be identified for exclusion 
from kernel crash dumps. Eliminating the zfs file data allows for 
significantly smaller crash dump files. Note that ZFS in illumos has 
always excluded the zfs data pages from a kernel crash dump.

This change tags ARC scatter data pages so they can be identified from 
the makedumpfile(8) command. That command is used to create smaller 
dump files by ignoring some memory regions and using compression. It 
already filters file data from the VFS page cache and will now be able 
to exclude ZFS file data pages from the dump file.

A corresponding change to makeumpfile(8) is required to identify ZFS 
data pages.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #8899
2019-08-28 10:44:46 -07:00
Chunwei Chen 035e96118b Fix zil replay panic when TX_REMOVE followed by TX_CREATE
If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.

We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #7151
Closes #8910
Closes #9123
Closes #9145
2019-08-28 10:42:02 -07:00
Ryan Moeller 9c9dcd6e04 Prefer for (;;) to while (TRUE)
Defining a special constant to make an infinite loop is excessive,
especially when the name clashes with symbols commonly defined on
some platforms (ie FreeBSD).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9219
2019-08-28 10:38:40 -07:00
Andriy Gapon e6203d288a zfs_ioc_snapshot: check user-prop permissions on snapshotted datasets
Previously, the permissions were checked on the pool which was obviously
incorrect.

After this change, zfs_check_userprops() only validates the properties
without any permission checks.  The permissions are checked individually
for each snapshotted dataset.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #9179 
Closes #9180
2019-08-27 13:45:53 -07:00
Richard Allen f335b8ffe1 Fix Plymouth passphrase prompt in initramfs script
Entering the ZFS encryption passphrase under Plymouth wasn't working
because in the ZFS initrd script, Plymouth was calling zfs via
"--command", which wasn't passing through the filesystem argument to
zfs load-key properly (it was passing through the single quotes around
the filesystem name intended to handle spaces literally,
which zfs load-key couldn't understand).

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Signed-off-by: Richard Allen <belperite@gmail.com>
Issue #9193
Closes #9202
2019-08-27 13:44:02 -07:00
Tom Caputi e7a2fa70c3 Fix deadlock in 'zfs rollback'
Currently, the 'zfs rollback' code can end up deadlocked due to
the way the kernel handles unreferenced inodes on a suspended fs.
Essentially, the zfs_resume_fs() code path may cause zfs to spawn
new threads as it reinstantiates the suspended fs's zil. When a
new thread is spawned, the kernel may attempt to free memory for
that thread by freeing some unreferenced inodes. If it happens to
select inodes that are a a part of the suspended fs a deadlock
will occur because freeing inodes requires holding the fs's
z_teardown_inactive_lock which is still held from the suspend.

This patch corrects this issue by adding an additional reference
to all inodes that are still present when a suspend is initiated.
This prevents them from being freed by the kernel for any reason.

Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9203
2019-08-27 09:55:51 -07:00
Ryan Moeller 142f84dd19 Restore :: in Makefile.am
The double-colon looked like a typo, but it's actually an obscure
feature. Rules with :: may appear multiple times and are run
independently of one another in the order they appear. The use of ::
for distclean-local was conventional, not accidental.

Add comments to indicate the intentional use of double-colon rules.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9210
2019-08-26 11:48:31 -07:00
Paul Dagnelie 95f0144675 Add regression test for "zpool list -p"
Other than this test, zpool list -p is not well tested by any of the 
automated tests.  Add a test for zpool list -p.

Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9134
2019-08-25 18:33:03 -07:00
Ryan Moeller a18f8bce5c Split argument list, satisfy shellcheck SC2086
Split the arguments for ${TEST_RUNNER} across multiple lines for
clarity. Also added quotes in the message to match the invoked command.

Unquoted variables in argument lists are subject to splitting. In this
particular case we can't quote the variable because it is an optional
argument. Use the method suggested in the description linked below,
instead.

The technique is to use an unquoted variable with an alternate value.

https://github.com/koalaman/shellcheck/wiki/SC2086

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9212
2019-08-25 18:30:39 -07:00
Brian Behlendorf 4302698be1 ZTS: Fix in-tree dbufstats test case
Commit a887d653 updated the dbufstats such that escalated privileges
are required.  Since all tests under cli_user are run with normal
privileges move this test case to a location where it will be run
required privileges.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9118 
Closes #9196
2019-08-22 17:37:48 -07:00
Brian Behlendorf d1d1f8c37d Fix install error introduced by #9089 (#9205)
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
2019-08-22 17:31:38 -07:00
Ryan Moeller 97c54ea818 Make slog test setup more robust
The slog tests fail when attempting to create pools using file vdevs
that already exist from previous test runs. Remove these files in the
setup for the test.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9194
2019-08-22 17:26:50 -07:00
yshui 19d61d63fa zfs-mount-genrator: dependencies should be space-separated
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Closes #9174
2019-08-22 17:11:17 -07:00
Paul Dagnelie d1484fb189 Fix install error introduced by #9089
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
2019-08-22 12:01:41 -07:00
Brian Behlendorf 31b548ffb9 ZTS: Use decimal values when setting tunables
The mdb_set_uint32 function requires that the values passed in be
decimal.  This was overlooked initially because the matching Linux
function accepts both decimal and hexadecimal values.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9125
Closes #9195
2019-08-22 10:36:57 -07:00
Brian Behlendorf a3ba6e589f Fix automake program name transformations (#9190)
Automake can perform program name transformations at install time.
However, arc_summary has its own name transformation taking place,
which interferes with the automake transforms. The automake transforms
must be taken into account in order to resolve the conflict.

Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
2019-08-22 09:53:45 -07:00
Mauricio Faria de Oliveira 2f74950c5e Document ZFS_DKMS_ENABLE_DEBUGINFO in userland configuration
Document the ZFS_DKMS_ENABLE_DEBUGINFO option in the userland
configuration file, as done with the other ZFS_DKMS_* options.

It has been introduced with commit e45c1734a6 ("dkms: Enable
debuginfo option to be set with zfs sysconfig file") but isn't
mentioned anywhere other than the 'dkms.conf' file (generated).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Closes #9191
2019-08-22 09:48:48 -07:00
Ryan Moeller 0154a1e539 Dedup IOC enum values in libzfs_input_check
Reuse enum value ZFS_IOC_BASE for `('Z' << 8)`.
This is helpful on FreeBSD where ZFS_IOC_BASE has a different value and
`('Z' << 8)` is wrong.

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9188
2019-08-22 09:46:09 -07:00
Ryan Moeller f591a581d6 Enhance ioctl number checks
When checking ZFS_IOC_* numbers, print which numbers are wrong rather
than silently failing.

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9187
2019-08-22 09:44:11 -07:00
Brian Behlendorf 20f7b917aa ZTS: Fix vdev_zaps_005_pos on CentOS 6
The ancient version of blkid (v2.17.2) used in CentOS 6 will not
detect the newly created pool unless it has been written to.
Force a pool sync so `zpool import` will detect the newly created
pool.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9199
2019-08-22 08:53:44 -07:00
Tony Hutter a9ebdfdd43 Linux 5.3: Fix switch() fall though compiler errors
Fix some switch() fall-though compiler errors:

    abd.c:1504:9: error: this statement may fall through

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #9170
2019-08-21 09:29:23 -07:00
Ryan Moeller f66a1f88fb Minor cleanup in Makefile.am
Split long lines where adding license info to dist archive.

Remove extra colon from target line.

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9189
2019-08-21 09:01:59 -07:00
Alexey Smirnoff c759b33a51 zfs-functions.in: in_mtab() always returns 1
$fs used with the wrong sed command where should be $mntpnt instead
to match a variable exported by read_mtab()

The fix is mostly to reuse the sed command found in read_mtab()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Alexey Smirnoff <fling@member.fsf.org>
Closes #9168
2019-08-20 16:26:19 -07:00
Ryan Moeller 92a9e1da60 Fix automake program name transformations
Automake can perform program name transformations at install time.
However, arc_summary has its own name transformation taking place,
which interferes with the automake transforms. The automake transforms
must be taken into account in order to resolve the conflict.

Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
2019-08-20 17:46:40 -04:00
Matthew Ahrens 325d288c5d Add fast path for zfs_ioc_space_snaps() handling of empty_bpobj
When there are many snapshots, calls to zfs_ioc_space_snaps() (e.g. from
`zfs destroy -nv pool/fs@snap1%snap10000`) can be very slow, resulting
in poor performance because we are holding the dp_config_rwlock the
entire time, blocking spa_sync() from continuing.  With around ten
thousand snapshots, we've seen up to 500 seconds in this ioctl,
iterating over up to 50,000,000 bpobjs, ~99% of which are the empty
bpobj.

By creating a fast path for zfs_ioc_space_snaps() handling of the
empty_bpobj, we can achieve a ~5x performance improvement of this ioctl
(when there are many snapshots, and the deadlist is mostly
empty_bpobj's).

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-58348
Closes #8744
2019-08-20 11:34:52 -07:00
jdike 3beb0a7694 Fix lockdep circular locking false positive involving sa_lock
There are two different deadlock scenarios, but they share a common
link, which is
thread 1 holding sa_lock and trying to get zap->zap_rwlock:
    zap_lockdir_impl+0x858/0x16c0 [zfs]
    zap_lockdir+0xd2/0x100 [zfs]
    zap_lookup_norm+0x7f/0x100 [zfs]
    zap_lookup+0x12/0x20 [zfs]
    sa_setup+0x902/0x1380 [zfs]
    zfsvfs_init+0x3d6/0xb20 [zfs]
    zfsvfs_create+0x5dd/0x900 [zfs]
    zfs_domount+0xa3/0xe20 [zfs]

and thread 2 trying to get sa_lock, either in sa_setup:
   sa_setup+0x742/0x1380 [zfs]
   zfsvfs_init+0x3d6/0xb20 [zfs]
   zfsvfs_create+0x5dd/0x900 [zfs]
   zfs_domount+0xa3/0xe20 [zfs]
or in sa_build_index:
   sa_build_index+0x13d/0x790 [zfs]
   sa_handle_get_from_db+0x368/0x500 [zfs]
   zfs_znode_sa_init.isra.0+0x24b/0x330 [zfs]
   zfs_znode_alloc+0x3da/0x1a40 [zfs]
   zfs_zget+0x39a/0x6e0 [zfs]
   zfs_root+0x101/0x160 [zfs]
   zfs_domount+0x91f/0xea0 [zfs]

From there, there are different locking paths back to something
holding zap->zap_rwlock.

The deadlock scenarios involve multiple different ZFS filesystems
being mounted.  sa_lock is common to these scenarios, and the sa
struct involved is private to a mount.  Therefore, these must be
referring to different sa_lock instances and these deadlocks can't
occur in practice.

The fix, from Brian Behlendorf, is to remove sa_lock from lockdep
coverage by initializing it with MUTEX_NOLOCKDEP.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jeff Dike <jdike@akamai.com>
Closes #9110
2019-08-19 16:04:26 -07:00
Dominic Pearson ff4b68eedc Linux 5.3 compat: Makefile subdir-m no longer supported
Uses obj-m instead, due to kernel changes.

See LKML: Masahiro Yamada, Tue, 6 Aug 2019 19:03:23 +0900

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Dominic Pearson <dsp@technoanimal.net>
Closes #9169
2019-08-19 15:22:52 -07:00
colmbuckley f6fbe25664 Set "none" scheduler if available (initramfs)
Existing zfs initramfs script logic will attempt to set the 'noop' 
scheduler if it's available on the vdev block devices. Newer kernels 
have the similar 'none' scheduler on multiqueue devices; this change 
alters the initramfs script logic to also attempt to set this scheduler 
if it's available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #9042
2019-08-19 15:11:46 -07:00
Paul Dagnelie 1a26cb6160 Add more refquota tests
It used to be possible for zfs receive (and other operations related 
to clone swap) to bypass refquotas. This can cause a number of issues, 
and there should be an automated test for it.

Added tests for rollback and receive not overriding refquota.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9139
2019-08-19 15:06:53 -07:00
Paul Dagnelie f09fda5071 Cap metaslab memory usage
On systems with large amounts of storage and high fragmentation, a huge 
amount of space can be used by storing metaslab range trees. Since 
metaslabs are only unloaded during a txg sync, and only if they have 
been inactive for 8 txgs, it is possible to get into a state where all 
of the system's memory is consumed by range trees and metaslabs, and 
txgs cannot sync. While ZFS knows how to evict ARC data when needed, 
it has no such mechanism for range tree data. This can result in boot 
hangs for some system configurations.

First, we add the ability to unload metaslabs outside of syncing 
context. Second, we store a multilist of all loaded metaslabs, sorted 
by their selection txg, so we can quickly identify the oldest 
metaslabs.  We use a multilist to reduce lock contention during heavy 
write workloads. Finally, we add logic that will unload a metaslab 
when we're loading a new metaslab, if we're using more than a certain 
fraction of the available memory on range trees.

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9128
2019-08-16 09:08:21 -06:00
Michael Niewöhner 9323aad14d initramfs: fixes for (debian) initramfs
* contrib/initramfs: include /etc/default/zfs and /etc/zfs/zfs-functions
At least debian needs /etc/default/zfs and /etc/zfs/zfs-functions for
its initramfs. Include both in build when initramfs is configured.

* contrib/initramfs: include 60-zvol.rules and zvol_id
Include 60-zvol.rules and zvol_id and set udev as predependency instead
of debians zdev. This makes debians additional zdev hook unneeded.

* Correct initconfdir substitution for some distros
Not every Linux distro is using @sysconfdir@/default but @initconfdir@
which is already determined by configure. Let's use it.

* systemd: prevent possible conflict between systemd and sysvinit
Systemd will not load a sysvinit service if a unit exists with the same
name. This prevents conflicts between sysvinit and systemd.
In ZFS there is one sysvinit service that does not have a systemd
service but a target counterpart, zfs-import.target.
Usually it does not make any sense to install both but it is possisble.
Let's prevent any conflict by masking zfs-import.service by default.
This does not harm even if init.d/zfs-import does not exist.

Reviewed-by: Chris Wedgwood <cw@f00f.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-by: Alex Ingram <reimu@reimuhakurei.net>
Tested-by: Dreamcat4 <dreamcat4@gmail.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #7904 
Closes #9089
2019-08-16 09:02:32 -06:00
Serapheim Dimitropoulos 0f8ff49eb6 dmu_tx_wait() hang likely due to cv_signal() in dsl_pool_dirty_delta()
Even though the bug's writeup (Github issue #9136) is very detailed,
we still don't know exactly how we got to that state, thus I wasn't
able to reproduce the bug. That said, we can make an educated guess
combining the information on filled issue with the code.

From the fact that `dp_dirty_total` was 0 (which is less than
`zfs_dirty_data_max`) we know that there was one thread that set it to
0 and then signaled one of the waiters of `dp_spaceavail_cv` [see
`dsl_pool_dirty_delta()` which is also the only place that
`dp_dirty_total` is changed].  Thus, the only logical explaination
then for the bug being hit is that the waiter that just got awaken
didn't go through `dsl_pool_dirty_data()`. Given that this function
is only called by `dsl_pool_dirty_space()` or `dsl_pool_undirty_space()`
I can only think of two possible ways of the above scenario happening:

[1] The waiter didn't call into any of the two functions - which I
    find highly unlikely (i.e. why wait on `dp_spaceavail_cv` to begin
    with?).
[2] The waiter did call in one of the above function but it passed 0 as
    the space/delta to be dirtied (or undirtied) and then the callee
    returned immediately (e.g both `dsl_pool_dirty_space()` and
    `dsl_pool_undirty_space()` return immediately when space is 0).

In any case and no matter how we got there, the easy fix would be to
just broadcast to all waiters whenever `dp_dirty_total` hits 0. That
said and given that we've never hit this before, it would make sense
to think more on why the above situation occured.

Attempting to mimic what Prakash was doing in the issue filed, I
created a dataset with `sync=always` and started doing contiguous
writes in a file within that dataset. I observed with DTrace that even
though we update the pool's dirty data accounting when we would dirty
stuff, the accounting wouldn't be decremented incrementally as we were
done with the ZIOs of those writes (the reason being that
`dbuf_write_physdone()` isn't be called as we go through the override
code paths, and thus `dsl_pool_undirty_space()` is never called). As a
result we'd have to wait until we get to `dsl_pool_sync()` where we
zero out all dirty data accounting for the pool and the current TXG's
metadata.

In addition, as Matt noted and I later verified, the same issue would
arise when using dedup.

In both cases (sync & dedup) we shouldn't have to wait until
`dsl_pool_sync()` zeros out the accounting data. According to the
comment in that part of the code, the reasons why we do the zeroing,
have nothing to do with what we observe:
````
/*
 * We have written all of the accounted dirty data, so our
 * dp_space_towrite should now be zero.  However, some seldom-used
 * code paths do not adhere to this (e.g. dbuf_undirty(), also
 * rounding error in dbuf_write_physdone).
 * Shore up the accounting of any dirtied space now.
 */
dsl_pool_undirty_space(dp, dp->dp_dirty_pertxg[txg & TXG_MASK], txg);
````

Ideally what we want to do is to undirty in the accounting exactly what
we dirty (I use the word ideally as we can still have rounding errors).
This would make the behavior of the system more clear and predictable.

Another interesting issue that I observed with DTrace was that we
wouldn't update any of the pool's dirty data accounting whenever we
would dirty and/or undirty MOS data. In addition, every time we would
change the size of a dbuf through `dbuf_new_size()` we wouldn't update
the accounted space dirtied in the appropriate dirty record, so when
ZIOs are done we would undirty less that we dirtied from the pool's
accounting point of view.

For the first two issues observed (sync & dedup) this patch ensures
that we still update the pool's accounting when we undirty data,
regardless of the write being physical or not.

For changes in the MOS, we first ensure to zero out the pool's dirty
data accounting in `dsl_pool_sync()` after we synced the MOS. Then we
can go ahead and enable the update of the pool's dirty data accounting
wheneve we change MOS data.

Another fix is that we now update the accounting explicitly for
counting errors in `dbuf_write_done()`.

Finally, `dbuf_new_size()` updates the accounted space of the
appropriate dirty record correctly now.

The problem is that we still don't know how the bug came up in the
issue filled. That said the issues fixed seem to be very relevant, so
instead of going with the broadcasting solution right away,
I decided to leave this patch as is.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
External-issue: DLPX-47285
Closes #9137
2019-08-15 17:53:53 -06:00
Tony Nguyen c8bbf7c00b Improve write performance by using dmu_read_by_dnode()
In zfs_log_write(), we can use dmu_read_by_dnode() rather than
dmu_read() thus avoiding unnecessary dnode_hold() calls.

We get a 2-5% performance gain for large sequential_writes tests, >=128K
writes to files with recordsize=8K.

Testing done on Ubuntu 18.04 with 4.15 kernel, 8vCPUs and SSD storage on
VMware ESX.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #9156
2019-08-15 17:36:24 -06:00
Serapheim Dimitropoulos 0e37a0f4f3 Assert that a dnode's bonuslen never exceeds its recorded size
This patch introduces an assertion that can catch pitfalls in
development where there is a mismatch between the size of
reads and writes between a *_phys structure and its respective
in-core structure when bonus buffers are used.

This debugging-aid should be complementary to the verification
done by ztest in ztest_verify_dnode_bt().

A side to this patch is that we now clear out any extra bytes
past a bonus buffer's new size when the buffer is shrinking.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #8348
2019-08-15 08:44:57 -06:00
Paul Zuchowski e2b31b58e8 Make txg_wait_synced conditional in zfsvfs_teardown
The call to txg_wait_synced in zfsvfs_teardown should
be made conditional on the objset having dirty data.
This can prevent unnecessary txg_wait_synced during
some unmount operations.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9115
2019-08-15 08:27:13 -06:00
Paul Dagnelie dc04a8c757 Prevent race in blkptr_verify against device removal
When we check the vdev of the blkptr in zfs_blkptr_verify, we can run 
into a race condition where that vdev is temporarily unavailable. This 
happens when a device removal operation and the old vdev_t has been 
removed from the array, but the new indirect vdev has not yet been 
inserted.

We hold the spa_config_lock while doing our sensitive verification. 
To ensure that we don't deadlock, we only grab the lock if we don't 
have config_writer held. In addition, I had to const the tags of the 
refcounts and the spa_config_lock arguments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9112
2019-08-13 21:24:43 -06:00
Chunwei Chen 8e556c5ebc Fix out-of-order ZIL txtype lost on hardlinked files
We should only call zil_remove_async when an object is removed. However,
in current implementation, it is called whenever TX_REMOVE is called. In
the case of hardlinked file, every unlink will generate TX_REMOVE and
causing operations to be dropped even when the object is not removed.

We fix this by only calling zil_remove_async when the file is fully
unlinked.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #8769
Closes #9061
2019-08-13 21:21:27 -06:00
Prakash Surya 475ebd763a Fix device expansion when VM is powered off
When running on an ESXi based VM, I've found that "zpool online -e" will
not expand the zpool, if the disk was expanded in ESXi while the VM was
powered off.

For example, take the following scenario:

 1. VM running on top of VMware ESXi
 2. ZFS pool created with a given device "sda" of size 8GB
 3. VM powered off
 4. Device "sda" size expanded to 16GB
 5. VM powered on
 6. "zpool online -e" used on device "sda"

In this situation, after (2) the zpool will be roughly 8GB in size.
After (6), the expectation is the zpool's size will expand to roughly
16GB in size; i.e. expand to the new size of the "sda" device.
Unfortunately, I've seen that after (6), the zpool size does not change.

What's happening is after (5), the EFI label of the "sda" device will be
such that fields "efi_last_u_lba", "efi_last_lba", and "efi_altern_lba"
all reflect the new size of the disk; i.e. "33554398", "33554431", and
"33554431" respectively.

Thus, the check that we perform in "efi_use_whole_disk":

    if ((efi_label->efi_altern_lba == 1) || (efi_label->efi_altern_lba
        >= efi_label->efi_last_lba)) {

This will return true, and then we return from the function without
having expanded the size of the zpool/device.

In contrast, if we remove steps (3) and (5) in the sequence above, i.e.
the device is expanded while the VM is powered on, things change. In
that case, the fields "efi_last_u_lba" and "efi_altern_lba" do not
change (i.e. they still reflect the old 8GB device size), but the
"efi_last_lba" field does change (i.e. it now reflects the new 16GB
device size). Thus, when we evaluate the same conditional in
"efi_use_whole_disk", it'll return false, so the zpool is expanded.

Taking all of this into account, this PR updates "efi_use_whole_disk" to
properly expand the zpool when the underlying disk is expanded while the
VM is powered off.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #9111
2019-08-13 21:18:53 -06:00
Allan Jude d2a32912b9 Mark dsl_livelist_should_disable() static
This function is not used outside of dsl_dataset.c

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #9154
2019-08-13 21:16:23 -06:00
George Wilson c8242a96ba spa_load_verify() may consume too much memory
When a pool is imported it will scan the pool to verify the integrity 
of the data and metadata. The amount it scans will depend on the 
import flags provided. On systems with small amounts of memory or 
when importing a pool from the crash kernel, it's possible for 
spa_load_verify to issue too many I/Os that it consumes all the memory 
of the system resulting in an OOM message or a hang.

To prevent this, we limit the amount of memory that the initial pool
scan can consume. This change will, by default, use 1/16th of the ARC
for scan I/Os to prevent running the system out of memory during import.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: George Wilson george.wilson@delphix.com
External-issue: DLPX-65237
External-issue: DLPX-65238
Closes #9146
2019-08-13 08:11:57 -06:00
Tomohiro Kusumi a43570c5f3 Change boolean-like uint8_t fields in znode_t to boolean_t
Given znode_t is an in-core structure, it's more readable to have
them as boolean. Also co-locate existing boolean fields with them
for space efficiency (expecting 8 booleans to be packed/aligned).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #9092
2019-08-13 07:58:02 -06:00
Richard Yao fccbd1d6e2 Drop KMC_NOEMERGENCY
This is not implemented. If it were implemented, using it would risk
deadlocks on pre-3.18 kernels. Lets just drop it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #9119
2019-08-13 07:46:12 -06:00
Serapheim Dimitropoulos 3b9edd7b17 Introduce getting holds and listing bookmarks through ZCP
Consumers of ZFS Channel Programs can now list bookmarks,
and get holds from datasets. A minor-refactoring was also
applied to distinguish between user and system properties
in ZCP.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Ported-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Dan Kimmel <dan.kimmel@delphix.com>

OpenZFS-issue: https://illumos.org/issues/8862
Closes #7902
2019-08-12 10:02:34 -07:00
Serapheim Dimitropoulos 2081db7982 Sort log spacemap tunables in alphabetical order
Beside the whole commit being a nit in reality it should
bring the diffs of the spa_log_spacemap.c source file
between ZoL and delphix/zfs to 0.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9143
2019-08-12 09:49:07 -07:00
Paul Dagnelie c81f1790e2 Metaslab max_size should be persisted while unloaded
When we unload metaslabs today in ZFS, the cached max_size value is
discarded. We instead use the histogram to determine whether or not we
think we can satisfy an allocation from the metaslab. This can result in
situations where, if we're doing I/Os of a size not aligned to a
histogram bucket, a metaslab is loaded even though it cannot satisfy the
allocation we think it can. For example, a metaslab with 16 entries in
the 16k-32k bucket may have entirely 16kB entries. If we try to allocate
a 24kB buffer, we will load that metaslab because we think it should be
able to handle the allocation. Doing so is expensive in CPU time, disk
reads, and average IO latency. This is exacerbated if the write being
attempted is a sync write.

This change makes ZFS cache the max_size after the metaslab is
unloaded. If we ever get a free (or a coalesced group of frees) larger
than the max_size, we will update it. Otherwise, we leave it as is. When
attempting to allocate, we use the max_size as a lower bound, and
respect it unless we are in try_hard. However, we do age the max_size
out at some point, since we expect the actual max_size to increase as we
do more frees. A more sophisticated algorithm here might be helpful, but
this works reasonably well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9055
2019-08-05 14:34:27 -07:00
DeHackEd 99e755d653 Don't wakeup unnecessarily in 'zpool events -f'
ZED can prevent CPU's from properly sleeping.

Rather than periodically waking up in the zevents code, just go to sleep and wait for a wakeup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #9091
2019-08-05 11:35:47 -07:00
Serapheim Dimitropoulos 8098465558 Test cancelling a removal in ZTS
This patch adds a new test that sanity checks cancelling a removal.

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9101
2019-08-05 10:50:20 -07:00
jdike 48be0dfba1 lockdep false positive - move txg_kick() outside of ->dp_lock
This fixes a lockdep warning by breaking a link between ->tx_sync_lock
and ->dp_lock.

The deadlock envisioned by lockdep is this:
    thread 1 holds db->db_mtx and tries to get dp->dp_lock:
	dsl_pool_dirty_space+0x70/0x2d0 [zfs]
	dbuf_dirty+0x778/0x31d0 [zfs]

    thread 2 holds bpo->bpo_lock and tries to get db->db_mtx:
        dmu_buf_will_dirty_impl
        dmu_buf_will_dirty+0x6b/0x6c0 [zfs]
        bpobj_iterate_impl+0xbe6/0x1410 [zfs]

    thread 3 holds tx->tx_sync_lock and tries to get bpo->bpo_lock:
        bpobj_space+0x63/0x470 [zfs]
        dsl_scan_active+0x340/0x3d0 [zfs]
        txg_sync_thread+0x3f2/0x1370 [zfs]

    thread 4 holds dp->dp_lock and tries to get tx->tx_sync_lock
       txg_kick+0x61/0x420 [zfs]
       dsl_pool_need_dirty_delay+0x1c7/0x3f0 [zfs]

This patch is orginally from Brian Behlendorf and slightly simplified
by me.

It breaks this cycle in thread 4 by moving the call from
dsl_pool_need_dirty_delay to txg_kick outside the section controlled
by dp->dp_lock.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Jeff Dike <jdike@akamai.com>
Closes #9094
2019-07-31 14:53:39 -07:00
Serapheim Dimitropoulos cae97c88c4 List log_spacemap feature in zpool-features.5 manual
Update zpool-features.5 manpage to describe the log_spacemap feature.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9096
2019-07-31 09:29:01 -07:00
Clint Armstrong f489458ad7 Add channel program for property based snapshots
Channel programs that many users find useful should be included with zfs
in the /contrib directory. This is the first of these contributions. A
channel program to recursively take snapshots of datasets with the
property com.sun:auto-snapshot=true.

Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Clint Armstrong <clint@clintarmstrong.net>
Closes #8443 
Closes #9050
2019-07-30 16:02:19 -07:00
Serapheim Dimitropoulos 1ba4f3e7b4 9072 handle error of zap_cursor_retrieve() for log spacemap zap
In spa_ld_log_sm_metadata(), it is possible for zap_cursor_retrieve()
to return errors other than the expected ENOENT (e.g. when we are at
the end of the zap). Ensure that these error cases are handled
correctly by the import path.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9074
2019-07-30 13:20:01 -07:00
Serapheim Dimitropoulos 2fcf4481a6 mismerged log spacemap comment for metaslab_verify_weight_and_frag
When the log spacemap commit was merged in ZoL, the
metaslab_verify_unflushed_changes() debugging function
was deleted as the feature was pretty much stable by
then. Unfortunately though there was a reference to
it from a comment in metaslab_verify_weight_and_frag().

This patch deletes the reference and pastes that
comment as is.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9097
2019-07-30 10:13:44 -07:00
Michael Niewöhner a6c8289473 install path fixes
* rpm: correct pkgconfig path

pkconfig files get installed to $datarootdir/pkgconfig but rpm expects
them to be at $datadir. This works when $datarootdir==$datadir which is
the case most of the time but will fail when they differ.

* install: make initramfs-tools path static

Since initramfs-tools' path is nothing we can control as it is an
external package it does not make any sense to install zfs additions
anywhere else. Simply use /usr/share/initramfs-tools as path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9087
2019-07-30 10:06:09 -07:00
Michael Niewöhner 85ce79bbc8 Increase default zcmd allocation to 256K
When creating hundreds of clones (for example using containers with
LXD) cloning slows down as the number of clones increases over time.
The reason for this is that the fetching of the clone information
using a small zcmd buffer requires two ioctl calls, one to determine
the size and a second to return the data. However, this requires
gathering the data twice, once to determine the size and again to
populate the zcmd buffer to return it to userspace.
These are expensive ioctl() calls, so instead, make the default buffer
size much larger: 256K.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9084
2019-07-30 09:59:38 -07:00
Matthew Ahrens 0eb8ba6ab6 Improve performance by using dmu_tx_hold_*_by_dnode()
In zfs_write() and dmu_tx_hold_sa(), we can use dmu_tx_hold_*_by_dnode()
instead of dmu_tx_hold_*(), since we already have a dbuf from the target
dnode in hand.  This eliminates some calls to dnode_hold(), which can be
expensive.  This is especially impactful if several threads are
accessing objects that are in the same block of dnodes, because they
will contend for that dbuf's lock.

We are seeing 10-20% performance wins for the sequential_writes tests in
the performance test suite, when doing >=128K writes to files with
recordsize=8K.

This also removes some unnecessary casts that are in the area.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #9081
2019-07-30 09:18:30 -07:00
Brian Behlendorf 1e620c9872 Revert "Develop tests for issues #5866 and #8858"
This reverts commit 693c1fc478.  This
change resulted in a kmem leak being observed in existing code which
needs to be identified and addressed.

Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #8978
Closes #9090
2019-07-29 12:46:56 -07:00
Brian Behlendorf adf495e239 Fix channel programs on s390x
When adapting the original sources for s390x the JMP_BUF_CNT was
mistakenly halved due to an incorrect assumption of the size of
a unsigned long.  They are 8 bytes for the s390x architecture.
Increase JMP_BUF_CNT accordingly.

Authored-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Colin Ian King <canonical.com>
Tested-by: Colin Ian King <canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8992
Closes #9080
2019-07-28 18:15:26 -07:00
George Wilson 453bb4791e Race between zfs-share and zfs-mount services
When a system boots the zfs-mount.service and the
zfs-share.service can start simultaneously. What may be
unclear is that sharing a filesystem will first mount
the filesystem if it's not already mounted. This means
that both service can race to mount the same fileystem.
This race can result in a SEGFAULT or EBUSY conditions.

This change explicitly defines the start ordering between the
two services such that the zfs-mount.service is solely
responsible for mounting filesystems eliminating the race
between "zfs mount -a" and "zfs share -a" commands.

Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <george.wilson@delphix.com>
Closes #9083
2019-07-28 18:13:56 -07:00
Paul Zuchowski 693c1fc478 Develop tests for issues #5866 and #8858
Provide zfstest coverage for these two issues which
were a panic accessing extended attributes and
a problem comparing 64 bit and 32 bit generation
numbers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Issue #5866
Issue #8858 
Closes #8978
2019-07-26 17:52:13 -07:00
Tomohiro Kusumi 9fb6abe5ad Implement secpolicy_vnode_setid_retain()
Don't unconditionally return 0 (i.e. retain SUID/SGID).
Test CAP_FSETID capability.

https://github.com/pjd/pjdfstest/blob/master/tests/chmod/12.t
which expects SUID/SGID to be dropped on write(2) by non-owner fails
without this. Most filesystems make this decision within VFS by using
a generic file write for fops.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #9035 
Closes #9043
2019-07-26 13:52:30 -07:00
Matthew Ahrens 4b5c9d9f97 zed crashes when devid not present
zed core dumps due to a NULL pointer in zfs_agent_iter_vdev(). The
gs_devid is NULL, but the nvl has a "devid" entry.

zfs_agent_post_event() checks that ZFS_EV_VDEV_GUID or DEV_IDENTIFIER is
present in nvl, but then later it and zfs_agent_iter_vdev() assume that
DEV_IDENTIFIER is present and thus gs_devid is set.

Typically this is not a problem because usually either all vdevs have
devid's, or none of them do. Since zfs_agent_iter_vdev() first checks if
the vdev has devid before dereferencing gs_devid, the problem isn't
typically encountered. However, if some vdevs have devid's and some do
not, then the problem is easily reproduced.  This can happen if the pool
has been moved from a system that has devid's to one that does not.

The fix is for zfs_agent_iter_vdev() to only try to match the devid's if
both nvl and gsp have devid's present.

Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-65090
Closes #9054
Closes #9060
2019-07-26 12:07:48 -07:00
Sara Hartse 37f03da8ba Fast Clone Deletion
Deleting a clone requires finding blocks are clone-only, not shared
with the snapshot. This was done by traversing the entire block tree
which results in a large performance penalty for sparsely
written clones.

This is new method keeps track of clone blocks when they are
modified in a "Livelist" so that, when it’s time to delete,
the clone-specific blocks are already at hand.

We see performance improvements because now deletion work is
proportional to the number of clone-modified blocks, not the size
of the original dataset.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Sara Hartse <sara.hartse@delphix.com>
Closes #8416
2019-07-26 10:54:14 -07:00
Tomohiro Kusumi d274ac5460 Don't directly cast unsigned long to void*
Cast to uintptr_t first for portability on integer to/from pointer
conversion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #9065
2019-07-25 11:59:20 -07:00
Matthew Ahrens 1ff46825e2 Replace zf_rwlock with a mutex
The rwlock implementation on linux does not perform as well as mutexes.
We can realize a performance benefit by replacing the zf_rwlock with a
mutex.  Local microbenchmarks show ~50% improvement, and over NFS we see
~5% improvement on several of the ZFS Performance Tests cases,
especially randwrite and seq_write.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #9062
2019-07-25 11:57:58 -07:00
Tomohiro Kusumi 09276fde1c Fix module_param() type for zfs_read_chunk_size
zfs_read_chunk_size is unsigned long.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #9051
2019-07-19 11:23:56 -07:00
Tony Hutter d84c5a120e Move some tests to cli_user/zpool_status
The tests in tests/functional/cli_root/zpool_status should all require
root. However, linux.run has "user =" specified for those tests, which
means they run as a normal user.  When I removed that line to run them
as root, the following tests did not pass:

zpool_status_003_pos
zpool_status_-c_disable
zpool_status_-c_homedir
zpool_status_-c_searchpath

These tests need to be run as a normal user.  To fix this, move these
tests to a new tests/functional/cli_user/zpool_status directory.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #9057
2019-07-19 11:21:54 -07:00
Serapheim Dimitropoulos 7f31908913 Tricky semantics of ms_max_size in metaslab_should_allocate()
metaslab_should_allocate() is used in two places:
[1] When trying to select a metaslab to allocate from
[2] When trying to allocate from a metaslab

In [2] we always expect the metaslab to be loaded, and after
the refactoring of the log spacemap changes, whenever we load
a metaslab we set ms_max_size to the biggest range in the
ms_allocatable tree. Thus, when it is used in [2], if that
field is 0, it means that the metaslab doesn't have any
segments that can be used for allocations now (though it may
have some free space but that space can be in the freeing,
freed, or deferred trees).

In [1] a metaslab can be loaded or unloaded at which point 0
can either mean the metaslab doesn't have any space or the
metaslab is just not loaded thus we go ahead and try to make
an estimation based on its weight.

The issue here is when we call the above function for [2] and
the metaslab doesn't have any allocatable space, we still go
ahead and check its ms_weight which may be out of date because
we haven't ran metaslab_sync_done() yet. At that point we are
allowing an allocation to be attempted even though we know
there is no range that is allocatable.

This patch fixes this issue by explicitly checking if the
metaslab is loaded and if it is, the ms_max_size is used.

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9045
2019-07-19 11:19:50 -07:00
Serapheim Dimitropoulos 43a8536260 Race condition between spa async threads and export
In the past we've seen multiple race conditions that have
to do with open-context threads async threads and concurrent
calls to spa_export()/spa_destroy() (including the one
referenced in issue #9015).

This patch ensures that only one thread can execute the
main body of spa_export_common() at a time, with subsequent
threads returning with a new error code created just for
this situation, eliminating this way any race condition
bugs introduced by concurrent calls to this function.

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9015 
Closes #9044
2019-07-18 13:02:33 -07:00
Serapheim Dimitropoulos 1c44a5c97f hdr_recl calls zthr_wakeup() on destroyed zthr
There exists a race condition were hdr_recl() calls
zthr_wakeup() on a destroyed zthr. The timeline is the
following:

[1] hdr_recl() runs first and goes intro zthr_wakeup()
    because arc_initialized is set.
[2] arc_fini() is called by another thread, zeroes
    that flag, destroying the zthr, and goes into
    buf_init().
[3] hdr_recl() tries to enter the destroyed mutex
    and we blow up.

This patch ensures that the ARC's zthrs are not offloaded
any new work once arc_initialized is set and then destroys
them after all of the ARC state has been deleted.

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9047
2019-07-18 12:55:29 -07:00
Serapheim Dimitropoulos bac15c1198 zdb: don't print log spacemap stats in pools without the feature
Creating a pool with not features enabled and running
`zdb -mmmmmm on` it before the patch:

```
Log Space Maps in Pool:

Log Space Map Obsolete Entry Statistics:
0        valid entries out of 0        - txg 0
0        valid entries out of 0        - total
```

After this patch the above output goes away.

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9048
2019-07-18 12:54:03 -07:00
Tomohiro Kusumi f79121d114 Fix wrong comment on zcr_blksz_{min,max}
These aren't tunable; illumos has this comment fixed in
"3742 zfs comments need cleaner, more consistent style",
so sync with that.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #9052
2019-07-18 12:48:46 -07:00
Pavel Zakharov 26b6047469 New service that waits on zvol links to be created
The zfs-volume-wait.service scans existing zvols and waits for their
links under /dev to be created. Any service that depends on zvol
links to be there should add a dependency on zfs-volumes.target.
By default, this target is not enabled.

Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Zakharov <pzakharov@delphix.com>
Closes #8975
2019-07-17 15:33:05 -07:00
Brian Behlendorf d64dd3b62a Retire unused spl_{mutex,rwlock}_{init_fini}
These functions are unused and can be removed along
with the spl-mutex.c and spl-rwlock.c source files.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9029
2019-07-17 15:13:53 -07:00
Brian Behlendorf e7a99dab2b Linux 5.3 compat: retire rw_tryupgrade()
The Linux kernel's rwsem's have never provided an interface to
allow a reader to be upgraded to a writer.  Historically, this
functionality has been implemented by a SPL wrapper function.
However, this approach depends on internal knowledge of the
rw_semaphore and is therefore rather brittle.

Since the ZFS code must always be able to fallback to rw_exit()
and rw_enter() when an rw_tryupgrade() fails; this functionality
isn't critical.  Furthermore, the only potentially performance
sensitive consumer is dmu_zfetch() and no decrease in performance
was observed with this change applied.  See the PR comments for
additional testing details.

Therefore, it is being retired to make the build more robust and
to simplify the rwlock implementation.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9029
2019-07-17 15:08:56 -07:00
Brian Behlendorf 041205afee Linux 5.3 compat: rw_semaphore owner
Commit https://github.com/torvalds/linux/commit/94a9717b updated the
rwsem's owner field to contain additional flags describing the rwsem's
state.  Rather then update the wrappers to mask out these bits, the
code no longer relies on the owner stored by the kernel.  This does
increase the size of a krwlock_t but it makes the implementation
less sensitive to future kernel changes.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9029
2019-07-17 15:07:46 -07:00
jdike a649768a17 Fix lockdep recursive locking false positive in dbuf_destroy
lockdep reports a possible recursive lock in dbuf_destroy.

It is true that dbuf_destroy is acquiring the dn_dbufs_mtx
on one dnode while holding it on another dnode.  However,
it is impossible for these to be the same dnode because,
among other things,dbuf_destroy checks MUTEX_HELD before
acquiring the mutex.

This fix defines a class NESTED_SINGLE == 1 and changes
that lock to call mutex_enter_nested with a subclass of
NESTED_SINGLE.

In order to make the userspace code compile,
include/sys/zfs_context.h now defines mutex_enter_nested and
NESTED_SINGLE.

This is the lockdep report:

[  122.950921] ============================================
[  122.950921] WARNING: possible recursive locking detected
[  122.950921] 4.19.29-4.19.0-debug-d69edad5368c1166 #1 Tainted: G           O
[  122.950921] --------------------------------------------
[  122.950921] dbu_evict/1457 is trying to acquire lock:
[  122.950921] 0000000083e9cbcf (&dn->dn_dbufs_mtx){+.+.}, at: dbuf_destroy+0x3c0/0xdb0 [zfs]
[  122.950921]
               but task is already holding lock:
[  122.950921] 0000000055523987 (&dn->dn_dbufs_mtx){+.+.}, at: dnode_evict_dbufs+0x90/0x740 [zfs]
[  122.950921]
               other info that might help us debug this:
[  122.950921]  Possible unsafe locking scenario:

[  122.950921]        CPU0
[  122.950921]        ----
[  122.950921]   lock(&dn->dn_dbufs_mtx);
[  122.950921]   lock(&dn->dn_dbufs_mtx);
[  122.950921]
                *** DEADLOCK ***

[  122.950921]  May be due to missing lock nesting notation

[  122.950921] 1 lock held by dbu_evict/1457:
[  122.950921]  #0: 0000000055523987 (&dn->dn_dbufs_mtx){+.+.}, at: dnode_evict_dbufs+0x90/0x740 [zfs]
[  122.950921]
               stack backtrace:
[  122.950921] CPU: 0 PID: 1457 Comm: dbu_evict Tainted: G           O      4.19.29-4.19.0-debug-d69edad5368c1166 #1
[  122.950921] Hardware name: Supermicro H8SSL-I2/H8SSL-I2, BIOS 080011  03/13/2009
[  122.950921] Call Trace:
[  122.950921]  dump_stack+0x91/0xeb
[  122.950921]  __lock_acquire+0x2ca7/0x4f10
[  122.950921]  lock_acquire+0x153/0x330
[  122.950921]  dbuf_destroy+0x3c0/0xdb0 [zfs]
[  122.950921]  dbuf_evict_one+0x1cc/0x3d0 [zfs]
[  122.950921]  dbuf_rele_and_unlock+0xb84/0xd60 [zfs]
[  122.950921]  dnode_evict_dbufs+0x3a6/0x740 [zfs]
[  122.950921]  dmu_objset_evict+0x7a/0x500 [zfs]
[  122.950921]  dsl_dataset_evict_async+0x70/0x480 [zfs]
[  122.950921]  taskq_thread+0x979/0x1480 [spl]
[  122.950921]  kthread+0x2e7/0x3e0
[  122.950921]  ret_from_fork+0x27/0x50

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jeff Dike <jdike@akamai.com>
Closes #8984
2019-07-17 09:18:24 -07:00
Brian Behlendorf 095b5412b3 Fix CONFIG_X86_DEBUG_FPU build failure
When CONFIG_X86_DEBUG_FPU is defined the alternatives_patched symbol
is pulled in as a dependency which results in a build failure.  To
prevent this undefine CONFIG_X86_DEBUG_FPU to disable the WARN_ON_FPU()
macro and rely on WARN_ON_ONCE debugging checks which were previously
added.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9041
Closes #9049
2019-07-17 09:14:36 -07:00
Michael Niewöhner 5784c7c36e Add missing __GFP_HIGHMEM flag to vmalloc
Make use of __GFP_HIGHMEM flag in vmem_alloc, which is required for
some 32-bit systems to make use of full available memory.
While kernel versions >=4.12-rc1 add this flag implicitly, older
kernels do not.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9031
2019-07-17 09:09:22 -07:00
Tomohiro Kusumi c8802ba08d Use zfsctl_snapshot_hold() wrapper
zfs_refcount_*() are to be wrapped by zfsctl_snapshot_*() in this file.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #9039
2019-07-17 09:07:53 -07:00
Brian Behlendorf 8062b7686a Minor style cleanup
Resolve an assortment of style inconsistencies including
use of white space, typos, capitalization, and line wrapping.
There is no functional change.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9030
2019-07-16 17:22:31 -07:00
Brian Behlendorf 3b03ff2276 Fix get_special_prop() build failure
The cast of the size_t returned by strlcpy() to a uint64_t by the
VERIFY3U can result in a build failure when CONFIG_FORTIFY_SOURCE
is set.  This is due to the additional hardening.  Since the token
is expected to always fit in strval the VERIFY3U has been removed.
If somehow it doesn't, it will still be safely truncated.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #8999
Closes #9020
2019-07-16 14:14:12 -07:00
Mike Gerdts d45d7f08fa Add zfs create dryrun
Adds the ability to sanity check zfs create arguments and to see the
value of any additional properties that will local to the dataset.  For
example, automation that may need to adjust quota on a parent filesystem
before creating a volume may call `zfs create -nP -V <size> <volume>` to
obtain the value of refreservation.  This adds the following options to
zfs create:

- -n dry-run (no-op)
- -v verbose
- -P parseable (implies verbose)

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Jerry Jelinek <jerry.jelinek@joyent.com>
Signed-off-by: Mike Gerdts <mike.gerdts@joyent.com>
Closes #8974
2019-07-16 11:19:24 -07:00
Serapheim Dimitropoulos 93e28d661e Log Spacemap Project
= Motivation

At Delphix we've seen a lot of customer systems where fragmentation
is over 75% and random writes take a performance hit because a lot
of time is spend on I/Os that update on-disk space accounting metadata.
Specifically, we seen cases where 20% to 40% of sync time is spend
after sync pass 1 and ~30% of the I/Os on the system is spent updating
spacemaps.

The problem is that these pools have existed long enough that we've
touched almost every metaslab at least once, and random writes
scatter frees across all metaslabs every TXG, thus appending to
their spacemaps and resulting in many I/Os. To give an example,
assuming that every VDEV has 200 metaslabs and our writes fit within
a single spacemap block (generally 4K) we have 200 I/Os. Then if we
assume 2 levels of indirection, we need 400 additional I/Os and
since we are talking about metadata for which we keep 2 extra copies
for redundancy we need to triple that number, leading to a total of
1800 I/Os per VDEV every TXG.

We could try and decrease the number of metaslabs so we have less
I/Os per TXG but then each metaslab would cover a wider range on
disk and thus would take more time to be loaded in memory from disk.
In addition, after it's loaded, it's range tree would consume more
memory.

Another idea would be to just increase the spacemap block size
which would allow us to fit more entries within an I/O block
resulting in fewer I/Os per metaslab and a speedup in loading time.
The problem is still that we don't deal with the number of I/Os
going up as the number of metaslabs is increasing and the fact
is that we generally write a lot to a few metaslabs and a little
to the rest of them. Thus, just increasing the block size would
actually waste bandwidth because we won't be utilizing our bigger
block size.

= About this patch

This patch introduces the Log Spacemap project which provides the
solution to the above problem while taking into account all the
aforementioned tradeoffs. The details on how it achieves that can
be found in the references sections below and in the code (see
Big Theory Statement in spa_log_spacemap.c).

Even though the change is fairly constraint within the metaslab
and lower-level SPA codepaths, there is a side-change that is
user-facing. The change is that VDEV IDs from VDEV holes will no
longer be reused. To give some background and reasoning for this,
when a log device is removed and its VDEV structure was replaced
with a hole (or was compacted; if at the end of the vdev array),
its vdev_id could be reused by devices added after that. Now
with the pool-wide space maps recording the vdev ID, this behavior
can cause problems (e.g. is this entry referring to a segment in
the new vdev or the removed log?). Thus, to simplify things the
ID reuse behavior is gone and now vdev IDs for top-level vdevs
are truly unique within a pool.

= Testing

The illumos implementation of this feature has been used internally
for a year and has been in production for ~6 months. For this patch
specifically there don't seem to be any regressions introduced to
ZTS and I have been running zloop for a week without any related
problems.

= Performance Analysis (Linux Specific)

All performance results and analysis for illumos can be found in
the links of the references. Redoing the same experiments in Linux
gave similar results. Below are the specifics of the Linux run.

After the pool reached stable state the percentage of the time
spent in pass 1 per TXG was 64% on average for the stock bits
while the log spacemap bits stayed at 95% during the experiment
(graph: sdimitro.github.io/img/linux-lsm/PercOfSyncInPassOne.png).

Sync times per TXG were 37.6 seconds on average for the stock
bits and 22.7 seconds for the log spacemap bits (related graph:
sdimitro.github.io/img/linux-lsm/SyncTimePerTXG.png). As a result
the log spacemap bits were able to push more TXGs, which is also
the reason why all graphs quantified per TXG have more entries for
the log spacemap bits.

Another interesting aspect in terms of txg syncs is that the stock
bits had 22% of their TXGs reach sync pass 7, 55% reach sync pass 8,
and 20% reach 9. The log space map bits reached sync pass 4 in 79%
of their TXGs, sync pass 7 in 19%, and sync pass 8 at 1%. This
emphasizes the fact that not only we spend less time on metadata
but we also iterate less times to convergence in spa_sync() dirtying
objects.
[related graphs:
stock- sdimitro.github.io/img/linux-lsm/NumberOfPassesPerTXGStock.png
lsm- sdimitro.github.io/img/linux-lsm/NumberOfPassesPerTXGLSM.png]

Finally, the improvement in IOPs that the userland gains from the
change is approximately 40%. There is a consistent win in IOPS as
you can see from the graphs below but the absolute amount of
improvement that the log spacemap gives varies within each minute
interval.
sdimitro.github.io/img/linux-lsm/StockVsLog3Days.png
sdimitro.github.io/img/linux-lsm/StockVsLog10Hours.png

= Porting to Other Platforms

For people that want to port this commit to other platforms below
is a list of ZoL commits that this patch depends on:

Make zdb results for checkpoint tests consistent
db587941c5

Update vdev_is_spacemap_addressable() for new spacemap encoding
419ba59145

Simplify spa_sync by breaking it up to smaller functions
8dc2197b7b

Factor metaslab_load_wait() in metaslab_load()
b194fab0fb

Rename range_tree_verify to range_tree_verify_not_present
df72b8bebe

Change target size of metaslabs from 256GB to 16GB
c853f382db

zdb -L should skip leak detection altogether
21e7cf5da8

vs_alloc can underflow in L2ARC vdevs
7558997d2f

Simplify log vdev removal code
6c926f426a

Get rid of space_map_update() for ms_synced_length
425d3237ee

Introduce auxiliary metaslab histograms
928e8ad47d

Error path in metaslab_load_impl() forgets to drop ms_sync_lock
8eef997679

= References

Background, Motivation, and Internals of the Feature
- OpenZFS 2017 Presentation:
youtu.be/jj2IxRkl5bQ
- Slides:
slideshare.net/SerapheimNikolaosDim/zfs-log-spacemaps-project

Flushing Algorithm Internals & Performance Results
(Illumos Specific)
- Blogpost:
sdimitro.github.io/post/zfs-lsm-flushing/
- OpenZFS 2018 Presentation:
youtu.be/x6D2dHRjkxw
- Slides:
slideshare.net/SerapheimNikolaosDim/zfs-log-spacemap-flushing-algorithm

Upstream Delphix Issues:
DLPX-51539, DLPX-59659, DLPX-57783, DLPX-61438, DLPX-41227, DLPX-59320
DLPX-63385

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #8442
2019-07-16 10:11:49 -07:00
Antonio Russo df834a7ccc Enable zfs-mount-generator by default
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #8750
Closes #8848
2019-07-15 16:31:57 -07:00
Antonio Russo f88d069cbb systemd encryption key support
Modify zfs-mount-generator to produce a dependency on new
zfs-import-key-*.service units, dynamically created at boot to call
zfs load-key for the encryption root, before attempting to mount any
encrypted datasets.

These units are created by zfs-mount-generator, and RequiresMountsFor on
the keyfile, if present, or call systemd-ask-password if a passphrase is
requested.

This patch includes suggestions from @Fabian-Gruenbichler, @ryanjaeb and
@rlaager, as well an adaptation of @rlaager's script to retry on
incorrect password entry.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #8750
Closes #8848
2019-07-15 16:31:47 -07:00
Tomohiro Kusumi 6993e01202 Drop redundant POSIX ACL check in zpl_init_acl()
ZFS_ACLTYPE_POSIXACL has already been tested in zpl_init_acl(),
so no need to test again on POSIX ACL access.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #9009
2019-07-15 16:26:52 -07:00
Brian Behlendorf 64f3d39ae4 Export dnode symbols
External consumers such as Lustre require access to the dnode
interfaces in order to correctly manipulate dnodes.

Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #8994
Closes #9027
2019-07-15 16:11:55 -07:00
Tom Caputi 9949b856a0 Ensure dsl_destroy_head() decrypts objsets
This patch corrects a small issue where the dsl_destroy_head()
code that runs when the async_destroy feature is disabled would
not properly decrypt the dataset before beginning processing.
If the dataset is not able to be decrypted, the optimization
code now simply does not run and the dataset is completely
destroyed in the DSL sync task.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9021
2019-07-15 16:08:42 -07:00
Tomohiro Kusumi ff9630d1a8 Disable unused pathname::pn_path* (unneeded in Linux)
struct pathname is originally from Solaris VFS, and it has been used
in ZoL to merely call VOP from Linux VFS interface without API change,
therefore pathname::pn_path* are unused and unneeded. Technically,
struct pathname is a wrapper for C string in ZoL.

Saves stack a bit on lookup and unlink.

(#if0'd members instead of removing since comments refer to them.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #9025
2019-07-15 13:57:56 -07:00
Brian Behlendorf e5db313494 Linux 5.0 compat: SIMD compatibility
Restore the SIMD optimization for 4.19.38 LTS, 4.14.120 LTS,
and 5.0 and newer kernels.  This is accomplished by leveraging
the fact that by definition dedicated kernel threads never need
to concern themselves with saving and restoring the user FPU state.
Therefore, they may use the FPU as long as we can guarantee user
tasks always restore their FPU state before context switching back
to user space.

For the 5.0 and 5.1 kernels disabling preemption and local
interrupts is sufficient to allow the FPU to be used.  All non-kernel
threads will restore the preserved user FPU state.

For 5.2 and latter kernels the user FPU state restoration will be
skipped if the kernel determines the registers have not changed.
Therefore, for these kernels we need to perform the additional
step of saving and restoring the FPU registers.  Invalidating the
per-cpu global tracking the FPU state would force a restore but
that functionality is private to the core x86 FPU implementation
and unavailable.

In practice, restricting SIMD to kernel threads is not a major
restriction for ZFS.  The vast majority of SIMD operations are
already performed by the IO pipeline.  The remaining cases are
relatively infrequent and can be handled by the generic code
without significant impact.  The two most noteworthy cases are:

  1) Decrypting the wrapping key for an encrypted dataset,
     i.e. `zfs load-key`.  All other encryption and decryption
     operations will use the SIMD optimized implementations.

  2) Generating the payload checksums for a `zfs send` stream.

In order to avoid making any changes to the higher layers of ZFS
all of the `*_get_ops()` functions were updated to take in to
consideration the calling context.  This allows for the fastest
implementation to be used as appropriate (see kfpu_allowed()).

The only other notable instance of SIMD operations being used
outside a kernel thread was at module load time.  This code
was moved in to a taskq in order to accommodate the new kernel
thread restriction.

Finally, a few other modifications were made in order to further
harden this code and facilitate testing.  They include updating
each implementations operations structure to be declared as a
constant.  And allowing "cycle" to be set when selecting the
preferred ops in the kernel as well as user space.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8754 
Closes #8793 
Closes #8965
2019-07-12 09:31:20 -07:00
Nick Mattis d230a65c3b Fixes: #8934 Large kmem_alloc
Large allocation over the spl_kmem_alloc_warn value was being performed.
Switched to vmem_alloc interface as specified for large allocations.
Changed the subsequent frees to match.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: nmattis <nickm970@gmail.com>
Closes #8934
Closes #9011
2019-07-10 15:54:49 -07:00
Attila Fülöp c3fba9091b Fix ZTS killed processes detection
log_neg_expect was using the wrong exit status to detect if a process
got killed by SIGSEGV or SIGBUS, resulting in false positives.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #9003
2019-07-10 11:44:52 -07:00
Shaun Tancheff 02fad9260a pkg-utils python sitelib for SLES15
Use python -Esc to set __python_sitelib.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Closes #8969
2019-07-09 13:02:40 -07:00
Tomohiro Kusumi ab5036df1c Fix race in parallel mount's thread dispatching algorithm
Strategy of parallel mount is as follows.

1) Initial thread dispatching is to select sets of mount points that
 don't have dependencies on other sets, hence threads can/should run
 lock-less and shouldn't race with other threads for other sets. Each
 thread dispatched corresponds to top level directory which may or may
 not have datasets to be mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1)
 is to mount datasets for each set of mount points. The mount points
 within each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that the initial thread dispatching in
zfs_foreach_mountpoint() can be multi-threaded when it needs to be
single-threaded, and this puts threads under race condition. This race
appeared as mount/unmount issues on ZoL for ZoL having different
timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts were reordered by the race condition.

There are currently two known patterns of input list `handles` in
`zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.

1) #8833 case where input is `/a /a /a/b` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list with two same top level directories.
 There is a race between two POSIX threads A and B,
  * ThreadA for "/a" for test1 and "/a/b"
  * ThreadB for "/a" for test0/a
 and in case of #8833, ThreadA won the race. Two threads were created
 because "/a" wasn't considered as `"/a" contains "/a"`.

2) #8450 case where input is `/ /var/data /var/data/test` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list containing "/".
 There is a race between two POSIX threads A and B,
  * ThreadA for "/" and "/var/data/test"
  * ThreadB for "/var/data"
 and in case of #8450, ThreadA won the race. Two threads were created
 because "/var/data" wasn't considered as `"/" contains "/var/data"`.
 In other words, if there is (at least one) "/" in the input list,
 the initial thread dispatching must be single-threaded since every
 directory is a child of "/", meaning they all directly or indirectly
 depend on "/".

In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result the initial thread
dispatching creates another thread when it needs to be single-threaded.
Fix a conditional in libzfs_path_contains() to consider above two.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8450
Closes #8833
Closes #8878
2019-07-09 09:31:46 -07:00
loli10K a54f92b4b1 Fix dracut Debian/Ubuntu packaging
This commit ensures make(1) targets that build .deb packages fail if
alien(1) can't convert all .rpm files; additionally it also updates
the zfs-dracut package name which was changed to "noarch" in ca4e5a7.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8990 
Closes #8991
2019-07-09 09:28:05 -07:00
loli10K 1d20b763bb zfs send does not handle invalid input gracefully
Due to some changes introduced in 30af21b 'zfs send' can crash when
provided with invalid inputs: this change attempts to add more checks
to the affected code paths.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9001
2019-07-08 15:10:23 -07:00
Paul Dagnelie f664f1ee7f Decrease contention on dn_struct_rwlock
Currently, sequential async write workloads spend a lot of time 
contending on the dn_struct_rwlock. This lock is responsible for 
protecting the entire block tree below it; this naturally results 
in some serialization during heavy write workloads. This can be 
resolved by having per-dbuf locking, which will allow multiple 
writers in the same object at the same time.

We introduce a new rwlock, the db_rwlock. This lock is responsible 
for protecting the contents of the dbuf that it is a part of; when 
reading a block pointer from a dbuf, you hold the lock as a reader. 
When writing data to a dbuf, you hold it as a writer. This allows 
multiple threads to write to different parts of a file at the same 
time.

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens matt@delphix.com
Reviewed by: George Wilson george.wilson@delphix.com
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
External-issue: DLPX-52564
External-issue: DLPX-53085
External-issue: DLPX-57384
Closes #8946
2019-07-08 13:18:50 -07:00
Brad Lewis cb70964221 8659 static dtrace probes unavailable on non-GPL modules
ZFS tracing efforts are hampered by the inability to access zfs static
probes(probes using DTRACE_PROBE macros). The probes are available via
tracepoints for GPL modules only.  The build could be modified to
generate a function for each unique DTRACE_PROBE invocation. These could
be then accessed via kprobes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Brad Lewis <brad.lewis@delphix.com>
Closes #8659 
Closes #8663
2019-07-08 11:20:53 -07:00
Brian Behlendorf 1086f54219 Revert "Fail early on bio corruption confirmed on 5.2-rc1"
This reverts commit aa7aab6c45.
The change is not compatible with CentOS 6's 2.6.32 based kernel
due to differnces in the bio layer.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #8961
2019-07-05 20:38:56 -07:00
Tom Caputi 52a83dc694 Remove VERIFY from dsl_dataset_crypt_stats()
This patch fixes an issue where dsl_dataset_crypt_stats() would
VERIFY that it was able to hold the encryption root. This function
should instead silently continue without populating the related
field in the nvlist, as is the convention for this code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8976
2019-07-05 16:53:14 -07:00
Paul Dagnelie fe0ea84812 Don't activate metaslabs with weight 0
We return ENOSPC in metaslab_activate if the metaslab has weight 0, 
to avoid activating a metaslab with no space available.  For sanity 
checking, we also assert that there is no free space in the range 
tree in that case.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #8968
2019-07-05 16:45:20 -07:00
loli10K 3b5fe2c351 Fix zfs "redact" misc issues
* zfs redact error messages do not end with newline character
 * 30af21b0 inadvertently removed some ZFS_PROP comments
 * man/zfs: zfs redact <redaction_snapshot> is not optional

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8988
2019-07-05 16:38:17 -07:00
Mike Gerdts 341166c843 OpenZFS 9318 - vol_volsize_to_reservation does not account for raidz skip blocks
When a volume is created in a pool with raidz vdevs and
volblocksize != 128k, the volume can reference more space than is
reserved with the automatically calculated refreservation.  There
are two deficiencies in vol_volsize_to_reservation that contribute
to this:

  1) Skip blocks may be added to keep each allocation a multiple
     of parity + 1. This is the dominating factor when volblocksize
     is close to 2^ashift.

  2) raidz deflation for 128 KB blocks is different for most other
     block sizes.

See "The theory of raidz space accounting" comment in
libzfs_dataset.c for a full explanation.

Authored by: Mike Gerdts <mike.gerdts@joyent.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Kody Kantor <kody.kantor@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Mike Gerdts <mike.gerdts@joyent.com>

Porting Notes:
* ZTS: wait for zvols to exist before writing
* ZTS: use log_must_busy with {zpool|zfs} destroy

OpenZFS-issue: https://www.illumos.org/issues/9318
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/b73ccab0
Closes #8973
2019-07-05 15:35:15 -07:00
Paul Zuchowski 6dbca94f0c Improve "Unable to automount" error message.
Having the mountpoint and dataset name both in the message made it
confusing to read.  Additionally, convert this to a zfs_dbgmsg rather than
sending it to the console.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #8959
2019-07-03 13:05:02 -07:00
Tomohiro Kusumi aa7aab6c45 Fail early on bio corruption confirmed on 5.2-rc1
Unable to import zpool with "Large kmem_alloc" warning due to
corrupted bio's with invalid # of page vectors.
See #8867 for details.

Fail early with ENOMEM.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8867 
Closes #8961
2019-07-03 13:03:22 -07:00
Brian Behlendorf 46db9d6161 Check b_freeze_cksum under ZFS_DEBUG_MODIFY conditional
The b_freeze_cksum field can only have data when ZFS_DEBUG_MODIFY
is set.  Therefore, the EQUIV check must be wrapped accordingly.
For the same reason the ASSERT in arc_buf_fill() in unsafe.
However, since it's largely redundant it has simply been removed.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8979
2019-07-03 13:01:54 -07:00
Matthew Ahrens aa79910726 Fix typo in zpool-features.5, section bookmark_written
The full property name includes "delphix", not "delphxi".

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8985
2019-07-03 12:57:05 -07:00
Tom Caputi 2ba59fa9f1 Fix error text for EINVAL in zfs_receive_one()
This small patch fixes the EINVAL case for zfs_receive_one(). A
missing 'else' has been added to the two possible cases, which
will ensure the intended error message is printed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8977
2019-07-02 17:29:59 -07:00
Tomohiro Kusumi df358db7c3 Don't use d_path() for automount mount point for chroot'd process
Chroot'd process fails to automount snapshots due to realpath(3)
failure in mount.zfs(8).

Construct a mount point path from sb of the ctldir inode and dirent
name, instead of from d_path(), so that chroot'd process doesn't get
affected by its view of fs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8903 
Closes #8966
2019-07-02 08:25:23 -07:00
loli10K ac6d54160a Fix zfs(8) mandoc errors
This was accidentally introduced in 765d1f06:

mandoc: ./man/man8/zfs.8: ERROR: skipping item outside list: It Ar filesystem Ns | Ns Ar mountpoint
mandoc: ./man/man8/zfs.8: ERROR: skipping item outside list: It Xo
mandoc: ./man/man8/zfs.8: ERROR: skipping end of block that is not open: Xc
mandoc: ./man/man8/zfs.8: ERROR: skipping item outside list: It Xo
mandoc: ./man/man8/zfs.8: ERROR: skipping end of block that is not open: Xc

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8980
2019-07-02 08:19:55 -07:00
George Wilson 681a85cb01 nopwrites on dmu_sync-ed blocks can result in a panic
After device removal, performing nopwrites on a dmu_sync-ed block
will result in a panic. This panic can show up in two ways:
1. an attempt to issue an IOCTL in vdev_indirect_io_start()
2. a failed comparison of zio->io_bp and zio->io_bp_orig in
   zio_done()
To resolve both of these panics, nopwrites of blocks on indirect
vdevs should be ignored and new allocations should be performed on
concrete vdevs.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #8957
2019-06-28 12:40:23 -07:00
Tom Caputi 765d1f0644 Add 'zfs umount -u' for encrypted datasets
This patch adds the ability for the user to unload keys for
datasets as they are being unmounted. This is analogous to
'zfs mount -l'.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes: #8917
Closes: #8952
2019-06-28 12:38:37 -07:00
Paul Dagnelie 679b0f2abf Concurrent small allocation defeats large allocation
With the new parallel allocators scheme, there is a possibility for 
a problem where two threads, allocating from the same allocator at 
the same time, conflict with each other. There are two primary cases 
to worry about. First, another thread working on another allocator
activates the same metaslab that the first thread was trying to
activate. This results in the first thread needing to go back and
reselect a new metaslab, even though it may have waited a long time
for this metaslab to load. Second, another thread working on the same
allocator may have activated a different metaslab while the first
thread was waiting for its metaslab to load. Both of these cases
can cause the first thread to be significantly delayed in issuing 
its IOs. The second case can also cause metaslab load/unload churn; 
because the metaslab is loaded but not fully activated, we never set 
the selected_txg, which results in the metaslab being immediately 
unloaded again. This process can repeat many times, wasting disk and 
cpu resources. This is more likely to happen when the IO of the first 
thread is a larger one (like a ZIL write) and the other thread is 
doing a smaller write, because it is more likely to find an 
acceptable metaslab quickly.

There are two primary changes. The first is to always proceed with 
the allocation when returning from metaslab_activate if we were 
preempted in either of the ways described in the previous section. 
The second change is to set the selected_txg before we do the call 
to activate so that even if the metaslab is not used for an 
allocation, we won't immediately attempt to unload it.

Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
External-issue: DLPX-61314
Closes #8843
2019-06-26 11:00:12 -07:00
Paul Dagnelie 3fab4d9e08 zdb -vvvvv on ztest pool dies with "out of memory"
ztest creates some extremely large files as part of its 
operation. When zdb tries to dump a large enough file, it 
can run out of memory or spend an extremely long time 
attempting to print millions or billions of uint64_ts.

We cap the amount of data from a uint64 object that we 
are willing to read and print.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
External-issue: DLPX-53814
Closes #8947
2019-06-25 12:50:37 -07:00
Alexander Motin fc7546777b Avoid extra taskq_dispatch() calls by DMU
DMU sync code calls taskq_dispatch() for each sublist of os_dirty_dnodes
and os_synced_dnodes.  Since the number of sublists by default is equal
to number of CPUs, it will dispatch equal, potentially large, number of
tasks, waking up many CPUs to handle them, even if only one or few of
sublists actually have any work to do.

This change adds check for empty sublists to avoid this.

Reviewed by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Closes #8909
2019-06-25 12:03:38 -07:00
loli10K 5279ae918b Redacted Send/Receive causes zdb to dump core
When used with verbosity >= 4 zdb fails an assertion in dump_bookmarks()
because it expects snprintf() to retun 0 on success.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8948
2019-06-24 18:06:26 -07:00
loli10K 746d4a451e Fix bp_embedded_type enum definition
With the addition of BP_EMBEDDED_TYPE_REDACTED in 30af21b0 a couple of
codepaths make wrong assumptions and could potentially result in errors.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8951
2019-06-24 18:02:17 -07:00
Igor K 13d454c6ca -Y option for zdb is valid
The -Y option was added for ztest to test split block reconstruction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #8926
2019-06-24 17:58:11 -07:00
Matthew Ahrens 59ec30a329 Remove code for zfs remap
The "zfs remap" command was disabled by
6e91a72fe3, because it has little utility
and introduced some tricky bugs.  This commit removes the code for it,
the associated ZFS_IOC_REMAP ioctl, and tests.

Note that the ioctl and property will remain, but have no functionality.
This allows older software to fail gracefully if it attempts to use
these, and avoids a backwards incompatibility that would be introduced if
we renumbered the later ioctls/props.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8944
2019-06-24 16:44:01 -07:00
Tom Caputi 53864800f6 Fix error message on promoting encrypted dataset
This patch corrects the error message reported when attempting
to promote a dataset outside of its encryption root.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8905 
Closes #8935
2019-06-24 16:42:52 -07:00
Brian Behlendorf 8f12a4f8d2 Fix out-of-tree build failures
Resolve the incorrect use of srcdir and builddir references for
various files in the build system.  These have crept in over time
and went unnoticed because when building in the top level directory
srcdir and builddir are identical.

With this change it's again possible to build in a subdirectory.

    $ mkdir obj
    $ cd obj
    $ ../configure
    $ make

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8921 
Closes #8943
2019-06-24 09:32:47 -07:00
Tomohiro Kusumi cc9625c47c Add missing .gitignore from "Implement Redacted Send/Receive"
30af21b025 needed to ignore get_diff executable.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8950
2019-06-23 11:51:29 -07:00
Don Brady 186898bbb5 OpenZFS 9425 - channel programs can be interrupted
Problem Statement
=================
ZFS Channel program scripts currently require a timeout, so that hung or
long-running scripts return a timeout error instead of causing ZFS to get
wedged. This limit can currently be set up to 100 million Lua instructions.
Even with a limit in place, it would be desirable to have a sys admin
(support engineer) be able to cancel a script that is taking a long time.

Proposed Solution
=================
Make it possible to abort a channel program by sending an interrupt signal.In
the underlying txg_wait_sync function, switch the cv_wait to a cv_wait_sig to
catch the signal. Once a signal is encountered, the dsl_sync_task function can
install a Lua hook that will get called before the Lua interpreter executes a
new line of code. The dsl_sync_task can resume with a standard txg_wait_sync
call and wait for the txg to complete.  Meanwhile, the hook will abort the
script and indicate that the channel program was canceled. The kernel returns
a EINTR to indicate that the channel program run was canceled.

Porting notes: Added missing return value from cv_wait_sig()

Authored by: Don Brady <don.brady@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>

OpenZFS-issue: https://www.illumos.org/issues/9425
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/d0cb1fb926
Closes #8904
2019-06-22 16:51:46 -07:00
Matthew Ahrens cb9e5b7e84 dn_struct_rwlock can not be held in dmu_tx_try_assign()
The thread calling dmu_tx_try_assign() can't hold the dn_struct_rwlock
while assigning the tx, because this can lead to deadlock. Specifically,
if this dnode is already assigned to an earlier txg, this thread may
need to wait for that txg to sync (the ERESTART case below).  The other
thread that has assigned this dnode to an earlier txg prevents this txg
from syncing until its tx can complete (calling dmu_tx_commit()), but it
may need to acquire the dn_struct_rwlock to do so (e.g. via
dmu_buf_hold*()).

This commit adds an assertion to dmu_tx_try_assign() to ensure that this
deadlock is not inadvertently introduced.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8929
2019-06-22 16:48:54 -07:00
gordan-bobic ca4e5a785f Remove arch and relax version dependency
Remove arch and relax version dependency for zfs-dracut
package.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gordan Bobic <gordan@redsleeve.org>
Issue #8913 
Closes #8914
2019-06-22 16:47:18 -07:00
Harry Mallon 8b14cb46bf Add libnvpair to libzfs pkg-config
Functions such as `fnvlist_lookup_nvlist` need libnvpair to be linked.
Default pkg-config file did not contain it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Harry Mallon <hjmallon@gmail.com>
Closes #8919
2019-06-22 16:43:11 -07:00
Don Brady a9cd8bfde7 Let zfs mount all tolerate in-progress mounts
The zfs-mount service can unexpectedly fail to start when zfs 
encounters a mount that is in progress. This service uses 
zfs mount -a, which has a window between the time it checks if 
the dataset was mounted and when the actual mount (via mount.zfs 
binary) occurs.

The reason for the racing mounts is that both zfs-mount.target 
and zfs-share.target are allowed to execute concurrently after 
the import.  This is more of an issue with the relatively recent 
addition of parallel mounting, and we should consider serializing
the mount and share targets.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #8881
2019-06-22 16:41:21 -07:00
Allan Jude fb6e6f1ffb zstreamdump: add per-record-type counters and an overhead counter
Count the bytes of payload for each replication record type

Count the bytes of overhead (replication records themselves)

Include these counters in the output summary at the end of the run.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Sponsored-By: Klara Systems and Catalogic
Closes #8432
2019-06-22 16:33:44 -07:00
Paul Dagnelie 2b09628b59 Fix comments on zfs_bookmark_phys
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #8945
2019-06-22 16:32:26 -07:00
Tomohiro Kusumi d5bf1cf179 Fix build break by "Implement Redacted Send/Receive"
30af21b025 broke build on Fedora. gcc can detect potential overflow
on compile-time. Consider strlen of already copied string.

Also change strn to strl variants per suggestion from @behlendorf
and @ofaaland.

--
libzfs_input_check.c: In function 'test_redact':
libzfs_input_check.c:711:2: error: 'strncat' specified bound 288 equals
 destination size [-Werror=stringop-overflow=]
  strncat(bookmark, "#testbookmark", sizeof (bookmark));
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8939
2019-06-22 16:30:59 -07:00
Paul Dagnelie a370182fed Add SCSI_PASSTHROUGH to zvols to enable UNMAP support
When exporting ZVOLs as SCSI LUNs, by default Windows will not
issue them UNMAP commands. This reduces storage efficiency in
many cases.

We add the SCSI_PASSTHROUGH flag to the zvol's device queue,
which lets the SCSI target logic know that it can handle SCSI
commands.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #8933
2019-06-21 09:40:56 -07:00
loli10K 3976fd65d3 Redacted Send/Receive broke zfs(8) help message
Since 30af21b0 was merged 'zfs send' help message format is broken
and lists "-r" as a valid option: this commit corrects these
small issues.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8942
2019-06-21 09:38:15 -07:00
Tomohiro Kusumi 9585497208 Prevent pointer to an out-of-scope local variable
`show_str` could be a pointer to a local variable in stack
which is out-of-scope by the time
`return (snprintf(buf, buflen, "%s\n", show_str));`
is called.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8924 
Closes #8940
2019-06-20 18:31:52 -07:00
Matthew Ahrens accd6d9dc4 dedup=verify doesn't clear the blkptr's dedup flag
The logic to handle strong checksum collisions where the data doesn't
match is incorrect. It is not clearing the dedup bit of the blkptr,
which can cause a panic later in zio_ddt_free() due to the dedup table
not matching what is in the blkptr.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-48097
Closes #8936
2019-06-20 18:30:40 -07:00
Igor K a64f8276c7 Update vdev_ops_t from illumos
Align vdev_ops_t from illumos for better compatibility.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #8925
2019-06-20 18:29:02 -07:00
Tom Caputi da68988708 Allow unencrypted children of encrypted datasets
When encryption was first added to ZFS, we made a decision to
prevent users from creating unencrypted children of encrypted
datasets. The idea was to prevent users from inadvertently
leaving some of their data unencrypted. However, since the
release of 0.8.0, some legitimate reasons have been brought up
for this behavior to be allowed. This patch simply removes this
limitation from all code paths that had checks for it and updates
the tests accordingly.

Reviewed-by: Jason King <jason.king@joyent.com>
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8737 
Closes #8870
2019-06-20 12:29:51 -07:00
dacianstremtan 84b4201f32 Replace whereis with type in zfs-lib.sh
The whereis command should not be used since it may not exist 
in the initramfs.  The dracut plymouth module also uses the type
command instead of whereis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Signed-off-by: Dacian Reece-Stremtan <dacianstremtan@gmail.com>
Closes #8920 
Closes #8938
2019-06-20 12:27:14 -07:00
Matthew Ahrens 050d720c43 Remove dedupditto functionality
If dedup is in use, the `dedupditto` property can be set, causing ZFS to
keep an extra copy of data that is referenced many times (>100x).  The
idea was that this data is more important than other data and thus we
want to be really sure that it is not lost if the disk experiences a
small amount of random corruption.

ZFS (and system administrators) rely on the pool-level redundancy to
protect their data (e.g. mirroring or RAIDZ).  Since the user/sysadmin
doesn't have control over what data will be offered extra redundancy by
dedupditto, this extra redundancy is not very useful.  The bulk of the
data is still vulnerable to loss based on the pool-level redundancy.
For example, if particle strikes corrupt 0.1% of blocks, you will either
be saved by mirror/raidz, or you will be sad.  This is true even if
dedupditto saved another 0.01% of blocks from being corrupted.

Therefore, the dedupditto functionality is rarely enabled (i.e. the
property is rarely set), and it fulfills its promise of increased
redundancy even more rarely.

Additionally, this feature does not work as advertised (on existing
releases), because scrub/resilver did not repair the extra (dedupditto)
copy (see https://github.com/zfsonlinux/zfs/pull/8270).

In summary, this seldom-used feature doesn't work, and even if it did it
wouldn't provide useful data protection.  It has a non-trivial
maintenance burden (again see https://github.com/zfsonlinux/zfs/pull/8270).

We should remove the dedupditto functionality.  For backwards
compatibility with the existing CLI, "zpool set dedupditto" will still
"succeed" (exit code zero), but won't have any effect.  For backwards
compatibility with existing pools that had dedupditto enabled at some
point, the code will still be able to understand dedupditto blocks and
free them when appropriate.  However, ZFS won't write any new dedupditto
blocks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Issue #8270 
Closes #8310
2019-06-19 14:54:02 -07:00
Tomohiro Kusumi fb0be12d7b Use ZFS_DEV macro instead of literals
The rest of the code/comments use ZFS_DEV, so sync with that.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8912
2019-06-19 12:27:31 -07:00
Michael Niewöhner 0b755ec3d5 Fix memory leak in check_disk()
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #8897  
Closes #8911
2019-06-19 11:53:37 -07:00
Olaf Faaland c308b1dd63 kmod-zfs-devel rpm should provide kmod-spl-devel
When configure is run with --with-spec=redhat, and rpms are built, the
kmod-zfs-devel package is missing

Provides: kmod-spl-devel = %{version}

which is required by software such as Lustre which builds against zfs
kmods.  Adding it makes it easier for such software to build against
both zfs-0.7 (where SPL is separate and may be missing) and zfs-0.8.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #8930
2019-06-19 11:44:44 -07:00
Brian Behlendorf 4ca457b065 ZTS: Fix mmp_interval failure
The mmp_interval test case was failing on Fedora 30 due to the built-in
'echo' command terminating the script when it was unable to write to
the sysfs module parameter.  This change in behavior was observed with
ksh-2020.0.0-alpha1.  Resolve the issue by using the external cat
command which fails gracefully as expected.

Additionally, remove some incorrect quotes around the $? return values.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8906
2019-06-19 10:39:28 -07:00
Paul Dagnelie 30af21b025 Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to 
a target system. One possible use case for this feature is to not 
transmit sensitive information to a data warehousing, test/dev, or 
analytics environment. Another is to save space by not replicating 
unimportant data within a given dataset, for example in backup tools 
like zrepl.

Redacted send/receive is a three-stage process. First, a clone (or 
clones) is made of the snapshot to be sent to the target. In this 
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction 
snapshot" (or snapshots). Second, the new zfs redact command is used 
to create a redaction bookmark. The redaction bookmark stores the 
list of blocks in a snapshot that were modified by the redaction 
snapshot(s). Finally, the redaction bookmark is passed as a parameter 
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive 
or unwanted information, and those blocks are not included in the send 
stream.  When sending from the redaction bookmark, the blocks it 
contains are considered as candidate blocks in addition to those 
blocks in the destination snapshot that were modified since the 
creation_txg of the redaction bookmark.  This step is necessary to 
allow the target to rehydrate data in the case where some blocks are 
accidentally or unnecessarily modified in the redaction snapshot.

The changes to bookmarks to enable fast space estimation involve 
adding deadlists to bookmarks. There is also logic to manage the 
life cycles of these deadlists.

The new size estimation process operates in cases where previously 
an accurate estimate could not be provided. In those cases, a send 
is performed where no data blocks are read, reducing the runtime 
significantly and providing a byte-accurate size estimate.

Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 09:48:12 -07:00
Alexander Motin c1b5801bb5 Minimize aggsum_compare(&arc_size, arc_c) calls.
For busy ARC situation when arc_size close to arc_c is desired.  But
then it is quite likely that aggsum_compare(&arc_size, arc_c) will need
to flush per-CPU buckets to find exact comparison result.  Doing that
often in a hot path penalizes whole idea of aggsum usage there, since it
replaces few simple atomic additions with dozens of lock acquisitions.

Replacing aggsum_compare() with aggsum_upper_bound() in code increasing
arc_p when ARC is growing (arc_size < arc_c) according to PMC profiles
allows to save ~5% of CPU time in aggsum code during sequential write
to 12 ZVOLs with 16KB block size on large dual-socket system.

I suppose there some minor arc_p behavior change due to lower precision
of the new code, but I don't think it is a big deal, since it should
affect only very small window in time (aggsum buckets are flushed every
second) and in ARC size (buckets are limited to 10 average ARC blocks
per CPU).

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Closes #8901
2019-06-14 14:07:33 -07:00
Ryan Moeller b1b4ac2708 Python config cleanup
Don't require Python at configure/build unless building pyzfs.
Move ZFS_AC_PYTHON_MODULE to always-pyzfs.m4 where it is used.
Make test syntax more consistent.

Sponsored by: iXsystems, Inc.
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #8895
2019-06-13 13:15:46 -07:00
Matthew Ahrens 7218b29e4b lz4_decompress_abd declared but not defined
`lz4_decompress_abd` is declared in zio_compress.h but it is not defined
anywhere. The declaration should be removed.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-47477
Closes #8894
2019-06-13 13:14:34 -07:00
Matthew Ahrens 53dce5acc6 panic in removal_remap test on 4K devices
If the zfs_remove_max_segment tunable is changed to be not a multiple of
the sector size, then the device removal code will malfunction and try
to create mappings that are smaller than one sector, leading to a panic.

On debug bits this assertion will fail in spa_vdev_copy_segment():
    ASSERT3U(DVA_GET_ASIZE(&dst), ==, size);

On nondebug, the system panics with a stack like:
    metaslab_free_concrete()
    metaslab_free_impl()
    metaslab_free_impl_cb()
    vdev_indirect_remap()
    free_from_removing_vdev()
    metaslab_free_impl()
    metaslab_free_dva()
    metaslab_free()

Fortunately, the default for zfs_remove_max_segment is 1MB, so this
can't occur by default.  We hit it during this test because
removal_remap.ksh changes zfs_remove_max_segment to 1KB. When testing on
4KB-sector disks, we hit the bug.

This change makes the zfs_remove_max_segment tunable more robust,
automatically rounding it up to a multiple of the sector size. We also
turn some key assertions into VERIFY's so that similar bugs would be
caught before they are encoded on disk (and thus avoid a
panic-reboot-loop).

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-61342
Closes #8893
2019-06-13 13:12:39 -07:00
Matthew Ahrens be89734a29 compress metadata in later sync passes
Starting in sync pass 5 (zfs_sync_pass_dont_compress), we disable
compression (including of metadata).  Ostensibly this helps the sync
passes to converge (i.e. for a sync pass to not need to allocate
anything because it is 100% overwrites).

However, in practice it increases the average number of sync passes,
because when we turn compression off, a lot of block's size will change
and thus we have to re-allocate (not overwrite) them.  It also increases
the number of 128KB allocations (e.g. for indirect blocks and spacemaps)
because these will not be compressed.  The 128K allocations are
especially detrimental to performance on highly fragmented systems,
which may have very few free segments of this size, and may need to load
new metaslabs to satisfy 128K allocations.

We should increase zfs_sync_pass_dont_compress.  In practice on a highly
fragmented system we see a few 5-pass txg's, a tiny number of 6-pass
txg's, and no txg's with more than 6 passes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-63431
Closes #8892
2019-06-13 13:10:18 -07:00
Alexander Motin ae5c78e0b1 Move write aggregation memory copy out of vq_lock
Memory copy is too heavy operation to do under the congested lock.
Moving it out reduces congestion by many times to almost invisible.
Since the original zio removed from the queue, and the child zio is
not executed yet, I don't see why would the copy need protection.
My guess it just remained like this from the time when lock was not
dropped here, which was added later to fix lock ordering issue.

Multi-threaded sequential write tests with both HDD and SSD pools
with ZVOL block sizes of 4KB, 16KB, 64KB and 128KB all show major
reduction of lock congestion, saving from 15% to 35% of CPU time
and increasing throughput from 10% to 40%.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Closes #8890
2019-06-13 13:08:24 -07:00
Matthew Ahrens d3230d761a looping in metaslab_block_picker impacts performance on fragmented pools
On fragmented pools with high-performance storage, the looping in
metaslab_block_picker() can become the performance-limiting bottleneck.
When looking for a larger block (e.g. a 128K block for the ZIL), we may
search through many free segments (up to hundreds of thousands) to find
one that is large enough to satisfy the allocation. This can take a long
time (up to dozens of ms), and is done while holding the ms_lock, which
other threads may spin waiting for.

When this performance problem is encountered, profiling will show
high CPU time in metaslab_block_picker, as well as in mutex_enter from
various callers.

The problem is very evident on a test system with a sync write workload
with 8K writes to a recordsize=8k filesystem, with 4TB of SSD storage,
84% full and 88% fragmented. It has also been observed on production
systems with 90TB of storage, 76% full and 87% fragmented.

The fix is to change metaslab_df_alloc() to search only up to 16MB from
the previous allocation (of this alignment). After that, we will pick a
segment that is of the exact size requested (or larger). This reduces
the number of iterations to a few hundred on fragmented pools (a ~100x
improvement).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-62324
Closes #8877
2019-06-13 13:06:15 -07:00
Tulsi Jain 9c7da9a95a Restrict filesystem creation if name referred either '.' or '..'
This change restricts filesystem creation if the given name
contains either '.' or '..'

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: TulsiJain <tulsi.jain@delphix.com>
Closes #8842 
Closes #8564
2019-06-13 08:56:15 -07:00
Matthew Ahrens 3475724ea4 ztest: dmu_tx_assign() gets ENOSPC in spa_vdev_remove_thread()
When running zloop, we occasionally see the following crash:

    dmu_tx_assign(tx, TXG_WAIT) == 0 (0x1c == 0)
    ASSERT at ../../module/zfs/vdev_removal.c:1507:spa_vdev_remove_thread()/sbin/ztest(+0x89c3)[0x55faf567b9c3]

The error value 0x1c is ENOSPC.

The transaction used by spa_vdev_remove_thread() should not be able to
fail due to being out of space. i.e. we should not call
dmu_tx_hold_space().  This will allow the removal thread to schedule its
work even when the pool is low on space.  The "slop space" will provide
enough free space to sync out the txg.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-37853
Closes #8889
2019-06-13 08:48:42 -07:00
Tomohiro Kusumi daddbdc7cc Fix lockdep warning on insmod
sysfs_attr_init() is required to make lockdep happy for dynamically
allocated sysfs attributes. This fixed #8868 on Fedora 29 running
kernel-debug.

This requirement was introduced in 2.6.34.
See include/linux/sysfs.h for what it actually does.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8868 
Closes #8884
2019-06-12 17:15:06 -07:00
Matthew Ahrens d9b4bf0665 fat zap should prefetch when iterating
When iterating over a ZAP object, we're almost always certain to iterate
over the entire object. If there are multiple leaf blocks, we can
realize a performance win by issuing reads for all the leaf blocks in
parallel when the iteration begins.

For example, if we have 10,000 snapshots, "zfs destroy -nv
pool/fs@1%9999" can take 30 minutes when the cache is cold. This change
provides a >3x performance improvement, by issuing the reads for all ~64
blocks of each ZAP object in parallel.

Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-58347
Closes #8862
2019-06-12 13:13:09 -07:00
Matthew Ahrens d9cd66e45f Target ARC size can get reduced to arc_c_min
Sometimes the target ARC size is reduced to arc_c_min, which impacts
performance.  We've seen this happen as part of the random_reads
performance regression test, where the ARC size is reduced before the
reads test starts which impacts how long it takes for system to reach
good IOPS performance.

We call arc_reduce_target_size when arc_reap_cb_check() returns TRUE,
and arc_available_memory() is less than arc_c>>arc_shrink_shift.

However, arc_available_memory() could easily be low, even when arc_c is
low, because we can have tons of unused bufs in the abd kmem cache. This
would be especially true just after the DMU requests a bunch of stuff be
evicted from the ARC (e.g. due to "zpool export").

To fix this, the ARC should reduce arc_c by the requested amount, not
all the way down to arc_size (or arc_c_min), which can be very small.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-59431
Closes #8864
2019-06-12 13:06:55 -07:00
bnjf 10269e02f9 Fix typo in vdev_raidz_math.c
Fix typo in vdev_raidz_math.c

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brad Forschinger <github@bnjf.id.au>
Closes #8875 
Closes #8880
2019-06-12 13:03:33 -07:00
Matthew Ahrens 5662fd5794 single-chunk scatter ABDs can be treated as linear
Scatter ABD's are allocated from a number of pages.  In contrast to
linear ABD's, these pages are disjoint in the kernel's virtual address
space, so they can't be accessed as a contiguous buffer.  Therefore
routines that need a linear buffer (e.g. abd_borrow_buf() and friends)
must allocate a separate linear buffer (with zio_buf_alloc()), and copy
the contents of the pages to/from the linear buffer.  This can have a
measurable performance overhead on some workloads.

https://github.com/zfsonlinux/zfs/commit/87c25d567fb7969b44c7d8af63990e
("abd_alloc should use scatter for >1K allocations") increased the use
of scatter ABD's, specifically switching 1.5K through 4K (inclusive)
buffers from linear to scatter.  For workloads that access blocks whose
compressed sizes are in this range, that commit introduced an additional
copy into the read code path.  For example, the
sequential_reads_arc_cached tests in the test suite were reduced by
around 5% (this is doing reads of 8K-logical blocks, compressed to 3K,
which are cached in the ARC).

This commit treats single-chunk scattered buffers as linear buffers,
because they are contiguous in the kernel's virtual address space.

All single-page (4K) ABD's can be represented this way.  Some multi-page
ABD's can also be represented this way, if we were able to allocate a
single "chunk" (higher-order "page" which represents a power-of-2 series
of physically-contiguous pages).  This is often the case for 2-page (8K)
ABD's.

Representing a single-entry scatter ABD as a linear ABD has the
performance advantage of avoiding the copy (and allocation) in
abd_borrow_buf_copy / abd_return_buf_copy.  A performance increase of
around 5% has been observed for ARC-cached reads (of small blocks which
can take advantage of this), fixing the regression introduced by
87c25d567.

Note that this optimization is only possible because all physical memory
is always mapped into the kernel's address space.  This is not the case
for HIGHMEM pages, so the optimization can not be made on 32-bit
systems.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8580
2019-06-11 09:02:31 -07:00
Matthew Ahrens b8738257c2 make zil max block size tunable
We've observed that on some highly fragmented pools, most metaslab
allocations are small (~2-8KB), but there are some large, 128K
allocations.  The large allocations are for ZIL blocks.  If there is a
lot of fragmentation, the large allocations can be hard to satisfy.

The most common impact of this is that we need to check (and thus load)
lots of metaslabs from the ZIL allocation code path, causing sync writes
to wait for metaslabs to load, which can take a second or more.  In the
worst case, we may not be able to satisfy the allocation, in which case
the ZIL will resort to txg_wait_synced() to ensure the change is on
disk.

To provide a workaround for this, this change adds a tunable that can
reduce the size of ZIL blocks.

External-issue: DLPX-61719
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8865
2019-06-10 11:48:42 -07:00
Alexander Motin 5a902f5aaa Fix comparison signedness in arc_is_overflowing()
When ARC size is very small, aggsum_lower_bound(&arc_size) may return
negative values, that due to unsigned comparison caused delays, waiting
for arc_adjust() to "fix" it by calling aggsum_value(&arc_size).  Use
of signed comparison there fixes the problem.

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Closes #8873
2019-06-10 09:52:25 -07:00
Tom Caputi c08c30ed13 Fix incorrect error message for raw receive
This patch fixes an incorrect error message that comes up when
doing a non-forcing, raw, incremental receive into a dataset
that has a newer snapshot than the "from" snapshot. In this
case, the current code prints a confusing message about an IVset
guid mismatch.

This functionality is supported by non-raw receives as an
undocumented feature, but was never supported by the raw receive
code. If this is desired in the future, we can probably figure
out a way to make it work.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Issue #8758 
Closes #8863
2019-06-10 09:45:08 -07:00
Richard Elling cfc16f8ba8 Improve ZTS block_device_wait debugging
The udevadm settle timeout can be 120 or 180 seconds by default
for some distributions. If a long delay is experienced, it could
be due to some strangeness in a malfunctioning device that isn't
related to the devices under test. To help debug this condition,
a notice is given if settle takes too long.

Arguments can now be passed to block_device_wait. The expected
arguments are block device pathnames.

Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com>
Closes #8839
2019-06-10 09:21:19 -07:00
Richard Elling 4cb1b541d4 Block_device_wait does not return an error code
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com>
Closes #8839
2019-06-10 09:21:08 -07:00
Richard Elling bef70afaa6 Remove redundant redundant remove
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com>
Closes #8839
2019-06-10 09:20:42 -07:00
Richard Elling 2ff615b2bf Fix logic error in setpartition function
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com>
Closes #8839
2019-06-10 09:19:32 -07:00
Eli Schwartz 215e4fe4d2 arc_summary: prefer python3 version and install when there is no python
This matches the behavior of other python scripts, such as arcstat and
dbufstat, which are always installed but whose install-exec-hook actions
will simply touch up the shebang if a python interpreter was configured
*and* that interpreter is a python2 interpreter.

Fixes installation in a minimal build chroot without python available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@freqlabs.com>
Signed-off-by: Eli Schwartz <eschwartz@archlinux.org>
Closes #8851
2019-06-10 09:08:53 -07:00
Samuel VERSCHELDE 01d1e88b1a Fix %post and %postun generation in kmodtool
During zfs-kmod RPM build, $(uname -r) gets unintentionally evaluated on
the build host, once and for all. It should be evaluated during the
execution of the scriptlets on the installation host. Escaping the $
character avoids evaluating it during build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Samuel Verschelde <stormi-xcp@ylix.fr>
Closes #8866
2019-06-10 09:06:58 -07:00
Paul Dagnelie 893a6d62c1 Allow metaslab to be unloaded even when not freed from
On large systems, the memory used by loaded metaslabs can become
a concern. While range trees are a fairly efficient data structure, 
on heavily fragmented pools they can still consume a significant 
amount of memory. This problem is amplified when we fail to unload 
metaslabs that we aren't using. Currently, we only unload a metaslab 
during metaslab_sync_done; in order for that function to be called 
on a given metaslab in a given txg, we have to have dirtied that 
metaslab in that txg. If the dirtying was the result of an allocation, 
we wouldn't be unloading it (since it wouldn't be 8 txgs since it 
was selected), so in effect we only unload a metaslab during txgs 
where it's being freed from.

We move the unload logic from sync_done to a new function, and 
call that function on all metaslabs in a given vdev during 
vdev_sync_done().

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #8837
2019-06-06 19:10:43 -07:00
Jorgen Lundman 876d76be34 Avoid updating zfs_gitrev.h when rev is unchanged
Build process would always re-compile spa_history.c due to touching
zfs_gitrev.h - avoid if no change in gitrev.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #8860
2019-06-06 19:01:41 -07:00
Tom Caputi fd7a657bff Reinstate raw receive check when truncating
This patch re-adds a check that was removed in 369aa50. The check
confirms that a raw receive is not occuring before truncating an
object's dn_maxblkid. At the time, it was believed that all cases
that would hit this code path would be handled in other places,
but that was not the case.

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8852 
Closes #8857
2019-06-06 13:47:33 -07:00
Allan Jude b7109a413c l2arc_apply_transforms: Fix typo in comment
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #8822
2019-06-06 13:14:48 -07:00
Serapheim Dimitropoulos cb020f0d86 Reduced IOPS when all vdevs are in the zfs_mg_fragmentation_threshold
Historically while doing performance testing we've noticed that IOPS
can be significantly reduced when all vdevs in the pool are hitting
the zfs_mg_fragmentation_threshold percentage. Specifically in a
hypothetical pool with two vdevs, what can happen is the following:
Vdev A would go above that threshold and only vdev B would be used.
Then vdev B would pass that threshold but vdev A would go below it
(we've been freeing from A to allocate to B). The allocations would
go back and forth utilizing one vdev at a time with IOPS taking a hit.

Empirically, we've seen that our vdev selection for allocations is
good enough that fragmentation increases uniformly across all vdevs
the majority of the time. Thus we set the threshold percentage high
enough to avoid hitting the speed bump on pools that are being pushed
to the edge. We effectively disable its effect in the majority of the
cases but we don't remove (at least for now) just in case we hit any
weird behavior in the future.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #8859
2019-06-06 13:08:41 -07:00
Garrett Fields 627f5117a3 If $ZFS_BOOTFS contains guid, replace the guid portion with $pool
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Garrett Fields <ghfields@gmail.com>
Closes #8356
2019-06-06 13:04:35 -07:00
Tom Caputi 3ce85b5e60 Fix integer overflow of ZTOI(zp)->i_generation
The ZFS on-disk format stores each inode's generation ID as a 64
bit number on disk and in-core. However, the Linux kernel's inode
is only a 32 bit number. In most places, the code handles this
correctly, but the cast is missing in zfs_rezget(). For many pools,
this isn't an issue since the generation ID is computed as the
current txg when the inode is created and many pools don't have
more than 2^32 txgs.

For the pools that have more txgs, this issue causes any inode with
a high enough generation number to report IO errors after a call to
"zfs rollback" while holding the file or directory open. This patch
simply adds the missing cast.

Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8858
2019-06-06 12:59:39 -07:00
Don Brady 8e91c5ba6a hkdf_test binary should only have one icp instance
The build for test binary hkdf_test was linking both against libicp 
and libzpool. This results in two instances of libicp inside the 
binary but the call to icp_init() only initializes one of them!

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #8850
2019-06-05 14:21:25 -07:00
Tomohiro Kusumi 1a6889947e Drop objid argument in zfs_znode_alloc() (sync with OpenZFS)
Since zfs_znode_alloc() already takes dmu_buf_t*, taking another
uint64_t argument for objid is redundant. inode's ->i_ino does and
needs to match znode's ->z_id.

zfs_znode_alloc() in FreeBSD and illumos doesn't have this argument
since vnode doesn't have vnode# in VFS (hence ->z_id exists).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #8841
2019-06-05 14:18:46 -07:00
Peter Wirdemo 2ecc2020ef Fixed a small typo in man/man1/raidz_test.1
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Peter Wirdemo <peter.wirdemo@gmail.com>
Closes #8855
2019-06-05 09:09:17 -07:00
Torsten Wörtwein 1ec7eda167 Allow TRIM_UNUSED_KSYM when build as a builtin-module
If ZFS is built with enable_linux_builtin, it seems to be possible
to compile the kernel with TRIM_UNUSED_KSYM.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Torsten Wörtwein <twoertwein@gmail.com>
Closes #8820
2019-06-04 18:12:16 -07:00
Ryan Moeller 1a132f0638 Make Python detection optional and more portable
Previously, --without-python would cause ./configure to fail. Now it is
able to proceed, and the Python scripts will not be built.

Use portable parameter expansion matching instead of nonstandard
substring matching to detect the Python version.  This test is
duplicated in several places, so define a function for it.

Don't assume the full path to binaries, since different platforms do
install things in different places.  Use AC_CHECK_PROGS instead.

When building without Python, also build without pyzfs.

Sponsored by: iXsystems, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Eli Schwartz <eschwartz93@gmail.com>
Signed-off-by: Ryan Moeller <ryan@freqlabs.com>
Closes #8809 
Closes #8731
2019-06-04 18:05:46 -07:00
DeHackEd df24bcf00a Wait in 'S' state when send/recv pipe is blocking
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #8733 
Closes #8752
2019-06-03 20:54:43 -07:00
TulsiJain a3c98d5728 Make zfs_async_block_max_blocks handle zero correctly
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: TulsiJain <tulsi.jain@delphix.com>
Closes #8829
Closes #8289
2019-06-03 09:40:48 -07:00
Brian Behlendorf 2531ce3720 Revert "Report holes when there are only metadata changes"
This reverts commit ec4f9b8f30 which introduced a narrow race which
can lead to lseek(, SEEK_DATA) incorrectly returning ENXIO.  Resolve
the issue by revering this change to restore the previous behavior
which depends solely on checking the dirty list.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8816 
Closes #8834
2019-05-30 17:13:18 -07:00
Tomohiro Kusumi 1608985a41 Add link count test for root inode
Add tests for
97aa3ba44("Fix link count of root inode when snapdir is visible")
as suggested in #8727.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #8732
2019-05-29 16:26:46 -07:00
Tomohiro Kusumi fe0c9f409a Remove vn_set_fs_pwd()/vn_set_pwd() (no need to be at / during insmod)
Per suggestion from @behlendorf in #8777, remove vn_set_fs_pwd() and
vn_set_pwd() which are only used in zfs_ioctl.c:_init() while loading
zfs.ko.

The rest of initialization functions being called here after cwd set
to / don't depend on cwd of the process except for spa_config_load().
spa_config_load() uses a relative path ".//etc/zfs/zpool.cache" when
`rootdir` is non-NULL, which is "/etc/zfs/zpool.cache" given cwd is /,
so just unconditionally use the absolute path without "./", so that
`vn_set_pwd("/")` as well as the entire functions can be removed.
This is also what FreeBSD does.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #8826
2019-05-29 16:18:14 -07:00
Brian Behlendorf 1e724f4f34 Exclude log device ashift from normal class
When opening a log device during import its allocation bias will
not yet have been set by vdev_load().  This results in the log
device's ashift being incorrectly applied to the maximum ashift
of the vdevs in the normal class.  Which in turn prevents the
removal of any top-level devices due to the ashift check in the
spa_vdev_remove_top_check() function.

This issue is resolved by including vdev_islog in the check since
it will be set correctly during vdev_open().

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8735
2019-05-29 11:35:50 -07:00
madz ec4afd27f1 Fix integer overflow in get_next_chunk()
dn->dn_datablksz type is uint32_t and need to be casted to uint64_t
to avoid an overflow when the record size is greater than 4 MiB.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olivier Mazouffre <olivier.mazouffre@ims-bordeaux.fr>
Closes #8778 
Closes #8797
2019-05-29 10:17:25 -07:00
Josh Soref 46df7e6cc9 grammar: it is / plural agreement
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
Closes #8818
2019-05-28 15:58:32 -07:00
Tomohiro Kusumi 5691b86ce5 Refactor parent dataset handling in libzfs zfs_rename()
For recursive renaming, simplify the code by moving `zhrp` and
`parentname` to inner scope. `zhrp` is only used to test existence
of a parent dataset for recursive dataset dir scan since ba6a24026c.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #8815
2019-05-28 15:31:38 -07:00
loli10K ab1a9705f8 Double-free of encryption wrapping key due to invalid pool properties
This commits fixes a double-free in zfs_ioc_pool_create() triggered by
specifying an unsupported combination of properties when creating a pool
with encryption enabled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8791
2019-05-28 15:19:50 -07:00
Ryan Moeller 227d379385 Update comments to match code
s/get_vdev_spec/make_root_vdev

The former doesn't exist anymore.

Sponsored by: iXsystems, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Ryan Moeller <ryan@freqlabs.com>
Closes #8759
2019-05-28 15:18:31 -07:00
Stoiko Ivanov 10e1a0112e tests: fix cosmetic permission issues during make install
files in dist_*_SCRIPTS get installed with 0755, those in dist_*_DATA
with 0644. This commit moves all .kshlib, .shlib and .cfg files in the
testsuite to dist_pkgdata_DATA, and removes the shebang from
zpool_import.kshlib.

This ensures that the files are installed with appropriate permissions
and silences some warnings from lintian

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #8803
2019-05-28 15:07:24 -07:00
Stoiko Ivanov 3ba8cd6d0e test-runner.py: change shebang to python3
In commit 6e72a5b9b6 python scripts which
work with python2 and python3 changed the shebang from /usr/bin/python
to /usr/bin/python3. This gets adapted by the build-system on systems
which don't provide python3.
This commit changes test-runner.py to also use /usr/bin/python3,
enabling the change during buildtime and fixing a minor lintian issue
for those Debian packages, which depend on a specific python version
(python3/python2).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #8803
2019-05-28 15:06:07 -07:00
loli10K 0869b74a1e Endless loop in zpool_do_remove() on platforms with unsigned char
On systems where "char" is an unsigned type the value returned by
getopt() will never be negative (-1), leading to an endless loop:
this issue prevents both 'zpool remove' and 'zstreamdump' for
working on some systems.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8789
2019-05-28 11:14:58 -07:00
Tomohiro Kusumi 841a7a98fc Update descriptions for vnops
These descriptions are not uptodate with the code.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8767
2019-05-25 14:29:10 -07:00
Tom Caputi 3b61ca3e57 Fix embedded bp accounting in count_block()
Currently, count_block() does not correctly account for the
possibility that the bp that is passed to it could be embedded.
These blocks shouldn't be counted since the work of scanning
these blocks in already handled when the containing block is
scanned. This patch simply resolves this issue by returning
early in this case.

Reviewed by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: Bill Sommerfeld <sommerfeld@alum.mit.edu>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8800 
Closes #8766
2019-05-25 13:52:23 -07:00
Tom Caputi 8e3c3ed1b3 Disable parallel processing for 'zfs mount -l'
Currently, 'zfs mount -a' will always attempt to parallelize
work related to mounting as best it can. Unfortunately, when
the user passes the '-l' option to load keys, this causes
all threads to prompt the user for their keys at once,
resulting in a confusing and racy user experience. This patch
simply disables parallel mounting when using the '-l' flag.

Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8762 
Closes #8811
2019-05-25 13:46:32 -07:00
Tomohiro Kusumi bfd5a709e7 Linux 5.2 compat: Directly call wait_on_page_bit()
wait_on_page_writeback() was made GPL only in torvalds/linux@19343b5bdd.

Directly call wait_on_page_bit() without using wait_on_page_writeback()
interface, given zfs_putpage() is the only caller for now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #8794
2019-05-25 13:42:09 -07:00
Tomohiro Kusumi 36c110f994 Linux 5.2 compat: Fix config/kernel-shrink.m4 test failure
"whether ->count_objects callback exists" test failed with
"error: error" message for using an incomplete function shrinker_cb().

This is caused by torvalds/linux@83da1bed86. It's configurable,
but we would want to be able to compile with default kbuild setting.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #8776
2019-05-25 13:40:46 -07:00
Tomohiro Kusumi 4bb17ebfe2 Linux 5.2 compat: Remove config/kernel-set-fs-pwd.m4
This failed on 5.2-rc1 with "error: unknown" message, for set_fs_pwd()
not being visible in both const and non-const tests.

This is caused by torvalds/linux@83da1bed86. It's configurable,
but we would want to be able to compile with default kbuild setting.

set_fs_pwd() has never been exported with exception of some distro
kernels, and set_fs_pwd() wasn't used in ZoL to begin with. The test
result was used for a spl function vn_set_fs_pwd().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #8777
2019-05-25 13:28:55 -07:00
Tomohiro Kusumi c3e5907fdb Drop local definition of MOUNT_BUSY
It's accessible via <sys/mntent.h>.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #8765
2019-05-24 16:43:23 -07:00
loli10K 43977948aa zpool: status -t is not documented in help message
This commit adds the undocumented "-t" option to zpool(8) help message.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8782
2019-05-24 14:16:00 -07:00
loli10K fe609530f2 zfs-tests: fix warnings when packaging some .shlib files
This change prevents the following warning when packaging some zfs-tests
files:

   *** WARNING: ./usr/src/zfs-0.8.0/tests/zfs-tests/include/zpool_script.shlib
   is executable but has empty or no shebang, removing executable bit

Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8787
2019-05-24 14:12:14 -07:00
loli10K d28b492ab3 VERIFY3P() message is missing a space character
This commit just reintroduces a [space] character inadvertently removed
in a887d653.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8786
2019-05-24 14:06:53 -07:00
loli10K 2696e86f2b zfs-tests: verify zfs(8) and zpool(8) help message is under 80 columns
This commit updates the ZFS Test Suite to detect incorrect wrapping of
both zfs(8) and zpool(8) help message

Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8785
2019-05-24 14:04:08 -07:00
loli10K b868525bf7 zfs: don't pretty-print objsetid property
The objsetid property, while being stored as a number, is a dataset
identifier and should not be pretty-printed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8784
2019-05-24 13:58:12 -07:00
loli10K 62b2152eca zfs: missing newline character in zfs_do_channel_program() error message
This commit simply adds a missing newline ("\n") character to the error
message printed by the zfs command when the provided pool parameter
can't be found.

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8783
2019-05-24 13:54:36 -07:00
siv0 0ff5cd0772 Fix ksh-path for random_readwrite_fixed.ksh
The test in zfs-tests/tests/perf/regression/random_readwrite_fixed.ksh
is the only file to use /usr/bin/ksh in the shebang.
Change it to /bin/ksh for consistency.

Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #8779
2019-05-24 13:17:50 -07:00
Tomohiro Kusumi 8708fd888f Linux 2.6.39 compat: Test if kstrtoul() exists
kstrtoul() exists only after torvalds/linux@33ee3b2e2e in 2.6.39.
Use strict_strtoul() if kstrtoul() doesn't exist.
Note that strict_strtoul() has existed as an alias for kstrtoul()
for a while, but removed in torvalds/linux@3db2e9cdc0.

It looks like RHEL6 (2.6.32 based) has backported kstrtoul(),
and this caused build CI to pass compilation test.
It should fail on vanilla < 2.6.39 kernels or distro kernels without
backport as reported in #8760.

--
 # grep "kstrtoul(" /lib/modules/2.6.32-754.12.1.el6.x86_64/build/ \
 include/linux/kernel.h >/dev/null
 # echo $?
 0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8760 
Closes #8761
2019-05-24 12:26:18 -07:00
loli10K 2e8c315fc6 Device removal panics on 32-bit systems
The issue is caused by an incorrect usage of the sizeof() operator
in vdev_obsolete_sm_object(): on 64-bit systems this is not an issue
since both "uint64_t" and "uint64_t*" are 8 bytes in size. However on
32-bit systems pointers are 4 bytes long which is not supported by
zap_lookup_impl(). Trying to remove a top-level vdev on a 32-bit system
will cause the following failure:

VERIFY3(0 == vdev_obsolete_sm_object(vd, &obsolete_sm_object)) failed (0 == 22)
PANIC at vdev_indirect.c:833:vdev_indirect_sync_obsolete()
Showing stack for process 1315
CPU: 6 PID: 1315 Comm: txg_sync Tainted: P           OE   4.4.69+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
 c1abc6e7 0ae10898 00000286 d4ac3bc0 c14397bc da4cd7d8 d4ac3bf0 d4ac3bd0
 d790e7ce d7911cc1 00000523 d4ac3d00 d790e7d7 d7911ce4 da4cd7d8 00000341
 da4ce664 da4cd8c0 da33fa6e 49524556 28335946 3d3d2030 65647620 626f5f76
Call Trace:
 [<>] dump_stack+0x58/0x7c
 [<>] spl_dumpstack+0x23/0x27 [spl]
 [<>] spl_panic.cold.0+0x5/0x41 [spl]
 [<>] ? dbuf_rele+0x3e/0x90 [zfs]
 [<>] ? zap_lookup_norm+0xbe/0xe0 [zfs]
 [<>] ? zap_lookup+0x57/0x70 [zfs]
 [<>] ? vdev_obsolete_sm_object+0x102/0x12b [zfs]
 [<>] vdev_indirect_sync_obsolete+0x3e1/0x64d [zfs]
 [<>] ? txg_verify+0x1d/0x160 [zfs]
 [<>] ? dmu_tx_create_dd+0x80/0xc0 [zfs]
 [<>] vdev_sync+0xbf/0x550 [zfs]
 [<>] ? mutex_lock+0x10/0x30
 [<>] ? txg_list_remove+0x9f/0x1a0 [zfs]
 [<>] ? zap_contains+0x4d/0x70 [zfs]
 [<>] spa_sync+0x9f1/0x1b10 [zfs]
 ...
 [<>] ? kthread_stop+0x110/0x110

This commit simply corrects the "integer_size" parameter used to lookup
the vdev's ZAP object.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8790
2019-05-24 12:17:52 -07:00
loli10K e55db32ad0 zpool: trim -p is not a valid option
This commit removes the documented but not handled "-p" option from
zpool(8) help message.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8781
2019-05-24 09:40:46 -07:00
loli10K 75c09c5060 Fix coverity defects: CID 186143
CID 186143: Memory - illegal accesses (USE_AFTER_FREE)

This patch fixes an use-after-free in spa_import_progress_destroy()
moving the kmem_free() call at the end of the function.

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8788
2019-05-23 19:17:00 -07:00
Igor K 2d0077da6d Rename reservation tests from *.sh to *.ksh
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #8729
2019-05-23 15:42:03 -07:00
Rafael Kitover 8b8b44d06f kernel timer API rework
In `config/kernel-timer.m4` refactor slightly to check more generally
for the new `timer_setup()` APIs, but also check the callback signature
because some kernels (notably 4.14) have the new `timer_setup()` API but
use the old callback signature. Also add a check for a `flags` member in
`struct timer_list`, which was added in 4.1-rc8.

Add compatibility shims to `include/spl/sys/timer.h` to allow using the
new timer APIs with the only two caveats being that the callback
argument type must be declared as `spl_timer_list_t` and an explicit
assignment is required to get the timer variable for the `timer_of()`
macro. So the callback would look like this:

```c
__cv_wakeup(spl_timer_list_t t)
{
        struct timer_list *tmr = (struct timer_list *)t;
	struct thing *parent = from_timer(parent, tmr,
		parent_timer_field);
	... /* do stuff with parent */
```

Make some minor changes to `spl-condvar.c` and `spl-taskq.c` to use the
new timer APIs instead of conditional code.

Reviewed-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rafael Kitover <rkitover@gmail.com>
Closes #8647
2019-05-23 14:40:28 -07:00
Richard Elling 78fac8d925 Fix kstat state update during pool transition
When reading kstats, the health (aka state) of the pool is stored into
/proc/spl/kstat/zfs/POOLNAME/state via spa_state_to_name().
However, during import/export there is a case where the spa exists,
but the root vdev does not exist. This fix checks that case and sets
the state to "TRANSITIONING"

Unfortunately, it is not easy to reproduce a test for this. It was
detected randomly during ZTS runs while kstats were also being sampled
regularly. After this change, further testing did not trip on the case
and the TRANSITIONING state was collected at least once by the kstats.

For posterity, the backtrace prior to this fix is:
[Mon May 13 17:21:00 2019] RIP: 0010:spa_state_to_name+0x10/0xb0 [zfs]
...
Mon May 13 17:21:00 2019] Call Trace:
[Mon May 13 17:21:00 2019]  spa_state_data+0x1a/0x40 [zfs]
[Mon May 13 17:21:00 2019]  kstat_seq_show+0x117/0x440 [spl]
[Mon May 13 17:21:00 2019]  seq_read+0xe5/0x430
[Mon May 13 17:21:00 2019]  proc_reg_read+0x45/0x70
[Mon May 13 17:21:00 2019]  __vfs_read+0x1b/0x40
[Mon May 13 17:21:00 2019]  vfs_read+0x8e/0x130
[Mon May 13 17:21:00 2019]  SyS_read+0x55/0xc0
[Mon May 13 17:21:00 2019]  ? SyS_fcntl+0x5d/0xb0
[Mon May 13 17:21:00 2019]  do_syscall_64+0x73/0x130
[Mon May 13 17:21:00 2019]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com>
Closes #8746
2019-05-23 14:28:53 -07:00
Brian Behlendorf bff2361aeb Linux 5.2 compat: rw_tryupgrade()
Commit torvalds/linux@46ad0840b has removed the architecture specific
rwsem source and headers leaving only the generic version.  As part
of this change the RWSEM_ACTIVE_READ_BIAS and RWSEM_ACTIVE_WRITE_BIAS
macros were moved to the private kernel/locking/rwsem.h header.
This results in a build failure because these macros were required
to implement the rw_tryupgrade() compatibility function.

In practice, this isn't a major problem because there are only a
few consumers of rw_tryupgrade() and because consumers of rw_tryupgrade
should be written to retry using rw_enter(RW_WRITER).

After auditing all of the callers only dmu_zfetch() was determined
not to perform a retry.  It has been updated in this commit to
resolve this issue.

That said, the rw_tryupgrade() functionality should be considered
for possible removal in a future release due to the difficultly
in supporting the interface.

Reviewed-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8730
2019-05-23 13:46:33 -07:00
2527 changed files with 247234 additions and 67915 deletions
+10
View File
@@ -0,0 +1,10 @@
root = true
[*]
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
[*.{c,h}]
tab_width = 8
indent_style = tab
+83 -94
View File
@@ -1,10 +1,12 @@
# Contributing to ZFS on Linux
<p align="center"><img src="http://zfsonlinux.org/images/zfs-linux.png"/></p>
# Contributing to OpenZFS
<p align="center">
<img alt="OpenZFS Logo"
src="https://openzfs.github.io/openzfs-docs/_static/img/logo/480px-Open-ZFS-Secondary-Logo-Colour-halfsize.png"/>
</p>
*First of all, thank you for taking the time to contribute!*
By using the following guidelines, you can help us make ZFS on Linux even
better.
By using the following guidelines, you can help us make OpenZFS even better.
## Table Of Contents
[What should I know before I get
@@ -32,17 +34,17 @@ started?](#what-should-i-know-before-i-get-started)
Helpful resources
* [ZFS on Linux wiki](https://github.com/zfsonlinux/zfs/wiki)
* [OpenZFS Documentation](http://open-zfs.org/wiki/Developer_resources)
* [Git and GitHub for beginners](https://github.com/zfsonlinux/zfs/wiki/Git-and-GitHub-for-beginners)
* [OpenZFS Documentation](https://openzfs.github.io/openzfs-docs/)
* [OpenZFS Developer Resources](http://open-zfs.org/wiki/Developer_resources)
* [Git and GitHub for beginners](https://openzfs.github.io/openzfs-docs/Developer%20Resources/Git%20and%20GitHub%20for%20beginners.html)
## What should I know before I get started?
### Get ZFS
You can build zfs packages by following [these
instructions](https://github.com/zfsonlinux/zfs/wiki/Building-ZFS),
instructions](https://openzfs.github.io/openzfs-docs/Developer%20Resources/Building%20ZFS.html),
or install stable packages from [your distribution's
repository](https://github.com/zfsonlinux/zfs/wiki/Getting-Started).
repository](https://openzfs.github.io/openzfs-docs/Getting%20Started/index.html).
### Debug ZFS
A variety of methods and tools are available to aid ZFS developers.
@@ -51,29 +53,30 @@ configure option should be set. This will enable additional correctness
checks and all the ASSERTs to help quickly catch potential issues.
In addition, there are numerous utilities and debugging files which
provide visibility in to the inner workings of ZFS. The most useful
of these tools are discussed in detail on the [debugging ZFS wiki
page](https://github.com/zfsonlinux/zfs/wiki/Debugging).
provide visibility into the inner workings of ZFS. The most useful
of these tools are discussed in detail on the [Troubleshooting
page](https://openzfs.github.io/openzfs-docs/Basic%20Concepts/Troubleshooting.html).
### Where can I ask for help?
[The zfs-discuss mailing list or IRC](http://list.zfsonlinux.org)
are the best places to ask for help. Please do not file support requests
on the GitHub issue tracker.
The [zfs-discuss mailing
list](https://openzfs.github.io/openzfs-docs/Project%20and%20Community/Mailing%20Lists.html)
or IRC are the best places to ask for help. Please do not file
support requests on the GitHub issue tracker.
## How Can I Contribute?
### Reporting Bugs
*Please* contact us via the [zfs-discuss mailing
list or IRC](http://list.zfsonlinux.org) if you aren't
certain that you are experiencing a bug.
list](https://openzfs.github.io/openzfs-docs/Project%20and%20Community/Mailing%20Lists.html)
or IRC if you aren't certain that you are experiencing a bug.
If you run into an issue, please search our [issue
tracker](https://github.com/zfsonlinux/zfs/issues) *first* to ensure the
tracker](https://github.com/openzfs/zfs/issues) *first* to ensure the
issue hasn't been reported before. Open a new issue only if you haven't
found anything similar to your issue.
You can open a new issue and search existing issues using the public [issue
tracker](https://github.com/zfsonlinux/zfs/issues).
tracker](https://github.com/openzfs/zfs/issues).
#### When opening a new issue, please include the following information at the top of the issue:
* What distribution (with version) you are using.
@@ -105,13 +108,13 @@ information like:
* Stack traces which may be logged to `dmesg`.
### Suggesting Enhancements
ZFS on Linux is a widely deployed production filesystem which is under
active development. The team's primary focus is on fixing known issues,
improving performance, and adding compelling new features.
OpenZFS is a widely deployed production filesystem which is under active
development. The team's primary focus is on fixing known issues, improving
performance, and adding compelling new features.
You can view the list of proposed features
by filtering the issue tracker by the ["Feature"
label](https://github.com/zfsonlinux/zfs/issues?q=is%3Aopen+is%3Aissue+label%3AFeature).
by filtering the issue tracker by the ["Type: Feature"
label](https://github.com/openzfs/zfs/issues?q=is%3Aopen+is%3Aissue+label%3A%22Type%3A+Feature%22).
If you have an idea for a feature first check this list. If your idea already
appears then add a +1 to the top most comment, this helps us gauge interest
in that feature.
@@ -120,8 +123,11 @@ Otherwise, open a new issue and describe your proposed feature. Why is this
feature needed? What problem does it solve?
### Pull Requests
* All pull requests must be based on the current master branch and apply
without conflicts.
#### General
* All pull requests, except backports and releases, must be based on the current master branch
and should apply without conflicts.
* Please attempt to limit pull requests to a single commit which resolves
one specific issue.
* Make sure your commit messages are in the correct format. See the
@@ -133,16 +139,28 @@ logically independent patches which build on each other. This makes large
changes easier to review and approve which speeds up the merging process.
* Try to keep pull requests simple. Simple code with comments is much easier
to review and approve.
* All proposed changes must be approved by an OpenZFS organization member.
* If you have an idea you'd like to discuss or which requires additional testing, consider opening it as a draft pull request.
Once everything is in good shape and the details have been worked out you can remove its draft status.
Any required reviews can then be finalized and the pull request merged.
#### Tests and Benchmarks
* Every pull request will by tested by the buildbot on multiple platforms by running the [zfs-tests.sh and zloop.sh](
https://openzfs.github.io/openzfs-docs/Developer%20Resources/Building%20ZFS.html#running-zloop-sh-and-zfs-tests-sh) test suites.
* To verify your changes conform to the [style guidelines](
https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#style-guides
), please run `make checkstyle` and resolve any warnings.
* Static code analysis of each pull request is performed by the buildbot; run `make lint` to check your changes.
* Test cases should be provided when appropriate.
This includes making sure new features have adequate code coverage.
* If your pull request improves performance, please include some benchmarks.
* The pull request must pass all required [ZFS
Buildbot](http://build.zfsonlinux.org/) builders before
being accepted. If you are experiencing intermittent TEST
builder failures, you may be experiencing a [test suite
issue](https://github.com/zfsonlinux/zfs/issues?q=is%3Aissue+is%3Aopen+label%3A%22Test+Suite%22).
There are also various [buildbot options](https://github.com/zfsonlinux/zfs/wiki/Buildbot-Options)
issue](https://github.com/openzfs/zfs/issues?q=is%3Aissue+is%3Aopen+label%3A%22Type%3A+Test+Suite%22).
There are also various [buildbot options](https://openzfs.github.io/openzfs-docs/Developer%20Resources/Buildbot%20Options.html)
to control how changes are tested.
* All proposed changes must be approved by a ZFS on Linux organization member.
### Testing
All help is appreciated! If you're in a position to run the latest code
@@ -152,16 +170,41 @@ range of realistic workloads, configurations and architectures we're better
able quickly identify and resolve potential issues.
Users can also run the [ZFS Test
Suite](https://github.com/zfsonlinux/zfs/tree/master/tests) on their systems
Suite](https://github.com/openzfs/zfs/tree/master/tests) on their systems
to verify ZFS is behaving as intended.
## Style Guides
### Repository Structure
OpenZFS uses a standardised branching structure.
- The "development and main branch", is the branch all development should be based on.
- "Release branches" contain the latest released code for said version.
- "Staging branches" contain selected commits prior to being released.
**Branch Names:**
- Development and Main branch: `master`
- Release branches: `zfs-$VERSION-release`
- Staging branches: `zfs-$VERSION-staging`
`$VERSION` should be replaced with the `major.minor` version number.
_(This is the version number without the `.patch` version at the end)_
### Coding Conventions
We currently use [C Style and Coding Standards for
SunOS](http://www.cis.upenn.edu/%7Elee/06cse480/data/cstyle.ms.pdf) as our
coding convention.
This repository has an `.editorconfig` file. If your editor [supports
editorconfig](https://editorconfig.org/#download), it will
automatically respect most of this project's whitespace preferences.
Additionally, Git can help warn on whitespace problems as well:
```
git config --local core.whitespace trailing-space,space-before-tab,indent-with-non-tab,-tab-in-indent
```
### Commit Message Formats
#### New Changes
Commit messages for new changes must meet the following guidelines:
@@ -187,70 +230,6 @@ attempting to solve.
Signed-off-by: Contributor <contributor@email.com>
```
#### OpenZFS Patch Ports
If you are porting OpenZFS patches, the commit message must meet
the following guidelines:
* The first line must be the summary line from the most important OpenZFS commit being ported.
It must begin with `OpenZFS dddd, dddd - ` where `dddd` are OpenZFS issue numbers.
* Provides a `Authored by:` line to attribute each patch for each original author.
* Provides the `Reviewed by:` and `Approved by:` lines from each original
OpenZFS commit.
* Provides a `Ported-by:` line with the developer's name followed by
their email for each OpenZFS commit.
* Provides a `OpenZFS-issue:` line with link for each original illumos
issue.
* Provides a `OpenZFS-commit:` line with link for each original OpenZFS commit.
* If necessary, provide some porting notes to describe any deviations from
the original OpenZFS commits.
An example OpenZFS patch port commit message for a single patch is provided
below.
```
OpenZFS 1234 - Summary from the original OpenZFS commit
Authored by: Original Author <original@email.com>
Reviewed by: Reviewer One <reviewer1@email.com>
Reviewed by: Reviewer Two <reviewer2@email.com>
Approved by: Approver One <approver1@email.com>
Ported-by: ZFS Contributor <contributor@email.com>
Provide some porting notes here if necessary.
OpenZFS-issue: https://www.illumos.org/issues/1234
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/abcd1234
```
If necessary, multiple OpenZFS patches can be combined in a single port.
This is useful when you are porting a new patch and its subsequent bug
fixes. An example commit message is provided below.
```
OpenZFS 1234, 5678 - Summary of most important OpenZFS commit
1234 Summary from original OpenZFS commit for 1234
Authored by: Original Author <original@email.com>
Reviewed by: Reviewer Two <reviewer2@email.com>
Approved by: Approver One <approver1@email.com>
Ported-by: ZFS Contributor <contributor@email.com>
Provide some porting notes here for 1234 if necessary.
OpenZFS-issue: https://www.illumos.org/issues/1234
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/abcd1234
5678 Summary from original OpenZFS commit for 5678
Authored by: Original Author2 <original2@email.com>
Reviewed by: Reviewer One <reviewer1@email.com>
Approved by: Approver Two <approver2@email.com>
Ported-by: ZFS Contributor <contributor@email.com>
Provide some porting notes here for 5678 if necessary.
OpenZFS-issue: https://www.illumos.org/issues/5678
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/efgh5678
```
#### Coverity Defect Fixes
If you are submitting a fix to a
[Coverity defect](https://scan.coverity.com/projects/zfsonlinux-zfs),
@@ -290,3 +269,13 @@ Git can append the `Signed-off-by` line to your commit messages. Simply
provide the `-s` or `--signoff` option when performing a `git commit`.
For more information about writing commit messages, visit [How to Write
a Git Commit Message](https://chris.beams.io/posts/git-commit/).
#### Co-authored By
If someone else had part in your pull request, please add the following to the commit:
`Co-authored-by: Name <gitregistered@email.address>`
This is useful if their authorship was lost during squashing, rebasing, etc.,
but may be used in any situation where there are co-authors.
The email address used here should be the same as on the GitHub profile of said user.
If said user does not have their email address public, please use the following instead:
`Co-authored-by: Name <[username]@users.noreply.github.com>`
-48
View File
@@ -1,48 +0,0 @@
<!-- Please fill out the following template, which will help other contributors address your issue. -->
<!--
Thank you for reporting an issue.
*IMPORTANT* - Please search our issue tracker *before* making a new issue.
If you cannot find a similar issue, then create a new issue.
https://github.com/zfsonlinux/zfs/issues
*IMPORTANT* - This issue tracker is for *bugs* and *issues* only.
Please search the wiki and the mailing list archives before asking
questions on the mailing list.
https://github.com/zfsonlinux/zfs/wiki/Mailing-Lists
Please fill in as much of the template as possible.
-->
### System information
<!-- add version after "|" character -->
Type | Version/Name
--- | ---
Distribution Name |
Distribution Version |
Linux Kernel |
Architecture |
ZFS Version |
SPL Version |
<!--
Commands to find ZFS/SPL versions:
modinfo zfs | grep -iw version
modinfo spl | grep -iw version
-->
### Describe the problem you're observing
### Describe how to reproduce the problem
### Include any warning/errors/backtraces from the system logs
<!--
*IMPORTANT* - Please mark logs and text output from terminal commands
or else Github will not display them correctly.
An example is provided below.
Example:
```
this is an example how log text should be marked (wrap it with ```)
```
-->
+55
View File
@@ -0,0 +1,55 @@
---
name: Bug report
about: Create a report to help us improve OpenZFS
title: ''
labels: 'Type: Defect'
assignees: ''
---
<!-- Please fill out the following template, which will help other contributors address your issue. -->
<!--
Thank you for reporting an issue.
*IMPORTANT* - Please check our issue tracker before opening a new issue.
Additional valuable information can be found in the OpenZFS documentation
and mailing list archives.
Please fill in as much of the template as possible.
-->
### System information
<!-- add version after "|" character -->
Type | Version/Name
--- | ---
Distribution Name |
Distribution Version |
Kernel Version |
Architecture |
OpenZFS Version |
<!--
Command to find OpenZFS version:
zfs version
Commands to find kernel version:
uname -r # Linux
freebsd-version -r # FreeBSD
-->
### Describe the problem you're observing
### Describe how to reproduce the problem
### Include any warning/errors/backtraces from the system logs
<!--
*IMPORTANT* - Please mark logs and text output from terminal commands
or else Github will not display them correctly.
An example is provided below.
Example:
```
this is an example how log text should be marked (wrap it with ```)
```
-->
+14
View File
@@ -0,0 +1,14 @@
blank_issues_enabled: false
contact_links:
- name: OpenZFS Questions
url: https://github.com/openzfs/zfs/discussions/new
about: Ask the community for help
- name: OpenZFS Community Support Mailing list (Linux)
url: https://zfsonlinux.topicbox.com/groups/zfs-discuss
about: Get community support for OpenZFS on Linux
- name: FreeBSD Community Support Mailing list
url: https://lists.freebsd.org/mailman/listinfo/freebsd-fs
about: Get community support for OpenZFS on FreeBSD
- name: OpenZFS on IRC
url: https://web.libera.chat/#openzfs
about: Use IRC to get community support for OpenZFS
+33
View File
@@ -0,0 +1,33 @@
---
name: Feature request
about: Suggest a feature for OpenZFS
title: ''
labels: 'Type: Feature'
assignees: ''
---
<!--
Thank you for suggesting a feature.
Please check our issue tracker before opening a new feature request.
Filling out the following template will help other contributors better understand your proposed feature.
-->
### Describe the feature would like to see added to OpenZFS
<!--
Provide a clear and concise description of the feature.
-->
### How will this feature improve OpenZFS?
<!--
What problem does this feature solve?
-->
### Additional context
<!--
Any additional information you can add about the proposal?
-->
+8 -6
View File
@@ -4,7 +4,7 @@
<!---
Documentation on ZFS Buildbot options can be found at
https://github.com/zfsonlinux/zfs/wiki/Buildbot-Options
https://openzfs.github.io/openzfs-docs/Developer%20Resources/Buildbot%20Options.html
-->
### Motivation and Context
@@ -19,6 +19,7 @@ https://github.com/zfsonlinux/zfs/wiki/Buildbot-Options
<!--- Include details of your testing environment, and the tests you ran to -->
<!--- see how your change affects other areas of the code, etc. -->
<!--- If your change is a performance enhancement, please provide benchmarks here. -->
<!--- Please think about using the draft PR feature if appropriate -->
### Types of changes
<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
@@ -27,14 +28,15 @@ https://github.com/zfsonlinux/zfs/wiki/Buildbot-Options
- [ ] Performance enhancement (non-breaking change which improves efficiency)
- [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Library ABI change (libzfs, libzfs\_core, libnvpair, libuutil and libzfsbootenv)
- [ ] Documentation (a change to man pages or other documentation)
### Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
- [ ] My code follows the ZFS on Linux [code style requirements](https://github.com/zfsonlinux/zfs/blob/master/.github/CONTRIBUTING.md#coding-conventions).
- [ ] My code follows the OpenZFS [code style requirements](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#coding-conventions).
- [ ] I have updated the documentation accordingly.
- [ ] I have read the [**contributing** document](https://github.com/zfsonlinux/zfs/blob/master/.github/CONTRIBUTING.md).
- [ ] I have added [tests](https://github.com/zfsonlinux/zfs/tree/master/tests) to cover my changes.
- [ ] All new and existing tests passed.
- [ ] All commit messages are properly formatted and contain [`Signed-off-by`](https://github.com/zfsonlinux/zfs/blob/master/.github/CONTRIBUTING.md#signed-off-by).
- [ ] I have read the [**contributing** document](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md).
- [ ] I have added [tests](https://github.com/openzfs/zfs/tree/master/tests) to cover my changes.
- [ ] I have run the ZFS Test Suite with this change applied.
- [ ] All commit messages are properly formatted and contain [`Signed-off-by`](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#signed-off-by).
+4 -1
View File
@@ -4,7 +4,8 @@ codecov:
after_n_builds: 2 # user and kernel
coverage:
precision: 2 # 2 digits of precision
precision: 0 # 0 decimals of precision
round: nearest # Round to nearest precision point
range: "50...90" # red -> yellow -> green
status:
@@ -20,3 +21,5 @@ comment:
layout: "reach, diff, flags, footer"
behavior: once # update if exists; post new; skip if deleted
require_changes: yes # only post when coverage changes
# ignore: Please place any ignores in config/ax_code_coverage.m4 instead
+13
View File
@@ -0,0 +1,13 @@
# Configuration for probot-no-response - https://github.com/probot/no-response
# Number of days of inactivity before an Issue is closed for lack of response
daysUntilClose: 31
# Label requiring a response
responseRequiredLabel: "Status: Feedback requested"
# Comment to post when closing an Issue for lack of response. Set to `false` to disable
closeComment: >
This issue has been automatically closed because there has been no response
to our request for more information from the original author. With only the
information that is currently in the issue, we don't have enough information
to take action. Please reach out if you have or find the answers we need so
that we can investigate further.
+26
View File
@@ -0,0 +1,26 @@
# Number of days of inactivity before an issue becomes stale
daysUntilStale: 365
# Number of days of inactivity before a stale issue is closed
daysUntilClose: 90
# Limit to only `issues` or `pulls`
only: issues
# Issues with these labels will never be considered stale
exemptLabels:
- "Type: Feature"
- "Bot: Not Stale"
- "Status: Work in Progress"
# Set to true to ignore issues in a project (defaults to false)
exemptProjects: true
# Set to true to ignore issues in a milestone (defaults to false)
exemptMilestones: true
# Set to true to ignore issues with an assignee (defaults to false)
exemptAssignees: true
# Label to use when marking an issue as stale
staleLabel: "Status: Stale"
# Comment to post when marking an issue as stale. Set to `false` to disable
markComment: >
This issue has been automatically marked as "stale" because it has not had
any activity for a while. It will be closed in 90 days if no further activity occurs.
Thank you for your contributions.
# Limit the number of actions per hour, from 1-30. Default is 30
limitPerRun: 6
-3
View File
@@ -1,3 +0,0 @@
preprocessorErrorDirective:./module/zfs/vdev_raidz_math_avx512f.c:243
preprocessorErrorDirective:./module/zfs/vdev_raidz_math_sse2.c:266
+50
View File
@@ -0,0 +1,50 @@
name: checkstyle
on:
push:
pull_request:
jobs:
checkstyle:
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install --yes -qq build-essential autoconf libtool gawk alien fakeroot linux-headers-$(uname -r)
sudo apt-get install --yes -qq zlib1g-dev uuid-dev libattr1-dev libblkid-dev libselinux-dev libudev-dev libssl-dev python-dev python-setuptools python-cffi python3 python3-dev python3-setuptools python3-cffi
# packages for tests
sudo apt-get install --yes -qq parted lsscsi ksh attr acl nfs-kernel-server fio
sudo apt-get install --yes -qq mandoc cppcheck pax-utils devscripts
sudo -E pip --quiet install flake8
- name: Prepare
run: |
sh ./autogen.sh
./configure
make -j$(nproc)
- name: Checkstyle
run: |
make checkstyle
- name: Lint
run: |
make lint
- name: CheckABI
id: CheckABI
run: |
sudo docker run -v $(pwd):/source ghcr.io/openzfs/libabigail make checkabi
- name: StoreABI
if: failure() && steps.CheckABI.outcome == 'failure'
run: |
sudo docker run -v $(pwd):/source ghcr.io/openzfs/libabigail make storeabi
- name: Prepare artifacts
if: failure() && steps.CheckABI.outcome == 'failure'
run: |
find -name *.abi | tar -cf abi_files.tar -T -
- uses: actions/upload-artifact@v2
if: failure() && steps.CheckABI.outcome == 'failure'
with:
name: New ABI files (use only if you're sure about interface changes)
path: abi_files.tar
@@ -0,0 +1,81 @@
name: zfs-tests-functional
on:
push:
pull_request:
jobs:
tests-functional-ubuntu:
strategy:
fail-fast: false
matrix:
os: [18.04, 20.04]
runs-on: ubuntu-${{ matrix.os }}
steps:
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install --yes -qq build-essential autoconf libtool gdb lcov \
git alien fakeroot wget curl bc fio acl \
sysstat mdadm lsscsi parted gdebi attr dbench watchdog ksh \
nfs-kernel-server samba rng-tools xz-utils \
zlib1g-dev uuid-dev libblkid-dev libselinux-dev \
xfslibs-dev libattr1-dev libacl1-dev libudev-dev libdevmapper-dev \
libssl-dev libffi-dev libaio-dev libelf-dev libmount-dev \
libpam0g-dev pamtester python-dev python-setuptools python-cffi \
python3 python3-dev python3-setuptools python3-cffi python3-packaging
- name: Autogen.sh
run: |
sh autogen.sh
- name: Configure
run: |
./configure --enable-debug --enable-debuginfo
- name: Make
run: |
make --no-print-directory -s pkg-utils pkg-kmod
- name: Install
run: |
sudo dpkg -i *.deb
# Update order of directories to search for modules, otherwise
# Ubuntu will load kernel-shipped ones.
sudo sed -i.bak 's/updates/extra updates/' /etc/depmod.d/ubuntu.conf
sudo depmod
sudo modprobe zfs
# Workaround for cloud-init bug
# see https://github.com/openzfs/zfs/issues/12644
FILE=/lib/udev/rules.d/10-cloud-init-hook-hotplug.rules
if [ -r "${FILE}" ]; then
HASH=$(md5sum "${FILE}" | awk '{ print $1 }')
if [ "${HASH}" = "121ff0ef1936cd2ef65aec0458a35772" ]; then
# Just shove a zd* exclusion right above the hotplug hook...
sudo sed -i -e s/'LABEL="cloudinit_hook"'/'KERNEL=="zd*", GOTO="cloudinit_end"\n&'/ "${FILE}"
sudo udevadm control --reload-rules
fi
fi
# Workaround to provide additional free space for testing.
# https://github.com/actions/virtual-environments/issues/2840
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- name: Tests
run: |
/usr/share/zfs/zfs-tests.sh -vR -s 3G
- name: Prepare artifacts
if: failure()
run: |
RESULTS_PATH=$(readlink -f /var/tmp/test_results/current)
sudo dmesg > $RESULTS_PATH/dmesg
sudo cp /var/log/syslog $RESULTS_PATH/
sudo chmod +r $RESULTS_PATH/*
# Replace ':' in dir names, actions/upload-artifact doesn't support it
for f in $(find /var/tmp/test_results -name '*:*'); do mv "$f" "${f//:/__}"; done
- uses: actions/upload-artifact@v2
if: failure()
with:
name: Test logs Ubuntu-${{ matrix.os }}
path: /var/tmp/test_results/20*/
if-no-files-found: ignore
+77
View File
@@ -0,0 +1,77 @@
name: zfs-tests-sanity
on:
push:
pull_request:
jobs:
tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install --yes -qq build-essential autoconf libtool gdb lcov \
git alien fakeroot wget curl bc fio acl \
sysstat mdadm lsscsi parted gdebi attr dbench watchdog ksh \
nfs-kernel-server samba rng-tools xz-utils \
zlib1g-dev uuid-dev libblkid-dev libselinux-dev \
xfslibs-dev libattr1-dev libacl1-dev libudev-dev libdevmapper-dev \
libssl-dev libffi-dev libaio-dev libelf-dev libmount-dev \
libpam0g-dev pamtester python-dev python-setuptools python-cffi \
python3 python3-dev python3-setuptools python3-cffi python3-packaging
- name: Autogen.sh
run: |
sh autogen.sh
- name: Configure
run: |
./configure --enable-debug --enable-debuginfo
- name: Make
run: |
make --no-print-directory -s pkg-utils pkg-kmod
- name: Install
run: |
sudo dpkg -i *.deb
# Update order of directories to search for modules, otherwise
# Ubuntu will load kernel-shipped ones.
sudo sed -i.bak 's/updates/extra updates/' /etc/depmod.d/ubuntu.conf
sudo depmod
sudo modprobe zfs
# Workaround for cloud-init bug
# see https://github.com/openzfs/zfs/issues/12644
FILE=/lib/udev/rules.d/10-cloud-init-hook-hotplug.rules
if [ -r "${FILE}" ]; then
HASH=$(md5sum "${FILE}" | awk '{ print $1 }')
if [ "${HASH}" = "121ff0ef1936cd2ef65aec0458a35772" ]; then
# Just shove a zd* exclusion right above the hotplug hook...
sudo sed -i -e s/'LABEL="cloudinit_hook"'/'KERNEL=="zd*", GOTO="cloudinit_end"\n&'/ "${FILE}"
sudo udevadm control --reload-rules
fi
fi
# Workaround to provide additional free space for testing.
# https://github.com/actions/virtual-environments/issues/2840
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- name: Tests
run: |
/usr/share/zfs/zfs-tests.sh -vR -s 3G -r sanity
- name: Prepare artifacts
if: failure()
run: |
RESULTS_PATH=$(readlink -f /var/tmp/test_results/current)
sudo dmesg > $RESULTS_PATH/dmesg
sudo cp /var/log/syslog $RESULTS_PATH/
sudo chmod +r $RESULTS_PATH/*
# Replace ':' in dir names, actions/upload-artifact doesn't support it
for f in $(find /var/tmp/test_results -name '*:*'); do mv "$f" "${f//:/__}"; done
- uses: actions/upload-artifact@v2
if: failure()
with:
name: Test logs
path: /var/tmp/test_results/20*/
if-no-files-found: ignore
+67
View File
@@ -0,0 +1,67 @@
name: zloop
on:
push:
pull_request:
jobs:
tests:
runs-on: ubuntu-latest
env:
TEST_DIR: /var/tmp/zloop
steps:
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install --yes -qq build-essential autoconf libtool gdb \
git alien fakeroot \
zlib1g-dev uuid-dev libblkid-dev libselinux-dev \
xfslibs-dev libattr1-dev libacl1-dev libudev-dev libdevmapper-dev \
libssl-dev libffi-dev libaio-dev libelf-dev libmount-dev \
libpam0g-dev \
python-dev python-setuptools python-cffi python-packaging \
python3 python3-dev python3-setuptools python3-cffi python3-packaging
- name: Autogen.sh
run: |
sh autogen.sh
- name: Configure
run: |
./configure --enable-debug --enable-debuginfo
- name: Make
run: |
make --no-print-directory -s pkg-utils pkg-kmod
- name: Install
run: |
sudo dpkg -i *.deb
# Update order of directories to search for modules, otherwise
# Ubuntu will load kernel-shipped ones.
sudo sed -i.bak 's/updates/extra updates/' /etc/depmod.d/ubuntu.conf
sudo depmod
sudo modprobe zfs
- name: Tests
run: |
sudo mkdir -p $TEST_DIR
# run for 20 minutes to have a total runner time of 30 minutes
sudo /usr/share/zfs/zloop.sh -t 1200 -l -m1 -- -T 120 -P 60
- name: Prepare artifacts
if: failure()
run: |
sudo chmod +r -R $TEST_DIR/
- uses: actions/upload-artifact@v2
if: failure()
with:
name: Logs
path: |
/var/tmp/zloop/*/
!/var/tmp/zloop/*/vdev/
if-no-files-found: ignore
- uses: actions/upload-artifact@v2
if: failure()
with:
name: Pool files
path: |
/var/tmp/zloop/*/vdev/
if-no-files-found: ignore
+5
View File
@@ -36,6 +36,7 @@ Makefile.in
# Top level generated files specific to this top level dir
#
/bin
/build
/configure
/config.log
/config.status
@@ -61,5 +62,9 @@ cscope.*
*.patch
*.orig
*.log
*.tmp
venv
*.so
*.so.debug
*.so.full
+1 -1
View File
@@ -1,3 +1,3 @@
[submodule "scripts/zfs-images"]
path = scripts/zfs-images
url = https://github.com/zfsonlinux/zfs-images
url = https://github.com/openzfs/zfs-images
-38
View File
@@ -1,38 +0,0 @@
language: c
sudo: required
env:
global:
# Travis limits maximum log size, we have to cut tests output
- ZFS_TEST_TRAVIS_LOG_MAX_LENGTH=800
matrix:
# tags are mainly in ascending order
- ZFS_TEST_TAGS='acl,atime,bootfs,cachefile,casenorm,chattr,checksum,clean_mirror,compression,ctime,delegate,devices,events,exec,fault,features,grow_pool,zdb,zfs,zfs_bookmark,zfs_change-key,zfs_clone,zfs_copies,zfs_create,zfs_diff,zfs_get,zfs_inherit,zfs_load-key,zfs_rename'
- ZFS_TEST_TAGS='cache,history,hkdf,inuse,zfs_property,zfs_receive,zfs_reservation,zfs_send,zfs_set,zfs_share,zfs_snapshot,zfs_unload-key,zfs_unmount,zfs_unshare,zfs_upgrade,zpool,zpool_add,zpool_attach,zpool_clear,zpool_create,zpool_destroy,zpool_detach'
- ZFS_TEST_TAGS='grow_replicas,mv_files,cli_user,zfs_mount,zfs_promote,zfs_rollback,zpool_events,zpool_expand,zpool_export,zpool_get,zpool_history,zpool_import,zpool_labelclear,zpool_offline,zpool_online,zpool_remove,zpool_reopen,zpool_replace,zpool_scrub,zpool_set,zpool_status,zpool_sync,zpool_upgrade'
- ZFS_TEST_TAGS='zfs_destroy,large_files,largest_pool,link_count,migration,mmap,mmp,mount,nestedfs,no_space,nopwrite,online_offline,pool_names,poolversion,privilege,quota,raidz,redundancy,rsend'
- ZFS_TEST_TAGS='inheritance,refquota,refreserv,rename_dirs,replacement,reservation,rootpool,scrub_mirror,slog,snapshot,snapused,sparse,threadsappend,tmpfile,truncate,upgrade,userquota,vdev_zaps,write_dirs,xattr,zvol,libzfs'
before_install:
- sudo apt-get -qq update
- sudo apt-get install --yes -qq build-essential autoconf libtool gawk alien fakeroot linux-headers-$(uname -r)
- sudo apt-get install --yes -qq zlib1g-dev uuid-dev libattr1-dev libblkid-dev libselinux-dev libudev-dev libssl-dev
# packages for tests
- sudo apt-get install --yes -qq parted lsscsi ksh attr acl nfs-kernel-server fio
install:
- git clone --depth=1 https://github.com/zfsonlinux/spl
- cd spl
- git checkout master
- sh autogen.sh
- ./configure
- make --no-print-directory -s pkg-utils pkg-kmod
- sudo dpkg -i *.deb
- cd ..
- sh autogen.sh
- ./configure
- make --no-print-directory -s pkg-utils pkg-kmod
- sudo dpkg -i *.deb
script:
- travis_wait 50 /usr/share/zfs/zfs-tests.sh -v -T $ZFS_TEST_TAGS
after_failure:
- find /var/tmp/test_results/current/log -type f -name '*' -printf "%f\n" -exec cut -c -$ZFS_TEST_TRAVIS_LOG_MAX_LENGTH {} \;
after_success:
- find /var/tmp/test_results/current/log -type f -name '*' -printf "%f\n" -exec cut -c -$ZFS_TEST_TRAVIS_LOG_MAX_LENGTH {} \;
+4
View File
@@ -83,6 +83,7 @@ CONTRIBUTORS:
Christopher Voltz <cjunk@voltz.ws>
Chunwei Chen <david.chen@nutanix.com>
Clemens Fruhwirth <clemens@endorphin.org>
Coleman Kane <ckane@colemankane.org>
Colin Ian King <colin.king@canonical.com>
Craig Loomis <cloomis@astro.princeton.edu>
Craig Sanders <github@taz.net.au>
@@ -181,6 +182,7 @@ CONTRIBUTORS:
Keith M Wesolowski <wesolows@foobazco.org>
Kevin Tanguy <kevin.tanguy@ovh.net>
KireinaHoro <i@jsteward.moe>
Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Kohsuke Kawaguchi <kk@kohsuke.org>
Kyle Blatter <kyleblatter@llnl.gov>
Kyle Fuller <inbox@kylefuller.co.uk>
@@ -209,6 +211,7 @@ CONTRIBUTORS:
Michael Gebetsroither <michael@mgeb.org>
Michael Kjorling <michael@kjorling.se>
Michael Martin <mgmartin.mgm@gmail.com>
Michael Niewöhner <foss@mniewoehner.de>
Mike Gerdts <mike.gerdts@joyent.com>
Mike Harsch <mike@harschsystems.com>
Mike Leddy <mike.leddy@gmail.com>
@@ -257,6 +260,7 @@ CONTRIBUTORS:
Saso Kiselkov <saso.kiselkov@nexenta.com>
Scot W. Stevenson <scot.stevenson@gmail.com>
Sean Eric Fagan <sef@ixsystems.com>
Sebastian Gottschall <s.gottschall@dd-wrt.com>
Sen Haerens <sen@senhaerens.be>
Serapheim Dimitropoulos <serapheim@delphix.com>
Seth Forshee <seth.forshee@canonical.com>
+1 -1
View File
@@ -1,2 +1,2 @@
The [OpenZFS Code of Conduct](http://www.open-zfs.org/wiki/Code_of_Conduct)
applies to spaces associated with the ZFS on Linux project, including GitHub.
applies to spaces associated with the OpenZFS project, including GitHub.
+5 -1
View File
@@ -19,7 +19,11 @@ notable exceptions and their respective licenses include:
* AES Implementation: module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman
* AES Implementation: module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl
* PBKDF2 Implementation: lib/libzfs/THIRDPARTYLICENSE.openssl
* SPL Implementation: module/spl/THIRDPARTYLICENSE.gplv2
* SPL Implementation: module/os/linux/spl/THIRDPARTYLICENSE.gplv2
* GCM Implementation: module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.cryptogams
* GCM Implementation: module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.openssl
* GHASH Implementation: module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.cryptogams
* GHASH Implementation: module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.openssl
This product includes software developed by the OpenSSL Project for use
in the OpenSSL Toolkit (http://www.openssl.org/)
+4 -4
View File
@@ -1,10 +1,10 @@
Meta: 1
Name: zfs
Branch: 1.0
Version: 0.8.0
Version: 2.1.2
Release: 1
Release-Tags: relext
License: CDDL
Author: OpenZFS on Linux
Linux-Maximum: 5.1
Linux-Minimum: 2.6.32
Author: OpenZFS
Linux-Maximum: 5.15
Linux-Minimum: 3.10
+125 -55
View File
@@ -1,12 +1,17 @@
include $(top_srcdir)/config/Shellcheck.am
ACLOCAL_AMFLAGS = -I config
include config/rpm.am
include config/deb.am
include config/tgz.am
SUBDIRS = include
if BUILD_LINUX
SUBDIRS += rpm
endif
SUBDIRS = include rpm
if CONFIG_USER
SUBDIRS += udev etc man scripts lib tests cmd contrib
SUBDIRS += man scripts lib tests cmd etc contrib
if BUILD_LINUX
SUBDIRS += udev
endif
endif
if CONFIG_KERNEL
SUBDIRS += module
@@ -14,33 +19,51 @@ SUBDIRS += module
extradir = $(prefix)/src/zfs-$(VERSION)
extra_HEADERS = zfs.release.in zfs_config.h.in
if BUILD_LINUX
kerneldir = $(prefix)/src/zfs-$(VERSION)/$(LINUX_VERSION)
nodist_kernel_HEADERS = zfs.release zfs_config.h module/$(LINUX_SYMBOLS)
endif
endif
AUTOMAKE_OPTIONS = foreign
EXTRA_DIST = autogen.sh copy-builtin
EXTRA_DIST += config/config.awk config/rpm.am config/deb.am config/tgz.am
EXTRA_DIST += META AUTHORS COPYRIGHT LICENSE NEWS NOTICE README.md
EXTRA_DIST += CODE_OF_CONDUCT.md
EXTRA_DIST += AUTHORS CODE_OF_CONDUCT.md COPYRIGHT LICENSE META NEWS NOTICE
EXTRA_DIST += README.md RELEASES.md
EXTRA_DIST += module/lua/README.zfs module/os/linux/spl/README.md
# Include all the extra licensing information for modules
EXTRA_DIST += module/icp/algs/skein/THIRDPARTYLICENSE module/icp/algs/skein/THIRDPARTYLICENSE.descrip
EXTRA_DIST += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman.descrip
EXTRA_DIST += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl.descrip
EXTRA_DIST += module/spl/THIRDPARTYLICENSE.gplv2 module/spl/THIRDPARTYLICENSE.gplv2.descrip
EXTRA_DIST += module/zfs/THIRDPARTYLICENSE.cityhash module/zfs/THIRDPARTYLICENSE.cityhash.descrip
EXTRA_DIST += module/icp/algs/skein/THIRDPARTYLICENSE
EXTRA_DIST += module/icp/algs/skein/THIRDPARTYLICENSE.descrip
EXTRA_DIST += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman
EXTRA_DIST += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman.descrip
EXTRA_DIST += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl
EXTRA_DIST += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl.descrip
EXTRA_DIST += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.cryptogams
EXTRA_DIST += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.cryptogams.descrip
EXTRA_DIST += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.openssl
EXTRA_DIST += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.openssl.descrip
EXTRA_DIST += module/os/linux/spl/THIRDPARTYLICENSE.gplv2
EXTRA_DIST += module/os/linux/spl/THIRDPARTYLICENSE.gplv2.descrip
EXTRA_DIST += module/zfs/THIRDPARTYLICENSE.cityhash
EXTRA_DIST += module/zfs/THIRDPARTYLICENSE.cityhash.descrip
@CODE_COVERAGE_RULES@
.PHONY: gitrev
gitrev:
-${top_srcdir}/scripts/make_gitrev.sh
GITREV = include/zfs_gitrev.h
BUILT_SOURCES = gitrev
PHONY = gitrev
gitrev:
$(AM_V_GEN)$(top_srcdir)/scripts/make_gitrev.sh $(GITREV)
all: gitrev
# Double-colon rules are allowed; there are multiple independent definitions.
maintainer-clean-local::
-$(RM) $(GITREV)
distclean-local::
-$(RM) -R autom4te*.cache
-$(RM) -R autom4te*.cache build
-find . \( -name SCCS -o -name BitKeeper -o -name .svn -o -name CVS \
-o -name .pc -o -name .hg -o -name .git \) -prune -o \
\( -name '*.orig' -o -name '*.rej' -o -name '*~' \
@@ -52,13 +75,15 @@ distclean-local::
-type f -print | xargs $(RM)
all-local:
-${top_srcdir}/scripts/zfs-tests.sh -c
-[ -x ${top_builddir}/scripts/zfs-tests.sh ] && \
${top_builddir}/scripts/zfs-tests.sh -c
dist-hook: gitrev
cp ${top_srcdir}/include/zfs_gitrev.h $(distdir)/include; \
sed -i 's/Release:[[:print:]]*/Release: $(RELEASE)/' \
dist-hook:
$(AM_V_GEN)$(top_srcdir)/scripts/make_gitrev.sh -D $(distdir) $(GITREV)
$(SED) ${ac_inplace} -e 's/Release:[[:print:]]*/Release: $(RELEASE)/' \
$(distdir)/META
if BUILD_LINUX
# For compatibility, create a matching spl-x.y.z directly which contains
# symlinks to the updated header and object file locations. These
# compatibility links will be removed in the next major release.
@@ -75,74 +100,102 @@ install-data-hook:
ln -fs zfs_config.h spl_config.h && \
ln -fs zfs.release spl.release
endif
endif
codecheck: cstyle shellcheck flake8 mancheck testscheck vcscheck
PHONY += codecheck
codecheck: cstyle shellcheck checkbashisms flake8 mancheck testscheck vcscheck
PHONY += checkstyle
checkstyle: codecheck commitcheck
PHONY += commitcheck
commitcheck:
@if git rev-parse --git-dir > /dev/null 2>&1; then \
${top_srcdir}/scripts/commitcheck.sh; \
fi
PHONY += cstyle
cstyle:
@find ${top_srcdir} -name '*.[hc]' ! -name 'zfs_config.*' \
! -name '*.mod.c' -type f \
@find ${top_srcdir} -name build -prune \
-o -type f -name '*.[hc]' \
! -name 'zfs_config.*' ! -name '*.mod.c' \
! -name 'opt_global.h' ! -name '*_if*.h' \
! -path './module/zstd/lib/*' \
-exec ${top_srcdir}/scripts/cstyle.pl -cpP {} \+
shellcheck:
@if type shellcheck > /dev/null 2>&1; then \
shellcheck --exclude=SC1090 --format=gcc \
$$(find ${top_srcdir}/scripts/*.sh -type f) \
$$(find ${top_srcdir}/cmd/zed/zed.d/*.sh -type f) \
$$(find ${top_srcdir}/cmd/zpool/zpool.d/* -executable); \
else \
echo "skipping shellcheck because shellcheck is not installed"; \
fi
filter_executable = -exec test -x '{}' \; -print
SHELLCHECKDIRS = cmd contrib etc scripts tests
SHELLCHECKSCRIPTS = autogen.sh
PHONY += checkabi storeabi
checklibabiversion:
libabiversion=`abidw -v | $(SED) 's/[^0-9]//g'`; \
if test $$libabiversion -lt "200"; then \
/bin/echo -e "\n" \
"*** Please use libabigail 2.0.0 version or newer;\n" \
"*** otherwise results are not consistent!\n" \
"(or see https://github.com/openzfs/libabigail-docker )\n"; \
exit 1; \
fi;
checkabi: checklibabiversion lib
$(MAKE) -C lib checkabi
storeabi: checklibabiversion lib
$(MAKE) -C lib storeabi
PHONY += mancheck
mancheck:
@if type mandoc > /dev/null 2>&1; then \
find ${top_srcdir}/man/man8 -type f -name 'zfs.8' \
-o -name 'zpool.8' -o -name 'zdb.8' \
-o -name 'zgenhostid.8' | \
xargs mandoc -Tlint -Werror; \
else \
echo "skipping mancheck because mandoc is not installed"; \
fi
${top_srcdir}/scripts/mancheck.sh ${top_srcdir}/man ${top_srcdir}/tests/test-runner/man
if BUILD_LINUX
stat_fmt = -c '%A %n'
else
stat_fmt = -f '%Sp %N'
endif
PHONY += testscheck
testscheck:
@find ${top_srcdir}/tests/zfs-tests/tests -type f \
\( -name '*.ksh' -not -executable \) -o \
\( -name '*.kshlib' -executable \) -o \
\( -name '*.cfg' -executable \) | \
xargs -r stat -c '%A %n' | \
@find ${top_srcdir}/tests/zfs-tests -type f \
\( -name '*.ksh' -not ${filter_executable} \) -o \
\( -name '*.kshlib' ${filter_executable} \) -o \
\( -name '*.shlib' ${filter_executable} \) -o \
\( -name '*.cfg' ${filter_executable} \) | \
xargs -r stat ${stat_fmt} | \
awk '{c++; print} END {if(c>0) exit 1}'
PHONY += vcscheck
vcscheck:
@if git rev-parse --git-dir > /dev/null 2>&1; then \
git ls-files . --exclude-standard --others | \
awk '{c++; print} END {if(c>0) exit 1}' ; \
fi
PHONY += lint
lint: cppcheck paxcheck
cppcheck:
@if type cppcheck > /dev/null 2>&1; then \
cppcheck --quiet --force --error-exitcode=2 --inline-suppr \
--suppressions-list=.github/suppressions.txt \
-UHAVE_SSE2 -UHAVE_AVX512F -UHAVE_UIO_ZEROCOPY \
${top_srcdir}; \
CPPCHECKDIRS = cmd lib module
PHONY += cppcheck
cppcheck: $(CPPCHECKDIRS)
@if test -n "$(CPPCHECK)"; then \
set -e ; for dir in $(CPPCHECKDIRS) ; do \
$(MAKE) -C $$dir cppcheck ; \
done \
else \
echo "skipping cppcheck because cppcheck is not installed"; \
fi
PHONY += paxcheck
paxcheck:
@if type scanelf > /dev/null 2>&1; then \
${top_srcdir}/scripts/paxcheck.sh ${top_srcdir}; \
${top_srcdir}/scripts/paxcheck.sh ${top_builddir}; \
else \
echo "skipping paxcheck because scanelf is not installed"; \
fi
PHONY += flake8
flake8:
@if type flake8 > /dev/null 2>&1; then \
flake8 ${top_srcdir}; \
@@ -150,17 +203,34 @@ flake8:
echo "skipping flake8 because flake8 is not installed"; \
fi
PHONY += ctags
ctags:
$(RM) tags
find $(top_srcdir) -name .git -prune -o -name '*.[hc]' | xargs ctags
find $(top_srcdir) -name '.?*' -prune \
-o -type f -name '*.[hcS]' -print | xargs ctags -a
PHONY += etags
etags:
$(RM) TAGS
find $(top_srcdir) -name .pc -prune -o -name '*.[hc]' | xargs etags -a
find $(top_srcdir) -name '.?*' -prune \
-o -type f -name '*.[hcS]' -print | xargs etags -a
PHONY += cscopelist
cscopelist:
find $(top_srcdir) -name '.?*' -prune \
-o -type f -name '*.[hc]' -print >cscope.files
PHONY += tags
tags: ctags etags
PHONY += pkg pkg-dkms pkg-kmod pkg-utils
pkg: @DEFAULT_PACKAGE@
pkg-dkms: @DEFAULT_PACKAGE@-dkms
pkg-kmod: @DEFAULT_PACKAGE@-kmod
pkg-utils: @DEFAULT_PACKAGE@-utils
include config/rpm.am
include config/deb.am
include config/tgz.am
.PHONY: $(PHONY)
+1 -1
View File
@@ -1,3 +1,3 @@
Descriptions of all releases can be found on github:
https://github.com/zfsonlinux/zfs/releases
https://github.com/openzfs/zfs/releases
+16 -12
View File
@@ -1,31 +1,35 @@
![img](http://zfsonlinux.org/images/zfs-linux.png)
![img](https://openzfs.github.io/openzfs-docs/_static/img/logo/480px-Open-ZFS-Secondary-Logo-Colour-halfsize.png)
ZFS on Linux is an advanced file system and volume manager which was originally
OpenZFS is an advanced file system and volume manager which was originally
developed for Solaris and is now maintained by the OpenZFS community.
This repository contains the code for running OpenZFS on Linux and FreeBSD.
[![codecov](https://codecov.io/gh/zfsonlinux/zfs/branch/master/graph/badge.svg)](https://codecov.io/gh/zfsonlinux/zfs)
[![coverity](https://scan.coverity.com/projects/1973/badge.svg)](https://scan.coverity.com/projects/zfsonlinux-zfs)
[![codecov](https://codecov.io/gh/openzfs/zfs/branch/master/graph/badge.svg)](https://codecov.io/gh/openzfs/zfs)
[![coverity](https://scan.coverity.com/projects/1973/badge.svg)](https://scan.coverity.com/projects/openzfs-zfs)
# Official Resources
* [Site](http://zfsonlinux.org)
* [Wiki](https://github.com/zfsonlinux/zfs/wiki)
* [Mailing lists](https://github.com/zfsonlinux/zfs/wiki/Mailing-Lists)
* [OpenZFS site](http://open-zfs.org/)
* [Documentation](https://openzfs.github.io/openzfs-docs/) - for using and developing this repo
* [ZoL Site](https://zfsonlinux.org) - Linux release info & links
* [Mailing lists](https://openzfs.github.io/openzfs-docs/Project%20and%20Community/Mailing%20Lists.html)
* [OpenZFS site](http://open-zfs.org/) - for conference videos and info on other platforms (illumos, OSX, Windows, etc)
# Installation
Full documentation for installing ZoL on your favorite Linux distribution can
be found at [our site](http://zfsonlinux.org/).
Full documentation for installing OpenZFS on your favorite operating system can
be found at the [Getting Started Page](https://openzfs.github.io/openzfs-docs/Getting%20Started/index.html).
# Contribute & Develop
We have a separate document with [contribution guidelines](./.github/CONTRIBUTING.md).
We have a [Code of Conduct](./CODE_OF_CONDUCT.md).
# Release
ZFS on Linux is released under a CDDL license.
OpenZFS is released under a CDDL license.
For more details see the NOTICE, LICENSE and COPYRIGHT files; `UCRL-CODE-235197`
# Supported Kernels
* The `META` file contains the officially recognized supported kernel versions.
* The `META` file contains the officially recognized supported Linux kernel versions.
* Supported FreeBSD versions are any supported branches and releases starting from 12.2-RELEASE.
+37
View File
@@ -0,0 +1,37 @@
OpenZFS uses the MAJOR.MINOR.PATCH versioning scheme described here:
* MAJOR - Incremented at the discretion of the OpenZFS developers to indicate
a particularly noteworthy feature or change. An increase in MAJOR number
does not indicate any incompatible on-disk format change. The ability
to import a ZFS pool is controlled by the feature flags enabled on the
pool and the feature flags supported by the installed OpenZFS version.
Increasing the MAJOR version is expected to be an infrequent occurrence.
* MINOR - Incremented to indicate new functionality such as a new feature
flag, pool/dataset property, zfs/zpool sub-command, new user/kernel
interface, etc. MINOR releases may introduce incompatible changes to the
user space library APIs (libzfs.so). Existing user/kernel interfaces are
considered to be stable to maximize compatibility between OpenZFS releases.
Additions to the user/kernel interface are backwards compatible.
* PATCH - Incremented when applying documentation updates, important bug
fixes, minor performance improvements, and kernel compatibility patches.
The user space library APIs and user/kernel interface are considered to
be stable. PATCH releases for a MAJOR.MINOR are published as needed.
Two release branches are maintained for OpenZFS, they are:
* OpenZFS LTS - A designated MAJOR.MINOR release with periodic PATCH
releases that incorporate important changes backported from newer OpenZFS
releases. This branch is intended for use in environments using an
LTS, enterprise, or similarly managed kernel (RHEL, Ubuntu LTS, Debian).
Minor changes to support these distribution kernels will be applied as
needed. New kernel versions released after the OpenZFS LTS release are
not supported. LTS releases will receive patches for at least 2 years.
The current LTS release is OpenZFS 2.1.
* OpenZFS current - Tracks the newest MAJOR.MINOR release. This branch
includes support for the latest OpenZFS features and recently releases
kernels. When a new MINOR release is tagged the previous MINOR release
will no longer be maintained (unless it is an LTS release). New MINOR
releases are planned to occur roughly annually.
-61
View File
@@ -48,64 +48,3 @@
#TEST_ZFSSTRESS_VDEV="/var/tmp/vdev"
#TEST_ZFSSTRESS_DIR="/$TEST_ZFSSTRESS_POOL/$TEST_ZFSSTRESS_FS"
#TEST_ZFSSTRESS_OPTIONS=""
### per-builder customization
#
# BB_NAME=builder-name <distribution-version-architecture-type>
# - distribution=Amazon,Debian,Fedora,RHEL,SUSE,Ubuntu
# - version=x.y
# - architecture=x86_64,i686,arm,aarch64
# - type=build,test
#
case "$BB_NAME" in
Amazon*)
# ZFS enabled xfstests fails to build
TEST_XFSTESTS_SKIP="yes"
;;
CentOS-7*)
# ZFS enabled xfstests fails to build
TEST_XFSTESTS_SKIP="yes"
;;
CentOS-6*)
;;
Debian*)
;;
Fedora*)
;;
RHEL*)
;;
SUSE*)
;;
Ubuntu-16.04*)
# ZFS enabled xfstests fails to build
TEST_XFSTESTS_SKIP="yes"
;;
Ubuntu*)
;;
*)
;;
esac
###
#
# Run ztest longer on the "coverage" builders to gain more code coverage
# data out of ztest, libzpool, etc.
#
case "$BB_NAME" in
*coverage*)
TEST_ZTEST_TIMEOUT=3600
;;
*)
TEST_ZTEST_TIMEOUT=900
;;
esac
###
#
# Disable the following test suites on 32-bit systems.
#
if [ $(getconf LONG_BIT) = "32" ]; then
TEST_ZTEST_SKIP="yes"
TEST_XFSTESTS_SKIP="yes"
TEST_ZFSSTRESS_SKIP="yes"
fi
+27 -3
View File
@@ -1,3 +1,27 @@
SUBDIRS = zfs zpool zdb zhack zinject zstreamdump ztest
SUBDIRS += mount_zfs fsck_zfs zvol_id vdev_id arcstat dbufstat zed
SUBDIRS += arc_summary raidz_test zgenhostid
include $(top_srcdir)/config/Shellcheck.am
SUBDIRS = zfs zpool zdb zhack zinject zstream ztest
SUBDIRS += fsck_zfs vdev_id raidz_test zfs_ids_to_path
SUBDIRS += zpool_influxdb
CPPCHECKDIRS = zfs zpool zdb zhack zinject zstream ztest
CPPCHECKDIRS += raidz_test zfs_ids_to_path zpool_influxdb
# TODO: #12084: SHELLCHECKDIRS = fsck_zfs vdev_id zpool
SHELLCHECKDIRS = fsck_zfs zpool
if USING_PYTHON
SUBDIRS += arcstat arc_summary dbufstat
endif
if BUILD_LINUX
SUBDIRS += mount_zfs zed zgenhostid zvol_id zvol_wait
CPPCHECKDIRS += mount_zfs zed zgenhostid zvol_id
SHELLCHECKDIRS += zed
endif
PHONY = cppcheck
cppcheck: $(CPPCHECKDIRS)
set -e ; for dir in $(CPPCHECKDIRS) ; do \
$(MAKE) -C $$dir cppcheck ; \
done
+1
View File
@@ -0,0 +1 @@
arc_summary
+8 -8
View File
@@ -1,13 +1,13 @@
bin_SCRIPTS = arc_summary
CLEANFILES = arc_summary
EXTRA_DIST = arc_summary2 arc_summary3
if USING_PYTHON_2
dist_bin_SCRIPTS = arc_summary2
install-exec-hook:
mv $(DESTDIR)$(bindir)/arc_summary2 $(DESTDIR)$(bindir)/arc_summary
SCRIPT = arc_summary2
else
SCRIPT = arc_summary3
endif
if USING_PYTHON_3
dist_bin_SCRIPTS = arc_summary3
install-exec-hook:
mv $(DESTDIR)$(bindir)/arc_summary3 $(DESTDIR)$(bindir)/arc_summary
endif
arc_summary: $(SCRIPT)
cp $< $@
+190 -91
View File
@@ -1,4 +1,4 @@
#!/usr/bin/python2
#!/usr/bin/env python2
#
# $Id: arc_summary.pl,v 388:e27800740aa2 2011-07-08 02:53:29Z jhell $
#
@@ -42,7 +42,7 @@
Provides basic information on the ARC, its efficiency, the L2ARC (if present),
the Data Management Unit (DMU), Virtual Devices (VDEVs), and tunables. See the
in-source documentation and code at
https://github.com/zfsonlinux/zfs/blob/master/module/zfs/arc.c for details.
https://github.com/openzfs/zfs/blob/master/module/zfs/arc.c for details.
"""
import getopt
@@ -54,46 +54,64 @@ import errno
from subprocess import Popen, PIPE
from decimal import Decimal as D
if sys.platform.startswith('freebsd'):
# Requires py27-sysctl on FreeBSD
import sysctl
def is_value(ctl):
return ctl.type != sysctl.CTLTYPE_NODE
def load_kstats(namespace):
"""Collect information on a specific subsystem of the ARC"""
base = 'kstat.zfs.misc.%s.' % namespace
fmt = lambda kstat: (kstat.name, D(kstat.value))
kstats = sysctl.filter(base)
return [fmt(kstat) for kstat in kstats if is_value(kstat)]
def load_tunables():
ctls = sysctl.filter('vfs.zfs')
return dict((ctl.name, ctl.value) for ctl in ctls if is_value(ctl))
elif sys.platform.startswith('linux'):
def load_kstats(namespace):
"""Collect information on a specific subsystem of the ARC"""
kstat = 'kstat.zfs.misc.%s.%%s' % namespace
path = '/proc/spl/kstat/zfs/%s' % namespace
with open(path) as f:
entries = [line.strip().split() for line in f][2:] # Skip header
return [(kstat % name, D(value)) for name, _, value in entries]
def load_tunables():
basepath = '/sys/module/zfs/parameters'
tunables = {}
for name in os.listdir(basepath):
if not name:
continue
path = '%s/%s' % (basepath, name)
with open(path) as f:
value = f.read()
tunables[name] = value.strip()
return tunables
show_tunable_descriptions = False
alternate_tunable_layout = False
def handle_Exception(ex_cls, ex, tb):
if ex is IOError:
if ex.errno == errno.EPIPE:
sys.exit()
if ex is KeyboardInterrupt:
sys.exit()
sys.excepthook = handle_Exception
def get_Kstat():
"""Collect information on the ZFS subsystem from the /proc virtual
file system. The name "kstat" is a holdover from the Solaris utility
of the same name.
"""
def load_proc_kstats(fn, namespace):
"""Collect information on a specific subsystem of the ARC"""
kstats = [line.strip() for line in open(fn)]
del kstats[0:2]
for kstat in kstats:
kstat = kstat.strip()
name, _, value = kstat.split()
Kstat[namespace + name] = D(value)
Kstat = {}
load_proc_kstats('/proc/spl/kstat/zfs/arcstats',
'kstat.zfs.misc.arcstats.')
load_proc_kstats('/proc/spl/kstat/zfs/zfetchstats',
'kstat.zfs.misc.zfetchstats.')
load_proc_kstats('/proc/spl/kstat/zfs/vdev_cache_stats',
'kstat.zfs.misc.vdev_cache_stats.')
Kstat.update(load_kstats('arcstats'))
Kstat.update(load_kstats('zfetchstats'))
Kstat.update(load_kstats('vdev_cache_stats'))
return Kstat
@@ -195,12 +213,30 @@ def get_arc_summary(Kstat):
deleted = Kstat["kstat.zfs.misc.arcstats.deleted"]
mutex_miss = Kstat["kstat.zfs.misc.arcstats.mutex_miss"]
evict_skip = Kstat["kstat.zfs.misc.arcstats.evict_skip"]
evict_l2_cached = Kstat["kstat.zfs.misc.arcstats.evict_l2_cached"]
evict_l2_eligible = Kstat["kstat.zfs.misc.arcstats.evict_l2_eligible"]
evict_l2_eligible_mfu = Kstat["kstat.zfs.misc.arcstats.evict_l2_eligible_mfu"]
evict_l2_eligible_mru = Kstat["kstat.zfs.misc.arcstats.evict_l2_eligible_mru"]
evict_l2_ineligible = Kstat["kstat.zfs.misc.arcstats.evict_l2_ineligible"]
evict_l2_skip = Kstat["kstat.zfs.misc.arcstats.evict_l2_skip"]
# ARC Misc.
output["arc_misc"] = {}
output["arc_misc"]["deleted"] = fHits(deleted)
output["arc_misc"]['mutex_miss'] = fHits(mutex_miss)
output["arc_misc"]['evict_skips'] = fHits(evict_skip)
output["arc_misc"]["mutex_miss"] = fHits(mutex_miss)
output["arc_misc"]["evict_skips"] = fHits(evict_skip)
output["arc_misc"]["evict_l2_skip"] = fHits(evict_l2_skip)
output["arc_misc"]["evict_l2_cached"] = fBytes(evict_l2_cached)
output["arc_misc"]["evict_l2_eligible"] = fBytes(evict_l2_eligible)
output["arc_misc"]["evict_l2_eligible_mfu"] = {
'per': fPerc(evict_l2_eligible_mfu, evict_l2_eligible),
'num': fBytes(evict_l2_eligible_mfu),
}
output["arc_misc"]["evict_l2_eligible_mru"] = {
'per': fPerc(evict_l2_eligible_mru, evict_l2_eligible),
'num': fBytes(evict_l2_eligible_mru),
}
output["arc_misc"]["evict_l2_ineligible"] = fBytes(evict_l2_ineligible)
# ARC Sizing
arc_size = Kstat["kstat.zfs.misc.arcstats.size"]
@@ -316,8 +352,26 @@ def _arc_summary(Kstat):
sys.stdout.write("\tDeleted:\t\t\t\t%s\n" % arc['arc_misc']['deleted'])
sys.stdout.write("\tMutex Misses:\t\t\t\t%s\n" %
arc['arc_misc']['mutex_miss'])
sys.stdout.write("\tEvict Skips:\t\t\t\t%s\n" %
sys.stdout.write("\tEviction Skips:\t\t\t\t%s\n" %
arc['arc_misc']['evict_skips'])
sys.stdout.write("\tEviction Skips Due to L2 Writes:\t%s\n" %
arc['arc_misc']['evict_l2_skip'])
sys.stdout.write("\tL2 Cached Evictions:\t\t\t%s\n" %
arc['arc_misc']['evict_l2_cached'])
sys.stdout.write("\tL2 Eligible Evictions:\t\t\t%s\n" %
arc['arc_misc']['evict_l2_eligible'])
sys.stdout.write("\tL2 Eligible MFU Evictions:\t%s\t%s\n" % (
arc['arc_misc']['evict_l2_eligible_mfu']['per'],
arc['arc_misc']['evict_l2_eligible_mfu']['num'],
)
)
sys.stdout.write("\tL2 Eligible MRU Evictions:\t%s\t%s\n" % (
arc['arc_misc']['evict_l2_eligible_mru']['per'],
arc['arc_misc']['evict_l2_eligible_mru']['num'],
)
)
sys.stdout.write("\tL2 Ineligible Evictions:\t\t%s\n" %
arc['arc_misc']['evict_l2_ineligible'])
sys.stdout.write("\n")
# ARC Sizing
@@ -653,6 +707,11 @@ def get_l2arc_summary(Kstat):
l2_writes_done = Kstat["kstat.zfs.misc.arcstats.l2_writes_done"]
l2_writes_error = Kstat["kstat.zfs.misc.arcstats.l2_writes_error"]
l2_writes_sent = Kstat["kstat.zfs.misc.arcstats.l2_writes_sent"]
l2_mfu_asize = Kstat["kstat.zfs.misc.arcstats.l2_mfu_asize"]
l2_mru_asize = Kstat["kstat.zfs.misc.arcstats.l2_mru_asize"]
l2_prefetch_asize = Kstat["kstat.zfs.misc.arcstats.l2_prefetch_asize"]
l2_bufc_data_asize = Kstat["kstat.zfs.misc.arcstats.l2_bufc_data_asize"]
l2_bufc_metadata_asize = Kstat["kstat.zfs.misc.arcstats.l2_bufc_metadata_asize"]
l2_access_total = (l2_hits + l2_misses)
output['l2_health_count'] = (l2_writes_error + l2_cksum_bad + l2_io_error)
@@ -675,7 +734,7 @@ def get_l2arc_summary(Kstat):
output["io_errors"] = fHits(l2_io_error)
output["l2_arc_size"] = {}
output["l2_arc_size"]["adative"] = fBytes(l2_size)
output["l2_arc_size"]["adaptive"] = fBytes(l2_size)
output["l2_arc_size"]["actual"] = {
'per': fPerc(l2_asize, l2_size),
'num': fBytes(l2_asize)
@@ -684,6 +743,26 @@ def get_l2arc_summary(Kstat):
'per': fPerc(l2_hdr_size, l2_size),
'num': fBytes(l2_hdr_size),
}
output["l2_arc_size"]["mfu_asize"] = {
'per': fPerc(l2_mfu_asize, l2_asize),
'num': fBytes(l2_mfu_asize),
}
output["l2_arc_size"]["mru_asize"] = {
'per': fPerc(l2_mru_asize, l2_asize),
'num': fBytes(l2_mru_asize),
}
output["l2_arc_size"]["prefetch_asize"] = {
'per': fPerc(l2_prefetch_asize, l2_asize),
'num': fBytes(l2_prefetch_asize),
}
output["l2_arc_size"]["bufc_data_asize"] = {
'per': fPerc(l2_bufc_data_asize, l2_asize),
'num': fBytes(l2_bufc_data_asize),
}
output["l2_arc_size"]["bufc_metadata_asize"] = {
'per': fPerc(l2_bufc_metadata_asize, l2_asize),
'num': fBytes(l2_bufc_metadata_asize),
}
output["l2_arc_evicts"] = {}
output["l2_arc_evicts"]['lock_retries'] = fHits(l2_evict_lock_retry)
@@ -748,7 +827,7 @@ def _l2arc_summary(Kstat):
sys.stdout.write("\n")
sys.stdout.write("L2 ARC Size: (Adaptive)\t\t\t\t%s\n" %
arc["l2_arc_size"]["adative"])
arc["l2_arc_size"]["adaptive"])
sys.stdout.write("\tCompressed:\t\t\t%s\t%s\n" % (
arc["l2_arc_size"]["actual"]["per"],
arc["l2_arc_size"]["actual"]["num"],
@@ -759,11 +838,36 @@ def _l2arc_summary(Kstat):
arc["l2_arc_size"]["head_size"]["num"],
)
)
sys.stdout.write("\tMFU Alloc. Size:\t\t%s\t%s\n" % (
arc["l2_arc_size"]["mfu_asize"]["per"],
arc["l2_arc_size"]["mfu_asize"]["num"],
)
)
sys.stdout.write("\tMRU Alloc. Size:\t\t%s\t%s\n" % (
arc["l2_arc_size"]["mru_asize"]["per"],
arc["l2_arc_size"]["mru_asize"]["num"],
)
)
sys.stdout.write("\tPrefetch Alloc. Size:\t\t%s\t%s\n" % (
arc["l2_arc_size"]["prefetch_asize"]["per"],
arc["l2_arc_size"]["prefetch_asize"]["num"],
)
)
sys.stdout.write("\tData (buf content) Alloc. Size:\t%s\t%s\n" % (
arc["l2_arc_size"]["bufc_data_asize"]["per"],
arc["l2_arc_size"]["bufc_data_asize"]["num"],
)
)
sys.stdout.write("\tMetadata (buf content) Size:\t%s\t%s\n" % (
arc["l2_arc_size"]["bufc_metadata_asize"]["per"],
arc["l2_arc_size"]["bufc_metadata_asize"]["num"],
)
)
sys.stdout.write("\n")
if arc["l2_arc_evicts"]['lock_retries'] != '0' or \
arc["l2_arc_evicts"]["reading"] != '0':
sys.stdout.write("L2 ARC Evicts:\n")
sys.stdout.write("L2 ARC Evictions:\n")
sys.stdout.write("\tLock Retries:\t\t\t\t%s\n" %
arc["l2_arc_evicts"]['lock_retries'])
sys.stdout.write("\tUpon Reading:\t\t\t\t%s\n" %
@@ -921,14 +1025,7 @@ def _tunable_summary(Kstat):
global show_tunable_descriptions
global alternate_tunable_layout
names = os.listdir("/sys/module/zfs/parameters/")
values = {}
for name in names:
with open("/sys/module/zfs/parameters/" + name) as f:
value = f.read()
values[name] = value.strip()
tunables = load_tunables()
descriptions = {}
if show_tunable_descriptions:
@@ -966,22 +1063,17 @@ def _tunable_summary(Kstat):
sys.stderr.write("Tunable descriptions will be disabled.\n")
sys.stdout.write("ZFS Tunables:\n")
names.sort()
if alternate_tunable_layout:
fmt = "\t%s=%s\n"
else:
fmt = "\t%-50s%s\n"
for name in names:
if not name:
continue
for name in sorted(tunables.keys()):
if show_tunable_descriptions and name in descriptions:
sys.stdout.write("\t# %s\n" % descriptions[name])
sys.stdout.write(fmt % (name, values[name]))
sys.stdout.write(fmt % (name, tunables[name]))
unSub = [
@@ -1033,48 +1125,55 @@ def main():
global alternate_tunable_layout
try:
opts, args = getopt.getopt(
sys.argv[1:],
"adp:h", ["alternate", "description", "page=", "help"]
)
except getopt.error as e:
sys.stderr.write("Error: %s\n" % e.msg)
usage()
sys.exit(1)
args = {}
for opt, arg in opts:
if opt in ('-a', '--alternate'):
args['a'] = True
if opt in ('-d', '--description'):
args['d'] = True
if opt in ('-p', '--page'):
args['p'] = arg
if opt in ('-h', '--help'):
usage()
sys.exit(0)
Kstat = get_Kstat()
alternate_tunable_layout = 'a' in args
show_tunable_descriptions = 'd' in args
pages = []
if 'p' in args:
try:
pages.append(unSub[int(args['p']) - 1])
except IndexError:
sys.stderr.write('the argument to -p must be between 1 and ' +
str(len(unSub)) + '\n')
opts, args = getopt.getopt(
sys.argv[1:],
"adp:h", ["alternate", "description", "page=", "help"]
)
except getopt.error as e:
sys.stderr.write("Error: %s\n" % e.msg)
usage()
sys.exit(1)
else:
pages = unSub
zfs_header()
for page in pages:
page(Kstat)
sys.stdout.write("\n")
args = {}
for opt, arg in opts:
if opt in ('-a', '--alternate'):
args['a'] = True
if opt in ('-d', '--description'):
args['d'] = True
if opt in ('-p', '--page'):
args['p'] = arg
if opt in ('-h', '--help'):
usage()
sys.exit(0)
Kstat = get_Kstat()
alternate_tunable_layout = 'a' in args
show_tunable_descriptions = 'd' in args
pages = []
if 'p' in args:
try:
pages.append(unSub[int(args['p']) - 1])
except IndexError:
sys.stderr.write('the argument to -p must be between 1 and ' +
str(len(unSub)) + '\n')
sys.exit(1)
else:
pages = unSub
zfs_header()
for page in pages:
page(Kstat)
sys.stdout.write("\n")
except IOError as ex:
if (ex.errno == errno.EPIPE):
sys.exit(0)
raise
except KeyboardInterrupt:
sys.exit(0)
if __name__ == '__main__':
+241 -130
View File
@@ -1,4 +1,4 @@
#!/usr/bin/python3
#!/usr/bin/env python3
#
# Copyright (c) 2008 Ben Rockwood <benr@cuddletech.com>,
# Copyright (c) 2010 Martin Matuska <mm@FreeBSD.org>,
@@ -32,7 +32,7 @@
Provides basic information on the ARC, its efficiency, the L2ARC (if present),
the Data Management Unit (DMU), Virtual Devices (VDEVs), and tunables. See
the in-source documentation and code at
https://github.com/zfsonlinux/zfs/blob/master/module/zfs/arc.c for details.
https://github.com/openzfs/zfs/blob/master/module/zfs/arc.c for details.
The original introduction to arc_summary can be found at
http://cuddletech.com/?p=454
"""
@@ -42,13 +42,17 @@ import os
import subprocess
import sys
import time
import errno
DECRIPTION = 'Print ARC and other statistics for ZFS on Linux'
# We can't use env -S portably, and we need python3 -u to handle pipes in
# the shell abruptly closing the way we want to, so...
import io
if isinstance(sys.__stderr__.buffer, io.BufferedWriter):
os.execv(sys.executable, [sys.executable, "-u"] + sys.argv)
DESCRIPTION = 'Print ARC and other statistics for OpenZFS'
INDENT = ' '*8
LINE_LENGTH = 72
PROC_PATH = '/proc/spl/kstat/zfs/'
SPL_PATH = '/sys/module/spl/parameters/'
TUNABLES_PATH = '/sys/module/zfs/parameters/'
DATE_FORMAT = '%a %b %d %H:%M:%S %Y'
TITLE = 'ZFS Subsystem Report'
@@ -61,11 +65,10 @@ SECTION_PATHS = {'arc': 'arcstats',
'dmu': 'dmu_tx',
'l2arc': 'arcstats', # L2ARC stuff lives in arcstats
'vdev': 'vdev_cache_stats',
'xuio': 'xuio_stats',
'zfetch': 'zfetchstats',
'zil': 'zil'}
parser = argparse.ArgumentParser(description=DECRIPTION)
parser = argparse.ArgumentParser(description=DESCRIPTION)
parser.add_argument('-a', '--alternate', action='store_true', default=False,
help='use alternate formatting for tunables and SPL',
dest='alt')
@@ -83,6 +86,172 @@ parser.add_argument('-s', '--section', dest='section', help=SECTION_HELP)
ARGS = parser.parse_args()
if sys.platform.startswith('freebsd'):
# Requires py36-sysctl on FreeBSD
import sysctl
VDEV_CACHE_SIZE = 'vdev.cache_size'
def is_value(ctl):
return ctl.type != sysctl.CTLTYPE_NODE
def namefmt(ctl, base='vfs.zfs.'):
# base is removed from the name
cut = len(base)
return ctl.name[cut:]
def load_kstats(section):
base = 'kstat.zfs.misc.{section}.'.format(section=section)
fmt = lambda kstat: '{name} : {value}'.format(name=namefmt(kstat, base),
value=kstat.value)
kstats = sysctl.filter(base)
return [fmt(kstat) for kstat in kstats if is_value(kstat)]
def get_params(base):
ctls = sysctl.filter(base)
return {namefmt(ctl): str(ctl.value) for ctl in ctls if is_value(ctl)}
def get_tunable_params():
return get_params('vfs.zfs')
def get_vdev_params():
return get_params('vfs.zfs.vdev')
def get_version_impl(request):
# FreeBSD reports versions for zpl and spa instead of zfs and spl.
name = {'zfs': 'zpl',
'spl': 'spa'}[request]
mib = 'vfs.zfs.version.{}'.format(name)
version = sysctl.filter(mib)[0].value
return '{} version {}'.format(name, version)
def get_descriptions(_request):
ctls = sysctl.filter('vfs.zfs')
return {namefmt(ctl): ctl.description for ctl in ctls if is_value(ctl)}
elif sys.platform.startswith('linux'):
KSTAT_PATH = '/proc/spl/kstat/zfs'
SPL_PATH = '/sys/module/spl/parameters'
TUNABLES_PATH = '/sys/module/zfs/parameters'
VDEV_CACHE_SIZE = 'zfs_vdev_cache_size'
def load_kstats(section):
path = os.path.join(KSTAT_PATH, section)
with open(path) as f:
return list(f)[2:] # Get rid of header
def get_params(basepath):
"""Collect information on the Solaris Porting Layer (SPL) or the
tunables, depending on the PATH given. Does not check if PATH is
legal.
"""
result = {}
for name in os.listdir(basepath):
path = os.path.join(basepath, name)
with open(path) as f:
value = f.read()
result[name] = value.strip()
return result
def get_spl_params():
return get_params(SPL_PATH)
def get_tunable_params():
return get_params(TUNABLES_PATH)
def get_vdev_params():
return get_params(TUNABLES_PATH)
def get_version_impl(request):
# The original arc_summary called /sbin/modinfo/{spl,zfs} to get
# the version information. We switch to /sys/module/{spl,zfs}/version
# to make sure we get what is really loaded in the kernel
try:
with open("/sys/module/{}/version".format(request)) as f:
return f.read().strip()
except:
return "(unknown)"
def get_descriptions(request):
"""Get the descriptions of the Solaris Porting Layer (SPL) or the
tunables, return with minimal formatting.
"""
if request not in ('spl', 'zfs'):
print('ERROR: description of "{0}" requested)'.format(request))
sys.exit(1)
descs = {}
target_prefix = 'parm:'
# We would prefer to do this with /sys/modules -- see the discussion at
# get_version() -- but there isn't a way to get the descriptions from
# there, so we fall back on modinfo
command = ["/sbin/modinfo", request, "-0"]
# The recommended way to do this is with subprocess.run(). However,
# some installed versions of Python are < 3.5, so we offer them
# the option of doing it the old way (for now)
info = ''
try:
if 'run' in dir(subprocess):
info = subprocess.run(command, stdout=subprocess.PIPE,
universal_newlines=True)
raw_output = info.stdout.split('\0')
else:
info = subprocess.check_output(command,
universal_newlines=True)
raw_output = info.split('\0')
except subprocess.CalledProcessError:
print("Error: Descriptions not available",
"(can't access kernel module)")
sys.exit(1)
for line in raw_output:
if not line.startswith(target_prefix):
continue
line = line[len(target_prefix):].strip()
name, raw_desc = line.split(':', 1)
desc = raw_desc.rsplit('(', 1)[0]
if desc == '':
desc = '(No description found)'
descs[name.strip()] = desc.strip()
return descs
def handle_unraisableException(exc_type, exc_value=None, exc_traceback=None,
err_msg=None, object=None):
handle_Exception(exc_type, object, exc_traceback)
def handle_Exception(ex_cls, ex, tb):
if ex_cls is KeyboardInterrupt:
sys.exit()
if ex_cls is BrokenPipeError:
# It turns out that while sys.exit() triggers an exception
# not handled message on Python 3.8+, os._exit() does not.
os._exit(0)
if ex_cls is OSError:
if ex.errno == errno.ENOTCONN:
sys.exit()
raise ex
if hasattr(sys,'unraisablehook'): # Python 3.8+
sys.unraisablehook = handle_unraisableException
sys.excepthook = handle_Exception
def cleanup_line(single_line):
"""Format a raw line of data from /proc and isolate the name value
part, returning a tuple with each. Currently, this gets rid of the
@@ -238,139 +407,48 @@ def format_raw_line(name, value):
if ARGS.alt:
result = '{0}{1}={2}'.format(INDENT, name, value)
else:
spc = LINE_LENGTH-(len(INDENT)+len(value))
result = '{0}{1:<{spc}}{2}'.format(INDENT, name, value, spc=spc)
# Right-align the value within the line length if it fits,
# otherwise just separate it from the name by a single space.
fit = LINE_LENGTH - len(INDENT) - len(name)
overflow = len(value) + 1
w = max(fit, overflow)
result = '{0}{1}{2:>{w}}'.format(INDENT, name, value, w=w)
return result
def get_kstats():
"""Collect information on the ZFS subsystem from the /proc Linux virtual
file system. The step does not perform any further processing, giving us
the option to only work on what is actually needed. The name "kstat" is a
holdover from the Solaris utility of the same name.
"""Collect information on the ZFS subsystem. The step does not perform any
further processing, giving us the option to only work on what is actually
needed. The name "kstat" is a holdover from the Solaris utility of the same
name.
"""
result = {}
secs = SECTION_PATHS.values()
for section in secs:
with open(PROC_PATH+section, 'r') as proc_location:
lines = [line for line in proc_location]
del lines[0:2] # Get rid of header
result[section] = lines
for section in SECTION_PATHS.values():
if section not in result:
result[section] = load_kstats(section)
return result
def get_spl_tunables(PATH):
"""Collect information on the Solaris Porting Layer (SPL) or the
tunables, depending on the PATH given. Does not check if PATH is
legal.
"""
result = {}
parameters = os.listdir(PATH)
for name in parameters:
with open(PATH+name, 'r') as para_file:
value = para_file.read()
result[name] = value.strip()
return result
def get_descriptions(request):
"""Get the decriptions of the Solaris Porting Layer (SPL) or the
tunables, return with minimal formatting.
"""
if request not in ('spl', 'zfs'):
print('ERROR: description of "{0}" requested)'.format(request))
sys.exit(1)
descs = {}
target_prefix = 'parm:'
# We would prefer to do this with /sys/modules -- see the discussion at
# get_version() -- but there isn't a way to get the descriptions from
# there, so we fall back on modinfo
command = ["/sbin/modinfo", request, "-0"]
# The recommended way to do this is with subprocess.run(). However,
# some installed versions of Python are < 3.5, so we offer them
# the option of doing it the old way (for now)
info = ''
try:
if 'run' in dir(subprocess):
info = subprocess.run(command, stdout=subprocess.PIPE,
universal_newlines=True)
raw_output = info.stdout.split('\0')
else:
info = subprocess.check_output(command, universal_newlines=True)
raw_output = info.split('\0')
except subprocess.CalledProcessError:
print("Error: Descriptions not available (can't access kernel module)")
sys.exit(1)
for line in raw_output:
if not line.startswith(target_prefix):
continue
line = line[len(target_prefix):].strip()
name, raw_desc = line.split(':', 1)
desc = raw_desc.rsplit('(', 1)[0]
if desc == '':
desc = '(No description found)'
descs[name.strip()] = desc.strip()
return descs
def get_version(request):
"""Get the version number of ZFS or SPL on this machine for header.
Returns an error string, but does not raise an error, if we can't
get the ZFS/SPL version via modinfo.
get the ZFS/SPL version.
"""
if request not in ('spl', 'zfs'):
error_msg = '(ERROR: "{0}" requested)'.format(request)
return error_msg
# The original arc_summary called /sbin/modinfo/{spl,zfs} to get
# the version information. We switch to /sys/module/{spl,zfs}/version
# to make sure we get what is really loaded in the kernel
command = ["cat", "/sys/module/{0}/version".format(request)]
req = request.upper()
version = "(Can't get {0} version)".format(req)
# The recommended way to do this is with subprocess.run(). However,
# some installed versions of Python are < 3.5, so we offer them
# the option of doing it the old way (for now)
info = ''
if 'run' in dir(subprocess):
info = subprocess.run(command, stdout=subprocess.PIPE,
universal_newlines=True)
version = info.stdout.strip()
else:
info = subprocess.check_output(command, universal_newlines=True)
version = info.strip()
return version
return get_version_impl(request)
def print_header():
"""Print the initial heading with date and time as well as info on the
Linux and ZFS versions. This is not called for the graph.
kernel and ZFS versions. This is not called for the graph.
"""
# datetime is now recommended over time but we keep the exact formatting
@@ -534,6 +612,20 @@ def section_arc(kstats_dict):
prt_i1('Deleted:', f_hits(arc_stats['deleted']))
prt_i1('Mutex misses:', f_hits(arc_stats['mutex_miss']))
prt_i1('Eviction skips:', f_hits(arc_stats['evict_skip']))
prt_i1('Eviction skips due to L2 writes:',
f_hits(arc_stats['evict_l2_skip']))
prt_i1('L2 cached evictions:', f_bytes(arc_stats['evict_l2_cached']))
prt_i1('L2 eligible evictions:', f_bytes(arc_stats['evict_l2_eligible']))
prt_i2('L2 eligible MFU evictions:',
f_perc(arc_stats['evict_l2_eligible_mfu'],
arc_stats['evict_l2_eligible']),
f_bytes(arc_stats['evict_l2_eligible_mfu']))
prt_i2('L2 eligible MRU evictions:',
f_perc(arc_stats['evict_l2_eligible_mru'],
arc_stats['evict_l2_eligible']),
f_bytes(arc_stats['evict_l2_eligible_mru']))
prt_i1('L2 ineligible evictions:',
f_bytes(arc_stats['evict_l2_ineligible']))
print()
@@ -672,15 +764,30 @@ def section_l2arc(kstats_dict):
prt_i2('Header size:',
f_perc(arc_stats['l2_hdr_size'], arc_stats['l2_size']),
f_bytes(arc_stats['l2_hdr_size']))
prt_i2('MFU allocated size:',
f_perc(arc_stats['l2_mfu_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_mfu_asize']))
prt_i2('MRU allocated size:',
f_perc(arc_stats['l2_mru_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_mru_asize']))
prt_i2('Prefetch allocated size:',
f_perc(arc_stats['l2_prefetch_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_prefetch_asize']))
prt_i2('Data (buffer content) allocated size:',
f_perc(arc_stats['l2_bufc_data_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_bufc_data_asize']))
prt_i2('Metadata (buffer content) allocated size:',
f_perc(arc_stats['l2_bufc_metadata_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_bufc_metadata_asize']))
print()
prt_1('L2ARC breakdown:', f_hits(l2_access_total))
prt_i2('Hit ratio:',
f_perc(arc_stats['l2_hits'], l2_access_total),
f_bytes(arc_stats['l2_hits']))
f_hits(arc_stats['l2_hits']))
prt_i2('Miss ratio:',
f_perc(arc_stats['l2_misses'], l2_access_total),
f_bytes(arc_stats['l2_misses']))
f_hits(arc_stats['l2_misses']))
prt_i1('Feeds:', f_hits(arc_stats['l2_feeds']))
print()
@@ -691,13 +798,13 @@ def section_l2arc(kstats_dict):
prt_i2('Done ratio:',
f_perc(arc_stats['l2_writes_done'],
arc_stats['l2_writes_sent']),
f_bytes(arc_stats['l2_writes_done']))
f_hits(arc_stats['l2_writes_done']))
prt_i2('Error ratio:',
f_perc(arc_stats['l2_writes_error'],
arc_stats['l2_writes_sent']),
f_bytes(arc_stats['l2_writes_error']))
f_hits(arc_stats['l2_writes_error']))
else:
prt_i2('Writes sent:', '100 %', f_bytes(arc_stats['l2_writes_sent']))
prt_i2('Writes sent:', '100 %', f_hits(arc_stats['l2_writes_sent']))
print()
print('L2ARC evicts:')
@@ -708,10 +815,14 @@ def section_l2arc(kstats_dict):
def section_spl(*_):
"""Print the SPL parameters, if requested with alternative format
and/or decriptions. This does not use kstats.
and/or descriptions. This does not use kstats.
"""
spls = get_spl_tunables(SPL_PATH)
if sys.platform.startswith('freebsd'):
# No SPL support in FreeBSD
return
spls = get_spl_params()
keylist = sorted(spls.keys())
print('Solaris Porting Layer (SPL):')
@@ -725,7 +836,7 @@ def section_spl(*_):
try:
print(INDENT+'#', descriptions[key])
except KeyError:
print(INDENT+'# (No decription found)') # paranoid
print(INDENT+'# (No description found)') # paranoid
print(format_raw_line(key, value))
@@ -734,10 +845,10 @@ def section_spl(*_):
def section_tunables(*_):
"""Print the tunables, if requested with alternative format and/or
decriptions. This does not use kstasts.
descriptions. This does not use kstasts.
"""
tunables = get_spl_tunables(TUNABLES_PATH)
tunables = get_tunable_params()
keylist = sorted(tunables.keys())
print('Tunables:')
@@ -751,7 +862,7 @@ def section_tunables(*_):
try:
print(INDENT+'#', descriptions[key])
except KeyError:
print(INDENT+'# (No decription found)') # paranoid
print(INDENT+'# (No description found)') # paranoid
print(format_raw_line(key, value))
@@ -763,11 +874,11 @@ def section_vdev(kstats_dict):
# Currently [Nov 2017] the VDEV cache is disabled, because it is actually
# harmful. When this is the case, we just skip the whole entry. See
# https://github.com/zfsonlinux/zfs/blob/master/module/zfs/vdev_cache.c
# https://github.com/openzfs/zfs/blob/master/module/zfs/vdev_cache.c
# for details
tunables = get_spl_tunables(TUNABLES_PATH)
tunables = get_vdev_params()
if tunables['zfs_vdev_cache_size'] == '0':
if tunables[VDEV_CACHE_SIZE] == '0':
print('VDEV cache disabled, skipping section\n')
return
@@ -789,7 +900,7 @@ def section_vdev(kstats_dict):
def section_zil(kstats_dict):
"""Collect information on the ZFS Intent Log. Some of the information
taken from https://github.com/zfsonlinux/zfs/blob/master/include/sys/zil.h
taken from https://github.com/openzfs/zfs/blob/master/include/sys/zil.h
"""
zil_stats = isolate_section('zil', kstats_dict)
+1
View File
@@ -0,0 +1 @@
arcstat
+4 -12
View File
@@ -1,13 +1,5 @@
dist_bin_SCRIPTS = arcstat
include $(top_srcdir)/config/Substfiles.am
#
# The arcstat script is compatibile with both Python 2.6 and 3.4.
# As such the python 3 shebang can be replaced at install time when
# targeting a python 2 system. This allows us to maintain a single
# version of the source.
#
if USING_PYTHON_2
install-exec-hook:
sed --in-place 's|^#!/usr/bin/python3|#!/usr/bin/python2|' \
$(DESTDIR)$(bindir)/arcstat
endif
bin_SCRIPTS = arcstat
SUBSTFILES += $(bin_SCRIPTS)
+148 -64
View File
@@ -1,20 +1,25 @@
#!/usr/bin/python3
#!/usr/bin/env @PYTHON_SHEBANG@
#
# Print out ZFS ARC Statistics exported via kstat(1)
# For a definition of fields, or usage, use arctstat.pl -v
# For a definition of fields, or usage, use arcstat -v
#
# This script is a fork of the original arcstat.pl (0.1) by
# Neelakanth Nadgir, originally published on his Sun blog on
# This script was originally a fork of the original arcstat.pl (0.1)
# by Neelakanth Nadgir, originally published on his Sun blog on
# 09/18/2007
# http://blogs.sun.com/realneel/entry/zfs_arc_statistics
#
# This version aims to improve upon the original by adding features
# and fixing bugs as needed. This version is maintained by
# Mike Harsch and is hosted in a public open source repository:
# A new version aimed to improve upon the original by adding features
# and fixing bugs as needed. This version was maintained by Mike
# Harsch and was hosted in a public open source repository:
# http://github.com/mharsch/arcstat
#
# Comments, Questions, or Suggestions are always welcome.
# Contact the maintainer at ( mike at harschsystems dot com )
# but has since moved to the illumos-gate repository.
#
# This Python port was written by John Hixson for FreeNAS, introduced
# in commit e2c29f:
# https://github.com/freenas/freenas
#
# and has been improved by many people since.
#
# CDDL HEADER START
#
@@ -51,16 +56,16 @@ import getopt
import re
import copy
from decimal import Decimal
from signal import signal, SIGINT, SIGWINCH, SIG_DFL
cols = {
# HDR: [Size, Scale, Description]
"time": [8, -1, "Time"],
"hits": [4, 1000, "ARC reads per second"],
"miss": [4, 1000, "ARC misses per second"],
"read": [4, 1000, "Total ARC accesses per second"],
"hit%": [4, 100, "ARC Hit percentage"],
"hit%": [4, 100, "ARC hit percentage"],
"miss%": [5, 100, "ARC miss percentage"],
"dhit": [4, 1000, "Demand hits per second"],
"dmis": [4, 1000, "Demand misses per second"],
@@ -75,13 +80,20 @@ cols = {
"mread": [5, 1000, "Metadata accesses per second"],
"mh%": [3, 100, "Metadata hit percentage"],
"mm%": [3, 100, "Metadata miss percentage"],
"arcsz": [5, 1024, "ARC Size"],
"c": [4, 1024, "ARC Target Size"],
"mfu": [4, 1000, "MFU List hits per second"],
"mru": [4, 1000, "MRU List hits per second"],
"mfug": [4, 1000, "MFU Ghost List hits per second"],
"mrug": [4, 1000, "MRU Ghost List hits per second"],
"arcsz": [5, 1024, "ARC size"],
"size": [4, 1024, "ARC size"],
"c": [4, 1024, "ARC target size"],
"mfu": [4, 1000, "MFU list hits per second"],
"mru": [4, 1000, "MRU list hits per second"],
"mfug": [4, 1000, "MFU ghost list hits per second"],
"mrug": [4, 1000, "MRU ghost list hits per second"],
"eskip": [5, 1000, "evict_skip per second"],
"el2skip": [7, 1000, "evict skip, due to l2 writes, per second"],
"el2cach": [7, 1024, "Size of L2 cached evictions per second"],
"el2el": [5, 1024, "Size of L2 eligible evictions per second"],
"el2mfu": [6, 1024, "Size of L2 eligible MFU evictions per second"],
"el2mru": [6, 1024, "Size of L2 eligible MRU evictions per second"],
"el2inel": [7, 1024, "Size of L2 ineligible evictions per second"],
"mtxmis": [6, 1000, "mutex_miss per second"],
"dread": [5, 1000, "Demand accesses per second"],
"pread": [5, 1000, "Prefetch accesses per second"],
@@ -90,17 +102,29 @@ cols = {
"l2read": [6, 1000, "Total L2ARC accesses per second"],
"l2hit%": [6, 100, "L2ARC access hit percentage"],
"l2miss%": [7, 100, "L2ARC access miss percentage"],
"l2pref": [6, 1024, "L2ARC prefetch allocated size"],
"l2mfu": [5, 1024, "L2ARC MFU allocated size"],
"l2mru": [5, 1024, "L2ARC MRU allocated size"],
"l2data": [6, 1024, "L2ARC data allocated size"],
"l2meta": [6, 1024, "L2ARC metadata allocated size"],
"l2pref%": [7, 100, "L2ARC prefetch percentage"],
"l2mfu%": [6, 100, "L2ARC MFU percentage"],
"l2mru%": [6, 100, "L2ARC MRU percentage"],
"l2data%": [7, 100, "L2ARC data percentage"],
"l2meta%": [7, 100, "L2ARC metadata percentage"],
"l2asize": [7, 1024, "Actual (compressed) size of the L2ARC"],
"l2size": [6, 1024, "Size of the L2ARC"],
"l2bytes": [7, 1024, "bytes read per second from the L2ARC"],
"grow": [4, 1000, "ARC Grow disabled"],
"need": [4, 1024, "ARC Reclaim need"],
"free": [4, 1024, "ARC Free memory"],
"l2bytes": [7, 1024, "Bytes read per second from the L2ARC"],
"grow": [4, 1000, "ARC grow disabled"],
"need": [4, 1024, "ARC reclaim need"],
"free": [4, 1024, "ARC free memory"],
"avail": [5, 1024, "ARC available memory"],
"waste": [5, 1024, "Wasted memory due to round up to pagesize"],
}
v = {}
hdr = ["time", "read", "miss", "miss%", "dmis", "dm%", "pmis", "pm%", "mmis",
"mm%", "arcsz", "c"]
"mm%", "size", "c", "avail"]
xhdr = ["time", "mfu", "mru", "mfug", "mrug", "eskip", "mtxmis", "dread",
"pread", "read"]
sint = 1 # Default interval is 1 second
@@ -110,12 +134,56 @@ opfile = None
sep = " " # Default separator is 2 spaces
version = "0.4"
l2exist = False
cmd = ("Usage: arcstat [-hvx] [-f fields] [-o file] [-s string] [interval "
cmd = ("Usage: arcstat [-havxp] [-f fields] [-o file] [-s string] [interval "
"[count]]\n")
cur = {}
d = {}
out = None
kstat = None
pretty_print = True
if sys.platform.startswith('freebsd'):
# Requires py-sysctl on FreeBSD
import sysctl
def kstat_update():
global kstat
k = [ctl for ctl in sysctl.filter('kstat.zfs.misc.arcstats')
if ctl.type != sysctl.CTLTYPE_NODE]
if not k:
sys.exit(1)
kstat = {}
for s in k:
if not s:
continue
name, value = s.name, s.value
# Trims 'kstat.zfs.misc.arcstats' from the name
kstat[name[24:]] = int(value)
elif sys.platform.startswith('linux'):
def kstat_update():
global kstat
k = [line.strip() for line in open('/proc/spl/kstat/zfs/arcstats')]
if not k:
sys.exit(1)
del k[0:2]
kstat = {}
for s in k:
if not s:
continue
name, unused, value = s.split()
kstat[name] = int(value)
def detailed_usage():
@@ -131,6 +199,7 @@ def detailed_usage():
def usage():
sys.stderr.write("%s\n" % cmd)
sys.stderr.write("\t -h : Print this help message\n")
sys.stderr.write("\t -a : Print all possible stats\n")
sys.stderr.write("\t -v : List all possible field headers and definitions"
"\n")
sys.stderr.write("\t -x : Print extended stats\n")
@@ -138,6 +207,7 @@ def usage():
sys.stderr.write("\t -o : Redirect output to the specified file\n")
sys.stderr.write("\t -s : Override default field separator with custom "
"character or string\n")
sys.stderr.write("\t -p : Disable auto-scaling of numerical fields\n")
sys.stderr.write("\nExamples:\n")
sys.stderr.write("\tarcstat -o /tmp/a.log 2 10\n")
sys.stderr.write("\tarcstat -s \",\" -o /tmp/a.log 2 10\n")
@@ -148,25 +218,6 @@ def usage():
sys.exit(1)
def kstat_update():
global kstat
k = [line.strip() for line in open('/proc/spl/kstat/zfs/arcstats')]
if not k:
sys.exit(1)
del k[0:2]
kstat = {}
for s in k:
if not s:
continue
name, unused, value = s.split()
kstat[name] = Decimal(value)
def snap_stats():
global cur
global kstat
@@ -197,7 +248,7 @@ def prettynum(sz, scale, num=0):
elif 0 < num < 1:
num = 0
while num > scale and index < 5:
while abs(num) > scale and index < 5:
save = num
num = num / scale
index += 1
@@ -205,7 +256,7 @@ def prettynum(sz, scale, num=0):
if index == 0:
return "%*d" % (sz, num)
if (save / scale) < 10:
if abs(save / scale) < 10:
return "%*.1f%s" % (sz - 1, num, suffix[index])
else:
return "%*d%s" % (sz - 1, num, suffix[index])
@@ -215,12 +266,14 @@ def print_values():
global hdr
global sep
global v
global pretty_print
for col in hdr:
sys.stdout.write("%s%s" % (
prettynum(cols[col][0], cols[col][1], v[col]),
sep
))
if pretty_print:
fmt = lambda col: prettynum(cols[col][0], cols[col][1], v[col])
else:
fmt = lambda col: v[col]
sys.stdout.write(sep.join(fmt(col) for col in hdr))
sys.stdout.write("\n")
sys.stdout.flush()
@@ -228,9 +281,14 @@ def print_values():
def print_header():
global hdr
global sep
global pretty_print
for col in hdr:
sys.stdout.write("%*s%s" % (cols[col][0], col, sep))
if pretty_print:
fmt = lambda col: "%*s" % (cols[col][0], col)
else:
fmt = lambda col: col
sys.stdout.write(sep.join(fmt(col) for col in hdr))
sys.stdout.write("\n")
@@ -267,8 +325,10 @@ def init():
global sep
global out
global l2exist
global pretty_print
desired_cols = None
aflag = False
xflag = False
hflag = False
vflag = False
@@ -277,14 +337,16 @@ def init():
try:
opts, args = getopt.getopt(
sys.argv[1:],
"xo:hvs:f:",
"axo:hvs:f:p",
[
"all",
"extended",
"outfile",
"help",
"verbose",
"separator",
"columns"
"columns",
"parsable"
]
)
except getopt.error as msg:
@@ -293,6 +355,8 @@ def init():
opts = None
for opt, arg in opts:
if opt in ('-a', '--all'):
aflag = True
if opt in ('-x', '--extended'):
xflag = True
if opt in ('-o', '--outfile'):
@@ -308,19 +372,13 @@ def init():
if opt in ('-f', '--columns'):
desired_cols = arg
i += 1
if opt in ('-p', '--parsable'):
pretty_print = False
i += 1
argv = sys.argv[i:]
sint = Decimal(argv[0]) if argv else sint
count = int(argv[1]) if len(argv) > 1 else count
if len(argv) > 1:
sint = Decimal(argv[0])
count = int(argv[1])
elif len(argv) > 0:
sint = Decimal(argv[0])
count = 0
sint = int(argv[0]) if argv else sint
count = int(argv[1]) if len(argv) > 1 else (0 if len(argv) > 0 else 1)
if hflag or (xflag and desired_cols):
usage()
@@ -360,6 +418,12 @@ def init():
incompat)
usage()
if aflag:
if l2exist:
hdr = cols.keys()
else:
hdr = [col for col in cols.keys() if not col.startswith("l2")]
if opfile:
try:
out = open(opfile, "w")
@@ -408,12 +472,19 @@ def calculate():
v["mm%"] = 100 - v["mh%"] if v["mread"] > 0 else 0
v["arcsz"] = cur["size"]
v["size"] = cur["size"]
v["c"] = cur["c"]
v["mfu"] = d["mfu_hits"] / sint
v["mru"] = d["mru_hits"] / sint
v["mrug"] = d["mru_ghost_hits"] / sint
v["mfug"] = d["mfu_ghost_hits"] / sint
v["eskip"] = d["evict_skip"] / sint
v["el2skip"] = d["evict_l2_skip"] / sint
v["el2cach"] = d["evict_l2_cached"] / sint
v["el2el"] = d["evict_l2_eligible"] / sint
v["el2mfu"] = d["evict_l2_eligible_mfu"] / sint
v["el2mru"] = d["evict_l2_eligible_mru"] / sint
v["el2inel"] = d["evict_l2_ineligible"] / sint
v["mtxmis"] = d["mutex_miss"] / sint
if l2exist:
@@ -427,9 +498,22 @@ def calculate():
v["l2size"] = cur["l2_size"]
v["l2bytes"] = d["l2_read_bytes"] / sint
v["l2pref"] = cur["l2_prefetch_asize"]
v["l2mfu"] = cur["l2_mfu_asize"]
v["l2mru"] = cur["l2_mru_asize"]
v["l2data"] = cur["l2_bufc_data_asize"]
v["l2meta"] = cur["l2_bufc_metadata_asize"]
v["l2pref%"] = 100 * v["l2pref"] / v["l2asize"]
v["l2mfu%"] = 100 * v["l2mfu"] / v["l2asize"]
v["l2mru%"] = 100 * v["l2mru"] / v["l2asize"]
v["l2data%"] = 100 * v["l2data"] / v["l2asize"]
v["l2meta%"] = 100 * v["l2meta"] / v["l2asize"]
v["grow"] = 0 if cur["arc_no_grow"] else 1
v["need"] = cur["arc_need_free"]
v["free"] = cur["arc_sys_free"]
v["free"] = cur["memory_free_bytes"]
v["avail"] = cur["memory_available_bytes"]
v["waste"] = cur["abd_chunk_waste_size"]
def main():
+1
View File
@@ -0,0 +1 @@
dbufstat
+4 -12
View File
@@ -1,13 +1,5 @@
dist_bin_SCRIPTS = dbufstat
include $(top_srcdir)/config/Substfiles.am
#
# The dbufstat script is compatibile with both Python 2.6 and 3.4.
# As such the python 3 shebang can be replaced at install time when
# targeting a python 2 system. This allows us to maintain a single
# version of the source.
#
if USING_PYTHON_2
install-exec-hook:
sed --in-place 's|^#!/usr/bin/python3|#!/usr/bin/python2|' \
$(DESTDIR)$(bindir)/dbufstat
endif
bin_SCRIPTS = dbufstat
SUBSTFILES += $(bin_SCRIPTS)
@@ -1,4 +1,4 @@
#!/usr/bin/python3
#!/usr/bin/env @PYTHON_SHEBANG@
#
# Print out statistics for all cached dmu buffers. This information
# is available through the dbufs kstat and may be post-processed as
@@ -113,10 +113,25 @@ cmd = ("Usage: dbufstat [-bdhnrtvx] [-i file] [-f fields] [-o file] "
raw = 0
if sys.platform.startswith("freebsd"):
import io
# Requires py-sysctl on FreeBSD
import sysctl
def default_ifile():
dbufs = sysctl.filter("kstat.zfs.misc.dbufs")[0].value
sys.stdin = io.StringIO(dbufs)
return "-"
elif sys.platform.startswith("linux"):
def default_ifile():
return "/proc/spl/kstat/zfs/dbufs"
def print_incompat_helper(incompat):
cnt = 0
for key in sorted(incompat):
if cnt is 0:
if cnt == 0:
sys.stderr.write("\t")
elif cnt > 8:
sys.stderr.write(",\n\t")
@@ -343,7 +358,7 @@ def get_compstring(c):
"ZIO_COMPRESS_GZIP_6", "ZIO_COMPRESS_GZIP_7",
"ZIO_COMPRESS_GZIP_8", "ZIO_COMPRESS_GZIP_9",
"ZIO_COMPRESS_ZLE", "ZIO_COMPRESS_LZ4",
"ZIO_COMPRESS_FUNCTION"]
"ZIO_COMPRESS_ZSTD", "ZIO_COMPRESS_FUNCTION"]
# If "-rr" option is used, don't convert to string representation
if raw > 1:
@@ -645,9 +660,9 @@ def main():
sys.exit(1)
if not ifile:
ifile = '/proc/spl/kstat/zfs/dbufs'
ifile = default_ifile()
if ifile is not "-":
if ifile != "-":
try:
tmp = open(ifile, "r")
sys.stdin = tmp
+1
View File
@@ -0,0 +1 @@
/fsck.zfs
+5
View File
@@ -1 +1,6 @@
include $(top_srcdir)/config/Substfiles.am
include $(top_srcdir)/config/Shellcheck.am
dist_sbin_SCRIPTS = fsck.zfs
SUBSTFILES += $(dist_sbin_SCRIPTS)
-9
View File
@@ -1,9 +0,0 @@
#!/bin/sh
#
# fsck.zfs: A fsck helper to accomidate distributions that expect
# to be able to execute a fsck on all filesystem types. Currently
# this script does nothing but it could be extended to act as a
# compatibility wrapper for 'zpool scrub'.
#
exit 0
+44
View File
@@ -0,0 +1,44 @@
#!/bin/sh
#
# fsck.zfs: A fsck helper to accommodate distributions that expect
# to be able to execute a fsck on all filesystem types.
#
# This script simply bubbles up some already-known-about errors,
# see fsck.zfs(8)
#
if [ "$#" = "0" ]; then
echo "Usage: $0 [options] dataset…" >&2
exit 16
fi
ret=0
for dataset in "$@"; do
case "$dataset" in
-*)
continue
;;
*)
;;
esac
pool="${dataset%%/*}"
case "$(@sbindir@/zpool list -Ho health "$pool")" in
DEGRADED)
ret=$(( ret | 4 ))
;;
FAULTED)
awk '!/^([[:space:]]*#.*)?$/ && $1 == "'"$dataset"'" && $3 == "zfs" {exit 1}' /etc/fstab || \
ret=$(( ret | 8 ))
;;
"")
# Pool not found, error printed by zpool(8)
ret=$(( ret | 8 ))
;;
*)
;;
esac
done
exit "$ret"
+7 -6
View File
@@ -1,9 +1,5 @@
include $(top_srcdir)/config/Rules.am
DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \
-I$(top_srcdir)/lib/libspl/include
#
# Ignore the prefix for the mount helper. It must be installed in /sbin/
# because this path is hardcoded in the mount(8) for security reasons.
@@ -17,5 +13,10 @@ mount_zfs_SOURCES = \
mount_zfs.c
mount_zfs_LDADD = \
$(top_builddir)/lib/libnvpair/libnvpair.la \
$(top_builddir)/lib/libzfs/libzfs.la
$(abs_top_builddir)/lib/libzfs/libzfs.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la \
$(abs_top_builddir)/lib/libnvpair/libnvpair.la
mount_zfs_LDADD += $(LTLIBINTL)
include $(top_srcdir)/config/CppCheck.am
+47 -299
View File
@@ -42,247 +42,46 @@
libzfs_handle_t *g_zfs;
typedef struct option_map {
const char *name;
unsigned long mntmask;
unsigned long zfsmask;
} option_map_t;
static const option_map_t option_map[] = {
/* Canonicalized filesystem independent options from mount(8) */
{ MNTOPT_NOAUTO, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_DEFAULTS, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_NODEVICES, MS_NODEV, ZS_COMMENT },
{ MNTOPT_DIRSYNC, MS_DIRSYNC, ZS_COMMENT },
{ MNTOPT_NOEXEC, MS_NOEXEC, ZS_COMMENT },
{ MNTOPT_GROUP, MS_GROUP, ZS_COMMENT },
{ MNTOPT_NETDEV, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_NOFAIL, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_NOSUID, MS_NOSUID, ZS_COMMENT },
{ MNTOPT_OWNER, MS_OWNER, ZS_COMMENT },
{ MNTOPT_REMOUNT, MS_REMOUNT, ZS_COMMENT },
{ MNTOPT_RO, MS_RDONLY, ZS_COMMENT },
{ MNTOPT_RW, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_SYNC, MS_SYNCHRONOUS, ZS_COMMENT },
{ MNTOPT_USER, MS_USERS, ZS_COMMENT },
{ MNTOPT_USERS, MS_USERS, ZS_COMMENT },
/* acl flags passed with util-linux-2.24 mount command */
{ MNTOPT_ACL, MS_POSIXACL, ZS_COMMENT },
{ MNTOPT_NOACL, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_POSIXACL, MS_POSIXACL, ZS_COMMENT },
#ifdef MS_NOATIME
{ MNTOPT_NOATIME, MS_NOATIME, ZS_COMMENT },
#endif
#ifdef MS_NODIRATIME
{ MNTOPT_NODIRATIME, MS_NODIRATIME, ZS_COMMENT },
#endif
#ifdef MS_RELATIME
{ MNTOPT_RELATIME, MS_RELATIME, ZS_COMMENT },
#endif
#ifdef MS_STRICTATIME
{ MNTOPT_STRICTATIME, MS_STRICTATIME, ZS_COMMENT },
#endif
#ifdef MS_LAZYTIME
{ MNTOPT_LAZYTIME, MS_LAZYTIME, ZS_COMMENT },
#endif
{ MNTOPT_CONTEXT, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_FSCONTEXT, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_DEFCONTEXT, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_ROOTCONTEXT, MS_COMMENT, ZS_COMMENT },
#ifdef MS_I_VERSION
{ MNTOPT_IVERSION, MS_I_VERSION, ZS_COMMENT },
#endif
#ifdef MS_MANDLOCK
{ MNTOPT_NBMAND, MS_MANDLOCK, ZS_COMMENT },
#endif
/* Valid options not found in mount(8) */
{ MNTOPT_BIND, MS_BIND, ZS_COMMENT },
#ifdef MS_REC
{ MNTOPT_RBIND, MS_BIND|MS_REC, ZS_COMMENT },
#endif
{ MNTOPT_COMMENT, MS_COMMENT, ZS_COMMENT },
#ifdef MS_NOSUB
{ MNTOPT_NOSUB, MS_NOSUB, ZS_COMMENT },
#endif
#ifdef MS_SILENT
{ MNTOPT_QUIET, MS_SILENT, ZS_COMMENT },
#endif
/* Custom zfs options */
{ MNTOPT_XATTR, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_NOXATTR, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_ZFSUTIL, MS_COMMENT, ZS_ZFSUTIL },
{ NULL, 0, 0 } };
/*
* Break the mount option in to a name/value pair. The name is
* validated against the option map and mount flags set accordingly.
* Opportunistically convert a target string into a pool name. If the
* string does not represent a block device with a valid zfs label
* then it is passed through without modification.
*/
static int
parse_option(char *mntopt, unsigned long *mntflags,
unsigned long *zfsflags, int sloppy)
static void
parse_dataset(const char *target, char **dataset)
{
const option_map_t *opt;
char *ptr, *name, *value = NULL;
int error = 0;
name = strdup(mntopt);
if (name == NULL)
return (ENOMEM);
for (ptr = name; ptr && *ptr; ptr++) {
if (*ptr == '=') {
*ptr = '\0';
value = ptr+1;
VERIFY3P(value, !=, NULL);
break;
}
}
for (opt = option_map; opt->name != NULL; opt++) {
if (strncmp(name, opt->name, strlen(name)) == 0) {
*mntflags |= opt->mntmask;
*zfsflags |= opt->zfsmask;
error = 0;
goto out;
}
}
if (!sloppy)
error = ENOENT;
out:
/* If required further process on the value may be done here */
free(name);
return (error);
}
/*
* Translate the mount option string in to MS_* mount flags for the
* kernel vfs. When sloppy is non-zero unknown options will be ignored
* otherwise they are considered fatal are copied in to badopt.
*/
static int
parse_options(char *mntopts, unsigned long *mntflags, unsigned long *zfsflags,
int sloppy, char *badopt, char *mtabopt)
{
int error = 0, quote = 0, flag = 0, count = 0;
char *ptr, *opt, *opts;
opts = strdup(mntopts);
if (opts == NULL)
return (ENOMEM);
*mntflags = 0;
opt = NULL;
/*
* Scan through all mount options which must be comma delimited.
* We must be careful to notice regions which are double quoted
* and skip commas in these regions. Each option is then checked
* to determine if it is a known option.
* Prior to util-linux 2.36.2, if a file or directory in the
* current working directory was named 'dataset' then mount(8)
* would prepend the current working directory to the dataset.
* Check for it and strip the prepended path when it is added.
*/
for (ptr = opts; ptr && !flag; ptr++) {
if (opt == NULL)
opt = ptr;
if (*ptr == '"')
quote = !quote;
if (quote)
continue;
if (*ptr == '\0')
flag = 1;
if ((*ptr == ',') || (*ptr == '\0')) {
*ptr = '\0';
error = parse_option(opt, mntflags, zfsflags, sloppy);
if (error) {
strcpy(badopt, opt);
goto out;
}
if (!(*mntflags & MS_REMOUNT) &&
!(*zfsflags & ZS_ZFSUTIL)) {
if (count > 0)
strlcat(mtabopt, ",", MNT_LINE_MAX);
strlcat(mtabopt, opt, MNT_LINE_MAX);
count++;
}
opt = NULL;
}
}
out:
free(opts);
return (error);
}
/*
* Return the pool/dataset to mount given the name passed to mount. This
* is expected to be of the form pool/dataset, however may also refer to
* a block device if that device contains a valid zfs label.
*/
static char *
parse_dataset(char *dataset)
{
char cwd[PATH_MAX];
struct stat64 statbuf;
int error;
int len;
/*
* We expect a pool/dataset to be provided, however if we're
* given a device which is a member of a zpool we attempt to
* extract the pool name stored in the label. Given the pool
* name we can mount the root dataset.
*/
error = stat64(dataset, &statbuf);
if (error == 0) {
nvlist_t *config;
char *name;
int fd;
fd = open(dataset, O_RDONLY);
if (fd < 0)
goto out;
error = zpool_read_label(fd, &config, NULL);
(void) close(fd);
if (error)
goto out;
error = nvlist_lookup_string(config,
ZPOOL_CONFIG_POOL_NAME, &name);
if (error) {
nvlist_free(config);
} else {
dataset = strdup(name);
nvlist_free(config);
return (dataset);
}
if (getcwd(cwd, PATH_MAX) == NULL) {
perror("getcwd");
return;
}
out:
/*
* If a file or directory in your current working directory is
* named 'dataset' then mount(8) will prepend your current working
* directory to the dataset. There is no way to prevent this
* behavior so we simply check for it and strip the prepended
* patch when it is added.
*/
if (getcwd(cwd, PATH_MAX) == NULL)
return (dataset);
int len = strlen(cwd);
if (strncmp(cwd, target, len) == 0)
target += len;
len = strlen(cwd);
/* Assume pool/dataset is more likely */
strlcpy(*dataset, target, PATH_MAX);
/* Do not add one when cwd already ends in a trailing '/' */
if (strncmp(cwd, dataset, len) == 0)
return (dataset + len + (cwd[len-1] != '/'));
int fd = open(target, O_RDONLY | O_CLOEXEC);
if (fd < 0)
return;
return (dataset);
nvlist_t *cfg = NULL;
if (zpool_read_label(fd, &cfg, NULL) == 0) {
char *nm = NULL;
if (!nvlist_lookup_string(cfg, ZPOOL_CONFIG_POOL_NAME, &nm))
strlcpy(*dataset, nm, PATH_MAX);
nvlist_free(cfg);
}
if (close(fd))
perror("close");
}
/*
@@ -326,8 +125,8 @@ mtab_update(char *dataset, char *mntpoint, char *type, char *mntopts)
if (!fp) {
(void) fprintf(stderr, gettext(
"filesystem '%s' was mounted, but /etc/mtab "
"could not be opened due to error %d\n"),
dataset, errno);
"could not be opened due to error: %s\n"),
dataset, strerror(errno));
return (MOUNT_FILEIO);
}
@@ -335,8 +134,8 @@ mtab_update(char *dataset, char *mntpoint, char *type, char *mntopts)
if (error) {
(void) fprintf(stderr, gettext(
"filesystem '%s' was mounted, but /etc/mtab "
"could not be updated due to error %d\n"),
dataset, errno);
"could not be updated due to error: %s\n"),
dataset, strerror(errno));
return (MOUNT_FILEIO);
}
@@ -345,34 +144,6 @@ mtab_update(char *dataset, char *mntpoint, char *type, char *mntopts)
return (MOUNT_SUCCESS);
}
static void
append_mntopt(const char *name, const char *val, char *mntopts,
char *mtabopt, boolean_t quote)
{
char tmp[MNT_LINE_MAX];
snprintf(tmp, MNT_LINE_MAX, quote ? ",%s=\"%s\"" : ",%s=%s", name, val);
if (mntopts)
strlcat(mntopts, tmp, MNT_LINE_MAX);
if (mtabopt)
strlcat(mtabopt, tmp, MNT_LINE_MAX);
}
static void
zfs_selinux_setcontext(zfs_handle_t *zhp, zfs_prop_t zpt, const char *name,
char *mntopts, char *mtabopt)
{
char context[ZFS_MAXPROPLEN];
if (zfs_prop_get(zhp, zpt, context, sizeof (context),
NULL, NULL, 0, B_FALSE) == 0) {
if (strcmp(context, "none") != 0)
append_mntopt(name, context, mntopts, mtabopt, B_TRUE);
}
}
int
main(int argc, char **argv)
{
@@ -383,12 +154,13 @@ main(int argc, char **argv)
char badopt[MNT_LINE_MAX] = { '\0' };
char mtabopt[MNT_LINE_MAX] = { '\0' };
char mntpoint[PATH_MAX];
char *dataset;
char dataset[PATH_MAX], *pdataset = dataset;
unsigned long mntflags = 0, zfsflags = 0, remount = 0;
int sloppy = 0, fake = 0, verbose = 0, nomtab = 0, zfsutil = 0;
int error, c;
(void) setlocale(LC_ALL, "");
(void) setlocale(LC_NUMERIC, "C");
(void) textdomain(TEXT_DOMAIN);
opterr = 0;
@@ -413,10 +185,11 @@ main(int argc, char **argv)
break;
case 'h':
case '?':
(void) fprintf(stderr, gettext("Invalid option '%c'\n"),
optopt);
if (optopt)
(void) fprintf(stderr,
gettext("Invalid option '%c'\n"), optopt);
(void) fprintf(stderr, gettext("Usage: mount.zfs "
"[-sfnv] [-o options] <dataset> <mountpoint>\n"));
"[-sfnvh] [-o options] <dataset> <mountpoint>\n"));
return (MOUNT_USAGE);
}
}
@@ -438,18 +211,18 @@ main(int argc, char **argv)
return (MOUNT_USAGE);
}
dataset = parse_dataset(argv[0]);
parse_dataset(argv[0], &pdataset);
/* canonicalize the mount point */
if (realpath(argv[1], mntpoint) == NULL) {
(void) fprintf(stderr, gettext("filesystem '%s' cannot be "
"mounted at '%s' due to canonicalization error %d.\n"),
dataset, argv[1], errno);
"mounted at '%s' due to canonicalization error: %s\n"),
dataset, argv[1], strerror(errno));
return (MOUNT_SYSERR);
}
/* validate mount options and set mntflags */
error = parse_options(mntopts, &mntflags, &zfsflags, sloppy,
error = zfs_parse_mount_options(mntopts, &mntflags, &zfsflags, sloppy,
badopt, mtabopt);
if (error) {
switch (error) {
@@ -489,7 +262,7 @@ main(int argc, char **argv)
zfsutil = 1;
if ((g_zfs = libzfs_init()) == NULL) {
(void) fprintf(stderr, "%s", libzfs_error_init(errno));
(void) fprintf(stderr, "%s\n", libzfs_error_init(errno));
return (MOUNT_SYSERR);
}
@@ -502,32 +275,7 @@ main(int argc, char **argv)
return (MOUNT_USAGE);
}
/*
* Checks to see if the ZFS_PROP_SELINUX_CONTEXT exists
* if it does, create a tmp variable in case it's needed
* checks to see if the selinux context is set to the default
* if it is, allow the setting of the other context properties
* this is needed because the 'context' property overrides others
* if it is not the default, set the 'context' property
*/
if (zfs_prop_get(zhp, ZFS_PROP_SELINUX_CONTEXT, prop, sizeof (prop),
NULL, NULL, 0, B_FALSE) == 0) {
if (strcmp(prop, "none") == 0) {
zfs_selinux_setcontext(zhp, ZFS_PROP_SELINUX_FSCONTEXT,
MNTOPT_FSCONTEXT, mntopts, mtabopt);
zfs_selinux_setcontext(zhp, ZFS_PROP_SELINUX_DEFCONTEXT,
MNTOPT_DEFCONTEXT, mntopts, mtabopt);
zfs_selinux_setcontext(zhp,
ZFS_PROP_SELINUX_ROOTCONTEXT, MNTOPT_ROOTCONTEXT,
mntopts, mtabopt);
} else {
append_mntopt(MNTOPT_CONTEXT, prop,
mntopts, mtabopt, B_TRUE);
}
}
/* A hint used to determine an auto-mounted snapshot mount point */
append_mntopt(MNTOPT_MNTPOINT, mntpoint, mntopts, NULL, B_FALSE);
zfs_adjust_mount_options(zhp, mntpoint, mntopts, mtabopt);
/* treat all snapshots as legacy mount points */
if (zfs_get_type(zhp) == ZFS_TYPE_SNAPSHOT)
@@ -620,7 +368,7 @@ main(int argc, char **argv)
"mount the filesystem again.\n"), dataset);
return (MOUNT_SYSERR);
}
/* fallthru */
fallthrough;
#endif
default:
(void) fprintf(stderr, gettext("filesystem "
+6 -7
View File
@@ -4,11 +4,7 @@ include $(top_srcdir)/config/Rules.am
AM_CFLAGS += $(FRAME_LARGER_THAN)
# Unconditionally enable ASSERTs
AM_CPPFLAGS += -DDEBUG -UNDEBUG
DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \
-I$(top_srcdir)/lib/libspl/include
AM_CPPFLAGS += -DDEBUG -UNDEBUG -DZFS_DEBUG
bin_PROGRAMS = raidz_test
@@ -18,6 +14,9 @@ raidz_test_SOURCES = \
raidz_bench.c
raidz_test_LDADD = \
$(top_builddir)/lib/libzpool/libzpool.la
$(abs_top_builddir)/lib/libzpool/libzpool.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la
raidz_test_LDADD += -lm -ldl
raidz_test_LDADD += -lm
include $(top_srcdir)/config/CppCheck.am
+23 -8
View File
@@ -31,8 +31,6 @@
#include <sys/vdev_raidz_impl.h>
#include <stdio.h>
#include <sys/time.h>
#include "raidz_test.h"
#define GEN_BENCH_MEMORY (((uint64_t)1ULL)<<32)
@@ -83,8 +81,17 @@ run_gen_bench_impl(const char *impl)
/* create suitable raidz_map */
ncols = rto_opts.rto_dcols + fn + 1;
zio_bench.io_size = 1ULL << ds;
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, fn+1);
if (rto_opts.rto_expand) {
rm_bench = vdev_raidz_map_alloc_expanded(
zio_bench.io_abd,
zio_bench.io_size, zio_bench.io_offset,
rto_opts.rto_ashift, ncols+1, ncols,
fn+1, rto_opts.rto_expand_offset);
} else {
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, fn+1);
}
/* estimate iteration count */
iter_cnt = GEN_BENCH_MEMORY;
@@ -113,7 +120,7 @@ run_gen_bench_impl(const char *impl)
}
}
void
static void
run_gen_bench(void)
{
char **impl_name;
@@ -163,8 +170,16 @@ run_rec_bench_impl(const char *impl)
(1ULL << BENCH_ASHIFT))
continue;
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, PARITY_PQR);
if (rto_opts.rto_expand) {
rm_bench = vdev_raidz_map_alloc_expanded(
zio_bench.io_abd,
zio_bench.io_size, zio_bench.io_offset,
BENCH_ASHIFT, ncols+1, ncols,
PARITY_PQR, rto_opts.rto_expand_offset);
} else {
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, PARITY_PQR);
}
/* estimate iteration count */
iter_cnt = (REC_BENCH_MEMORY);
@@ -197,7 +212,7 @@ run_rec_bench_impl(const char *impl)
}
}
void
static void
run_rec_bench(void)
{
char **impl_name;
+285 -46
View File
@@ -77,16 +77,20 @@ static void print_opts(raidz_test_opts_t *opts, boolean_t force)
(void) fprintf(stdout, DBLSEP "Running with options:\n"
" (-a) zio ashift : %zu\n"
" (-o) zio offset : 1 << %zu\n"
" (-e) expanded map : %s\n"
" (-r) reflow offset : %llx\n"
" (-d) number of raidz data columns : %zu\n"
" (-s) size of DATA : 1 << %zu\n"
" (-S) sweep parameters : %s \n"
" (-v) verbose : %s \n\n",
opts->rto_ashift, /* -a */
ilog2(opts->rto_offset), /* -o */
opts->rto_dcols, /* -d */
ilog2(opts->rto_dsize), /* -s */
opts->rto_sweep ? "yes" : "no", /* -S */
verbose); /* -v */
opts->rto_ashift, /* -a */
ilog2(opts->rto_offset), /* -o */
opts->rto_expand ? "yes" : "no", /* -e */
(u_longlong_t)opts->rto_expand_offset, /* -r */
opts->rto_dcols, /* -d */
ilog2(opts->rto_dsize), /* -s */
opts->rto_sweep ? "yes" : "no", /* -S */
verbose); /* -v */
}
}
@@ -104,6 +108,8 @@ static void usage(boolean_t requested)
"\t[-S parameter sweep (default: %s)]\n"
"\t[-t timeout for parameter sweep test]\n"
"\t[-B benchmark all raidz implementations]\n"
"\t[-e use expanded raidz map (default: %s)]\n"
"\t[-r expanded raidz map reflow offset (default: %llx)]\n"
"\t[-v increase verbosity (default: %zu)]\n"
"\t[-h (print help)]\n"
"\t[-T test the test, see if failure would be detected]\n"
@@ -114,6 +120,8 @@ static void usage(boolean_t requested)
o->rto_dcols, /* -d */
ilog2(o->rto_dsize), /* -s */
rto_opts.rto_sweep ? "yes" : "no", /* -S */
rto_opts.rto_expand ? "yes" : "no", /* -e */
(u_longlong_t)o->rto_expand_offset, /* -r */
o->rto_v); /* -d */
exit(requested ? 0 : 1);
@@ -128,7 +136,7 @@ static void process_options(int argc, char **argv)
bcopy(&rto_opts_defaults, o, sizeof (*o));
while ((opt = getopt(argc, argv, "TDBSvha:o:d:s:t:")) != -1) {
while ((opt = getopt(argc, argv, "TDBSvha:er:o:d:s:t:")) != -1) {
value = 0;
switch (opt) {
@@ -136,6 +144,12 @@ static void process_options(int argc, char **argv)
value = strtoull(optarg, NULL, 0);
o->rto_ashift = MIN(13, MAX(9, value));
break;
case 'e':
o->rto_expand = 1;
break;
case 'r':
o->rto_expand_offset = strtoull(optarg, NULL, 0);
break;
case 'o':
value = strtoull(optarg, NULL, 0);
o->rto_offset = ((1ULL << MIN(12, value)) >> 9) << 9;
@@ -179,25 +193,34 @@ static void process_options(int argc, char **argv)
}
}
#define DATA_COL(rm, i) ((rm)->rm_col[raidz_parity(rm) + (i)].rc_abd)
#define DATA_COL_SIZE(rm, i) ((rm)->rm_col[raidz_parity(rm) + (i)].rc_size)
#define DATA_COL(rr, i) ((rr)->rr_col[rr->rr_firstdatacol + (i)].rc_abd)
#define DATA_COL_SIZE(rr, i) ((rr)->rr_col[rr->rr_firstdatacol + (i)].rc_size)
#define CODE_COL(rm, i) ((rm)->rm_col[(i)].rc_abd)
#define CODE_COL_SIZE(rm, i) ((rm)->rm_col[(i)].rc_size)
#define CODE_COL(rr, i) ((rr)->rr_col[(i)].rc_abd)
#define CODE_COL_SIZE(rr, i) ((rr)->rr_col[(i)].rc_size)
static int
cmp_code(raidz_test_opts_t *opts, const raidz_map_t *rm, const int parity)
{
int i, ret = 0;
int r, i, ret = 0;
VERIFY(parity >= 1 && parity <= 3);
for (i = 0; i < parity; i++) {
if (abd_cmp(CODE_COL(rm, i), CODE_COL(opts->rm_golden, i))
!= 0) {
ret++;
LOG_OPT(D_DEBUG, opts,
"\nParity block [%d] different!\n", i);
for (r = 0; r < rm->rm_nrows; r++) {
raidz_row_t * const rr = rm->rm_row[r];
raidz_row_t * const rrg = opts->rm_golden->rm_row[r];
for (i = 0; i < parity; i++) {
if (CODE_COL_SIZE(rrg, i) == 0) {
VERIFY0(CODE_COL_SIZE(rr, i));
continue;
}
if (abd_cmp(CODE_COL(rr, i),
CODE_COL(rrg, i)) != 0) {
ret++;
LOG_OPT(D_DEBUG, opts,
"\nParity block [%d] different!\n", i);
}
}
}
return (ret);
@@ -206,16 +229,26 @@ cmp_code(raidz_test_opts_t *opts, const raidz_map_t *rm, const int parity)
static int
cmp_data(raidz_test_opts_t *opts, raidz_map_t *rm)
{
int i, ret = 0;
int dcols = opts->rm_golden->rm_cols - raidz_parity(opts->rm_golden);
int r, i, dcols, ret = 0;
for (i = 0; i < dcols; i++) {
if (abd_cmp(DATA_COL(opts->rm_golden, i), DATA_COL(rm, i))
!= 0) {
ret++;
for (r = 0; r < rm->rm_nrows; r++) {
raidz_row_t *rr = rm->rm_row[r];
raidz_row_t *rrg = opts->rm_golden->rm_row[r];
dcols = opts->rm_golden->rm_row[0]->rr_cols -
raidz_parity(opts->rm_golden);
for (i = 0; i < dcols; i++) {
if (DATA_COL_SIZE(rrg, i) == 0) {
VERIFY0(DATA_COL_SIZE(rr, i));
continue;
}
LOG_OPT(D_DEBUG, opts,
"\nData block [%d] different!\n", i);
if (abd_cmp(DATA_COL(rrg, i),
DATA_COL(rr, i)) != 0) {
ret++;
LOG_OPT(D_DEBUG, opts,
"\nData block [%d] different!\n", i);
}
}
}
return (ret);
@@ -236,12 +269,13 @@ init_rand(void *data, size_t size, void *private)
static void
corrupt_colums(raidz_map_t *rm, const int *tgts, const int cnt)
{
int i;
raidz_col_t *col;
for (i = 0; i < cnt; i++) {
col = &rm->rm_col[tgts[i]];
abd_iterate_func(col->rc_abd, 0, col->rc_size, init_rand, NULL);
for (int r = 0; r < rm->rm_nrows; r++) {
raidz_row_t *rr = rm->rm_row[r];
for (int i = 0; i < cnt; i++) {
raidz_col_t *col = &rr->rr_col[tgts[i]];
abd_iterate_func(col->rc_abd, 0, col->rc_size,
init_rand, NULL);
}
}
}
@@ -288,10 +322,22 @@ init_raidz_golden_map(raidz_test_opts_t *opts, const int parity)
VERIFY0(vdev_raidz_impl_set("original"));
opts->rm_golden = vdev_raidz_map_alloc(opts->zio_golden,
opts->rto_ashift, total_ncols, parity);
rm_test = vdev_raidz_map_alloc(zio_test,
opts->rto_ashift, total_ncols, parity);
if (opts->rto_expand) {
opts->rm_golden =
vdev_raidz_map_alloc_expanded(opts->zio_golden->io_abd,
opts->zio_golden->io_size, opts->zio_golden->io_offset,
opts->rto_ashift, total_ncols+1, total_ncols,
parity, opts->rto_expand_offset);
rm_test = vdev_raidz_map_alloc_expanded(zio_test->io_abd,
zio_test->io_size, zio_test->io_offset,
opts->rto_ashift, total_ncols+1, total_ncols,
parity, opts->rto_expand_offset);
} else {
opts->rm_golden = vdev_raidz_map_alloc(opts->zio_golden,
opts->rto_ashift, total_ncols, parity);
rm_test = vdev_raidz_map_alloc(zio_test,
opts->rto_ashift, total_ncols, parity);
}
VERIFY(opts->zio_golden);
VERIFY(opts->rm_golden);
@@ -312,6 +358,187 @@ init_raidz_golden_map(raidz_test_opts_t *opts, const int parity)
return (err);
}
/*
* If reflow is not in progress, reflow_offset should be UINT64_MAX.
* For each row, if the row is entirely before reflow_offset, it will
* come from the new location. Otherwise this row will come from the
* old location. Therefore, rows that straddle the reflow_offset will
* come from the old location.
*
* NOTE: Until raidz expansion is implemented this function is only
* needed by raidz_test.c to the multi-row raid_map_t functionality.
*/
raidz_map_t *
vdev_raidz_map_alloc_expanded(abd_t *abd, uint64_t size, uint64_t offset,
uint64_t ashift, uint64_t physical_cols, uint64_t logical_cols,
uint64_t nparity, uint64_t reflow_offset)
{
/* The zio's size in units of the vdev's minimum sector size. */
uint64_t s = size >> ashift;
uint64_t q, r, bc, devidx, asize = 0, tot;
/*
* "Quotient": The number of data sectors for this stripe on all but
* the "big column" child vdevs that also contain "remainder" data.
* AKA "full rows"
*/
q = s / (logical_cols - nparity);
/*
* "Remainder": The number of partial stripe data sectors in this I/O.
* This will add a sector to some, but not all, child vdevs.
*/
r = s - q * (logical_cols - nparity);
/* The number of "big columns" - those which contain remainder data. */
bc = (r == 0 ? 0 : r + nparity);
/*
* The total number of data and parity sectors associated with
* this I/O.
*/
tot = s + nparity * (q + (r == 0 ? 0 : 1));
/* How many rows contain data (not skip) */
uint64_t rows = howmany(tot, logical_cols);
int cols = MIN(tot, logical_cols);
raidz_map_t *rm = kmem_zalloc(offsetof(raidz_map_t, rm_row[rows]),
KM_SLEEP);
rm->rm_nrows = rows;
for (uint64_t row = 0; row < rows; row++) {
raidz_row_t *rr = kmem_alloc(offsetof(raidz_row_t,
rr_col[cols]), KM_SLEEP);
rm->rm_row[row] = rr;
/* The starting RAIDZ (parent) vdev sector of the row. */
uint64_t b = (offset >> ashift) + row * logical_cols;
/*
* If we are in the middle of a reflow, and any part of this
* row has not been copied, then use the old location of
* this row.
*/
int row_phys_cols = physical_cols;
if (b + (logical_cols - nparity) > reflow_offset >> ashift)
row_phys_cols--;
/* starting child of this row */
uint64_t child_id = b % row_phys_cols;
/* The starting byte offset on each child vdev. */
uint64_t child_offset = (b / row_phys_cols) << ashift;
/*
* We set cols to the entire width of the block, even
* if this row is shorter. This is needed because parity
* generation (for Q and R) needs to know the entire width,
* because it treats the short row as though it was
* full-width (and the "phantom" sectors were zero-filled).
*
* Another approach to this would be to set cols shorter
* (to just the number of columns that we might do i/o to)
* and have another mechanism to tell the parity generation
* about the "entire width". Reconstruction (at least
* vdev_raidz_reconstruct_general()) would also need to
* know about the "entire width".
*/
rr->rr_cols = cols;
rr->rr_bigcols = bc;
rr->rr_missingdata = 0;
rr->rr_missingparity = 0;
rr->rr_firstdatacol = nparity;
rr->rr_abd_empty = NULL;
rr->rr_nempty = 0;
for (int c = 0; c < rr->rr_cols; c++, child_id++) {
if (child_id >= row_phys_cols) {
child_id -= row_phys_cols;
child_offset += 1ULL << ashift;
}
rr->rr_col[c].rc_devidx = child_id;
rr->rr_col[c].rc_offset = child_offset;
rr->rr_col[c].rc_orig_data = NULL;
rr->rr_col[c].rc_error = 0;
rr->rr_col[c].rc_tried = 0;
rr->rr_col[c].rc_skipped = 0;
rr->rr_col[c].rc_need_orig_restore = B_FALSE;
uint64_t dc = c - rr->rr_firstdatacol;
if (c < rr->rr_firstdatacol) {
rr->rr_col[c].rc_size = 1ULL << ashift;
rr->rr_col[c].rc_abd =
abd_alloc_linear(rr->rr_col[c].rc_size,
B_TRUE);
} else if (row == rows - 1 && bc != 0 && c >= bc) {
/*
* Past the end, this for parity generation.
*/
rr->rr_col[c].rc_size = 0;
rr->rr_col[c].rc_abd = NULL;
} else {
/*
* "data column" (col excluding parity)
* Add an ASCII art diagram here
*/
uint64_t off;
if (c < bc || r == 0) {
off = dc * rows + row;
} else {
off = r * rows +
(dc - r) * (rows - 1) + row;
}
rr->rr_col[c].rc_size = 1ULL << ashift;
rr->rr_col[c].rc_abd = abd_get_offset_struct(
&rr->rr_col[c].rc_abdstruct,
abd, off << ashift, 1 << ashift);
}
asize += rr->rr_col[c].rc_size;
}
/*
* If all data stored spans all columns, there's a danger that
* parity will always be on the same device and, since parity
* isn't read during normal operation, that that device's I/O
* bandwidth won't be used effectively. We therefore switch
* the parity every 1MB.
*
* ...at least that was, ostensibly, the theory. As a practical
* matter unless we juggle the parity between all devices
* evenly, we won't see any benefit. Further, occasional writes
* that aren't a multiple of the LCM of the number of children
* and the minimum stripe width are sufficient to avoid pessimal
* behavior. Unfortunately, this decision created an implicit
* on-disk format requirement that we need to support for all
* eternity, but only for single-parity RAID-Z.
*
* If we intend to skip a sector in the zeroth column for
* padding we must make sure to note this swap. We will never
* intend to skip the first column since at least one data and
* one parity column must appear in each row.
*/
if (rr->rr_firstdatacol == 1 && rr->rr_cols > 1 &&
(offset & (1ULL << 20))) {
ASSERT(rr->rr_cols >= 2);
ASSERT(rr->rr_col[0].rc_size == rr->rr_col[1].rc_size);
devidx = rr->rr_col[0].rc_devidx;
uint64_t o = rr->rr_col[0].rc_offset;
rr->rr_col[0].rc_devidx = rr->rr_col[1].rc_devidx;
rr->rr_col[0].rc_offset = rr->rr_col[1].rc_offset;
rr->rr_col[1].rc_devidx = devidx;
rr->rr_col[1].rc_offset = o;
}
}
ASSERT3U(asize, ==, tot << ashift);
/* init RAIDZ parity ops */
rm->rm_ops = vdev_raidz_math_get_ops();
return (rm);
}
static raidz_map_t *
init_raidz_map(raidz_test_opts_t *opts, zio_t **zio, const int parity)
{
@@ -330,8 +557,15 @@ init_raidz_map(raidz_test_opts_t *opts, zio_t **zio, const int parity)
(*zio)->io_abd = raidz_alloc(alloc_dsize);
init_zio_abd(*zio);
rm = vdev_raidz_map_alloc(*zio, opts->rto_ashift,
total_ncols, parity);
if (opts->rto_expand) {
rm = vdev_raidz_map_alloc_expanded((*zio)->io_abd,
(*zio)->io_size, (*zio)->io_offset,
opts->rto_ashift, total_ncols+1, total_ncols,
parity, opts->rto_expand_offset);
} else {
rm = vdev_raidz_map_alloc(*zio, opts->rto_ashift,
total_ncols, parity);
}
VERIFY(rm);
/* Make sure code columns are destroyed */
@@ -420,7 +654,7 @@ run_rec_check_impl(raidz_test_opts_t *opts, raidz_map_t *rm, const int fn)
if (fn < RAIDZ_REC_PQ) {
/* can reconstruct 1 failed data disk */
for (x0 = 0; x0 < opts->rto_dcols; x0++) {
if (x0 >= rm->rm_cols - raidz_parity(rm))
if (x0 >= rm->rm_row[0]->rr_cols - raidz_parity(rm))
continue;
/* Check if should stop */
@@ -445,10 +679,11 @@ run_rec_check_impl(raidz_test_opts_t *opts, raidz_map_t *rm, const int fn)
} else if (fn < RAIDZ_REC_PQR) {
/* can reconstruct 2 failed data disk */
for (x0 = 0; x0 < opts->rto_dcols; x0++) {
if (x0 >= rm->rm_cols - raidz_parity(rm))
if (x0 >= rm->rm_row[0]->rr_cols - raidz_parity(rm))
continue;
for (x1 = x0 + 1; x1 < opts->rto_dcols; x1++) {
if (x1 >= rm->rm_cols - raidz_parity(rm))
if (x1 >= rm->rm_row[0]->rr_cols -
raidz_parity(rm))
continue;
/* Check if should stop */
@@ -475,14 +710,15 @@ run_rec_check_impl(raidz_test_opts_t *opts, raidz_map_t *rm, const int fn)
} else {
/* can reconstruct 3 failed data disk */
for (x0 = 0; x0 < opts->rto_dcols; x0++) {
if (x0 >= rm->rm_cols - raidz_parity(rm))
if (x0 >= rm->rm_row[0]->rr_cols - raidz_parity(rm))
continue;
for (x1 = x0 + 1; x1 < opts->rto_dcols; x1++) {
if (x1 >= rm->rm_cols - raidz_parity(rm))
if (x1 >= rm->rm_row[0]->rr_cols -
raidz_parity(rm))
continue;
for (x2 = x1 + 1; x2 < opts->rto_dcols; x2++) {
if (x2 >=
rm->rm_cols - raidz_parity(rm))
if (x2 >= rm->rm_row[0]->rr_cols -
raidz_parity(rm))
continue;
/* Check if should stop */
@@ -700,6 +936,8 @@ run_sweep(void)
opts->rto_dcols = dcols_v[d];
opts->rto_offset = (1 << ashift_v[a]) * rand();
opts->rto_dsize = size_v[s];
opts->rto_expand = rto_opts.rto_expand;
opts->rto_expand_offset = rto_opts.rto_expand_offset;
opts->rto_v = 0; /* be quiet */
VERIFY3P(thread_create(NULL, 0, sweep_thread, (void *) opts,
@@ -732,6 +970,7 @@ exit:
return (sweep_state == SWEEP_ERROR ? SWEEP_ERROR : 0);
}
int
main(int argc, char **argv)
{
@@ -757,7 +996,7 @@ main(int argc, char **argv)
process_options(argc, argv);
kernel_init(FREAD);
kernel_init(SPA_MODE_READ);
/* setup random data because rand() is not reentrant */
rand_data = (int *)umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
+9 -1
View File
@@ -38,18 +38,21 @@ static const char *raidz_impl_names[] = {
"avx512bw",
"aarch64_neon",
"aarch64_neonx2",
"powerpc_altivec",
NULL
};
typedef struct raidz_test_opts {
size_t rto_ashift;
size_t rto_offset;
uint64_t rto_offset;
size_t rto_dcols;
size_t rto_dsize;
size_t rto_v;
size_t rto_sweep;
size_t rto_sweep_timeout;
size_t rto_benchmark;
size_t rto_expand;
uint64_t rto_expand_offset;
size_t rto_sanity;
size_t rto_gdb;
@@ -68,6 +71,8 @@ static const raidz_test_opts_t rto_opts_defaults = {
.rto_v = 0,
.rto_sweep = 0,
.rto_benchmark = 0,
.rto_expand = 0,
.rto_expand_offset = -1ULL,
.rto_sanity = 0,
.rto_gdb = 0,
.rto_should_stop = B_FALSE
@@ -112,4 +117,7 @@ void init_zio_abd(zio_t *zio);
void run_raidz_benchmark(void);
struct raidz_map *vdev_raidz_map_alloc_expanded(abd_t *, uint64_t, uint64_t,
uint64_t, uint64_t, uint64_t, uint64_t, uint64_t);
#endif /* RAIDZ_TEST_H */
+2
View File
@@ -1 +1,3 @@
include $(top_srcdir)/config/Shellcheck.am
dist_udev_SCRIPTS = vdev_id
+309 -126
View File
@@ -79,6 +79,34 @@
# channel 86:00.0 1 A
# channel 86:00.0 0 B
# #
# # Example vdev_id.conf - multipath / multijbod-daisychaining
# #
#
# multipath yes
# multijbod yes
#
# # PCI_ID HBA PORT CHANNEL NAME
# channel 85:00.0 1 A
# channel 85:00.0 0 B
# channel 86:00.0 1 A
# channel 86:00.0 0 B
# #
# # Example vdev_id.conf - multipath / mixed
# #
#
# multipath yes
# slot mix
#
# # PCI_ID HBA PORT CHANNEL NAME
# channel 85:00.0 3 A
# channel 85:00.0 2 B
# channel 86:00.0 3 A
# channel 86:00.0 2 B
# channel af:00.0 0 C
# channel af:00.0 1 C
# #
# # Example vdev_id.conf - alias
# #
@@ -92,9 +120,10 @@ PATH=/bin:/sbin:/usr/bin:/usr/sbin
CONFIG=/etc/zfs/vdev_id.conf
PHYS_PER_PORT=
DEV=
MULTIPATH=
TOPOLOGY=
BAY=
ENCL_ID=""
UNIQ_ENCL_ID=""
usage() {
cat << EOF
@@ -102,71 +131,153 @@ Usage: vdev_id [-h]
vdev_id <-d device> [-c config_file] [-p phys_per_port]
[-g sas_direct|sas_switch|scsi] [-m]
-c specify name of alernate config file [default=$CONFIG]
-c specify name of an alternative config file [default=$CONFIG]
-d specify basename of device (i.e. sda)
-e Create enclose device symlinks only (/dev/by-enclosure)
-g Storage network topology [default="$TOPOLOGY"]
-m Run in multipath mode
-j Run in multijbod mode
-p number of phy's per switch port [default=$PHYS_PER_PORT]
-h show this summary
EOF
exit 0
exit 1
# exit with error to avoid processing usage message by a udev rule
}
map_slot() {
local LINUX_SLOT=$1
local CHANNEL=$2
local MAPPED_SLOT=
LINUX_SLOT=$1
CHANNEL=$2
MAPPED_SLOT=`awk "\\$1 == \"slot\" && \\$2 == ${LINUX_SLOT} && \
\\$4 ~ /^${CHANNEL}$|^$/ { print \\$3; exit }" $CONFIG`
MAPPED_SLOT=$(awk -v linux_slot="$LINUX_SLOT" -v channel="$CHANNEL" \
'$1 == "slot" && $2 == linux_slot && \
($4 ~ "^"channel"$" || $4 ~ /^$/) { print $3; exit}' $CONFIG)
if [ -z "$MAPPED_SLOT" ] ; then
MAPPED_SLOT=$LINUX_SLOT
fi
printf "%d" ${MAPPED_SLOT}
printf "%d" "${MAPPED_SLOT}"
}
map_channel() {
local MAPPED_CHAN=
local PCI_ID=$1
local PORT=$2
MAPPED_CHAN=
PCI_ID=$1
PORT=$2
case $TOPOLOGY in
"sas_switch")
MAPPED_CHAN=`awk "\\$1 == \"channel\" && \\$2 == ${PORT} \
{ print \\$3; exit }" $CONFIG`
MAPPED_CHAN=$(awk -v port="$PORT" \
'$1 == "channel" && $2 == port \
{ print $3; exit }' $CONFIG)
;;
"sas_direct"|"scsi")
MAPPED_CHAN=`awk "\\$1 == \"channel\" && \
\\$2 == \"${PCI_ID}\" && \\$3 == ${PORT} \
{ print \\$4; exit }" $CONFIG`
MAPPED_CHAN=$(awk -v pciID="$PCI_ID" -v port="$PORT" \
'$1 == "channel" && $2 == pciID && $3 == port \
{print $4}' $CONFIG)
;;
esac
printf "%s" ${MAPPED_CHAN}
printf "%s" "${MAPPED_CHAN}"
}
get_encl_id() {
set -- $(echo $1)
count=$#
i=1
while [ $i -le $count ] ; do
d=$(eval echo '$'{$i})
id=$(cat "/sys/class/enclosure/${d}/id")
ENCL_ID="${ENCL_ID} $id"
i=$((i + 1))
done
}
get_uniq_encl_id() {
for uuid in ${ENCL_ID}; do
found=0
for count in ${UNIQ_ENCL_ID}; do
if [ $count = $uuid ]; then
found=1
break
fi
done
if [ $found -eq 0 ]; then
UNIQ_ENCL_ID="${UNIQ_ENCL_ID} $uuid"
fi
done
}
# map_jbod explainer: The bsg driver knows the difference between a SAS
# expander and fanout expander. Use hostX instance along with top-level
# (whole enclosure) expander instances in /sys/class/enclosure and
# matching a field in an array of expanders, using the index of the
# matched array field as the enclosure instance, thereby making jbod IDs
# dynamic. Avoids reliance on high overhead userspace commands like
# multipath and lsscsi and instead uses existing sysfs data. $HOSTCHAN
# variable derived from devpath gymnastics in sas_handler() function.
map_jbod() {
DEVEXP=$(ls -l "/sys/block/$DEV/device/" | grep enclos | awk -F/ '{print $(NF-1) }')
DEV=$1
# Use "set --" to create index values (Arrays)
set -- $(ls -l /sys/class/enclosure | grep -v "^total" | awk '{print $9}')
# Get count of total elements
JBOD_COUNT=$#
JBOD_ITEM=$*
# Build JBODs (enclosure) id from sys/class/enclosure/<dev>/id
get_encl_id "$JBOD_ITEM"
# Different expander instances for each paths.
# Filter out and keep only unique id.
get_uniq_encl_id
# Identify final 'mapped jbod'
j=0
for count in ${UNIQ_ENCL_ID}; do
i=1
j=$((j + 1))
while [ $i -le $JBOD_COUNT ] ; do
d=$(eval echo '$'{$i})
id=$(cat "/sys/class/enclosure/${d}/id")
if [ "$d" = "$DEVEXP" ] && [ $id = $count ] ; then
MAPPED_JBOD=$j
break
fi
i=$((i + 1))
done
done
printf "%d" "${MAPPED_JBOD}"
}
sas_handler() {
if [ -z "$PHYS_PER_PORT" ] ; then
PHYS_PER_PORT=`awk "\\$1 == \"phys_per_port\" \
{print \\$2; exit}" $CONFIG`
PHYS_PER_PORT=$(awk '$1 == "phys_per_port" \
{print $2; exit}' $CONFIG)
fi
PHYS_PER_PORT=${PHYS_PER_PORT:-4}
if ! echo $PHYS_PER_PORT | grep -q -E '^[0-9]+$' ; then
if ! echo "$PHYS_PER_PORT" | grep -q -E '^[0-9]+$' ; then
echo "Error: phys_per_port value $PHYS_PER_PORT is non-numeric"
exit 1
fi
if [ -z "$MULTIPATH_MODE" ] ; then
MULTIPATH_MODE=`awk "\\$1 == \"multipath\" \
{print \\$2; exit}" $CONFIG`
MULTIPATH_MODE=$(awk '$1 == "multipath" \
{print $2; exit}' $CONFIG)
fi
if [ -z "$MULTIJBOD_MODE" ] ; then
MULTIJBOD_MODE=$(awk '$1 == "multijbod" \
{print $2; exit}' $CONFIG)
fi
# Use first running component device if we're handling a dm-mpath device
if [ "$MULTIPATH_MODE" = "yes" ] ; then
# If udev didn't tell us the UUID via DM_NAME, check /dev/mapper
if [ -z "$DM_NAME" ] ; then
DM_NAME=`ls -l --full-time /dev/mapper |
awk "/\/$DEV$/{print \\$9}"`
DM_NAME=$(ls -l --full-time /dev/mapper |
grep "$DEV"$ | awk '{print $9}')
fi
# For raw disks udev exports DEVTYPE=partition when
@@ -176,28 +287,50 @@ sas_handler() {
# we have to append the -part suffix directly in the
# helper.
if [ "$DEVTYPE" != "partition" ] ; then
PART=`echo $DM_NAME | awk -Fp '/p/{print "-part"$2}'`
# Match p[number], remove the 'p' and prepend "-part"
PART=$(echo "$DM_NAME" |
awk 'match($0,/p[0-9]+$/) {print "-part"substr($0,RSTART+1,RLENGTH-1)}')
fi
# Strip off partition information.
DM_NAME=`echo $DM_NAME | sed 's/p[0-9][0-9]*$//'`
DM_NAME=$(echo "$DM_NAME" | sed 's/p[0-9][0-9]*$//')
if [ -z "$DM_NAME" ] ; then
return
fi
# Get the raw scsi device name from multipath -ll. Strip off
# leading pipe symbols to make field numbering consistent.
DEV=`multipath -ll $DM_NAME |
awk '/running/{gsub("^[|]"," "); print $3 ; exit}'`
# Utilize DM device name to gather subordinate block devices
# using sysfs to avoid userspace utilities
# If our DEVNAME is something like /dev/dm-177, then we may be
# able to get our DMDEV from it.
DMDEV=$(echo $DEVNAME | sed 's;/dev/;;g')
if [ ! -e /sys/block/$DMDEV/slaves/* ] ; then
# It's not there, try looking in /dev/mapper
DMDEV=$(ls -l --full-time /dev/mapper | grep $DM_NAME |
awk '{gsub("../", " "); print $NF}')
fi
# Use sysfs pointers in /sys/block/dm-X/slaves because using
# userspace tools creates lots of overhead and should be avoided
# whenever possible. Use awk to isolate lowest instance of
# sd device member in dm device group regardless of string
# length.
DEV=$(ls "/sys/block/$DMDEV/slaves" | awk '
{ len=sprintf ("%20s",length($0)); gsub(/ /,0,str); a[NR]=len "_" $0; }
END {
asort(a)
print substr(a[1],22)
}')
if [ -z "$DEV" ] ; then
return
fi
fi
if echo $DEV | grep -q ^/devices/ ; then
if echo "$DEV" | grep -q ^/devices/ ; then
sys_path=$DEV
else
sys_path=`udevadm info -q path -p /sys/block/$DEV 2>/dev/null`
sys_path=$(udevadm info -q path -p "/sys/block/$DEV" 2>/dev/null)
fi
# Use positional parameters as an ad-hoc array
@@ -207,84 +340,104 @@ sas_handler() {
# Get path up to /sys/.../hostX
i=1
while [ $i -le $num_dirs ] ; do
d=$(eval echo \${$i})
while [ $i -le "$num_dirs" ] ; do
d=$(eval echo '$'{$i})
scsi_host_dir="$scsi_host_dir/$d"
echo $d | grep -q -E '^host[0-9]+$' && break
i=$(($i + 1))
echo "$d" | grep -q -E '^host[0-9]+$' && break
i=$((i + 1))
done
if [ $i = $num_dirs ] ; then
# Lets grab the SAS host channel number and save it for JBOD sorting later
HOSTCHAN=$(echo "$d" | awk -F/ '{ gsub("host","",$NF); print $NF}')
if [ $i = "$num_dirs" ] ; then
return
fi
PCI_ID=$(eval echo \${$(($i -1))} | awk -F: '{print $2":"$3}')
PCI_ID=$(eval echo '$'{$((i -1))} | awk -F: '{print $2":"$3}')
# In sas_switch mode, the directory four levels beneath
# /sys/.../hostX contains symlinks to phy devices that reveal
# the switch port number. In sas_direct mode, the phy links one
# directory down reveal the HBA port.
port_dir=$scsi_host_dir
case $TOPOLOGY in
"sas_switch") j=$(($i + 4)) ;;
"sas_direct") j=$(($i + 1)) ;;
"sas_switch") j=$((i + 4)) ;;
"sas_direct") j=$((i + 1)) ;;
esac
i=$(($i + 1))
i=$((i + 1))
while [ $i -le $j ] ; do
port_dir="$port_dir/$(eval echo \${$i})"
i=$(($i + 1))
port_dir="$port_dir/$(eval echo '$'{$i})"
i=$((i + 1))
done
PHY=`ls -d $port_dir/phy* 2>/dev/null | head -1 | awk -F: '{print $NF}'`
PHY=$(ls -vd "$port_dir"/phy* 2>/dev/null | head -1 | awk -F: '{print $NF}')
if [ -z "$PHY" ] ; then
PHY=0
fi
PORT=$(( $PHY / $PHYS_PER_PORT ))
PORT=$((PHY / PHYS_PER_PORT))
# Look in /sys/.../sas_device/end_device-X for the bay_identifier
# attribute.
end_device_dir=$port_dir
while [ $i -lt $num_dirs ] ; do
d=$(eval echo \${$i})
while [ $i -lt "$num_dirs" ] ; do
d=$(eval echo '$'{$i})
end_device_dir="$end_device_dir/$d"
if echo $d | grep -q '^end_device' ; then
if echo "$d" | grep -q '^end_device' ; then
end_device_dir="$end_device_dir/sas_device/$d"
break
fi
i=$(($i + 1))
i=$((i + 1))
done
# Add 'mix' slot type for environments where dm-multipath devices
# include end-devices connected via SAS expanders or direct connection
# to SAS HBA. A mixed connectivity environment such as pool devices
# contained in a SAS JBOD and spare drives or log devices directly
# connected in a server backplane without expanders in the I/O path.
SLOT=
case $BAY in
"bay")
SLOT=`cat $end_device_dir/bay_identifier 2>/dev/null`
SLOT=$(cat "$end_device_dir/bay_identifier" 2>/dev/null)
;;
"mix")
if [ $(cat "$end_device_dir/bay_identifier" 2>/dev/null) ] ; then
SLOT=$(cat "$end_device_dir/bay_identifier" 2>/dev/null)
else
SLOT=$(cat "$end_device_dir/phy_identifier" 2>/dev/null)
fi
;;
"phy")
SLOT=`cat $end_device_dir/phy_identifier 2>/dev/null`
SLOT=$(cat "$end_device_dir/phy_identifier" 2>/dev/null)
;;
"port")
d=$(eval echo \${$i})
SLOT=`echo $d | sed -e 's/^.*://'`
d=$(eval echo '$'{$i})
SLOT=$(echo "$d" | sed -e 's/^.*://')
;;
"id")
i=$(($i + 1))
d=$(eval echo \${$i})
SLOT=`echo $d | sed -e 's/^.*://'`
i=$((i + 1))
d=$(eval echo '$'{$i})
SLOT=$(echo "$d" | sed -e 's/^.*://')
;;
"lun")
i=$(($i + 2))
d=$(eval echo \${$i})
SLOT=`echo $d | sed -e 's/^.*://'`
i=$((i + 2))
d=$(eval echo '$'{$i})
SLOT=$(echo "$d" | sed -e 's/^.*://')
;;
"ses")
# look for this SAS path in all SCSI Enclosure Services
# (SES) enclosures
sas_address=`cat $end_device_dir/sas_address 2>/dev/null`
enclosures=`lsscsi -g | \
sed -n -e '/enclosu/s/^.* \([^ ][^ ]*\) *$/\1/p'`
sas_address=$(cat "$end_device_dir/sas_address" 2>/dev/null)
enclosures=$(lsscsi -g | \
sed -n -e '/enclosu/s/^.* \([^ ][^ ]*\) *$/\1/p')
for enclosure in $enclosures; do
set -- $(sg_ses -p aes $enclosure | \
set -- $(sg_ses -p aes "$enclosure" | \
awk "/device slot number:/{slot=\$12} \
/SAS address: $sas_address/\
{print slot}")
@@ -299,42 +452,55 @@ sas_handler() {
return
fi
CHAN=`map_channel $PCI_ID $PORT`
SLOT=`map_slot $SLOT $CHAN`
if [ -z "$CHAN" ] ; then
return
if [ "$MULTIJBOD_MODE" = "yes" ] ; then
CHAN=$(map_channel "$PCI_ID" "$PORT")
SLOT=$(map_slot "$SLOT" "$CHAN")
JBOD=$(map_jbod "$DEV")
if [ -z "$CHAN" ] ; then
return
fi
echo "${CHAN}"-"${JBOD}"-"${SLOT}${PART}"
else
CHAN=$(map_channel "$PCI_ID" "$PORT")
SLOT=$(map_slot "$SLOT" "$CHAN")
if [ -z "$CHAN" ] ; then
return
fi
echo "${CHAN}${SLOT}${PART}"
fi
echo ${CHAN}${SLOT}${PART}
}
scsi_handler() {
if [ -z "$FIRST_BAY_NUMBER" ] ; then
FIRST_BAY_NUMBER=`awk "\\$1 == \"first_bay_number\" \
{print \\$2; exit}" $CONFIG`
FIRST_BAY_NUMBER=$(awk '$1 == "first_bay_number" \
{print $2; exit}' $CONFIG)
fi
FIRST_BAY_NUMBER=${FIRST_BAY_NUMBER:-0}
if [ -z "$PHYS_PER_PORT" ] ; then
PHYS_PER_PORT=`awk "\\$1 == \"phys_per_port\" \
{print \\$2; exit}" $CONFIG`
PHYS_PER_PORT=$(awk '$1 == "phys_per_port" \
{print $2; exit}' $CONFIG)
fi
PHYS_PER_PORT=${PHYS_PER_PORT:-4}
if ! echo $PHYS_PER_PORT | grep -q -E '^[0-9]+$' ; then
if ! echo "$PHYS_PER_PORT" | grep -q -E '^[0-9]+$' ; then
echo "Error: phys_per_port value $PHYS_PER_PORT is non-numeric"
exit 1
fi
if [ -z "$MULTIPATH_MODE" ] ; then
MULTIPATH_MODE=`awk "\\$1 == \"multipath\" \
{print \\$2; exit}" $CONFIG`
MULTIPATH_MODE=$(awk '$1 == "multipath" \
{print $2; exit}' $CONFIG)
fi
# Use first running component device if we're handling a dm-mpath device
if [ "$MULTIPATH_MODE" = "yes" ] ; then
# If udev didn't tell us the UUID via DM_NAME, check /dev/mapper
if [ -z "$DM_NAME" ] ; then
DM_NAME=`ls -l --full-time /dev/mapper |
awk "/\/$DEV$/{print \\$9}"`
DM_NAME=$(ls -l --full-time /dev/mapper |
grep "$DEV"$ | awk '{print $9}')
fi
# For raw disks udev exports DEVTYPE=partition when
@@ -344,28 +510,30 @@ scsi_handler() {
# we have to append the -part suffix directly in the
# helper.
if [ "$DEVTYPE" != "partition" ] ; then
PART=`echo $DM_NAME | awk -Fp '/p/{print "-part"$2}'`
# Match p[number], remove the 'p' and prepend "-part"
PART=$(echo "$DM_NAME" |
awk 'match($0,/p[0-9]+$/) {print "-part"substr($0,RSTART+1,RLENGTH-1)}')
fi
# Strip off partition information.
DM_NAME=`echo $DM_NAME | sed 's/p[0-9][0-9]*$//'`
DM_NAME=$(echo "$DM_NAME" | sed 's/p[0-9][0-9]*$//')
if [ -z "$DM_NAME" ] ; then
return
fi
# Get the raw scsi device name from multipath -ll. Strip off
# leading pipe symbols to make field numbering consistent.
DEV=`multipath -ll $DM_NAME |
awk '/running/{gsub("^[|]"," "); print $3 ; exit}'`
DEV=$(multipath -ll "$DM_NAME" |
awk '/running/{gsub("^[|]"," "); print $3 ; exit}')
if [ -z "$DEV" ] ; then
return
fi
fi
if echo $DEV | grep -q ^/devices/ ; then
if echo "$DEV" | grep -q ^/devices/ ; then
sys_path=$DEV
else
sys_path=`udevadm info -q path -p /sys/block/$DEV 2>/dev/null`
sys_path=$(udevadm info -q path -p "/sys/block/$DEV" 2>/dev/null)
fi
# expect sys_path like this, for example:
@@ -378,44 +546,47 @@ scsi_handler() {
# Get path up to /sys/.../hostX
i=1
while [ $i -le $num_dirs ] ; do
d=$(eval echo \${$i})
while [ $i -le "$num_dirs" ] ; do
d=$(eval echo '$'{$i})
scsi_host_dir="$scsi_host_dir/$d"
echo $d | grep -q -E '^host[0-9]+$' && break
i=$(($i + 1))
echo "$d" | grep -q -E '^host[0-9]+$' && break
i=$((i + 1))
done
if [ $i = $num_dirs ] ; then
if [ $i = "$num_dirs" ] ; then
return
fi
PCI_ID=$(eval echo \${$(($i -1))} | awk -F: '{print $2":"$3}')
PCI_ID=$(eval echo '$'{$((i -1))} | awk -F: '{print $2":"$3}')
# In scsi mode, the directory two levels beneath
# /sys/.../hostX reveals the port and slot.
port_dir=$scsi_host_dir
j=$(($i + 2))
j=$((i + 2))
i=$(($i + 1))
i=$((i + 1))
while [ $i -le $j ] ; do
port_dir="$port_dir/$(eval echo \${$i})"
i=$(($i + 1))
port_dir="$port_dir/$(eval echo '$'{$i})"
i=$((i + 1))
done
set -- $(echo $port_dir | sed -e 's/^.*:\([^:]*\):\([^:]*\)$/\1 \2/')
set -- $(echo "$port_dir" | sed -e 's/^.*:\([^:]*\):\([^:]*\)$/\1 \2/')
PORT=$1
SLOT=$(($2 + $FIRST_BAY_NUMBER))
SLOT=$(($2 + FIRST_BAY_NUMBER))
if [ -z "$SLOT" ] ; then
return
fi
CHAN=`map_channel $PCI_ID $PORT`
SLOT=`map_slot $SLOT $CHAN`
CHAN=$(map_channel "$PCI_ID" "$PORT")
SLOT=$(map_slot "$SLOT" "$CHAN")
if [ -z "$CHAN" ] ; then
return
fi
echo ${CHAN}${SLOT}${PART}
echo "${CHAN}${SLOT}${PART}"
}
# Figure out the name for the enclosure symlink
@@ -426,7 +597,7 @@ enclosure_handler () {
# Get the enclosure ID ("0:0:0:0")
ENC=$(basename $(readlink -m "/sys/$DEVPATH/../.."))
if [ ! -d /sys/class/enclosure/$ENC ] ; then
if [ ! -d "/sys/class/enclosure/$ENC" ] ; then
# Not an enclosure, bail out
return
fi
@@ -434,14 +605,14 @@ enclosure_handler () {
# Get the long sysfs device path to our enclosure. Looks like:
# /devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/ ... /enclosure/0:0:0:0
ENC_DEVICE=$(readlink /sys/class/enclosure/$ENC)
ENC_DEVICE=$(readlink "/sys/class/enclosure/$ENC")
# Grab the full path to the hosts port dir:
# /devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0
PORT_DIR=$(echo $ENC_DEVICE | grep -Eo '.+host[0-9]+/port-[0-9]+:[0-9]+')
PORT_DIR=$(echo "$ENC_DEVICE" | grep -Eo '.+host[0-9]+/port-[0-9]+:[0-9]+')
# Get the port number
PORT_ID=$(echo $PORT_DIR | grep -Eo "[0-9]+$")
PORT_ID=$(echo "$PORT_DIR" | grep -Eo "[0-9]+$")
# The PCI directory is two directories up from the port directory
# /sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0
@@ -452,7 +623,7 @@ enclosure_handler () {
# Name our device according to vdev_id.conf (like "L0" or "U1").
NAME=$(awk "/channel/{if (\$1 == \"channel\" && \$2 == \"$PCI_ID\" && \
\$3 == \"$PORT_ID\") {print \$4int(count[\$4])}; count[\$4]++}" $CONFIG)
\$3 == \"$PORT_ID\") {print \$4\$3}}" $CONFIG)
echo "${NAME}"
}
@@ -487,10 +658,12 @@ alias_handler () {
# digits as partitions, causing alias creation to fail. This
# ambiguity seems unavoidable, so devices using this facility
# must not use such names.
local DM_PART=
if echo $DM_NAME | grep -q -E 'p[0-9][0-9]*$' ; then
DM_PART=
if echo "$DM_NAME" | grep -q -E 'p[0-9][0-9]*$' ; then
if [ "$DEVTYPE" != "partition" ] ; then
DM_PART=`echo $DM_NAME | awk -Fp '/p/{print "-part"$2}'`
# Match p[number], remove the 'p' and prepend "-part"
DM_PART=$(echo "$DM_NAME" |
awk 'match($0,/p[0-9]+$/) {print "-part"substr($0,RSTART+1,RLENGTH-1)}')
fi
fi
@@ -498,21 +671,25 @@ alias_handler () {
for link in $DEVLINKS ; do
# Remove partition information to match key of top-level device.
if [ -n "$DM_PART" ] ; then
link=`echo $link | sed 's/p[0-9][0-9]*$//'`
link=$(echo "$link" | sed 's/p[0-9][0-9]*$//')
fi
# Check both the fully qualified and the base name of link.
for l in $link `basename $link` ; do
alias=`awk "\\$1 == \"alias\" && \\$3 == \"${l}\" \
{ print \\$2; exit }" $CONFIG`
if [ -n "$alias" ] ; then
echo ${alias}${DM_PART}
return
for l in $link $(basename "$link") ; do
if [ ! -z "$l" ]; then
alias=$(awk -v var="$l" '($1 == "alias") && \
($3 == var) \
{ print $2; exit }' $CONFIG)
if [ -n "$alias" ] ; then
echo "${alias}${DM_PART}"
return
fi
fi
done
done
}
while getopts 'c:d:eg:mp:h' OPTION; do
# main
while getopts 'c:d:eg:jmp:h' OPTION; do
case ${OPTION} in
c)
CONFIG=${OPTARG}
@@ -525,7 +702,9 @@ while getopts 'c:d:eg:mp:h' OPTION; do
# create the enclosure device symlinks only. We also need
# "enclosure_symlinks yes" set in vdev_id.config to actually create the
# symlink.
ENCLOSURE_MODE=$(awk '{if ($1 == "enclosure_symlinks") print $2}' $CONFIG)
ENCLOSURE_MODE=$(awk '{if ($1 == "enclosure_symlinks") \
print $2}' "$CONFIG")
if [ "$ENCLOSURE_MODE" != "yes" ] ; then
exit 0
fi
@@ -536,6 +715,9 @@ while getopts 'c:d:eg:mp:h' OPTION; do
p)
PHYS_PER_PORT=${OPTARG}
;;
j)
MULTIJBOD_MODE=yes
;;
m)
MULTIPATH_MODE=yes
;;
@@ -545,34 +727,35 @@ while getopts 'c:d:eg:mp:h' OPTION; do
esac
done
if [ ! -r $CONFIG ] ; then
exit 0
if [ ! -r "$CONFIG" ] ; then
echo "Error: Config file \"$CONFIG\" not found"
exit 1
fi
if [ -z "$DEV" -a -z "$ENCLOSURE_MODE" ] ; then
if [ -z "$DEV" ] && [ -z "$ENCLOSURE_MODE" ] ; then
echo "Error: missing required option -d"
exit 1
fi
if [ -z "$TOPOLOGY" ] ; then
TOPOLOGY=`awk "\\$1 == \"topology\" {print \\$2; exit}" $CONFIG`
TOPOLOGY=$(awk '($1 == "topology") {print $2; exit}' "$CONFIG")
fi
if [ -z "$BAY" ] ; then
BAY=`awk "\\$1 == \"slot\" {print \\$2; exit}" $CONFIG`
BAY=$(awk '($1 == "slot") {print $2; exit}' "$CONFIG")
fi
TOPOLOGY=${TOPOLOGY:-sas_direct}
# Should we create /dev/by-enclosure symlinks?
if [ "$ENCLOSURE_MODE" = "yes" -a "$TOPOLOGY" = "sas_direct" ] ; then
if [ "$ENCLOSURE_MODE" = "yes" ] && [ "$TOPOLOGY" = "sas_direct" ] ; then
ID_ENCLOSURE=$(enclosure_handler)
if [ -z "$ID_ENCLOSURE" ] ; then
exit 0
fi
# Just create the symlinks to the enclosure devices and then exit.
ENCLOSURE_PREFIX=$(awk '/enclosure_symlinks_prefix/{print $2}' $CONFIG)
ENCLOSURE_PREFIX=$(awk '/enclosure_symlinks_prefix/{print $2}' "$CONFIG")
if [ -z "$ENCLOSURE_PREFIX" ] ; then
ENCLOSURE_PREFIX="enc"
fi
@@ -582,16 +765,16 @@ if [ "$ENCLOSURE_MODE" = "yes" -a "$TOPOLOGY" = "sas_direct" ] ; then
fi
# First check if an alias was defined for this device.
ID_VDEV=`alias_handler`
ID_VDEV=$(alias_handler)
if [ -z "$ID_VDEV" ] ; then
BAY=${BAY:-bay}
case $TOPOLOGY in
sas_direct|sas_switch)
ID_VDEV=`sas_handler`
ID_VDEV=$(sas_handler)
;;
scsi)
ID_VDEV=`scsi_handler`
ID_VDEV=$(scsi_handler)
;;
*)
echo "Error: unknown topology $TOPOLOGY"
+6 -7
View File
@@ -1,11 +1,7 @@
include $(top_srcdir)/config/Rules.am
# Unconditionally enable debugging for zdb
AM_CPPFLAGS += -DDEBUG -UNDEBUG
DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \
-I$(top_srcdir)/lib/libspl/include
AM_CPPFLAGS += -DDEBUG -UNDEBUG -DZFS_DEBUG
sbin_PROGRAMS = zdb
@@ -15,5 +11,8 @@ zdb_SOURCES = \
zdb.h
zdb_LDADD = \
$(top_builddir)/lib/libnvpair/libnvpair.la \
$(top_builddir)/lib/libzpool/libzpool.la
$(abs_top_builddir)/lib/libzpool/libzpool.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la \
$(abs_top_builddir)/lib/libnvpair/libnvpair.la
include $(top_srcdir)/config/CppCheck.am
+2847 -334
View File
File diff suppressed because it is too large Load Diff
+21 -20
View File
@@ -62,9 +62,9 @@ print_log_bp(const blkptr_t *bp, const char *prefix)
/* ARGSUSED */
static void
zil_prt_rec_create(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_create(zilog_t *zilog, int txtype, const void *arg)
{
lr_create_t *lr = arg;
const lr_create_t *lr = arg;
time_t crtime = lr->lr_crtime[0];
char *name, *link;
lr_attr_t *lrattr;
@@ -98,9 +98,9 @@ zil_prt_rec_create(zilog_t *zilog, int txtype, void *arg)
/* ARGSUSED */
static void
zil_prt_rec_remove(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_remove(zilog_t *zilog, int txtype, const void *arg)
{
lr_remove_t *lr = arg;
const lr_remove_t *lr = arg;
(void) printf("%sdoid %llu, name %s\n", tab_prefix,
(u_longlong_t)lr->lr_doid, (char *)(lr + 1));
@@ -108,9 +108,9 @@ zil_prt_rec_remove(zilog_t *zilog, int txtype, void *arg)
/* ARGSUSED */
static void
zil_prt_rec_link(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_link(zilog_t *zilog, int txtype, const void *arg)
{
lr_link_t *lr = arg;
const lr_link_t *lr = arg;
(void) printf("%sdoid %llu, link_obj %llu, name %s\n", tab_prefix,
(u_longlong_t)lr->lr_doid, (u_longlong_t)lr->lr_link_obj,
@@ -119,9 +119,9 @@ zil_prt_rec_link(zilog_t *zilog, int txtype, void *arg)
/* ARGSUSED */
static void
zil_prt_rec_rename(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_rename(zilog_t *zilog, int txtype, const void *arg)
{
lr_rename_t *lr = arg;
const lr_rename_t *lr = arg;
char *snm = (char *)(lr + 1);
char *tnm = snm + strlen(snm) + 1;
@@ -148,11 +148,11 @@ zil_prt_rec_write_cb(void *data, size_t len, void *unused)
/* ARGSUSED */
static void
zil_prt_rec_write(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_write(zilog_t *zilog, int txtype, const void *arg)
{
lr_write_t *lr = arg;
const lr_write_t *lr = arg;
abd_t *data;
blkptr_t *bp = &lr->lr_blkptr;
const blkptr_t *bp = &lr->lr_blkptr;
zbookmark_phys_t zb;
int verbose = MAX(dump_opt['d'], dump_opt['i']);
int error;
@@ -211,9 +211,9 @@ out:
/* ARGSUSED */
static void
zil_prt_rec_truncate(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_truncate(zilog_t *zilog, int txtype, const void *arg)
{
lr_truncate_t *lr = arg;
const lr_truncate_t *lr = arg;
(void) printf("%sfoid %llu, offset 0x%llx, length 0x%llx\n", tab_prefix,
(u_longlong_t)lr->lr_foid, (longlong_t)lr->lr_offset,
@@ -222,9 +222,9 @@ zil_prt_rec_truncate(zilog_t *zilog, int txtype, void *arg)
/* ARGSUSED */
static void
zil_prt_rec_setattr(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_setattr(zilog_t *zilog, int txtype, const void *arg)
{
lr_setattr_t *lr = arg;
const lr_setattr_t *lr = arg;
time_t atime = (time_t)lr->lr_atime[0];
time_t mtime = (time_t)lr->lr_mtime[0];
@@ -268,15 +268,15 @@ zil_prt_rec_setattr(zilog_t *zilog, int txtype, void *arg)
/* ARGSUSED */
static void
zil_prt_rec_acl(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_acl(zilog_t *zilog, int txtype, const void *arg)
{
lr_acl_t *lr = arg;
const lr_acl_t *lr = arg;
(void) printf("%sfoid %llu, aclcnt %llu\n", tab_prefix,
(u_longlong_t)lr->lr_foid, (u_longlong_t)lr->lr_aclcnt);
}
typedef void (*zil_prt_rec_func_t)(zilog_t *, int, void *);
typedef void (*zil_prt_rec_func_t)(zilog_t *, int, const void *);
typedef struct zil_rec_info {
zil_prt_rec_func_t zri_print;
const char *zri_name;
@@ -309,7 +309,7 @@ static zil_rec_info_t zil_rec_info[TX_MAX_TYPE] = {
/* ARGSUSED */
static int
print_log_record(zilog_t *zilog, lr_t *lr, void *arg, uint64_t claim_txg)
print_log_record(zilog_t *zilog, const lr_t *lr, void *arg, uint64_t claim_txg)
{
int txtype;
int verbose = MAX(dump_opt['d'], dump_opt['i']);
@@ -343,7 +343,8 @@ print_log_record(zilog_t *zilog, lr_t *lr, void *arg, uint64_t claim_txg)
/* ARGSUSED */
static int
print_log_block(zilog_t *zilog, blkptr_t *bp, void *arg, uint64_t claim_txg)
print_log_block(zilog_t *zilog, const blkptr_t *bp, void *arg,
uint64_t claim_txg)
{
char blkbuf[BP_SPRINTF_LEN + 10];
int verbose = MAX(dump_opt['d'], dump_opt['i']);
+11 -59
View File
@@ -1,11 +1,10 @@
include $(top_srcdir)/config/Rules.am
include $(top_srcdir)/config/Shellcheck.am
DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \
-I$(top_srcdir)/lib/libspl/include
AM_CFLAGS += $(LIBUDEV_CFLAGS) $(LIBUUID_CFLAGS)
EXTRA_DIST = zed.d/README \
zed.d/history_event-zfs-list-cacher.sh.in
SUBDIRS = zed.d
SHELLCHECKDIRS = $(SUBDIRS)
sbin_PROGRAMS = zed
@@ -41,61 +40,14 @@ FMA_SRC = \
zed_SOURCES = $(ZED_SRC) $(FMA_SRC)
zed_LDADD = \
$(top_builddir)/lib/libnvpair/libnvpair.la \
$(top_builddir)/lib/libuutil/libuutil.la \
$(top_builddir)/lib/libzfs/libzfs.la
$(abs_top_builddir)/lib/libzfs/libzfs.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la \
$(abs_top_builddir)/lib/libnvpair/libnvpair.la \
$(abs_top_builddir)/lib/libuutil/libuutil.la
zed_LDADD += -lrt
zed_LDADD += -lrt $(LIBATOMIC_LIBS) $(LIBUDEV_LIBS) $(LIBUUID_LIBS)
zed_LDFLAGS = -pthread
zedconfdir = $(sysconfdir)/zfs/zed.d
EXTRA_DIST = agents/README.md
dist_zedconf_DATA = \
zed.d/zed-functions.sh \
zed.d/zed.rc
zedexecdir = $(zfsexecdir)/zed.d
dist_zedexec_SCRIPTS = \
zed.d/all-debug.sh \
zed.d/all-syslog.sh \
zed.d/data-notify.sh \
zed.d/generic-notify.sh \
zed.d/resilver_finish-notify.sh \
zed.d/scrub_finish-notify.sh \
zed.d/statechange-led.sh \
zed.d/statechange-notify.sh \
zed.d/vdev_clear-led.sh \
zed.d/vdev_attach-led.sh \
zed.d/pool_import-led.sh \
zed.d/resilver_finish-start-scrub.sh
nodist_zedexec_SCRIPTS = zed.d/history_event-zfs-list-cacher.sh
$(nodist_zedexec_SCRIPTS): %: %.in
-$(SED) -e 's,@bindir\@,$(bindir),g' \
-e 's,@runstatedir\@,$(runstatedir),g' \
-e 's,@sbindir\@,$(sbindir),g' \
-e 's,@sysconfdir\@,$(sysconfdir),g' \
$< >'$@'
zedconfdefaults = \
all-syslog.sh \
data-notify.sh \
resilver_finish-notify.sh \
scrub_finish-notify.sh \
statechange-led.sh \
statechange-notify.sh \
vdev_clear-led.sh \
vdev_attach-led.sh \
pool_import-led.sh \
resilver_finish-start-scrub.sh
install-data-hook:
$(MKDIR_P) "$(DESTDIR)$(zedconfdir)"
for f in $(zedconfdefaults); do \
test -f "$(DESTDIR)$(zedconfdir)/$${f}" -o \
-L "$(DESTDIR)$(zedconfdir)/$${f}" || \
ln -s "$(zedexecdir)/$${f}" "$(DESTDIR)$(zedconfdir)"; \
done
chmod 0600 "$(DESTDIR)$(zedconfdir)/zed.rc"
include $(top_srcdir)/config/CppCheck.am
+1 -1
View File
@@ -25,7 +25,7 @@
*/
/*
* This file imlements the minimal FMD module API required to support the
* This file implements the minimal FMD module API required to support the
* fault logic modules in ZED. This support includes module registration,
* memory allocation, module property accessors, basic case management,
* one-shot timers and SERD engines.
+1 -1
View File
@@ -281,7 +281,7 @@ fmd_serd_eng_empty(fmd_serd_eng_t *sgp)
void
fmd_serd_eng_reset(fmd_serd_eng_t *sgp)
{
serd_log_msg(" SERD Engine: reseting %s", sgp->sg_name);
serd_log_msg(" SERD Engine: resetting %s", sgp->sg_name);
while (sgp->sg_count != 0)
fmd_serd_eng_discard(sgp, list_head(&sgp->sg_list));
+21 -10
View File
@@ -13,6 +13,7 @@
/*
* Copyright (c) 2016, Intel Corporation.
* Copyright (c) 2018, loli10K <ezomori.nozomu@gmail.com>
* Copyright (c) 2021 Hewlett Packard Enterprise Development LP
*/
#include <libnvpair.h>
@@ -116,7 +117,8 @@ zfs_agent_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *arg)
/*
* On a devid match, grab the vdev guid and expansion time, if any.
*/
if ((nvlist_lookup_string(nvl, ZPOOL_CONFIG_DEVID, &path) == 0) &&
if (gsp->gs_devid != NULL &&
(nvlist_lookup_string(nvl, ZPOOL_CONFIG_DEVID, &path) == 0) &&
(strcmp(gsp->gs_devid, path) == 0)) {
(void) nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_GUID,
&gsp->gs_vdev_guid);
@@ -176,10 +178,12 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
}
/*
* On ZFS on Linux, we don't get the expected FM_RESOURCE_REMOVED
* ereport from vdev_disk layer after a hot unplug. Fortunately we
* get a EC_DEV_REMOVE from our disk monitor and it is a suitable
* On Linux, we don't get the expected FM_RESOURCE_REMOVED ereport
* from the vdev_disk layer after a hot unplug. Fortunately we do
* get an EC_DEV_REMOVE from our disk monitor and it is a suitable
* proxy so we remap it here for the benefit of the diagnosis engine.
* Starting in OpenZFS 2.0, we do get FM_RESOURCE_REMOVED from the spa
* layer. Processing multiple FM_RESOURCE_REMOVED events is not harmful.
*/
if ((strcmp(class, EC_DEV_REMOVE) == 0) &&
(strcmp(subclass, ESC_DISK) == 0) &&
@@ -208,12 +212,18 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
* For multipath, spare and l2arc devices ZFS_EV_VDEV_GUID or
* ZFS_EV_POOL_GUID may be missing so find them.
*/
(void) nvlist_lookup_string(nvl, DEV_IDENTIFIER,
&search.gs_devid);
(void) zpool_iter(g_zfs_hdl, zfs_agent_iter_pool, &search);
pool_guid = search.gs_pool_guid;
vdev_guid = search.gs_vdev_guid;
devtype = search.gs_vdev_type;
if (pool_guid == 0 || vdev_guid == 0) {
if ((nvlist_lookup_string(nvl, DEV_IDENTIFIER,
&search.gs_devid) == 0) &&
(zpool_iter(g_zfs_hdl, zfs_agent_iter_pool, &search)
== 1)) {
if (pool_guid == 0)
pool_guid = search.gs_pool_guid;
if (vdev_guid == 0)
vdev_guid = search.gs_vdev_guid;
devtype = search.gs_vdev_type;
}
}
/*
* We want to avoid reporting "remove" events coming from
@@ -382,6 +392,7 @@ zfs_agent_init(libzfs_handle_t *zfs_hdl)
list_destroy(&agent_events);
zed_log_die("Failed to initialize agents");
}
pthread_setname_np(g_agents_tid, "agents");
}
void
+54 -20
View File
@@ -63,13 +63,10 @@
* If the device could not be replaced, then the second online attempt will
* trigger the FMA fault that we skipped earlier.
*
* ZFS on Linux porting notes:
* Linux udev provides a disk insert for both the disk and the partition
*
* On Linux udev provides a disk insert for both the disk and the partition.
*/
#include <ctype.h>
#include <devid.h>
#include <fcntl.h>
#include <libnvpair.h>
#include <libzfs.h>
@@ -157,7 +154,7 @@ zfs_unavail_pool(zpool_handle_t *zhp, void *data)
* 1. physical match with no fs, no partition
* tag it top, partition disk
*
* 2. physical match again, see partion and tag
* 2. physical match again, see partition and tag
*
*/
@@ -192,8 +189,8 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
char rawpath[PATH_MAX], fullpath[PATH_MAX];
char devpath[PATH_MAX];
int ret;
int is_dm = 0;
int is_sd = 0;
boolean_t is_dm = B_FALSE;
boolean_t is_sd = B_FALSE;
uint_t c;
vdev_stat_t *vs;
@@ -221,8 +218,8 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
is_dm = zfs_dev_is_dm(path);
zed_log_msg(LOG_INFO, "zfs_process_add: pool '%s' vdev '%s', phys '%s'"
" wholedisk %d, dm %d (%llu)", zpool_get_name(zhp), path,
physpath ? physpath : "NULL", wholedisk, is_dm,
" wholedisk %d, %s dm (guid %llu)", zpool_get_name(zhp), path,
physpath ? physpath : "NULL", wholedisk, is_dm ? "is" : "not",
(long long unsigned int)guid);
/*
@@ -267,7 +264,7 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
* testing)
*/
if (physpath != NULL && strcmp("scsidebug", physpath) == 0)
is_sd = 1;
is_sd = B_TRUE;
/*
* If the pool doesn't have the autoreplace property set, then use
@@ -438,7 +435,15 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
return;
}
ret = zpool_vdev_attach(zhp, fullpath, path, nvroot, B_TRUE);
/*
* Prefer sequential resilvering when supported (mirrors and dRAID),
* otherwise fallback to a traditional healing resilver.
*/
ret = zpool_vdev_attach(zhp, fullpath, path, nvroot, B_TRUE, B_TRUE);
if (ret != 0) {
ret = zpool_vdev_attach(zhp, fullpath, path, nvroot,
B_TRUE, B_FALSE);
}
zed_log_msg(LOG_INFO, " zpool_vdev_replace: %s with %s (%s)",
fullpath, path, (ret == 0) ? "no errors" :
@@ -534,7 +539,7 @@ zfs_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *data)
(dp->dd_func)(zhp, nvl, dp->dd_islabeled);
}
void
static void
zfs_enable_ds(void *arg)
{
unavailpool_t *pool = (unavailpool_t *)arg;
@@ -635,6 +640,27 @@ devid_iter(const char *devid, zfs_process_func_t func, boolean_t is_slice)
return (data.dd_found);
}
/*
* Given a device guid, find any vdevs with a matching guid.
*/
static boolean_t
guid_iter(uint64_t pool_guid, uint64_t vdev_guid, const char *devid,
zfs_process_func_t func, boolean_t is_slice)
{
dev_data_t data = { 0 };
data.dd_func = func;
data.dd_found = B_FALSE;
data.dd_pool_guid = pool_guid;
data.dd_vdev_guid = vdev_guid;
data.dd_islabeled = is_slice;
data.dd_new_devid = devid;
(void) zpool_iter(g_zfshdl, zfs_iter_pool, &data);
return (data.dd_found);
}
/*
* Handle a EC_DEV_ADD.ESC_DISK event.
*
@@ -658,15 +684,18 @@ static int
zfs_deliver_add(nvlist_t *nvl, boolean_t is_lofi)
{
char *devpath = NULL, *devid;
uint64_t pool_guid = 0, vdev_guid = 0;
boolean_t is_slice;
/*
* Expecting a devid string and an optional physical location
* Expecting a devid string and an optional physical location and guid
*/
if (nvlist_lookup_string(nvl, DEV_IDENTIFIER, &devid) != 0)
return (-1);
(void) nvlist_lookup_string(nvl, DEV_PHYS_PATH, &devpath);
(void) nvlist_lookup_uint64(nvl, ZFS_EV_POOL_GUID, &pool_guid);
(void) nvlist_lookup_uint64(nvl, ZFS_EV_VDEV_GUID, &vdev_guid);
is_slice = (nvlist_lookup_boolean(nvl, DEV_IS_PART) == 0);
@@ -674,15 +703,19 @@ zfs_deliver_add(nvlist_t *nvl, boolean_t is_lofi)
devid, devpath ? devpath : "NULL", is_slice);
/*
* Iterate over all vdevs looking for a match in the folllowing order:
* Iterate over all vdevs looking for a match in the following order:
* 1. ZPOOL_CONFIG_DEVID (identifies the unique disk)
* 2. ZPOOL_CONFIG_PHYS_PATH (identifies disk physical location).
*
* For disks, we only want to pay attention to vdevs marked as whole
* disks or are a multipath device.
* 3. ZPOOL_CONFIG_GUID (identifies unique vdev).
*/
if (!devid_iter(devid, zfs_process_add, is_slice) && devpath != NULL)
(void) devphys_iter(devpath, devid, zfs_process_add, is_slice);
if (devid_iter(devid, zfs_process_add, is_slice))
return (0);
if (devpath != NULL && devphys_iter(devpath, devid, zfs_process_add,
is_slice))
return (0);
if (vdev_guid != 0)
(void) guid_iter(pool_guid, vdev_guid, devid, zfs_process_add,
is_slice);
return (0);
}
@@ -892,7 +925,7 @@ zfs_enum_pools(void *arg)
*
* sent messages from zevents or udev monitor
*
* For now, each agent has it's own libzfs instance
* For now, each agent has its own libzfs instance
*/
int
zfs_slm_init()
@@ -913,6 +946,7 @@ zfs_slm_init()
return (-1);
}
pthread_setname_np(g_zfs_tid, "enum-pools");
list_create(&g_device_list, sizeof (struct pendingdev),
offsetof(struct pendingdev, pd_node));
+18 -9
View File
@@ -219,12 +219,18 @@ replace_with_spare(fmd_hdl_t *hdl, zpool_handle_t *zhp, nvlist_t *vdev)
* replace it.
*/
for (s = 0; s < nspares; s++) {
char *spare_name;
boolean_t rebuild = B_FALSE;
char *spare_name, *type;
if (nvlist_lookup_string(spares[s], ZPOOL_CONFIG_PATH,
&spare_name) != 0)
continue;
/* prefer sequential resilvering for distributed spares */
if ((nvlist_lookup_string(spares[s], ZPOOL_CONFIG_TYPE,
&type) == 0) && strcmp(type, VDEV_TYPE_DRAID_SPARE) == 0)
rebuild = B_TRUE;
/* if set, add the "ashift" pool property to the spare nvlist */
if (source != ZPROP_SRC_DEFAULT)
(void) nvlist_add_uint64(spares[s],
@@ -237,7 +243,7 @@ replace_with_spare(fmd_hdl_t *hdl, zpool_handle_t *zhp, nvlist_t *vdev)
dev_name, basename(spare_name));
if (zpool_vdev_attach(zhp, dev_name, spare_name,
replacement, B_TRUE) == 0) {
replacement, B_TRUE, rebuild) == 0) {
free(dev_name);
nvlist_free(replacement);
return (B_TRUE);
@@ -319,12 +325,16 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
fmd_hdl_debug(hdl, "zfs_retire_recv: '%s'", class);
nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_STATE, &state);
/*
* If this is a resource notifying us of device removal then simply
* check for an available spare and continue unless the device is a
* l2arc vdev, in which case we just offline it.
*/
if (strcmp(class, "resource.fs.zfs.removed") == 0) {
if (strcmp(class, "resource.fs.zfs.removed") == 0 ||
(strcmp(class, "resource.fs.zfs.statechange") == 0 &&
(state == VDEV_STATE_REMOVED || state == VDEV_STATE_FAULTED))) {
char *devtype;
char *devname;
@@ -347,9 +357,8 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
zpool_vdev_offline(zhp, devname, B_TRUE);
} else if (!fmd_prop_get_int32(hdl, "spare_on_remove") ||
replace_with_spare(hdl, zhp, vdev) == B_FALSE) {
/* Could not handle with spare: offline the device */
fmd_hdl_debug(hdl, "zpool_vdev_offline '%s'", devname);
zpool_vdev_offline(zhp, devname, B_TRUE);
/* Could not handle with spare */
fmd_hdl_debug(hdl, "no spare for '%s'", devname);
}
free(devname);
@@ -361,12 +370,11 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
return;
/*
* Note: on zfsonlinux statechange events are more than just
* Note: on Linux statechange events are more than just
* healthy ones so we need to confirm the actual state value.
*/
if (strcmp(class, "resource.fs.zfs.statechange") == 0 &&
nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_STATE,
&state) == 0 && state == VDEV_STATE_HEALTHY) {
state == VDEV_STATE_HEALTHY) {
zfs_vdev_repair(hdl, nvl);
return;
}
@@ -497,6 +505,7 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
* Attempt to substitute a hot spare.
*/
(void) replace_with_spare(hdl, zhp, vdev);
zpool_close(zhp);
}
+50 -23
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -60,8 +60,8 @@ _setup_sig_handlers(void)
zed_log_die("Failed to initialize sigset");
sa.sa_flags = SA_RESTART;
sa.sa_handler = SIG_IGN;
sa.sa_handler = SIG_IGN;
if (sigaction(SIGPIPE, &sa, NULL) < 0)
zed_log_die("Failed to ignore SIGPIPE");
@@ -75,6 +75,10 @@ _setup_sig_handlers(void)
sa.sa_handler = _hup_handler;
if (sigaction(SIGHUP, &sa, NULL) < 0)
zed_log_die("Failed to register SIGHUP handler");
(void) sigaddset(&sa.sa_mask, SIGCHLD);
if (pthread_sigmask(SIG_BLOCK, &sa.sa_mask, NULL) < 0)
zed_log_die("Failed to block SIGCHLD");
}
/*
@@ -212,22 +216,20 @@ _finish_daemonize(void)
int
main(int argc, char *argv[])
{
struct zed_conf *zcp;
struct zed_conf zcp;
uint64_t saved_eid;
int64_t saved_etime[2];
zed_log_init(argv[0]);
zed_log_stderr_open(LOG_NOTICE);
zcp = zed_conf_create();
zed_conf_parse_opts(zcp, argc, argv);
if (zcp->do_verbose)
zed_conf_init(&zcp);
zed_conf_parse_opts(&zcp, argc, argv);
if (zcp.do_verbose)
zed_log_stderr_open(LOG_INFO);
if (geteuid() != 0)
zed_log_die("Must be run as root");
zed_conf_parse_file(zcp);
zed_file_close_from(STDERR_FILENO + 1);
(void) umask(0);
@@ -235,47 +237,72 @@ main(int argc, char *argv[])
if (chdir("/") < 0)
zed_log_die("Failed to change to root directory");
if (zed_conf_scan_dir(zcp) < 0)
if (zed_conf_scan_dir(&zcp) < 0)
exit(EXIT_FAILURE);
if (!zcp->do_foreground) {
if (!zcp.do_foreground) {
_start_daemonize();
zed_log_syslog_open(LOG_DAEMON);
}
_setup_sig_handlers();
if (zcp->do_memlock)
if (zcp.do_memlock)
_lock_memory();
if ((zed_conf_write_pid(zcp) < 0) && (!zcp->do_force))
if ((zed_conf_write_pid(&zcp) < 0) && (!zcp.do_force))
exit(EXIT_FAILURE);
if (!zcp->do_foreground)
if (!zcp.do_foreground)
_finish_daemonize();
zed_log_msg(LOG_NOTICE,
"ZFS Event Daemon %s-%s (PID %d)",
ZFS_META_VERSION, ZFS_META_RELEASE, (int)getpid());
if (zed_conf_open_state(zcp) < 0)
if (zed_conf_open_state(&zcp) < 0)
exit(EXIT_FAILURE);
if (zed_conf_read_state(zcp, &saved_eid, saved_etime) < 0)
if (zed_conf_read_state(&zcp, &saved_eid, saved_etime) < 0)
exit(EXIT_FAILURE);
zed_event_init(zcp);
zed_event_seek(zcp, saved_eid, saved_etime);
idle:
/*
* If -I is specified, attempt to open /dev/zfs repeatedly until
* successful.
*/
do {
if (!zed_event_init(&zcp))
break;
/* Wait for some time and try again. tunable? */
sleep(30);
} while (!_got_exit && zcp.do_idle);
if (_got_exit)
goto out;
zed_event_seek(&zcp, saved_eid, saved_etime);
while (!_got_exit) {
int rv;
if (_got_hup) {
_got_hup = 0;
(void) zed_conf_scan_dir(zcp);
(void) zed_conf_scan_dir(&zcp);
}
zed_event_service(zcp);
rv = zed_event_service(&zcp);
/* ENODEV: When kernel module is unloaded (osx) */
if (rv == ENODEV)
break;
}
zed_log_msg(LOG_NOTICE, "Exiting");
zed_event_fini(zcp);
zed_conf_destroy(zcp);
zed_event_fini(&zcp);
if (zcp.do_idle && !_got_exit)
goto idle;
out:
zed_conf_destroy(&zcp);
zed_log_fini();
exit(EXIT_SUCCESS);
}
+57
View File
@@ -0,0 +1,57 @@
include $(top_srcdir)/config/Rules.am
include $(top_srcdir)/config/Substfiles.am
include $(top_srcdir)/config/Shellcheck.am
EXTRA_DIST += README
zedconfdir = $(sysconfdir)/zfs/zed.d
dist_zedconf_DATA = \
zed-functions.sh \
zed.rc
zedexecdir = $(zfsexecdir)/zed.d
dist_zedexec_SCRIPTS = \
all-debug.sh \
all-syslog.sh \
data-notify.sh \
generic-notify.sh \
resilver_finish-notify.sh \
scrub_finish-notify.sh \
statechange-led.sh \
statechange-notify.sh \
vdev_clear-led.sh \
vdev_attach-led.sh \
pool_import-led.sh \
resilver_finish-start-scrub.sh \
trim_finish-notify.sh
nodist_zedexec_SCRIPTS = history_event-zfs-list-cacher.sh
SUBSTFILES += $(nodist_zedexec_SCRIPTS)
zedconfdefaults = \
all-syslog.sh \
data-notify.sh \
history_event-zfs-list-cacher.sh \
resilver_finish-notify.sh \
scrub_finish-notify.sh \
statechange-led.sh \
statechange-notify.sh \
vdev_clear-led.sh \
vdev_attach-led.sh \
pool_import-led.sh \
resilver_finish-start-scrub.sh
install-data-hook:
$(MKDIR_P) "$(DESTDIR)$(zedconfdir)"
for f in $(zedconfdefaults); do \
test -f "$(DESTDIR)$(zedconfdir)/$${f}" -o \
-L "$(DESTDIR)$(zedconfdir)/$${f}" || \
ln -s "$(zedexecdir)/$${f}" "$(DESTDIR)$(zedconfdir)"; \
done
chmod 0600 "$(DESTDIR)$(zedconfdir)/zed.rc"
# False positive: 1>&"${ZED_FLOCK_FD}" looks suspiciously similar to a >&filename bash extension
CHECKBASHISMS_IGNORE = -e 'should be >word 2>&1' -e '&"$${ZED_FLOCK_FD}"'
+6 -10
View File
@@ -12,15 +12,11 @@
zed_exit_if_ignoring_this_event
lockfile="$(basename -- "${ZED_DEBUG_LOG}").lock"
zed_lock "${ZED_DEBUG_LOG}"
{
printenv | sort
echo
} 1>&"${ZED_FLOCK_FD}"
zed_unlock "${ZED_DEBUG_LOG}"
umask 077
zed_lock "${lockfile}"
exec >> "${ZED_DEBUG_LOG}"
printenv | sort
echo
exec >&-
zed_unlock "${lockfile}"
exit 0
+41 -4
View File
@@ -1,14 +1,51 @@
#!/bin/sh
#
# Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
# Copyright (c) 2020 by Delphix. All rights reserved.
#
#
# Log the zevent via syslog.
#
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
zed_exit_if_ignoring_this_event
zed_log_msg "eid=${ZEVENT_EID}" "class=${ZEVENT_SUBCLASS}" \
"${ZEVENT_POOL_GUID:+"pool_guid=${ZEVENT_POOL_GUID}"}" \
"${ZEVENT_VDEV_PATH:+"vdev_path=${ZEVENT_VDEV_PATH}"}" \
"${ZEVENT_VDEV_STATE_STR:+"vdev_state=${ZEVENT_VDEV_STATE_STR}"}"
# build a string of name=value pairs for this event
msg="eid=${ZEVENT_EID} class=${ZEVENT_SUBCLASS}"
if [ "${ZED_SYSLOG_DISPLAY_GUIDS}" = "1" ]; then
[ -n "${ZEVENT_POOL_GUID}" ] && msg="${msg} pool_guid=${ZEVENT_POOL_GUID}"
[ -n "${ZEVENT_VDEV_GUID}" ] && msg="${msg} vdev_guid=${ZEVENT_VDEV_GUID}"
else
[ -n "${ZEVENT_POOL}" ] && msg="${msg} pool='${ZEVENT_POOL}'"
[ -n "${ZEVENT_VDEV_PATH}" ] && msg="${msg} vdev=$(basename "${ZEVENT_VDEV_PATH}")"
fi
# log pool state if state is anything other than 'ACTIVE'
[ -n "${ZEVENT_POOL_STATE_STR}" ] && [ "$ZEVENT_POOL_STATE" -ne 0 ] && \
msg="${msg} pool_state=${ZEVENT_POOL_STATE_STR}"
# Log the following payload nvpairs if they are present
[ -n "${ZEVENT_VDEV_STATE_STR}" ] && msg="${msg} vdev_state=${ZEVENT_VDEV_STATE_STR}"
[ -n "${ZEVENT_CKSUM_ALGORITHM}" ] && msg="${msg} algorithm=${ZEVENT_CKSUM_ALGORITHM}"
[ -n "${ZEVENT_ZIO_SIZE}" ] && msg="${msg} size=${ZEVENT_ZIO_SIZE}"
[ -n "${ZEVENT_ZIO_OFFSET}" ] && msg="${msg} offset=${ZEVENT_ZIO_OFFSET}"
[ -n "${ZEVENT_ZIO_PRIORITY}" ] && msg="${msg} priority=${ZEVENT_ZIO_PRIORITY}"
[ -n "${ZEVENT_ZIO_ERR}" ] && msg="${msg} err=${ZEVENT_ZIO_ERR}"
[ -n "${ZEVENT_ZIO_FLAGS}" ] && msg="${msg} flags=$(printf '0x%x' "${ZEVENT_ZIO_FLAGS}")"
# log delays that are >= 10 milisec
[ -n "${ZEVENT_ZIO_DELAY}" ] && [ "$ZEVENT_ZIO_DELAY" -gt 10000000 ] && \
msg="${msg} delay=$((ZEVENT_ZIO_DELAY / 1000000))ms"
# list the bookmark data together
# shellcheck disable=SC2153
[ -n "${ZEVENT_ZIO_OBJSET}" ] && \
msg="${msg} bookmark=${ZEVENT_ZIO_OBJSET}:${ZEVENT_ZIO_OBJECT}:${ZEVENT_ZIO_LEVEL}:${ZEVENT_ZIO_BLKID}"
zed_log_msg "${msg}"
exit 0
+1 -1
View File
@@ -25,7 +25,7 @@ zed_rate_limit "${rate_limit_tag}" || exit 3
umask 077
note_subject="ZFS ${ZEVENT_SUBCLASS} error for ${ZEVENT_POOL} on $(hostname)"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
note_pathname="$(mktemp)"
{
echo "ZFS has detected a data error:"
echo
+1 -1
View File
@@ -31,7 +31,7 @@ umask 077
pool_str="${ZEVENT_POOL:+" for ${ZEVENT_POOL}"}"
host_str=" on $(hostname)"
note_subject="ZFS ${ZEVENT_SUBCLASS} event${pool_str}${host_str}"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
note_pathname="$(mktemp)"
{
echo "ZFS has posted the following event:"
echo
@@ -3,9 +3,8 @@
# Track changes to enumerated pools for use in early-boot
set -ef
FSLIST_DIR="@sysconfdir@/zfs/zfs-list.cache"
FSLIST_TMP="@runstatedir@/zfs-list.cache.new"
FSLIST="${FSLIST_DIR}/${ZEVENT_POOL}"
FSLIST="@sysconfdir@/zfs/zfs-list.cache/${ZEVENT_POOL}"
FSLIST_TMP="@runstatedir@/zfs-list.cache@${ZEVENT_POOL}"
# If the pool specific cache file is not writeable, abort
[ -w "${FSLIST}" ] || exit 0
@@ -13,21 +12,21 @@ FSLIST="${FSLIST_DIR}/${ZEVENT_POOL}"
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
zed_exit_if_ignoring_this_event
zed_check_cmd "${ZFS}" sort diff grep
[ "$ZEVENT_SUBCLASS" != "history_event" ] && exit 0
zed_check_cmd "${ZFS}" sort diff
# If we are acting on a snapshot, we have nothing to do
printf '%s' "${ZEVENT_HISTORY_DSNAME}" | grep '@' && exit 0
[ "${ZEVENT_HISTORY_DSNAME%@*}" = "${ZEVENT_HISTORY_DSNAME}" ] || exit 0
# We obtain a lock on zfs-list to avoid any simultaneous writes.
# We lock the output file to avoid simultaneous writes.
# If we run into trouble, log and drop the lock
abort_alter() {
zed_log_msg "Error updating zfs-list.cache!"
zed_unlock zfs-list
zed_log_msg "Error updating zfs-list.cache for ${ZEVENT_POOL}!"
zed_unlock "${FSLIST}"
}
finished() {
zed_unlock zfs-list
zed_unlock "${FSLIST}"
trap - EXIT
exit 0
}
@@ -37,7 +36,7 @@ case "${ZEVENT_HISTORY_INTERNAL_NAME}" in
;;
export)
zed_lock zfs-list
zed_lock "${FSLIST}"
trap abort_alter EXIT
echo > "${FSLIST}"
finished
@@ -46,8 +45,13 @@ case "${ZEVENT_HISTORY_INTERNAL_NAME}" in
set|inherit)
# Only act if one of the tracked properties is altered.
case "${ZEVENT_HISTORY_INTERNAL_STR%%=*}" in
canmount|mountpoint|atime|relatime|devices|exec| \
readonly|setuid|nbmand) ;;
canmount|mountpoint|atime|relatime|devices|exec|readonly| \
setuid|nbmand|encroot|keylocation|org.openzfs.systemd:requires| \
org.openzfs.systemd:requires-mounts-for| \
org.openzfs.systemd:before|org.openzfs.systemd:after| \
org.openzfs.systemd:wanted-by|org.openzfs.systemd:required-by| \
org.openzfs.systemd:nofail|org.openzfs.systemd:ignore \
) ;;
*) exit 0 ;;
esac
;;
@@ -58,11 +62,15 @@ case "${ZEVENT_HISTORY_INTERNAL_NAME}" in
;;
esac
zed_lock zfs-list
zed_lock "${FSLIST}"
trap abort_alter EXIT
PROPS="name,mountpoint,canmount,atime,relatime,devices,exec,readonly"
PROPS="${PROPS},setuid,nbmand"
PROPS="name,mountpoint,canmount,atime,relatime,devices,exec\
,readonly,setuid,nbmand,encroot,keylocation\
,org.openzfs.systemd:requires,org.openzfs.systemd:requires-mounts-for\
,org.openzfs.systemd:before,org.openzfs.systemd:after\
,org.openzfs.systemd:wanted-by,org.openzfs.systemd:required-by\
,org.openzfs.systemd:nofail,org.openzfs.systemd:ignore"
"${ZFS}" list -H -t filesystem -o $PROPS -r "${ZEVENT_POOL}" > "${FSLIST_TMP}"
@@ -70,7 +78,7 @@ PROPS="${PROPS},setuid,nbmand"
sort "${FSLIST_TMP}" -o "${FSLIST_TMP}"
# Don't modify the file if it hasn't changed
diff -q "${FSLIST_TMP}" "${FSLIST}" || mv "${FSLIST_TMP}" "${FSLIST}"
diff -q "${FSLIST_TMP}" "${FSLIST}" || cat "${FSLIST_TMP}" > "${FSLIST}"
rm -f "${FSLIST_TMP}"
finished
@@ -5,10 +5,12 @@
# Exit codes:
# 1: Internal error
# 2: Script wasn't enabled in zed.rc
# 3: Scrubs are automatically started for sequential resilvers
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
[ "${ZED_SCRUB_AFTER_RESILVER}" = "1" ] || exit 2
[ "${ZEVENT_RESILVER_TYPE}" != "sequential" ] || exit 3
[ -n "${ZEVENT_POOL}" ] || exit 1
[ -n "${ZEVENT_SUBCLASS}" ] || exit 1
zed_check_cmd "${ZPOOL}" || exit 1
+1 -1
View File
@@ -41,7 +41,7 @@ fi
umask 077
note_subject="ZFS ${ZEVENT_SUBCLASS} event for ${ZEVENT_POOL} on $(hostname)"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
note_pathname="$(mktemp)"
{
echo "ZFS has finished a ${action}:"
echo
+115 -52
View File
@@ -1,26 +1,26 @@
#!/bin/sh
#
# Turn off/on the VDEV's enclosure fault LEDs when the pool's state changes.
# Turn off/on vdevs' enclosure fault LEDs when their pool's state changes.
#
# Turn the VDEV's fault LED on if it becomes FAULTED, DEGRADED or UNAVAIL.
# Turn the LED off when it's back ONLINE again.
# Turn a vdev's fault LED on if it becomes FAULTED, DEGRADED or UNAVAIL.
# Turn its LED off when it's back ONLINE again.
#
# This script run in two basic modes:
#
# 1. If $ZEVENT_VDEV_ENC_SYSFS_PATH and $ZEVENT_VDEV_STATE_STR are set, then
# only set the LED for that particular VDEV. This is the case for statechange
# only set the LED for that particular vdev. This is the case for statechange
# events and some vdev_* events.
#
# 2. If those vars are not set, then check the state of all VDEVs in the pool
# 2. If those vars are not set, then check the state of all vdevs in the pool
# and set the LEDs accordingly. This is the case for pool_import events.
#
# Note that this script requires that your enclosure be supported by the
# Linux SCSI enclosure services (ses) driver. The script will do nothing
# Linux SCSI Enclosure services (SES) driver. The script will do nothing
# if you have no enclosure, or if your enclosure isn't supported.
#
# Exit codes:
# 0: enclosure led successfully set
# 1: enclosure leds not not available
# 1: enclosure leds not available
# 2: enclosure leds administratively disabled
# 3: The led sysfs path passed from ZFS does not exist
# 4: $ZPOOL not set
@@ -29,7 +29,8 @@
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
if [ ! -d /sys/class/enclosure ] ; then
if [ ! -d /sys/class/enclosure ] && [ ! -d /sys/bus/pci/slots ] ; then
# No JBOD enclosure or NVMe slots
exit 1
fi
@@ -59,6 +60,10 @@ check_and_set_led()
file="$1"
val="$2"
if [ -z "$val" ]; then
return 0
fi
if [ ! -e "$file" ] ; then
return 3
fi
@@ -66,11 +71,11 @@ check_and_set_led()
# If another process is accessing the LED when we attempt to update it,
# the update will be lost so retry until the LED actually changes or we
# timeout.
for _ in $(seq 1 5); do
for _ in 1 2 3 4 5; do
# We want to check the current state first, since writing to the
# 'fault' entry always always causes a SES command, even if the
# 'fault' entry always causes a SES command, even if the
# current state is already what you want.
current=$(cat "${file}")
read -r current < "${file}"
# On some enclosures if you write 1 to fault, and read it back,
# it will return 2. Treat all non-zero values as 1 for
@@ -85,27 +90,84 @@ check_and_set_led()
else
break
fi
done
done
}
# Fault LEDs for JBODs and NVMe drives are handled a little differently.
#
# On JBODs the fault LED is called 'fault' and on a path like this:
#
# /sys/class/enclosure/0:0:1:0/SLOT 10/fault
#
# On NVMe it's called 'attention' and on a path like this:
#
# /sys/bus/pci/slot/0/attention
#
# This function returns the full path to the fault LED file for a given
# enclosure/slot directory.
#
path_to_led()
{
dir=$1
if [ -f "$dir/fault" ] ; then
echo "$dir/fault"
elif [ -f "$dir/attention" ] ; then
echo "$dir/attention"
fi
}
state_to_val()
{
state="$1"
if [ "$state" = "FAULTED" ] || [ "$state" = "DEGRADED" ] || \
[ "$state" = "UNAVAIL" ] ; then
echo 1
elif [ "$state" = "ONLINE" ] ; then
echo 0
fi
case "$state" in
FAULTED|DEGRADED|UNAVAIL)
echo 1
;;
ONLINE)
echo 0
;;
esac
}
# process_pool ([pool])
#
# Iterate through a pool (or pools) and set the VDEV's enclosure slot LEDs to
# the VDEV's state.
# Given a nvme name like 'nvme0n1', pass back its slot directory
# like "/sys/bus/pci/slots/0"
#
nvme_dev_to_slot()
{
dev="$1"
# Get the address "0000:01:00.0"
address=$(cat "/sys/class/block/$dev/device/address")
# For each /sys/bus/pci/slots subdir that is an actual number
# (rather than weird directories like "1-3/").
# shellcheck disable=SC2010
for i in $(ls /sys/bus/pci/slots/ | grep -E "^[0-9]+$") ; do
this_address=$(cat "/sys/bus/pci/slots/$i/address")
# The format of address is a little different between
# /sys/class/block/$dev/device/address and
# /sys/bus/pci/slots/
#
# address= "0000:01:00.0"
# this_address = "0000:01:00"
#
if echo "$address" | grep -Eq ^"$this_address" ; then
echo "/sys/bus/pci/slots/$i"
break
fi
done
}
# process_pool (pool)
#
# Iterate through a pool and set the vdevs' enclosure slot LEDs to
# those vdevs' state.
#
# Arguments
# pool: Optional pool name. If not specified, iterate though all pools.
# pool: Pool name.
#
# Return
# 0 on success, 3 on missing sysfs path
@@ -113,19 +175,27 @@ state_to_val()
process_pool()
{
pool="$1"
# The output will be the vdevs only (from "grep '/dev/'"):
#
# U45 ONLINE 0 0 0 /dev/sdk 0
# U46 ONLINE 0 0 0 /dev/sdm 0
# U47 ONLINE 0 0 0 /dev/sdn 0
# U50 ONLINE 0 0 0 /dev/sdbn 0
#
ZPOOL_SCRIPTS_AS_ROOT=1 $ZPOOL status -c upath,fault_led "$pool" | grep '/dev/' | (
rc=0
# Lookup all the current LED values and paths in parallel
#shellcheck disable=SC2016
cmd='echo led_token=$(cat "$VDEV_ENC_SYSFS_PATH/fault"),"$VDEV_ENC_SYSFS_PATH",'
out=$($ZPOOL status -vc "$cmd" "$pool" | grep 'led_token=')
#shellcheck disable=SC2034
echo "$out" | while read -r vdev state read write chksum therest; do
while read -r vdev state _ _ _ therest; do
# Read out current LED value and path
tmp=$(echo "$therest" | sed 's/^.*led_token=//g')
vdev_enc_sysfs_path=$(echo "$tmp" | awk -F ',' '{print $2}')
current_val=$(echo "$tmp" | awk -F ',' '{print $1}')
# Get dev name (like 'sda')
dev=$(basename "$(echo "$therest" | awk '{print $(NF-1)}')")
vdev_enc_sysfs_path=$(realpath "/sys/class/block/$dev/device/enclosure_device"*)
if [ ! -d "$vdev_enc_sysfs_path" ] ; then
# This is not a JBOD disk, but it could be a PCI NVMe drive
vdev_enc_sysfs_path=$(nvme_dev_to_slot "$dev")
fi
current_val=$(echo "$therest" | awk '{print $NF}')
if [ "$current_val" != "0" ] ; then
current_val=1
@@ -136,40 +206,33 @@ process_pool()
continue
fi
if [ ! -e "$vdev_enc_sysfs_path/fault" ] ; then
#shellcheck disable=SC2030
rc=1
zed_log_msg "vdev $vdev '$file/fault' doesn't exist"
continue;
led_path=$(path_to_led "$vdev_enc_sysfs_path")
if [ ! -e "$led_path" ] ; then
rc=3
zed_log_msg "vdev $vdev '$led_path' doesn't exist"
continue
fi
val=$(state_to_val "$state")
if [ "$current_val" = "$val" ] ; then
# LED is already set correctly
continue;
continue
fi
if ! check_and_set_led "$vdev_enc_sysfs_path/fault" "$val"; then
rc=1
if ! check_and_set_led "$led_path" "$val"; then
rc=3
fi
done
#shellcheck disable=SC2031
if [ "$rc" = "0" ] ; then
return 0
else
# We didn't see a sysfs entry that we wanted to set
return 3
fi
exit "$rc"; )
}
if [ -n "$ZEVENT_VDEV_ENC_SYSFS_PATH" ] && [ -n "$ZEVENT_VDEV_STATE_STR" ] ; then
# Got a statechange for an individual VDEV
# Got a statechange for an individual vdev
val=$(state_to_val "$ZEVENT_VDEV_STATE_STR")
vdev=$(basename "$ZEVENT_VDEV_PATH")
check_and_set_led "$ZEVENT_VDEV_ENC_SYSFS_PATH/fault" "$val"
ledpath=$(path_to_led "$ZEVENT_VDEV_ENC_SYSFS_PATH")
check_and_set_led "$ledpath" "$val"
else
# Process the entire pool
poolname=$(zed_guid_to_pool "$ZEVENT_POOL_GUID")
+1 -1
View File
@@ -37,7 +37,7 @@ fi
umask 077
note_subject="ZFS device fault for pool ${ZEVENT_POOL_GUID} on $(hostname)"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
note_pathname="$(mktemp)"
{
if [ "${ZEVENT_VDEV_STATE_STR}" = "FAULTED" ] ; then
echo "The number of I/O errors associated with a ZFS device exceeded"
+37
View File
@@ -0,0 +1,37 @@
#!/bin/sh
#
# Send notification in response to a TRIM_FINISH. The event
# will be received for each vdev in the pool which was trimmed.
#
# Exit codes:
# 0: notification sent
# 1: notification failed
# 2: notification not configured
# 9: internal error
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
[ -n "${ZEVENT_POOL}" ] || exit 9
[ -n "${ZEVENT_SUBCLASS}" ] || exit 9
zed_check_cmd "${ZPOOL}" || exit 9
umask 077
note_subject="ZFS ${ZEVENT_SUBCLASS} event for ${ZEVENT_POOL} on $(hostname)"
note_pathname="$(mktemp)"
{
echo "ZFS has finished a trim:"
echo
echo " eid: ${ZEVENT_EID}"
echo " class: ${ZEVENT_SUBCLASS}"
echo " host: $(hostname)"
echo " time: ${ZEVENT_TIME_STRING}"
"${ZPOOL}" status -t "${ZEVENT_POOL}"
} > "${note_pathname}"
zed_notify "${note_subject}" "${note_pathname}"; rv=$?
rm -f "${note_pathname}"
exit "${rv}"
+84 -12
View File
@@ -126,10 +126,8 @@ zed_lock()
# Obtain a lock on the file bound to the given file descriptor.
#
eval "exec ${fd}> '${lockfile}'"
err="$(flock --exclusive "${fd}" 2>&1)"
# shellcheck disable=SC2181
if [ $? -ne 0 ]; then
eval "exec ${fd}>> '${lockfile}'"
if ! err="$(flock --exclusive "${fd}" 2>&1)"; then
zed_log_err "failed to lock \"${lockfile}\": ${err}"
fi
@@ -165,9 +163,7 @@ zed_unlock()
fi
# Release the lock and close the file descriptor.
err="$(flock --unlock "${fd}" 2>&1)"
# shellcheck disable=SC2181
if [ $? -ne 0 ]; then
if ! err="$(flock --unlock "${fd}" 2>&1)"; then
zed_log_err "failed to unlock \"${lockfile}\": ${err}"
fi
eval "exec ${fd}>&-"
@@ -202,6 +198,10 @@ zed_notify()
[ "${rv}" -eq 0 ] && num_success=$((num_success + 1))
[ "${rv}" -eq 1 ] && num_failure=$((num_failure + 1))
zed_notify_slack_webhook "${subject}" "${pathname}"; rv=$?
[ "${rv}" -eq 0 ] && num_success=$((num_success + 1))
[ "${rv}" -eq 1 ] && num_failure=$((num_failure + 1))
[ "${num_success}" -gt 0 ] && return 0
[ "${num_failure}" -gt 0 ] && return 1
return 2
@@ -263,7 +263,7 @@ zed_notify_email()
-e "s/@SUBJECT@/${subject}/g")"
# shellcheck disable=SC2086
eval "${ZED_EMAIL_PROG}" ${ZED_EMAIL_OPTS} < "${pathname}" >/dev/null 2>&1
eval ${ZED_EMAIL_PROG} ${ZED_EMAIL_OPTS} < "${pathname}" >/dev/null 2>&1
rv=$?
if [ "${rv}" -ne 0 ]; then
zed_log_err "$(basename "${ZED_EMAIL_PROG}") exit=${rv}"
@@ -359,6 +359,80 @@ zed_notify_pushbullet()
}
# zed_notify_slack_webhook (subject, pathname)
#
# Notification via Slack Webhook <https://api.slack.com/incoming-webhooks>.
# The Webhook URL (ZED_SLACK_WEBHOOK_URL) identifies this client to the
# Slack channel.
#
# Requires awk, curl, and sed executables to be installed in the standard PATH.
#
# References
# https://api.slack.com/incoming-webhooks
#
# Arguments
# subject: notification subject
# pathname: pathname containing the notification message (OPTIONAL)
#
# Globals
# ZED_SLACK_WEBHOOK_URL
#
# Return
# 0: notification sent
# 1: notification failed
# 2: not configured
#
zed_notify_slack_webhook()
{
[ -n "${ZED_SLACK_WEBHOOK_URL}" ] || return 2
local subject="$1"
local pathname="${2:-"/dev/null"}"
local msg_body
local msg_tag
local msg_json
local msg_out
local msg_err
local url="${ZED_SLACK_WEBHOOK_URL}"
[ -n "${subject}" ] || return 1
if [ ! -r "${pathname}" ]; then
zed_log_err "slack webhook cannot read \"${pathname}\""
return 1
fi
zed_check_cmd "awk" "curl" "sed" || return 1
# Escape the following characters in the message body for JSON:
# newline, backslash, double quote, horizontal tab, vertical tab,
# and carriage return.
#
msg_body="$(awk '{ ORS="\\n" } { gsub(/\\/, "\\\\"); gsub(/"/, "\\\"");
gsub(/\t/, "\\t"); gsub(/\f/, "\\f"); gsub(/\r/, "\\r"); print }' \
"${pathname}")"
# Construct the JSON message for posting.
#
msg_json="$(printf '{"text": "*%s*\n%s"}' "${subject}" "${msg_body}" )"
# Send the POST request and check for errors.
#
msg_out="$(curl -X POST "${url}" \
--header "Content-Type: application/json" --data-binary "${msg_json}" \
2>/dev/null)"; rv=$?
if [ "${rv}" -ne 0 ]; then
zed_log_err "curl exit=${rv}"
return 1
fi
msg_err="$(echo "${msg_out}" \
| sed -n -e 's/.*"error" *:.*"message" *: *"\([^"]*\)".*/\1/p')"
if [ -n "${msg_err}" ]; then
zed_log_err "slack webhook \"${msg_err}"\"
return 1
fi
return 0
}
# zed_rate_limit (tag, [interval])
#
# Check whether an event of a given type [tag] has already occurred within the
@@ -433,10 +507,8 @@ zed_guid_to_pool()
return
fi
guid=$(printf "%llu" "$1")
if [ -n "$guid" ] ; then
$ZPOOL get -H -ovalue,name guid | awk '$1=='"$guid"' {print $2}'
fi
guid="$(printf "%u" "$1")"
$ZPOOL get -H -ovalue,name guid | awk '$1 == '"$guid"' {print $2; exit}'
}
# zed_exit_if_ignoring_this_event
+16 -3
View File
@@ -74,6 +74,14 @@
#
#ZED_PUSHBULLET_CHANNEL_TAG=""
##
# Slack Webhook URL.
# This allows posting to the given channel and includes an access token.
# <https://api.slack.com/incoming-webhooks>
# Disabled by default; uncomment to enable.
#
#ZED_SLACK_WEBHOOK_URL=""
##
# Default directory for zed state files.
#
@@ -81,8 +89,8 @@
##
# Turn on/off enclosure LEDs when drives get DEGRADED/FAULTED. This works for
# device mapper and multipath devices as well. Your enclosure must be
# supported by the Linux SES driver for this to work.
# device mapper and multipath devices as well. This works with JBOD enclosures
# and NVMe PCI drives (assuming they're supported by Linux in sysfs).
#
ZED_USE_ENCLOSURE_LEDS=1
@@ -110,5 +118,10 @@ ZED_USE_ENCLOSURE_LEDS=1
# Otherwise, if ZED_SYSLOG_SUBCLASS_EXCLUDE is set, the
# matching subclasses are excluded from logging.
#ZED_SYSLOG_SUBCLASS_INCLUDE="checksum|scrub_*|vdev.*"
#ZED_SYSLOG_SUBCLASS_EXCLUDE="statechange|config_*|history_event"
ZED_SYSLOG_SUBCLASS_EXCLUDE="history_event"
##
# Use GUIDs instead of names when logging pool and vdevs
# Disabled by default, 1 to enable and 0 to disable.
#ZED_SYSLOG_DISPLAY_GUIDS=1
+3 -18
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -15,11 +15,6 @@
#ifndef ZED_H
#define ZED_H
/*
* Absolute path for the default zed configuration file.
*/
#define ZED_CONF_FILE SYSCONFDIR "/zfs/zed.conf"
/*
* Absolute path for the default zed pid file.
*/
@@ -35,16 +30,6 @@
*/
#define ZED_ZEDLET_DIR SYSCONFDIR "/zfs/zed.d"
/*
* Reserved for future use.
*/
#define ZED_MAX_EVENTS 0
/*
* Reserved for future use.
*/
#define ZED_MIN_EVENTS 0
/*
* String prefix for ZED variables passed via environment variables.
*/
+99 -124
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -32,43 +32,26 @@
#include "zed_strings.h"
/*
* Return a new configuration with default values.
* Initialise the configuration with default values.
*/
struct zed_conf *
zed_conf_create(void)
void
zed_conf_init(struct zed_conf *zcp)
{
struct zed_conf *zcp;
memset(zcp, 0, sizeof (*zcp));
zcp = calloc(1, sizeof (*zcp));
if (!zcp)
goto nomem;
/* zcp->zfs_hdl opened in zed_event_init() */
/* zcp->zedlets created in zed_conf_scan_dir() */
zcp->syslog_facility = LOG_DAEMON;
zcp->min_events = ZED_MIN_EVENTS;
zcp->max_events = ZED_MAX_EVENTS;
zcp->pid_fd = -1;
zcp->zedlets = NULL; /* created via zed_conf_scan_dir() */
zcp->state_fd = -1; /* opened via zed_conf_open_state() */
zcp->zfs_hdl = NULL; /* opened via zed_event_init() */
zcp->zevent_fd = -1; /* opened via zed_event_init() */
zcp->pid_fd = -1; /* opened in zed_conf_write_pid() */
zcp->state_fd = -1; /* opened in zed_conf_open_state() */
zcp->zevent_fd = -1; /* opened in zed_event_init() */
if (!(zcp->conf_file = strdup(ZED_CONF_FILE)))
goto nomem;
zcp->max_jobs = 16;
if (!(zcp->pid_file = strdup(ZED_PID_FILE)))
goto nomem;
if (!(zcp->zedlet_dir = strdup(ZED_ZEDLET_DIR)))
goto nomem;
if (!(zcp->state_file = strdup(ZED_STATE_FILE)))
goto nomem;
return (zcp);
nomem:
zed_log_die("Failed to create conf: %s", strerror(errno));
return (NULL);
if (!(zcp->pid_file = strdup(ZED_PID_FILE)) ||
!(zcp->zedlet_dir = strdup(ZED_ZEDLET_DIR)) ||
!(zcp->state_file = strdup(ZED_STATE_FILE)))
zed_log_die("Failed to create conf: %s", strerror(errno));
}
/*
@@ -79,9 +62,6 @@ nomem:
void
zed_conf_destroy(struct zed_conf *zcp)
{
if (!zcp)
return;
if (zcp->state_fd >= 0) {
if (close(zcp->state_fd) < 0)
zed_log_msg(LOG_WARNING,
@@ -102,10 +82,6 @@ zed_conf_destroy(struct zed_conf *zcp)
zcp->pid_file, strerror(errno));
zcp->pid_fd = -1;
}
if (zcp->conf_file) {
free(zcp->conf_file);
zcp->conf_file = NULL;
}
if (zcp->pid_file) {
free(zcp->pid_file);
zcp->pid_file = NULL;
@@ -122,7 +98,6 @@ zed_conf_destroy(struct zed_conf *zcp)
zed_strings_destroy(zcp->zedlets);
zcp->zedlets = NULL;
}
free(zcp);
}
/*
@@ -132,44 +107,52 @@ zed_conf_destroy(struct zed_conf *zcp)
* otherwise, output to stderr and exit with a failure status.
*/
static void
_zed_conf_display_help(const char *prog, int got_err)
_zed_conf_display_help(const char *prog, boolean_t got_err)
{
struct opt { const char *o, *d, *v; };
FILE *fp = got_err ? stderr : stdout;
int w1 = 4; /* width of leading whitespace */
int w2 = 8; /* width of L-justified option field */
struct opt *oo;
struct opt iopts[] = {
{ .o = "-h", .d = "Display help" },
{ .o = "-L", .d = "Display license information" },
{ .o = "-V", .d = "Display version information" },
{},
};
struct opt nopts[] = {
{ .o = "-v", .d = "Be verbose" },
{ .o = "-f", .d = "Force daemon to run" },
{ .o = "-F", .d = "Run daemon in the foreground" },
{ .o = "-I",
.d = "Idle daemon until kernel module is (re)loaded" },
{ .o = "-M", .d = "Lock all pages in memory" },
{ .o = "-P", .d = "$PATH for ZED to use (only used by ZTS)" },
{ .o = "-Z", .d = "Zero state file" },
{},
};
struct opt vopts[] = {
{ .o = "-d DIR", .d = "Read enabled ZEDLETs from DIR.",
.v = ZED_ZEDLET_DIR },
{ .o = "-p FILE", .d = "Write daemon's PID to FILE.",
.v = ZED_PID_FILE },
{ .o = "-s FILE", .d = "Write daemon's state to FILE.",
.v = ZED_STATE_FILE },
{ .o = "-j JOBS", .d = "Start at most JOBS at once.",
.v = "16" },
{},
};
fprintf(fp, "Usage: %s [OPTION]...\n", (prog ? prog : "zed"));
fprintf(fp, "\n");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-h",
"Display help.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-L",
"Display license information.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-V",
"Display version information.");
for (oo = iopts; oo->o; ++oo)
fprintf(fp, " %*s %s\n", -8, oo->o, oo->d);
fprintf(fp, "\n");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-v",
"Be verbose.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-f",
"Force daemon to run.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-F",
"Run daemon in the foreground.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-M",
"Lock all pages in memory.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-P",
"$PATH for ZED to use (only used by ZTS).");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-Z",
"Zero state file.");
for (oo = nopts; oo->o; ++oo)
fprintf(fp, " %*s %s\n", -8, oo->o, oo->d);
fprintf(fp, "\n");
#if 0
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-c FILE",
"Read configuration from FILE.", ZED_CONF_FILE);
#endif
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-d DIR",
"Read enabled ZEDLETs from DIR.", ZED_ZEDLET_DIR);
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-p FILE",
"Write daemon's PID to FILE.", ZED_PID_FILE);
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-s FILE",
"Write daemon's state to FILE.", ZED_STATE_FILE);
for (oo = vopts; oo->o; ++oo)
fprintf(fp, " %*s %s [%s]\n", -8, oo->o, oo->d, oo->v);
fprintf(fp, "\n");
exit(got_err ? EXIT_FAILURE : EXIT_SUCCESS);
@@ -181,20 +164,14 @@ _zed_conf_display_help(const char *prog, int got_err)
static void
_zed_conf_display_license(void)
{
const char **pp;
const char *text[] = {
"The ZFS Event Daemon (ZED) is distributed under the terms of the",
" Common Development and Distribution License (CDDL-1.0)",
" <http://opensource.org/licenses/CDDL-1.0>.",
"",
printf(
"The ZFS Event Daemon (ZED) is distributed under the terms of the\n"
" Common Development and Distribution License (CDDL-1.0)\n"
" <http://opensource.org/licenses/CDDL-1.0>.\n"
"\n"
"Developed at Lawrence Livermore National Laboratory"
" (LLNL-CODE-403049).",
"",
NULL
};
for (pp = text; *pp; pp++)
printf("%s\n", *pp);
" (LLNL-CODE-403049).\n"
"\n");
exit(EXIT_SUCCESS);
}
@@ -229,16 +206,19 @@ _zed_conf_parse_path(char **resultp, const char *path)
if (path[0] == '/') {
*resultp = strdup(path);
} else if (!getcwd(buf, sizeof (buf))) {
zed_log_die("Failed to get current working dir: %s",
strerror(errno));
} else if (strlcat(buf, "/", sizeof (buf)) >= sizeof (buf)) {
zed_log_die("Failed to copy path: %s", strerror(ENAMETOOLONG));
} else if (strlcat(buf, path, sizeof (buf)) >= sizeof (buf)) {
zed_log_die("Failed to copy path: %s", strerror(ENAMETOOLONG));
} else {
if (!getcwd(buf, sizeof (buf)))
zed_log_die("Failed to get current working dir: %s",
strerror(errno));
if (strlcat(buf, "/", sizeof (buf)) >= sizeof (buf) ||
strlcat(buf, path, sizeof (buf)) >= sizeof (buf))
zed_log_die("Failed to copy path: %s",
strerror(ENAMETOOLONG));
*resultp = strdup(buf);
}
if (!*resultp)
zed_log_die("Failed to copy path: %s", strerror(ENOMEM));
}
@@ -249,8 +229,9 @@ _zed_conf_parse_path(char **resultp, const char *path)
void
zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
{
const char * const opts = ":hLVc:d:p:P:s:vfFMZ";
const char * const opts = ":hLVd:p:P:s:vfFMZIj:";
int opt;
unsigned long raw;
if (!zcp || !argv || !argv[0])
zed_log_die("Failed to parse options: Internal error");
@@ -260,7 +241,7 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
while ((opt = getopt(argc, argv, opts)) != -1) {
switch (opt) {
case 'h':
_zed_conf_display_help(argv[0], EXIT_SUCCESS);
_zed_conf_display_help(argv[0], B_FALSE);
break;
case 'L':
_zed_conf_display_license();
@@ -268,12 +249,12 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
case 'V':
_zed_conf_display_version();
break;
case 'c':
_zed_conf_parse_path(&zcp->conf_file, optarg);
break;
case 'd':
_zed_conf_parse_path(&zcp->zedlet_dir, optarg);
break;
case 'I':
zcp->do_idle = 1;
break;
case 'p':
_zed_conf_parse_path(&zcp->pid_file, optarg);
break;
@@ -298,31 +279,30 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
case 'Z':
zcp->do_zero = 1;
break;
case 'j':
errno = 0;
raw = strtoul(optarg, NULL, 0);
if (errno == ERANGE || raw > INT16_MAX) {
zed_log_die("%lu is too many jobs", raw);
} if (raw == 0) {
zed_log_die("0 jobs makes no sense");
} else {
zcp->max_jobs = raw;
}
break;
case '?':
default:
if (optopt == '?')
_zed_conf_display_help(argv[0], EXIT_SUCCESS);
_zed_conf_display_help(argv[0], B_FALSE);
fprintf(stderr, "%s: %s '-%c'\n\n", argv[0],
"Invalid option", optopt);
_zed_conf_display_help(argv[0], EXIT_FAILURE);
fprintf(stderr, "%s: Invalid option '-%c'\n\n",
argv[0], optopt);
_zed_conf_display_help(argv[0], B_TRUE);
break;
}
}
}
/*
* Parse the configuration file into the configuration [zcp].
*
* FIXME: Not yet implemented.
*/
void
zed_conf_parse_file(struct zed_conf *zcp)
{
if (!zcp)
zed_log_die("Failed to parse config: %s", strerror(EINVAL));
}
/*
* Scan the [zcp] zedlet_dir for files to exec based on the event class.
* Files must be executable by user, but not writable by group or other.
@@ -330,8 +310,6 @@ zed_conf_parse_file(struct zed_conf *zcp)
*
* Return 0 on success with an updated set of zedlets,
* or -1 on error with errno set.
*
* FIXME: Check if zedlet_dir and all parent dirs are secure.
*/
int
zed_conf_scan_dir(struct zed_conf *zcp)
@@ -447,8 +425,6 @@ zed_conf_scan_dir(struct zed_conf *zcp)
int
zed_conf_write_pid(struct zed_conf *zcp)
{
const mode_t dirmode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
const mode_t filemode = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;
char buf[PATH_MAX];
int n;
char *p;
@@ -476,7 +452,7 @@ zed_conf_write_pid(struct zed_conf *zcp)
if (p)
*p = '\0';
if ((mkdirp(buf, dirmode) < 0) && (errno != EEXIST)) {
if ((mkdirp(buf, 0755) < 0) && (errno != EEXIST)) {
zed_log_msg(LOG_ERR, "Failed to create directory \"%s\": %s",
buf, strerror(errno));
goto err;
@@ -486,7 +462,7 @@ zed_conf_write_pid(struct zed_conf *zcp)
*/
mask = umask(0);
umask(mask | 022);
zcp->pid_fd = open(zcp->pid_file, (O_RDWR | O_CREAT), filemode);
zcp->pid_fd = open(zcp->pid_file, O_RDWR | O_CREAT | O_CLOEXEC, 0644);
umask(mask);
if (zcp->pid_fd < 0) {
zed_log_msg(LOG_ERR, "Failed to open PID file \"%s\": %s",
@@ -523,7 +499,7 @@ zed_conf_write_pid(struct zed_conf *zcp)
errno = ERANGE;
zed_log_msg(LOG_ERR, "Failed to write PID file \"%s\": %s",
zcp->pid_file, strerror(errno));
} else if (zed_file_write_n(zcp->pid_fd, buf, n) != n) {
} else if (write(zcp->pid_fd, buf, n) != n) {
zed_log_msg(LOG_ERR, "Failed to write PID file \"%s\": %s",
zcp->pid_file, strerror(errno));
} else if (fdatasync(zcp->pid_fd) < 0) {
@@ -551,7 +527,6 @@ int
zed_conf_open_state(struct zed_conf *zcp)
{
char dirbuf[PATH_MAX];
mode_t dirmode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
int n;
char *p;
int rv;
@@ -573,7 +548,7 @@ zed_conf_open_state(struct zed_conf *zcp)
if (p)
*p = '\0';
if ((mkdirp(dirbuf, dirmode) < 0) && (errno != EEXIST)) {
if ((mkdirp(dirbuf, 0755) < 0) && (errno != EEXIST)) {
zed_log_msg(LOG_WARNING,
"Failed to create directory \"%s\": %s",
dirbuf, strerror(errno));
@@ -591,7 +566,7 @@ zed_conf_open_state(struct zed_conf *zcp)
(void) unlink(zcp->state_file);
zcp->state_fd = open(zcp->state_file,
(O_RDWR | O_CREAT), (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH));
O_RDWR | O_CREAT | O_CLOEXEC, 0644);
if (zcp->state_fd < 0) {
zed_log_msg(LOG_WARNING, "Failed to open state file \"%s\": %s",
zcp->state_file, strerror(errno));
+19 -22
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -20,42 +20,39 @@
#include "zed_strings.h"
struct zed_conf {
unsigned do_force:1; /* true if force enabled */
unsigned do_foreground:1; /* true if run in foreground */
unsigned do_memlock:1; /* true if locking memory */
unsigned do_verbose:1; /* true if verbosity enabled */
unsigned do_zero:1; /* true if zeroing state */
int syslog_facility; /* syslog facility value */
int min_events; /* RESERVED FOR FUTURE USE */
int max_events; /* RESERVED FOR FUTURE USE */
char *conf_file; /* abs path to config file */
char *pid_file; /* abs path to pid file */
int pid_fd; /* fd to pid file for lock */
char *zedlet_dir; /* abs path to zedlet dir */
zed_strings_t *zedlets; /* names of enabled zedlets */
char *state_file; /* abs path to state file */
int state_fd; /* fd to state file */
libzfs_handle_t *zfs_hdl; /* handle to libzfs */
int zevent_fd; /* fd for access to zevents */
zed_strings_t *zedlets; /* names of enabled zedlets */
char *path; /* custom $PATH for zedlets to use */
int pid_fd; /* fd to pid file for lock */
int state_fd; /* fd to state file */
int zevent_fd; /* fd for access to zevents */
int16_t max_jobs; /* max zedlets to run at one time */
boolean_t do_force:1; /* true if force enabled */
boolean_t do_foreground:1; /* true if run in foreground */
boolean_t do_memlock:1; /* true if locking memory */
boolean_t do_verbose:1; /* true if verbosity enabled */
boolean_t do_zero:1; /* true if zeroing state */
boolean_t do_idle:1; /* true if idle enabled */
};
struct zed_conf *zed_conf_create(void);
void zed_conf_init(struct zed_conf *zcp);
void zed_conf_destroy(struct zed_conf *zcp);
void zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv);
void zed_conf_parse_file(struct zed_conf *zcp);
int zed_conf_scan_dir(struct zed_conf *zcp);
int zed_conf_write_pid(struct zed_conf *zcp);
int zed_conf_open_state(struct zed_conf *zcp);
int zed_conf_read_state(struct zed_conf *zcp, uint64_t *eidp, int64_t etime[]);
int zed_conf_write_state(struct zed_conf *zcp, uint64_t eid, int64_t etime[]);
#endif /* !ZED_CONF_H */
+3
View File
@@ -72,6 +72,8 @@ zed_udev_event(const char *class, const char *subclass, nvlist_t *nvl)
zed_log_msg(LOG_INFO, "\t%s: %s", DEV_PATH, strval);
if (nvlist_lookup_string(nvl, DEV_IDENTIFIER, &strval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %s", DEV_IDENTIFIER, strval);
if (nvlist_lookup_boolean(nvl, DEV_IS_PART) == B_TRUE)
zed_log_msg(LOG_INFO, "\t%s: B_TRUE", DEV_IS_PART);
if (nvlist_lookup_string(nvl, DEV_PHYS_PATH, &strval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %s", DEV_PHYS_PATH, strval);
if (nvlist_lookup_uint64(nvl, DEV_SIZE, &numval) == 0)
@@ -379,6 +381,7 @@ zed_disk_event_init()
return (-1);
}
pthread_setname_np(g_mon_tid, "udev monitor");
zed_log_msg(LOG_INFO, "zed_disk_event_init");
return (0);
+74 -39
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -15,7 +15,7 @@
#include <ctype.h>
#include <errno.h>
#include <fcntl.h>
#include <libzfs.h> /* FIXME: Replace with libzfs_core. */
#include <libzfs_core.h>
#include <paths.h>
#include <stdarg.h>
#include <stdio.h>
@@ -28,6 +28,7 @@
#include "zed.h"
#include "zed_conf.h"
#include "zed_disk_event.h"
#include "zed_event.h"
#include "zed_exec.h"
#include "zed_file.h"
#include "zed_log.h"
@@ -40,25 +41,36 @@
/*
* Open the libzfs interface.
*/
void
int
zed_event_init(struct zed_conf *zcp)
{
if (!zcp)
zed_log_die("Failed zed_event_init: %s", strerror(EINVAL));
zcp->zfs_hdl = libzfs_init();
if (!zcp->zfs_hdl)
if (!zcp->zfs_hdl) {
if (zcp->do_idle)
return (-1);
zed_log_die("Failed to initialize libzfs");
}
zcp->zevent_fd = open(ZFS_DEV, O_RDWR);
if (zcp->zevent_fd < 0)
zcp->zevent_fd = open(ZFS_DEV, O_RDWR | O_CLOEXEC);
if (zcp->zevent_fd < 0) {
if (zcp->do_idle)
return (-1);
zed_log_die("Failed to open \"%s\": %s",
ZFS_DEV, strerror(errno));
}
zfs_agent_init(zcp->zfs_hdl);
if (zed_disk_event_init() != 0)
if (zed_disk_event_init() != 0) {
if (zcp->do_idle)
return (-1);
zed_log_die("Failed to initialize disk events");
}
return (0);
}
/*
@@ -84,6 +96,47 @@ zed_event_fini(struct zed_conf *zcp)
libzfs_fini(zcp->zfs_hdl);
zcp->zfs_hdl = NULL;
}
zed_exec_fini();
}
static void
_bump_event_queue_length(void)
{
int zzlm = -1, wr;
char qlen_buf[12] = {0}; /* parameter is int => max "-2147483647\n" */
long int qlen;
zzlm = open("/sys/module/zfs/parameters/zfs_zevent_len_max", O_RDWR);
if (zzlm < 0)
goto done;
if (read(zzlm, qlen_buf, sizeof (qlen_buf)) < 0)
goto done;
qlen_buf[sizeof (qlen_buf) - 1] = '\0';
errno = 0;
qlen = strtol(qlen_buf, NULL, 10);
if (errno == ERANGE)
goto done;
if (qlen <= 0)
qlen = 512; /* default zfs_zevent_len_max value */
else
qlen *= 2;
if (qlen > INT_MAX)
qlen = INT_MAX;
wr = snprintf(qlen_buf, sizeof (qlen_buf), "%ld", qlen);
if (pwrite(zzlm, qlen_buf, wr, 0) < 0)
goto done;
zed_log_msg(LOG_WARNING, "Bumping queue length to %ld", qlen);
done:
if (zzlm > -1)
(void) close(zzlm);
}
/*
@@ -124,10 +177,7 @@ zed_event_seek(struct zed_conf *zcp, uint64_t saved_eid, int64_t saved_etime[])
if (n_dropped > 0) {
zed_log_msg(LOG_WARNING, "Missed %d events", n_dropped);
/*
* FIXME: Increase max size of event nvlist in
* /sys/module/zfs/parameters/zfs_zevent_len_max ?
*/
_bump_event_queue_length();
}
if (nvlist_lookup_uint64(nvl, "eid", &eid) != 0) {
zed_log_msg(LOG_WARNING, "Failed to lookup zevent eid");
@@ -199,7 +249,7 @@ _zed_event_value_is_hex(const char *name)
*
* All environment variables in [zsp] should be added through this function.
*/
static int
static __attribute__((format(printf, 5, 6))) int
_zed_event_add_var(uint64_t eid, zed_strings_t *zsp,
const char *prefix, const char *name, const char *fmt, ...)
{
@@ -574,8 +624,6 @@ _zed_event_add_string_array(uint64_t eid, zed_strings_t *zsp,
* Convert the nvpair [nvp] to a string which is added to the environment
* of the child process.
* Return 0 on success, -1 on error.
*
* FIXME: Refactor with cmd/zpool/zpool_main.c:zpool_do_events_nvprint()?
*/
static void
_zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
@@ -674,23 +722,11 @@ _zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
_zed_event_add_var(eid, zsp, prefix, name,
"%llu", (u_longlong_t)i64);
break;
case DATA_TYPE_NVLIST:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break;
case DATA_TYPE_STRING:
(void) nvpair_value_string(nvp, &str);
_zed_event_add_var(eid, zsp, prefix, name,
"%s", (str ? str : "<NULL>"));
break;
case DATA_TYPE_BOOLEAN_ARRAY:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break;
case DATA_TYPE_BYTE_ARRAY:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break;
case DATA_TYPE_INT8_ARRAY:
_zed_event_add_int8_array(eid, zsp, prefix, nvp);
break;
@@ -718,9 +754,11 @@ _zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
case DATA_TYPE_STRING_ARRAY:
_zed_event_add_string_array(eid, zsp, prefix, nvp);
break;
case DATA_TYPE_NVLIST:
case DATA_TYPE_BOOLEAN_ARRAY:
case DATA_TYPE_BYTE_ARRAY:
case DATA_TYPE_NVLIST_ARRAY:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
_zed_event_add_var(eid, zsp, prefix, name, "_NOT_IMPLEMENTED_");
break;
default:
errno = EINVAL;
@@ -872,7 +910,7 @@ _zed_event_add_time_strings(uint64_t eid, zed_strings_t *zsp, int64_t etime[])
/*
* Service the next zevent, blocking until one is available.
*/
void
int
zed_event_service(struct zed_conf *zcp)
{
nvlist_t *nvl;
@@ -890,20 +928,17 @@ zed_event_service(struct zed_conf *zcp)
errno = EINVAL;
zed_log_msg(LOG_ERR, "Failed to service zevent: %s",
strerror(errno));
return;
return (EINVAL);
}
rv = zpool_events_next(zcp->zfs_hdl, &nvl, &n_dropped, ZEVENT_NONE,
zcp->zevent_fd);
if ((rv != 0) || !nvl)
return;
return (errno);
if (n_dropped > 0) {
zed_log_msg(LOG_WARNING, "Missed %d events", n_dropped);
/*
* FIXME: Increase max size of event nvlist in
* /sys/module/zfs/parameters/zfs_zevent_len_max ?
*/
_bump_event_queue_length();
}
if (nvlist_lookup_uint64(nvl, "eid", &eid) != 0) {
zed_log_msg(LOG_WARNING, "Failed to lookup zevent eid");
@@ -941,12 +976,12 @@ zed_event_service(struct zed_conf *zcp)
_zed_event_add_time_strings(eid, zsp, etime);
zed_exec_process(eid, class, subclass,
zcp->zedlet_dir, zcp->zedlets, zsp, zcp->zevent_fd);
zed_exec_process(eid, class, subclass, zcp, zsp);
zed_conf_write_state(zcp, eid, etime);
zed_strings_destroy(zsp);
}
nvlist_free(nvl);
return (0);
}
+5 -5
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -17,13 +17,13 @@
#include <stdint.h>
void zed_event_init(struct zed_conf *zcp);
int zed_event_init(struct zed_conf *zcp);
void zed_event_fini(struct zed_conf *zcp);
int zed_event_seek(struct zed_conf *zcp, uint64_t saved_eid,
int64_t saved_etime[]);
void zed_event_service(struct zed_conf *zcp);
int zed_event_service(struct zed_conf *zcp);
#endif /* !ZED_EVENT_H */
+196 -59
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -18,16 +18,53 @@
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h>
#include <sys/avl.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>
#include "zed_file.h"
#include <pthread.h>
#include "zed_exec.h"
#include "zed_log.h"
#include "zed_strings.h"
#define ZEVENT_FILENO 3
struct launched_process_node {
avl_node_t node;
pid_t pid;
uint64_t eid;
char *name;
};
static int
_launched_process_node_compare(const void *x1, const void *x2)
{
pid_t p1;
pid_t p2;
assert(x1 != NULL);
assert(x2 != NULL);
p1 = ((const struct launched_process_node *) x1)->pid;
p2 = ((const struct launched_process_node *) x2)->pid;
if (p1 < p2)
return (-1);
else if (p1 == p2)
return (0);
else
return (1);
}
static pthread_t _reap_children_tid = (pthread_t)-1;
static volatile boolean_t _reap_children_stop;
static avl_tree_t _launched_processes;
static pthread_mutex_t _launched_processes_lock = PTHREAD_MUTEX_INITIALIZER;
static int16_t _launched_processes_limit;
/*
* Create an environment string array for passing to execve() using the
* NAME=VALUE strings in container [zsp].
@@ -78,20 +115,26 @@ _zed_exec_create_env(zed_strings_t *zsp)
*/
static void
_zed_exec_fork_child(uint64_t eid, const char *dir, const char *prog,
char *env[], int zfd)
char *env[], int zfd, boolean_t in_foreground)
{
char path[PATH_MAX];
int n;
pid_t pid;
int fd;
pid_t wpid;
int status;
struct launched_process_node *node;
sigset_t mask;
struct timespec launch_timeout =
{ .tv_sec = 0, .tv_nsec = 200 * 1000 * 1000, };
assert(dir != NULL);
assert(prog != NULL);
assert(env != NULL);
assert(zfd >= 0);
while (__atomic_load_n(&_launched_processes_limit,
__ATOMIC_SEQ_CST) <= 0)
(void) nanosleep(&launch_timeout, NULL);
n = snprintf(path, sizeof (path), "%s/%s", dir, prog);
if ((n < 0) || (n >= sizeof (path))) {
zed_log_msg(LOG_WARNING,
@@ -99,100 +142,179 @@ _zed_exec_fork_child(uint64_t eid, const char *dir, const char *prog,
prog, eid, strerror(ENAMETOOLONG));
return;
}
(void) pthread_mutex_lock(&_launched_processes_lock);
pid = fork();
if (pid < 0) {
(void) pthread_mutex_unlock(&_launched_processes_lock);
zed_log_msg(LOG_WARNING,
"Failed to fork \"%s\" for eid=%llu: %s",
prog, eid, strerror(errno));
return;
} else if (pid == 0) {
(void) sigemptyset(&mask);
(void) sigprocmask(SIG_SETMASK, &mask, NULL);
(void) umask(022);
if ((fd = open("/dev/null", O_RDWR)) != -1) {
if (in_foreground && /* we're already devnulled if daemonised */
(fd = open("/dev/null", O_RDWR | O_CLOEXEC)) != -1) {
(void) dup2(fd, STDIN_FILENO);
(void) dup2(fd, STDOUT_FILENO);
(void) dup2(fd, STDERR_FILENO);
}
(void) dup2(zfd, ZEVENT_FILENO);
zed_file_close_from(ZEVENT_FILENO + 1);
execle(path, prog, NULL, env);
_exit(127);
}
/* parent process */
node = calloc(1, sizeof (*node));
if (node) {
node->pid = pid;
node->eid = eid;
node->name = strdup(prog);
avl_add(&_launched_processes, node);
}
(void) pthread_mutex_unlock(&_launched_processes_lock);
__atomic_sub_fetch(&_launched_processes_limit, 1, __ATOMIC_SEQ_CST);
zed_log_msg(LOG_INFO, "Invoking \"%s\" eid=%llu pid=%d",
prog, eid, pid);
}
/* FIXME: Timeout rogue child processes with sigalarm? */
static void
_nop(int sig)
{}
/*
* Wait for child process using WNOHANG to limit
* the time spent waiting to 10 seconds (10,000ms).
*/
for (n = 0; n < 1000; n++) {
wpid = waitpid(pid, &status, WNOHANG);
if (wpid == (pid_t)-1) {
if (errno == EINTR)
continue;
zed_log_msg(LOG_WARNING,
"Failed to wait for \"%s\" eid=%llu pid=%d",
prog, eid, pid);
break;
} else if (wpid == 0) {
struct timespec t;
static void *
_reap_children(void *arg)
{
struct launched_process_node node, *pnode;
pid_t pid;
int status;
struct rusage usage;
struct sigaction sa = {};
/* child still running */
t.tv_sec = 0;
t.tv_nsec = 10000000; /* 10ms */
(void) nanosleep(&t, NULL);
continue;
}
(void) sigfillset(&sa.sa_mask);
(void) sigdelset(&sa.sa_mask, SIGCHLD);
(void) pthread_sigmask(SIG_SETMASK, &sa.sa_mask, NULL);
if (WIFEXITED(status)) {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d exit=%d",
prog, eid, pid, WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d sig=%d/%s",
prog, eid, pid, WTERMSIG(status),
strsignal(WTERMSIG(status)));
(void) sigemptyset(&sa.sa_mask);
sa.sa_handler = _nop;
sa.sa_flags = SA_NOCLDSTOP;
(void) sigaction(SIGCHLD, &sa, NULL);
for (_reap_children_stop = B_FALSE; !_reap_children_stop; ) {
(void) pthread_mutex_lock(&_launched_processes_lock);
pid = wait4(0, &status, WNOHANG, &usage);
if (pid == 0 || pid == (pid_t)-1) {
(void) pthread_mutex_unlock(&_launched_processes_lock);
if (pid == 0 || errno == ECHILD)
pause();
else if (errno != EINTR)
zed_log_msg(LOG_WARNING,
"Failed to wait for children: %s",
strerror(errno));
} else {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d status=0x%X",
prog, eid, (unsigned int) status);
memset(&node, 0, sizeof (node));
node.pid = pid;
pnode = avl_find(&_launched_processes, &node, NULL);
if (pnode) {
memcpy(&node, pnode, sizeof (node));
avl_remove(&_launched_processes, pnode);
free(pnode);
}
(void) pthread_mutex_unlock(&_launched_processes_lock);
__atomic_add_fetch(&_launched_processes_limit, 1,
__ATOMIC_SEQ_CST);
usage.ru_utime.tv_sec += usage.ru_stime.tv_sec;
usage.ru_utime.tv_usec += usage.ru_stime.tv_usec;
usage.ru_utime.tv_sec +=
usage.ru_utime.tv_usec / (1000 * 1000);
usage.ru_utime.tv_usec %= 1000 * 1000;
if (WIFEXITED(status)) {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d "
"time=%llu.%06us exit=%d",
node.name, node.eid, pid,
(unsigned long long) usage.ru_utime.tv_sec,
(unsigned int) usage.ru_utime.tv_usec,
WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d "
"time=%llu.%06us sig=%d/%s",
node.name, node.eid, pid,
(unsigned long long) usage.ru_utime.tv_sec,
(unsigned int) usage.ru_utime.tv_usec,
WTERMSIG(status),
strsignal(WTERMSIG(status)));
} else {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d "
"time=%llu.%06us status=0x%X",
node.name, node.eid,
(unsigned long long) usage.ru_utime.tv_sec,
(unsigned int) usage.ru_utime.tv_usec,
(unsigned int) status);
}
free(node.name);
}
break;
}
/*
* kill child process after 10 seconds
*/
if (wpid == 0) {
zed_log_msg(LOG_WARNING, "Killing hung \"%s\" pid=%d",
prog, pid);
(void) kill(pid, SIGKILL);
return (NULL);
}
void
zed_exec_fini(void)
{
struct launched_process_node *node;
void *ck = NULL;
if (_reap_children_tid == (pthread_t)-1)
return;
_reap_children_stop = B_TRUE;
(void) pthread_kill(_reap_children_tid, SIGCHLD);
(void) pthread_join(_reap_children_tid, NULL);
while ((node = avl_destroy_nodes(&_launched_processes, &ck)) != NULL) {
free(node->name);
free(node);
}
avl_destroy(&_launched_processes);
(void) pthread_mutex_destroy(&_launched_processes_lock);
(void) pthread_mutex_init(&_launched_processes_lock, NULL);
_reap_children_tid = (pthread_t)-1;
}
/*
* Process the event [eid] by synchronously invoking all zedlets with a
* matching class prefix.
*
* Each executable in [zedlets] from the directory [dir] is matched against
* the event's [class], [subclass], and the "all" class (which matches
* all events). Every zedlet with a matching class prefix is invoked.
* Each executable in [zcp->zedlets] from the directory [zcp->zedlet_dir]
* is matched against the event's [class], [subclass], and the "all" class
* (which matches all events).
* Every zedlet with a matching class prefix is invoked.
* The NAME=VALUE strings in [envs] will be passed to the zedlet as
* environment variables.
*
* The file descriptor [zfd] is the zevent_fd used to track the
* The file descriptor [zcp->zevent_fd] is the zevent_fd used to track the
* current cursor location within the zevent nvlist.
*
* Return 0 on success, -1 on error.
*/
int
zed_exec_process(uint64_t eid, const char *class, const char *subclass,
const char *dir, zed_strings_t *zedlets, zed_strings_t *envs, int zfd)
struct zed_conf *zcp, zed_strings_t *envs)
{
const char *class_strings[4];
const char *allclass = "all";
@@ -201,9 +323,22 @@ zed_exec_process(uint64_t eid, const char *class, const char *subclass,
char **e;
int n;
if (!dir || !zedlets || !envs || zfd < 0)
if (!zcp->zedlet_dir || !zcp->zedlets || !envs || zcp->zevent_fd < 0)
return (-1);
if (_reap_children_tid == (pthread_t)-1) {
_launched_processes_limit = zcp->max_jobs;
if (pthread_create(&_reap_children_tid, NULL,
_reap_children, NULL) != 0)
return (-1);
pthread_setname_np(_reap_children_tid, "reap ZEDLETs");
avl_create(&_launched_processes, _launched_process_node_compare,
sizeof (struct launched_process_node),
offsetof(struct launched_process_node, node));
}
csp = class_strings;
if (class)
@@ -219,11 +354,13 @@ zed_exec_process(uint64_t eid, const char *class, const char *subclass,
e = _zed_exec_create_env(envs);
for (z = zed_strings_first(zedlets); z; z = zed_strings_next(zedlets)) {
for (z = zed_strings_first(zcp->zedlets); z;
z = zed_strings_next(zcp->zedlets)) {
for (csp = class_strings; *csp; csp++) {
n = strlen(*csp);
if ((strncmp(z, *csp, n) == 0) && !isalpha(z[n]))
_zed_exec_fork_child(eid, dir, z, e, zfd);
_zed_exec_fork_child(eid, zcp->zedlet_dir,
z, e, zcp->zevent_fd, zcp->do_foreground);
}
}
free(e);
+8 -5
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -16,9 +16,12 @@
#define ZED_EXEC_H
#include <stdint.h>
#include "zed_strings.h"
#include "zed_conf.h"
void zed_exec_fini(void);
int zed_exec_process(uint64_t eid, const char *class, const char *subclass,
const char *dir, zed_strings_t *zedlets, zed_strings_t *envs,
int zevent_fd);
struct zed_conf *zcp, zed_strings_t *envs);
#endif /* !ZED_EXEC_H */
+24 -99
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -12,72 +12,17 @@
* You may not use this file except in compliance with the license.
*/
#include <dirent.h>
#include <errno.h>
#include <fcntl.h>
#include <limits.h>
#include <string.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include "zed_file.h"
#include "zed_log.h"
/*
* Read up to [n] bytes from [fd] into [buf].
* Return the number of bytes read, 0 on EOF, or -1 on error.
*/
ssize_t
zed_file_read_n(int fd, void *buf, size_t n)
{
unsigned char *p;
size_t n_left;
ssize_t n_read;
p = buf;
n_left = n;
while (n_left > 0) {
if ((n_read = read(fd, p, n_left)) < 0) {
if (errno == EINTR)
continue;
else
return (-1);
} else if (n_read == 0) {
break;
}
n_left -= n_read;
p += n_read;
}
return (n - n_left);
}
/*
* Write [n] bytes from [buf] out to [fd].
* Return the number of bytes written, or -1 on error.
*/
ssize_t
zed_file_write_n(int fd, void *buf, size_t n)
{
const unsigned char *p;
size_t n_left;
ssize_t n_written;
p = buf;
n_left = n;
while (n_left > 0) {
if ((n_written = write(fd, p, n_left)) < 0) {
if (errno == EINTR)
continue;
else
return (-1);
}
n_left -= n_written;
p += n_written;
}
return (n);
}
/*
* Set an exclusive advisory lock on the open file descriptor [fd].
* Return 0 on success, 1 if a conflicting lock is held by another process,
@@ -159,6 +104,13 @@ zed_file_is_locked(int fd)
return (lock.l_pid);
}
#if __APPLE__
#define PROC_SELF_FD "/dev/fd"
#else /* Linux-compatible layout */
#define PROC_SELF_FD "/proc/self/fd"
#endif
/*
* Close all open file descriptors greater than or equal to [lowfd].
* Any errors encountered while closing file descriptors are ignored.
@@ -166,51 +118,24 @@ zed_file_is_locked(int fd)
void
zed_file_close_from(int lowfd)
{
const int maxfd_def = 256;
int errno_bak;
struct rlimit rl;
int maxfd;
int errno_bak = errno;
int maxfd = 0;
int fd;
DIR *fddir;
struct dirent *fdent;
errno_bak = errno;
if (getrlimit(RLIMIT_NOFILE, &rl) < 0) {
maxfd = maxfd_def;
} else if (rl.rlim_max == RLIM_INFINITY) {
maxfd = maxfd_def;
if ((fddir = opendir(PROC_SELF_FD)) != NULL) {
while ((fdent = readdir(fddir)) != NULL) {
fd = atoi(fdent->d_name);
if (fd > maxfd && fd != dirfd(fddir))
maxfd = fd;
}
(void) closedir(fddir);
} else {
maxfd = rl.rlim_max;
maxfd = sysconf(_SC_OPEN_MAX);
}
for (fd = lowfd; fd < maxfd; fd++)
(void) close(fd);
errno = errno_bak;
}
/*
* Set the CLOEXEC flag on file descriptor [fd] so it will be automatically
* closed upon successful execution of one of the exec functions.
* Return 0 on success, or -1 on error.
*
* FIXME: No longer needed?
*/
int
zed_file_close_on_exec(int fd)
{
int flags;
if (fd < 0) {
errno = EBADF;
return (-1);
}
flags = fcntl(fd, F_GETFD);
if (flags == -1)
return (-1);
flags |= FD_CLOEXEC;
if (fcntl(fd, F_SETFD, flags) == -1)
return (-1);
return (0);
}
+3 -9
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -18,10 +18,6 @@
#include <sys/types.h>
#include <unistd.h>
ssize_t zed_file_read_n(int fd, void *buf, size_t n);
ssize_t zed_file_write_n(int fd, void *buf, size_t n);
int zed_file_lock(int fd);
int zed_file_unlock(int fd);
@@ -30,6 +26,4 @@ pid_t zed_file_is_locked(int fd);
void zed_file_close_from(int fd);
int zed_file_close_on_exec(int fd);
#endif /* !ZED_FILE_H */
+3 -3
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
+3 -3
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
+4 -4
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -108,7 +108,7 @@ _zed_strings_node_destroy(zed_strings_node_t *np)
* If [key] is specified, it will be used to index the node; otherwise,
* the string [val] will be used.
*/
zed_strings_node_t *
static zed_strings_node_t *
_zed_strings_node_create(const char *key, const char *val)
{
zed_strings_node_t *np;
+3 -3
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
+12 -8
View File
@@ -1,9 +1,5 @@
include $(top_srcdir)/config/Rules.am
DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \
-I$(top_srcdir)/lib/libspl/include
sbin_PROGRAMS = zfs
zfs_SOURCES = \
@@ -15,7 +11,15 @@ zfs_SOURCES = \
zfs_projectutil.h
zfs_LDADD = \
$(top_builddir)/lib/libnvpair/libnvpair.la \
$(top_builddir)/lib/libuutil/libuutil.la \
$(top_builddir)/lib/libzfs/libzfs.la \
$(top_builddir)/lib/libzfs_core/libzfs_core.la
$(abs_top_builddir)/lib/libzfs/libzfs.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la \
$(abs_top_builddir)/lib/libnvpair/libnvpair.la \
$(abs_top_builddir)/lib/libuutil/libuutil.la
zfs_LDADD += $(LTLIBINTL)
if BUILD_FREEBSD
zfs_LDADD += -lgeom -ljail
endif
include $(top_srcdir)/config/CppCheck.am
+913 -318
View File
File diff suppressed because it is too large Load Diff
+1 -1
View File
@@ -33,7 +33,7 @@ extern "C" {
void * safe_malloc(size_t size);
void nomem(void);
libzfs_handle_t *g_zfs;
extern libzfs_handle_t *g_zfs;
#ifdef __cplusplus
}
+1
View File
@@ -0,0 +1 @@
zfs_ids_to_path
+11
View File
@@ -0,0 +1,11 @@
include $(top_srcdir)/config/Rules.am
sbin_PROGRAMS = zfs_ids_to_path
zfs_ids_to_path_SOURCES = \
zfs_ids_to_path.c
zfs_ids_to_path_LDADD = \
$(abs_top_builddir)/lib/libzfs/libzfs.la
include $(top_srcdir)/config/CppCheck.am
+96
View File
@@ -0,0 +1,96 @@
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright (c) 2019 by Delphix. All rights reserved.
*/
#include <libintl.h>
#include <unistd.h>
#include <sys/types.h>
#include <stdint.h>
#include <libzfs.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
libzfs_handle_t *g_zfs;
static void
usage(int err)
{
fprintf(stderr, "Usage: zfs_ids_to_path [-v] <pool> <objset id> "
"<object id>\n");
exit(err);
}
int
main(int argc, char **argv)
{
boolean_t verbose = B_FALSE;
int c;
while ((c = getopt(argc, argv, "v")) != -1) {
switch (c) {
case 'v':
verbose = B_TRUE;
break;
}
}
argc -= optind;
argv += optind;
if (argc != 3) {
(void) fprintf(stderr, "Incorrect number of arguments: %d\n",
argc);
usage(1);
}
uint64_t objset, object;
if (sscanf(argv[1], "%llu", (u_longlong_t *)&objset) != 1) {
(void) fprintf(stderr, "Invalid objset id: %s\n", argv[1]);
usage(2);
}
if (sscanf(argv[2], "%llu", (u_longlong_t *)&object) != 1) {
(void) fprintf(stderr, "Invalid object id: %s\n", argv[2]);
usage(3);
}
if ((g_zfs = libzfs_init()) == NULL) {
(void) fprintf(stderr, "%s\n", libzfs_error_init(errno));
return (4);
}
zpool_handle_t *pool = zpool_open(g_zfs, argv[0]);
if (pool == NULL) {
fprintf(stderr, "Could not open pool %s\n", argv[0]);
libzfs_fini(g_zfs);
return (5);
}
char pathname[PATH_MAX * 2];
if (verbose) {
zpool_obj_to_path_ds(pool, objset, object, pathname,
sizeof (pathname));
} else {
zpool_obj_to_path(pool, objset, object, pathname,
sizeof (pathname));
}
printf("%s\n", pathname);
zpool_close(pool);
libzfs_fini(g_zfs);
return (0);
}
+1
View File
@@ -0,0 +1 @@
/zgenhostid
+7 -1
View File
@@ -1 +1,7 @@
dist_bin_SCRIPTS = zgenhostid
include $(top_srcdir)/config/Rules.am
sbin_PROGRAMS = zgenhostid
zgenhostid_SOURCES = zgenhostid.c
include $(top_srcdir)/config/CppCheck.am
-61
View File
@@ -1,61 +0,0 @@
#!/bin/bash
# Emulate genhostid(1) available on RHEL/CENTOS, for use on distros
# which do not provide that utility.
#
# Usage:
# zgenhostid
# zgenhostid <value>
#
# If /etc/hostid already exists and is size > 0, the script exits immediately
# and changes nothing. Unlike genhostid, this generates an error message.
#
# The first form generates a random hostid and stores it in /etc/hostid.
# The second form checks that the provided value is between 0x1 and 0xFFFFFFFF
# and if so, stores it in /etc/hostid. This form is not supported by
# genhostid(1).
hostid_file=/etc/hostid
function usage {
echo "$0 [value]"
echo "If $hostid_file is not present, store a hostid in it." >&2
echo "The optional value must be an 8-digit hex number between" >&2
echo "1 and 2^32-1. If no value is provided, a random one will" >&2
echo "be generated. The value must be unique among your systems." >&2
}
# hostid(1) ignores contents of /etc/hostid if size < 4 bytes. It would
# be better if this checked size >= 4 bytes but it the method must be
# widely portable.
if [ -s $hostid_file ]; then
echo "$hostid_file already exists. No change made." >&2
exit 1
fi
if [ -n "$1" ]; then
host_id=$1
else
# $RANDOM goes from 0..32k-1
number=$((((RANDOM % 4) * 32768 + RANDOM) * 32768 + RANDOM))
host_id=$(printf "%08x" $number)
fi
if egrep -o '^0{8}$' <<< $host_id >/dev/null 2>&1; then
usage
exit 2
fi
if ! egrep -o '^[a-fA-F0-9]{8}$' <<< $host_id >/dev/null 2>&1; then
usage
exit 3
fi
a=${host_id:6:2}
b=${host_id:4:2}
c=${host_id:2:2}
d=${host_id:0:2}
echo -ne \\x$a\\x$b\\x$c\\x$d > $hostid_file
exit 0
+142
View File
@@ -0,0 +1,142 @@
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright (c) 2020, Georgy Yakovlev. All rights reserved.
*/
#include <errno.h>
#include <fcntl.h>
#include <getopt.h>
#include <inttypes.h>
#include <limits.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <time.h>
#include <unistd.h>
static void
usage(void)
{
(void) fprintf(stderr,
"usage: zgenhostid [-fh] [-o path] [value]\n\n"
" -f\t\t force hostid file write\n"
" -h\t\t print this usage and exit\n"
" -o <filename>\t write hostid to this file\n\n"
"If hostid file is not present, store a hostid in it.\n"
"The optional value should be an 8-digit hex number between"
" 1 and 2^32-1.\n"
"If the value is 0 or no value is provided, a random one"
" will be generated.\n"
"The value must be unique among your systems.\n");
exit(EXIT_FAILURE);
/* NOTREACHED */
}
int
main(int argc, char **argv)
{
/* default file path, can be optionally set by user */
const char *path = "/etc/hostid";
/* holds converted user input or lrand48() generated value */
unsigned long input_i = 0;
int opt;
int force_fwrite = 0;
while ((opt = getopt_long(argc, argv, "fo:h?", 0, 0)) != -1) {
switch (opt) {
case 'f':
force_fwrite = 1;
break;
case 'o':
path = optarg;
break;
case 'h':
case '?':
usage();
}
}
char *in_s = argv[optind];
if (in_s != NULL) {
/* increment pointer by 2 if string is 0x prefixed */
if (strncasecmp("0x", in_s, 2) == 0) {
in_s += 2;
}
/* need to be exactly 8 characters */
const char *hex = "0123456789abcdefABCDEF";
if (strlen(in_s) != 8 || strspn(in_s, hex) != 8) {
fprintf(stderr, "%s\n", strerror(ERANGE));
usage();
}
input_i = strtoul(in_s, NULL, 16);
if (errno != 0) {
perror("strtoul");
exit(EXIT_FAILURE);
}
if (input_i > UINT32_MAX) {
fprintf(stderr, "%s\n", strerror(ERANGE));
usage();
}
}
struct stat fstat;
if (force_fwrite == 0 && stat(path, &fstat) == 0 &&
S_ISREG(fstat.st_mode)) {
fprintf(stderr, "%s: %s\n", path, strerror(EEXIST));
exit(EXIT_FAILURE);
}
/*
* generate if not provided by user
* also handle unlikely zero return from lrand48()
*/
while (input_i == 0) {
srand48(getpid() ^ time(NULL));
input_i = lrand48();
}
FILE *fp = fopen(path, "wb");
if (!fp) {
perror("fopen");
exit(EXIT_FAILURE);
}
/*
* we need just 4 bytes in native endianness
* not using sethostid() because it may be missing or just a stub
*/
uint32_t hostid = input_i;
int written = fwrite(&hostid, 1, 4, fp);
if (written != 4) {
perror("fwrite");
exit(EXIT_FAILURE);
}
fclose(fp);
exit(EXIT_SUCCESS);
}
+7 -5
View File
@@ -1,8 +1,7 @@
include $(top_srcdir)/config/Rules.am
DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \
-I$(top_srcdir)/lib/libspl/include
# Unconditionally enable debugging for zhack
AM_CPPFLAGS += -DDEBUG -UNDEBUG -DZFS_DEBUG
sbin_PROGRAMS = zhack
@@ -10,5 +9,8 @@ zhack_SOURCES = \
zhack.c
zhack_LDADD = \
$(top_builddir)/lib/libnvpair/libnvpair.la \
$(top_builddir)/lib/libzpool/libzpool.la
$(abs_top_builddir)/lib/libzpool/libzpool.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la \
$(abs_top_builddir)/lib/libnvpair/libnvpair.la
include $(top_srcdir)/config/CppCheck.am
+5 -3
View File
@@ -103,8 +103,8 @@ fatal(spa_t *spa, void *tag, const char *fmt, ...)
/* ARGSUSED */
static int
space_delta_cb(dmu_object_type_t bonustype, void *data,
uint64_t *userp, uint64_t *groupp, uint64_t *projectp)
space_delta_cb(dmu_object_type_t bonustype, const void *data,
zfs_file_info_t *zoi)
{
/*
* Is it a valid type of object to track?
@@ -126,7 +126,8 @@ zhack_import(char *target, boolean_t readonly)
nvlist_t *props;
int error;
kernel_init(readonly ? FREAD : (FREAD | FWRITE));
kernel_init(readonly ? SPA_MODE_READ :
(SPA_MODE_READ | SPA_MODE_WRITE));
dmu_objset_register_type(DMU_OST_ZFS, space_delta_cb);
@@ -149,6 +150,7 @@ zhack_import(char *target, boolean_t readonly)
zfeature_checks_disable = B_TRUE;
error = spa_import(target, config, props,
(readonly ? ZFS_IMPORT_SKIP_MMP : ZFS_IMPORT_NORMAL));
fnvlist_free(config);
zfeature_checks_disable = B_FALSE;
if (error == EEXIST)
error = 0;
+5 -6
View File
@@ -1,9 +1,5 @@
include $(top_srcdir)/config/Rules.am
DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \
-I$(top_srcdir)/lib/libspl/include
sbin_PROGRAMS = zinject
zinject_SOURCES = \
@@ -12,5 +8,8 @@ zinject_SOURCES = \
zinject.h
zinject_LDADD = \
$(top_builddir)/lib/libnvpair/libnvpair.la \
$(top_builddir)/lib/libzfs/libzfs.la
$(abs_top_builddir)/lib/libzfs/libzfs.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la \
$(abs_top_builddir)/lib/libnvpair/libnvpair.la
include $(top_srcdir)/config/CppCheck.am

Some files were not shown because too many files have changed in this diff Show More