Compare commits

..

520 Commits

Author SHA1 Message Date
Tony Hutter ad81baab77 Tag zfs-2.0.7
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2021-12-22 09:47:58 -08:00
Brian Behlendorf a4831fa023 Fix zvol_open() lock inversion
When restructuring the zvol_open() logic for the Linux 5.13 kernel
a lock inversion was accidentally introduced.  In the updated code
the spa_namespace_lock is now taken before the zv_suspend_lock
allowing the following scenario to occur:

    down_read <=== waiting for zv_suspend_lock
    zvol_open <=== holds spa_namespace_lock
    __blkdev_get
    blkdev_get_by_dev
    blkdev_open
    ...

     mutex_lock <== waiting for spa_namespace_lock
     spa_open_common
     spa_open
     dsl_pool_hold
     dmu_objset_hold_flags
     dmu_objset_hold
     dsl_prop_get
     dsl_prop_get_integer
     zvol_create_minor
     dmu_recv_end
     zfs_ioc_recv_impl <=== holds zv_suspend_lock via zvol_suspend()
     zfs_ioc_recv
     ...

This commit resolves the issue by moving the acquisition of the
spa_namespace_lock back to after the zv_suspend_lock which restores
the original ordering.

Additionally, as part of this change the error exit paths were
simplified where possible.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12863
2021-12-22 09:31:58 -08:00
Ryan Moeller b327131a1f FreeBSD: Add vop_standard_writecount_nomsync
https://cgit.freebsd.org/src/commit?id=3ffcfa599e29686cf2b3c1a6087408c37acaed78

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12828
2021-12-13 12:26:20 -08:00
Ryan Moeller 2ed7c54654 FreeBSD: Catch up with more VFS changes
Unused thread argument was removed from NDINIT*

https://cgit.freebsd.org/src/commit?id=7e1d3eefd410ca0fbae5a217422821244c3eeee4

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12828
2021-12-13 12:25:53 -08:00
Pawel Jakub Dawidek 70fdb198ad Remove (now unused) td argument from zfs_lookup()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12748
2021-12-13 12:25:36 -08:00
Ryan Moeller 7100db9793 FreeBSD: Implement xattr=sa
FreeBSD historically has not cared about the xattr property; it was
always treated as xattr=on.  With xattr=on, xattrs are stored as files
in a hidden xattr directory.  With xattr=sa, xattrs are stored as
system attributes and get cached in nvlists during xattr operations.
This makes SA xattrs simpler and more efficient to manipulate.  FreeBSD
needs to implement the SA xattr operations for feature parity with
Linux and to ensure that SA xattrs are accessible when migrated or
replicated from Linux.

Following the example set by Linux, refactor our existing extattr vnops
to split off the parts handling dir style xattrs, and add the
corresponding SA handling parts.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11997
2021-12-13 12:22:11 -08:00
Mark Johnston 57f43735ca Fix several bugs in the FreeBSD rename VOP implementation
- To avoid a use-after-free, zfsvfs->z_log needs to be loaded after the
  teardown lock is acquired with ZFS_ENTER().
- Avoid leaking vnode locks in zfs_rename_relock() and zfs_rename_()
  when the ZFS_ENTER() macros forces an early return.

Refactor the rename implementation so that ZFS_ENTER() can be used
safely.  As a bonus, this lets us use the ZFS_VERIFY_ZP() macro instead
of open-coding its implementation.

Reported-by: Peter Holm <pho@FreeBSD.org>
Tested-by: Peter Holm <pho@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #12717
2021-12-13 11:56:03 -08:00
Mark Johnston 47f098d2db Exit the teardown section later in rename on FreeBSD
We have to hold the teardown lock while dereferencing zfsvfs->z_os and,
I believe, when committing to the ZIL.

Note that jumping to the "out" label, "error" is always non-zero.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12704
2021-12-13 10:43:22 -08:00
Mark Johnston d94d1a589c Fix potential use-after-frees in FreeBSD getpages and setattr VOPs
The objset object is reallocated during certain dataset operations, such
as rollbacks, so the objset pointer must be loaded after acquiring the
teardown lock.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12704
2021-12-13 10:43:15 -08:00
Brian Behlendorf 4bbffa2489 ZTS: import_rewind_device_replaced reliably fails
The import_rewind_device_replaced.ksh test was never entirely reliable
because it depends on MOS data not being overwritten.  The MOS data is
not protected by the snapshot so occasional failures were always
expected.  However, this test is now failing reliably on all platforms
indicating something has changed in the code since the test was marked
"maybe".  Convert the test to a "known" failure until the root cause
is identified and resolved.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12821
2021-12-08 15:01:44 -08:00
Brian Behlendorf d1bf6c7251 Linux 5.15 compat: META (#12824)
The final 5.15 kernel is available and has been tested.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2021-12-07 17:03:37 -08:00
Paul Dagnelie 1fbfc022c9 ZFS send/recv with ashift 9->12 leads to data corruption
Improve the ability of zfs send to determine if a block is compressed
or not by using information contained in the blkptr.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12770
2021-12-07 17:03:17 -08:00
Coleman Kane 2a21853b75 Linux 5.16: Resolve ZSTD_isError symbol collision in Linux kernel
Newer zstd code introduced in the main kernel tree now creates a symbol
collision with ZSTD_isError in our ZSTD code. This change relabels our
implementation with a ZFS-specific symbol name, and undoes some
macro-based micro-optimizations that conflict with the attempt to rename
our internal-use version.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 13:15:12 -08:00
Coleman Kane 6a3ba7538a Linux 5.16: The blk-cgroup.h header is where struct blkcg_gq is defined
The definition of struct blkcg_gq was moved into blk-cgroup.h, which is
a header that's been in Linux since 2015. This is used by
vdev_blkg_tryget() in module/os/linux/zfs/vdev_disk.c. Since the kernel
for CentOS 7 and similar-generation releases doesn't have this header,
its inclusion is guarded by a configure test.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 13:15:12 -08:00
Coleman Kane 94e49866e9 Linux 5.16: bio_set_dev is no longer a helper macro
This change adds a confiugre check to determine if bio_set_dev is a
helper macro or not. If not, then the attempt to override its internal
call to bio_associate_blkg(), with a macro definition to our own
version, is no longer possible, as the compiler won't use it when
compiling the new inline function replacement implemented in the header.
This change also creates a new vdev_bio_set_dev() function that performs
the same work, and also performs the work implemented in
vdev_bio_associate_blkg(), as it is the only thing calling that function
in our code. Our custom vdev_bio_associate_blkg() is now only compiled
if the bio_set_dev() is a macro in the Linux headers.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 13:15:12 -08:00
Coleman Kane eb9d35e641 Linux 5.16: type member of iov_iter renamed iter_type
The iov_iter->type member was renamed iov_iter->iter_type. However,
while looking into this, realized that in 2018 a iov_iter_type(*iov)
accessor function was introduced. So if that is present, use it,
otherwise fall back to trying the existing behavior of directly
accessing type from iov_iter.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 13:15:11 -08:00
Coleman Kane 649b3ce987 Linux 5.16: block_device_operations->submit_bio now returns void
The return type for the submit_bio member of struct
block_device_operations was changed to no longer return a value.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 13:15:11 -08:00
Coleman Kane 541fe3d598 Linux 5.16 compat: asm/fpu/xcr.h is new location for xgetbv/xsetbv
Linux 5.16 moved these functions into this new header in commit
1b4fb8545f2b00f2844c4b7619d64d98440a477c. This change adds code to look
for the presence of this header, and include it so that the code using
xgetbv & xsetbv will compile again.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12800
2021-12-07 13:15:11 -08:00
Alexander Motin c84f9be5c7 FreeBSD: avoid memory allocation in arc_prune_async
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Closes #12049
2021-12-07 11:18:28 -08:00
наб 5ea770b0a2 tests/file_check: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-12-06 13:52:41 -08:00
John Wren Kennedy 3404e887b1 Strip colons from all test result filenames
The upload artifact functionality in github can't handle colons in
filenames. The current code handles this for files under the most
recent set of results. With the ability to rerun failed tests, now
there can be multiple sets of results, and they all need to be
processed in the same way.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12815
2021-12-06 13:41:04 -08:00
Brian Behlendorf 2915930f89 Linux 5.13 compat: retry zvol_open() when contended
Due to a possible lock inversion the zvol open call path on Linux
needs to be able to retry in the case where the spa_namespace_lock
cannot be acquired.

For Linux 5.12 an older kernel this was accomplished by returning
-ERESTARTSYS from zvol_open() to request that blkdev_get() drop
the bdev->bd_mutex lock, reaquire it, then call the open callback
again.  However, as of the 5.13 kernel this behavior was removed.

Therefore, for 5.12 and older kernels we preserved the existing
retry logic, but for 5.13 and newer kernels we retry internally in
zvol_open().  This should always succeed except in the case where
a pool's vdev are layed on zvols, in which case it may fail.  To
handle this case vdev_disk_open() has been updated to retry when
opening a device when -ERESTARTSYS is returned.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12301
Closes #12759
2021-12-06 13:40:59 -08:00
John Wren Kennedy 3c97d7a84b Temporarily remove tests from sanity runfile
With the addition of functionality to rerun failing tests, some
tests that fail only sometimes still fail often enough to degrade
the reliability of the sanity runs. Remove them from the runfile
until they reliably pass.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12814
2021-12-06 13:40:54 -08:00
Paul Dagnelie 5a80c82609 Add zfs-test facility to automatically rerun failing tests
This was a project proposed as part of the Quality theme for the
hackthon for the 2021 OpenZFS Developer Summit. The idea is to improve
the usability of the automated tests that get run when a PR is created
by having failing tests automatically rerun in order to make flaky
tests less impactful.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12740
2021-12-06 13:40:48 -08:00
Coleman Kane 679be593dd Linux 5.16: wait_on_page_bit() no longer available to modules
Instead, linux/pagemap.h offers a number of folio-specific functions to
be called instead. In this case, module/os/linux/zfs/zfs_vnops_os.c
wants to call wait_on_page_bit(pp, PG_writeback). This gets replaced
with folio_wait_bit(folio_page(pp), PG_writeback). This change modifies
the code to conditionally compile that if configure identifies th
presence of the folio_wait_bit() function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12800
2021-12-06 13:40:43 -08:00
Jorgen Lundman f43b315e17 Iterate encrypted clones at zvol_create_minor
Userland figures out which encryption-root keys are required to load,
and issues ZFS_IOC_LOAD_KEY.
The tail section of spa_keystore_load_wkey() will call
zvol_create_minors() on the encryption-root object.

Any clones of the encrypted zvol will not be plumbed. This commits
adds additional logic to detect if zvol has clones, and is encrypted,
then adds these to the list of zvols to call zvol_create_minors() on.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12471
2021-12-06 13:40:37 -08:00
Tony Hutter 922fe416b2 Update ABIs for zfs-2.0.7
In the course of cherry-picking commits, the ABI files from master
and the code in zfs-2.0.7 can get out of sync, and the ABIs need
to be regenerated.  This patch just runs 'make storeabi' on the
previous commit and checks in the new ABI files.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2021-11-12 16:38:16 -08:00
наб e2dcc523a4 libefi: remove efi_auto_sense()
It's present (but undocumented) in the illumos gate and used exclusively
by rmformat(1) (which I recommend as a nice blast from the past),
and also the math assumes 512B sectors and is therefore wrong

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12191
2021-11-12 16:31:55 -08:00
наб dec1ea4dbc libefi: efi_get_devname: don't allocate procfs path
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12048
2021-11-12 16:31:55 -08:00
Brian Behlendorf 1a4030b1b5 cppcheck: resolve double free
The double free reported for the realloc() failure branch is a
false positive.  It should be resolved in cppcheck 2.4 but for
the benefit of older versions we supress the warning.

    https://trac.cppcheck.net/ticket/9292

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-11-12 16:31:55 -08:00
Brian Behlendorf 33fa3985a9 Restore dirty dnode detection logic
In addition to flushing memory mapped regions when checking holes,
commit de198f2d95 modified the dirty dnode detection logic to check
the dn->dn_dirty_records instead of the dn->dn_dirty_link.  Relying
on the dirty record has not be reliable, switch back to the previous
method.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900 
Closes #12745
2021-11-12 16:31:55 -08:00
Brian Behlendorf a524f8d6af Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
When using lseek(2) to report data/holes memory mapped regions of
the file were ignored.  This could result in incorrect results.
To handle this zfs_holey_common() was updated to asynchronously
writeback any dirty mmap(2) regions prior to reporting holes.

Additionally, while not strictly required, the dn_struct_rwlock is
now held over the dirty check to prevent the dnode structure from
changing.  This ensures that a clean dnode can't be dirtied before
the data/hole is located.  The range lock is now also taken to
ensure the call cannot race with zfs_write().

Furthermore, the code was refactored to provide a dnode_is_dirty()
helper function which checks the dnode for any dirty records to
determine its dirtiness.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900
Closes #12724
2021-11-12 16:31:55 -08:00
Tony Hutter 4ba1a6227a zed: Control NVMe fault LEDs
The ZED code currently can only turn on the fault LED for
a faulted disk in a JBOD enclosure.  This extends support
for faulted NVMe disks as well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12648
Closes #12695
2021-11-12 16:31:55 -08:00
Brian Behlendorf 6de5c440fa Linux 5.16 compat: submit_bio()
The submit_bio() prototype has changed again.  The version is 5.16
still only expects a single argument but the return type has changed
to void.  Since we never used the returned value before update the
configure check to detect both single arg versions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12725
2021-11-12 16:31:55 -08:00
Brian Behlendorf b8b3b93ebb Linux 5.16 compat: linux/elevator.h
Commit https://github.com/torvalds/linux/commit/2e9bc346 moved
the elevator.h header under the block/ directory as part of some
refactoring.  This turns out not to be a problem since there's
no longer anything we need from the header.  This has been the
case for some time, this change removes the elevator.h include
and replaces it with a major.h include.

Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12725
2021-11-12 16:31:55 -08:00
наб 586358102e zed.d/pool_import-led.sh: fix for current zpool scripts
Also minor clean-up with folding state_to_val() into a case,
unrolling the lesser-available seq into numbers,
ignoring vdev states we don't care about,
and documentation comments

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11934
Closes #11935
2021-11-12 16:31:55 -08:00
Rich Ercolani 110d0ba5ca Revert behavior of 59eab109 on not-Linux
It turns out that short-circuiting the EFAULT behavior on a short read
breaks things on FreeBSD. So until there's a nicer solution, let's
just revert the behavior for not-Linux.

Reference:
https://reviews.freebsd.org/R10:70f51f0e474ffe1fb74cb427423a2fba3637544d

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12698
2021-11-12 16:31:55 -08:00
Rich Ercolani 2ef1ce66f5 Handle partial reads in zfs_read
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12370 
Closes #12509
Closes #12516
2021-11-12 16:31:55 -08:00
Brian Atkinson d10c35b640 Cleaning up uio headers
Making uio_impl.h the common header interface between Linux and FreeBSD
so both OS's can share a common header file. This also helps reduce code
duplication for zfs_uio_t for each OS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11622
2021-11-12 16:31:55 -08:00
Brian Atkinson 0b1e6fcc3e Extending FreeBSD UIO Struct
In FreeBSD the struct uio was just a typedef to uio_t. In order to
extend this struct, outside of the definition for the struct uio, the
struct uio has been embedded inside of a uio_t struct.

Also renamed all the uio_* interfaces to be zfs_uio_* to make it clear
this is a ZFS interface.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11438
2021-11-12 16:31:55 -08:00
Ryan Moeller cd4f9572d0 FreeBSD: Move uio_prefaultpages def to uio.h
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2021-11-12 16:31:55 -08:00
Matthew Macy 6f59f6402d Remove UIO_ZEROCOPY functions structures
The original xuio zero copy functionality has always been unused 
on Linux and FreeBSD.  Remove this disabled code to avoid any
confusion and improve readability.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11124
2021-11-12 16:31:55 -08:00
Ryan Moeller 3fcf17e69d FreeBSD: Catch up with recent VFS changes
cn_thread is always curthread.

https://cgit.freebsd.org/src/commit?id=b4a58fbf640409a1e507d9f7b411c83a3f83a2f3
https://cgit.freebsd.org/src/commit?id=2b68eb8e1dbbdaf6a0df1c83b26f5403ca52d4c3

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12668
2021-11-12 16:31:55 -08:00
Tony Hutter 92fcbe04ba vdev_id: Fix PHY sorting
One of our developers noticed a bug in vdev_id where we were incorrectly
sorting PHYs using alphabetical sorting (which usually works) instead
of natural sorting (-v).  For example:

	[port-0:0]# ls -d phy*
	phy-0:10  phy-0:11  phy-0:8  phy-0:9

	[port-0:0]# ls -vd phy*
	phy-0:8  phy-0:9  phy-0:10  phy-0:11

This fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12699
2021-11-12 16:31:55 -08:00
Tony Hutter dfbc33a0e5 vdev_id: Fix enclosure_symlinks feature
The vdev_id.conf "enclosure_symlinks" option persistently creates
and maps /dev/by-enclosure symlinks to dynamic /dev/sg* devices.

This patch fixes two issues:

1. The enclosure_symlinks feature was accidentally broken in:

   vdev_id: Support daisy-chained JBODs in multipath mode

2. Even when working, the feature numbered the enclosure
   sequentially rather than by HBA port number.  That meant that
   if a port was down or didn't appear in sysfs, then the
   enclosure_sumlinks numbers would be numbered wrong.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12660
2021-11-12 16:31:55 -08:00
Tony Hutter 91fe213823 Rescan enclosure sysfs path on import
When you create a pool, zfs writes vd->vdev_enc_sysfs_path with the
enclosure sysfs path to the fault LEDs, like:

    vdev_enc_sysfs_path = /sys/class/enclosure/0:0:1:0/SLOT8

However, this enclosure path doesn't get updated on successive imports
even if enclosure path to the disk changes.  This patch fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11950
Closes #12095
2021-11-12 16:31:55 -08:00
Anton Gubarkov 728cf93b0c vdev_id: Return an error if config file is not found (#12508)
Signed-off-by: Anton Gubarkov <anton.gubarkov@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-11-12 16:31:55 -08:00
наб 43706c7dee vdev_id.conf.5: modernise
Also yeet pci_slot since it doesn't seem to exist?

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-11-12 16:31:55 -08:00
наб 5cc5996f4e vdev_id.8: modernise, note scsi topology
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-11-12 16:31:55 -08:00
наб cc6ea631ce zfs_get_enclosure_sysfs_path(): don't free undefined pointer
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-11-12 16:31:55 -08:00
наб b847e538ef zfs_get_enclosure_sysfs_path(): don't leak dev path
Also always free tmp2 at the end

Before:
nabijaczleweli@tarta:~/uwu$ valgrind --leak-check=full ./blergh
==8947== Memcheck, a memory error detector
==8947== Using Valgrind-3.14.0 and LibVEX
==8947== Command: ./blergh
==8947==
(null)
==8947==
==8947== HEAP SUMMARY:
==8947==     in use at exit: 23 bytes in 1 blocks
==8947==   total heap usage: 3 allocs, 2 frees, 1,147 bytes allocated
==8947==
==8947== 23 bytes in 1 blocks are definitely lost in loss record 1 of 1
==8947==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==8947==    by 0x48D74B7: vasprintf (vasprintf.c:73)
==8947==    by 0x48B7833: asprintf (asprintf.c:35)
==8947==    by 0x401258: zfs_get_enclosure_sysfs_path
                         (zutil_device_path_os.c:191)
==8947==    by 0x401482: main (blergh.c:107)
==8947==
==8947== LEAK SUMMARY:
==8947==    definitely lost: 23 bytes in 1 blocks
==8947==    indirectly lost: 0 bytes in 0 blocks
==8947==      possibly lost: 0 bytes in 0 blocks
==8947==    still reachable: 0 bytes in 0 blocks
==8947==         suppressed: 0 bytes in 0 blocks
==8947==
==8947== For counts of detected and suppressed errors, rerun with: -v
==8947== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

nabijaczleweli@tarta:~/uwu$ sed -n 191p zutil_device_path_os.c
        tmpsize = asprintf(&tmp1, "/sys/block/%s/device", dev_name);

After:
nabijaczleweli@tarta:~/uwu$ valgrind --leak-check=full ./blergh
==9512== Memcheck, a memory error detector
==9512== Using Valgrind-3.14.0 and LibVEX
==9512== Command: ./blergh
==9512==
(null)
==9512==
==9512== HEAP SUMMARY:
==9512==     in use at exit: 0 bytes in 0 blocks
==9512==   total heap usage: 3 allocs, 3 frees, 1,147 bytes allocated
==9512==
==9512== All heap blocks were freed -- no leaks are possible
==9512==
==9512== For counts of detected and suppressed errors, rerun with: -v
==9512== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-11-12 16:31:55 -08:00
Arshad Hussain 238504fece vdev_id: variable not getting expanded under map_slot()
Under function map_slot() variable passed as args
were not getting properly substituted or expanded.
This patch fixes the substitution issue.

Reviewed-by: Niklas Edmundsson <nikke@acc.umu.se>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #11951
Closes #11959
2021-11-12 16:31:55 -08:00
Tony Hutter b0b796a94d vdev_id: Create symlinks even if no /dev/mapper/
vdev_id uses the /dev/mapper/ symlinks to resolve a UUID to a dm name
(like dm-1).  However on some multipath setups, there is no /dev/mapper/
entry for the UUID at the time vdev_id is called by udev.  However,
this isn't necessarily needed, as we may be able to resolve the dm
name from the $DEVNAME that udev passes us (like DEVNAME="/dev/dm-1").

This patch tries to resolve the dm name from $DEVNAME first, before
falling back to looking in /dev/mapper/.  This fixed an issue where the
by-vdev names weren't reliably showing up on one of our nodes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11698
2021-11-12 16:31:55 -08:00
Tony Hutter 9d14963de6 vdev_id: Fix partition regular expression
Given a DM device name, the old vdev_id script would extract any text
after a 'p' as the partition number.  It then appends "-part" + the
partition number to the name, giving a by-vdev name like "L0-part5".

This works fine if the DM name is like 'dm-2p5', but doesn't work if
the DM name is a multipath name like "mpatha".  In those cases it
incorrectly matches the 'p' in "mpatha", giving by-vdev names like
"L0-partatha".

This patch fixes the issue by making the partition regex match stricter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11637
2021-11-12 16:31:55 -08:00
Tony Hutter 88a3751e67 Better zfs_get_enclosure_sysfs_path() enclosure support
A multpathed disk will have several 'underlying' paths to the disk.  For
example, multipath disk 'dm-0' may be made up of paths:
/dev/{sda,sdb,sdc,sdd}.  On many enclosures those underlying sysfs
paths will have a symlink back to their enclosure device entry
(like 'enclosure_device0/slot1').  This is used by the
statechange-led.sh script to set/clear the fault LED for a disk, and
by 'zpool status -c'.

However, on some enclosures, those underlying paths may not all have
symlinks back to the enclosure device.  Maybe only two out of four
of them might.

This patch updates zfs_get_enclosure_sysfs_path() to favor returning
paths that have symlinks back to their enclosure devices, rather
than just returning the first path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11617
2021-11-12 16:31:55 -08:00
Arshad Hussain bdd43a2396 vdev_id: Support daisy-chained JBODs in multipath mode
Within function sas_handler() userspace commands like
'/usr/sbin/multipath' have been replaced with sourcing
device details from within sysfs which reduced a
significant amount of overhead and processing time.
Multiple JBOD enclosures and their order are sourced
from the bsg driver (/sys/class/enclosure) to isolate
chassis top-level expanders, which are then dynamically
indexed based on host channel of the multipath subordinate
disk member device being processed. Additionally added a
"mixed" mode for slot identification for environments where
a ZFS server system may contain SAS disk slots where there
is no expander (direct connect to HBA) while an attached
external JBOD with an expander have different slot identifier
methods.

How Has This Been Tested?
~~~~~~~~~~~~~~~~~~~~~~~~~

Testing was performed on a AMD EPYC based dual-server
high-availability multipath environment with multiple
HBAs per ZFS server and four SAS JBODs. The two primary
JBODs were multipath/cross-connected between the two
ZFS-HA servers. The secondary JBODs were daisy-chained
off of the primary JBODs using aligned SAS expander
channels (JBOD-0 expanderA--->JBOD-1 expanderA,
          JBOD-0 expanderB--->JBOD-1 expanderB, etc).
Pools were created, exported and re-imported, imported
globally with 'zpool import -a -d /dev/disk/by-vdev'.
Low level udev debug outputs were traced to isolate
and resolve errors.

Result:
~~~~~~~

Initial testing of a previous version of this change
showed how reliance on userspace utilities like
'/usr/sbin/multipath' and '/usr/bin/lsscsi' were
exacerbated by increasing numbers of disks and JBODs.
With four 60-disk SAS JBODs and 240 disks the time to
process a udevadm trigger was 3 minutes 30 seconds
during which nearly all CPU cores were above 80%
utilization. By switching reliance on userspace
utilities to sysfs in this version, the udevadm
trigger processing time was reduced to 12.2 seconds
and negligible CPU load.

This patch also fixes few shellcheck complains.

Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Jeff Johnson <jeff.johnson@aeoncomputing.com>
Signed-off-by: Jeff Johnson <jeff.johnson@aeoncomputing.com>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #11526
2021-11-12 16:31:55 -08:00
Rich Ercolani 47ada9e5ed Added error for writing to /dev/ on Linux
Starting in Linux 5.10, trying to write to /dev/{null,zero} errors out.
Prefer to inform people when this happens rather than hoping they guess
what's wrong.

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes:  #11991
2021-11-12 16:31:55 -08:00
Brian Behlendorf c4a5e56abc ZTS: Add known exceptions
The receive-o-x_props_override test case reliably fails on the
FreeBSD main builders (but not on Linux), until the root cause is
understood add this test to the FreeBSD exception list.

On Linux the alloc_class_012_pos test case may occasionally fail.
This is a known false positive which has also been added to the
Linux exception list until the test can be made entirely reliable.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12272
2021-11-12 16:31:55 -08:00
Brian Behlendorf ec1b033413 ZTS: Standardize use of destroy_dataset in cleanup
When cleaning up a test case standardize on using the convention:

    datasetexists $ds && destroy_dataset $ds <flags>

By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure
that the destroy is retried in the event that a ZFS volume is busy.
This helps ensures ensure tests are fully cleaned up and prevents false
positive test failures on Linux.

Note that all of the tests which used 'zfs destroy' in cleanup have
been updated even if they don't use volumes.  This was done to
clearly establish the expected convention.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12663
2021-11-12 16:31:55 -08:00
Damian Szuberski 8464de1315 Update checkstyle workflow env to ubuntu-20.04
- `checkstyle` workflow uses ubuntu-20.04 environment
- improved `mancheck.sh` readability

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12713
2021-11-12 16:31:55 -08:00
Rich Ercolani cc40a67cf8 Workaround cloud-init hotplug issue
cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.

So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12644
Closes #12669
2021-11-12 16:31:54 -08:00
George Melikov c2a69a21ef CI: don't install abigail-tools
We use docker image instead.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-11-12 16:31:54 -08:00
George Melikov 866ac70904 CI: use fresh libabigail via docker image
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-11-12 16:31:48 -08:00
Jonathon f3c85e3ebd Update libera webchat client URL
Libera have made a webchat client available. This change builds on #12127.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jonathon Fernyhough <jonathon@m2x.dev>
Closes #12251
2021-11-12 15:40:55 -08:00
Paul Dagnelie 14770e1030 Don't direct to freenode in issue template
While Libera doesn't yet have a webchat client, we should at least
direct them to the right network. Once a webchat client is available,
we can direct them to it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12127
2021-11-12 15:40:35 -08:00
Attila Fülöp fb9eee4cc2 gcc 11 cleanup
Compiling with gcc 11.1.0 produces three new warnings.
Change the code slightly to avoid them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12130
Closes #12188
Closes #12237
2021-11-12 15:24:36 -08:00
Brian Behlendorf 820c95750b Use fallthrough macro
As of the Linux 5.9 kernel a fallthrough macro has been added which
should be used to anotate all intentional fallthrough paths.  Once
all of the kernel code paths have been updated to use fallthrough
the -Wimplicit-fallthrough option will because the default.  To
avoid warnings in the OpenZFS code base when this happens apply
the fallthrough macro.

Additional reading: https://lwn.net/Articles/794944/

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12441
2021-11-12 15:24:36 -08:00
Rich Ercolani 66c9e15686 Correct a flaw in the Python 3 version checking
It turns out the ax_python_devel.m4 version check assumes that
("3.X+1.0" >= "3.X.0") is True in Python, which is not when X+1
is 10 or above and X is not. (Also presumably X+1=100 and ...)

So let's remake the check to behave consistently, using the
"packaging" or (if absent) the "distlib" modules.

(Also, update the Github workflows to use the new packages.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12073
2021-11-12 15:24:36 -08:00
Rich Ercolani c64c17328b Let zfs diff be more permissive
In the current world, `zfs diff` will die on certain kinds of errors
that come up on ordinary, not-mangled filesystems - like EINVAL,
which can come from a file with multiple hardlinks having the one
whose name is referenced deleted.

Since it should always be safe to continue, let's relax about all
error codes - still print something for most, but don't immediately
abort when we encounter them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12072
2021-11-12 15:24:36 -08:00
Rich Ercolani 439b4b134d Added test for being able to read various variants of zstd
As detailed in #12022 and #12008, it turns out the current zstd
implementation is quite nonportable, and results in various
configurations of ondisk header that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64 and one written by FBSD/ppc64.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12030
2021-11-12 15:24:36 -08:00
наб 5957574694 zed: only go up to current limit in close_from() fallback
Consider the following strace log:
  prlimit64(0, RLIMIT_NOFILE,
            NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = -1 EBADF (Bad file descriptor)
  dup2(0, 30000)                      = -1 EBADF (Bad file descriptor)
  dup2(0, 300000)                     = -1 EBADF (Bad file descriptor)
  prlimit64(0, RLIMIT_NOFILE,
            {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = 3000
  dup2(0, 30000)                      = 30000
  dup2(0, 300000)                     = 300000

Even a privileged process needs to bump its rlimit before being able
to use fds higher than rlim_cur.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-11-12 15:24:36 -08:00
наб 3e04897edc zed: implement close_from() in terms of /proc/self/fd, if available
/dev/fd on Darwin

Consider the following strace output:
  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0

Yes, that is well over a million file descriptors!

This reduces the ZED start-up time from "at least a second" to
"instantaneous", and, under strace, from "don't even try" to "usable"
by simple virtue of doing five syscalls instead of over a million;
in most cases the main loop does nothing

Recent Linuxes (5.8+) have close_range(2) for this, but that's an
overoptimisation (and libcs don't have wrappers for it yet)

This is also run by the ZEDLET pre-exec. Compare:
  Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0
to
  Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0
lol

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-11-12 15:24:36 -08:00
Rich Ercolani fb823061b0 Fix cross-endian interoperability of zstd
It turns out that layouts of union bitfields are a pain, and the
current code results in an inconsistent layout between BE and LE
systems, leading to zstd-active datasets on one erroring out on
the other.

Switch everyone over to the LE layout, and add compatibility code
to read both.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12008
Closes #12022
2021-11-12 15:24:36 -08:00
George Melikov 4e8a639d5f CI: generate ABI files if changed
So commit author can just download them as
artifacts and commit.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12379
2021-11-12 15:22:08 -08:00
Brian Behlendorf 8009f02748 Update bug report template
- Remove the "SPL Version" line, the repositories have been merged
  since the 0.8 release and we no longer need to ask about this.

- Simply ask for the kernel version / patch level and add a hint
  about how to get this information on Linux and FreeBSD.

- Remove "Status: Triage Needed" from the template, in practice
  we really haven't been using this label so let's step setting it.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #12340
2021-11-12 15:22:00 -08:00
Jonathon 4a5316eef4 Update libera webchat client URL
Libera have made a webchat client available. This change builds on #12127.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jonathon Fernyhough <jonathon@m2x.dev>
Closes #12251
2021-11-12 15:21:48 -08:00
Paul Dagnelie e00f3da136 Don't direct to freenode in issue template
While Libera doesn't yet have a webchat client, we should at least
direct them to the right network. Once a webchat client is available,
we can direct them to it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12127
2021-11-12 15:21:31 -08:00
Tony Hutter ef686e96ec Tag zfs-2.0.6
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2021-09-22 15:19:08 -07:00
Brian Behlendorf 7a41ef240a Linux 5.15 compat: get_acl()
Kernel commits

332f606b32b6 ovl: enable RCU'd ->get_acl()
0cad6246621b vfs: add rcu argument to ->get_acl() callback

Added compatibility code to detect the new ->get_acl() interface
and correctly handle the case where the new rcu argument is set.

Reviewed-by: Coleman Kane <ckane@colemankane.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12548
2021-09-22 15:19:08 -07:00
Alexander 72d16a9b49 Linux 5.15 compat: standalone <linux/stdarg.h>
Kernel commits

39f75da7bcc8 ("isystem: trim/fixup stdarg.h and other headers")
c0891ac15f04 ("isystem: ship and use stdarg.h")
564f963eabd1 ("isystem: delete global -isystem compile option")

(for now can be found in linux-next.git tree, will land into the
 Linus' tree during the ongoing 5.15 cycle with one of akpm merges)

removed the -isystem flag and disallowed the inclusion of any
compiler header files. They also introduced a minimal
<linux/stdarg.h> as a replacement for <stdarg.h>.
include/os/linux/spl/sys/cmn_err.h in the ZFS source tree includes
<stdarg.h> unconditionally. Introduce a test for <linux/stdarg.h>
and include it instead of the compiler's one to prevent module
build breakage.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #12531
2021-09-22 15:19:08 -07:00
Brian Behlendorf 54c358c3f2 Linux 5.15 compat: block device readahead
The 5.15 kernel moved the backing_dev_info structure out of
the request queue structure which causes a build failure.

Rather than look in the new location for the BDI we instead
detect this upstream refactoring by the existance of either
the blk_queue_update_readahead() or disk_update_readahead()
functions.  In either case, there's no longer any reason to
manually set the ra_pages value since it will be overridden
with a reasonable default (2x the block size) when
blk_queue_io_opt() is called.

Therefore, we update the compatibility wrapper to do nothing
for 5.9 and newer kernels.  While it's tempting to do the
same for older kernels we want to keep the compatibility
code to preserve the existing behavior.  Removing it would
effectively increase the default readahead to 128k.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12532
2021-09-22 15:19:08 -07:00
Brian Behlendorf 560e9fc817 Linux 5.14 compat: META
Increase the Linux-Maximum version in the META file to 5.14.
All of the required compatibility patches have been merged
and the 5.14 kernel has been officially released.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12565
2021-09-22 15:19:08 -07:00
Brian Behlendorf d642efe83c Linux 5.13 compat: META
Increase the Linux-Maximum version in the META file to 5.13.
All of the required compatibility patches have been merged
and the 5.13 kernel has been officially released.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-09-22 15:19:08 -07:00
Alexander Motin 51e313f610 FreeBSD: Ignore make_dev_s() errors
Since errors returned by zvol_create_minor_impl() are ignored by the
common code, it is more convenient to ignore make_dev_s() errors there.
It allows, for example, to get device created for the zvol after later
rename instead of having it further stuck in half-created state.
zvol_rename_minor() already ignores those errors.

While there, switch from MAXPHYS to maxphys in FreeBSD 13+.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12375
2021-09-22 15:19:08 -07:00
Alexander Motin 327f12c291 FreeBSD: Switch from MAXPHYS to maxphys on FreeBSD 13+
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12378
2021-09-22 15:19:08 -07:00
Alexander Motin 5d4f7e7566 FreeBSD: Retry OCF ENOMEM errors.
ZFS does not expect transient errors from crypto.  For read they are
counted as checksum errors, while for write end up in panic.  To not
panic on random low memory conditions retry ENOMEM errors in the OCF
wrapper function.

While there remove unneeded timeout and priority from msleep().

External-issue: https://reviews.freebsd.org/D30339
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12077
2021-09-22 15:19:08 -07:00
Serapheim Dimitropoulos abd0b59e48 Livelist logic should handle dedup blkptrs
Update the logic to handle the dedup-case of consecutive
FREEs in the livelist code. The logic still ensures that
all the FREE entries are matched up with a respective
ALLOC by keeping a refcount for each FREE blkptr that we
encounter and ensuring that this refcount gets to zero
by the time we are done processing the livelist.

zdb -y no longer panics when encountering double frees

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #11480
Closes #12177
2021-09-22 15:19:08 -07:00
Coleman Kane 6b9d0eda75 Linux 5.14 compat: explicity assign set_page_dirty
Kernel 5.14 introduced a change where set_page_dirty of
struct address_space_operations is no longer implicitly set to
__set_page_dirty_buffers(), which ended up resulting in a NULL
pointer deref in the kernel when it is attempted to be called.
This change sets .set_page_dirty in the structure to
__set_page_dirty_nobuffers(), which was introduced with the
related patch set. The breaking change was introduce in commit
0af573780b0b13fceb7fabd49dc1b073cee9a507 to torvalds/linux.git.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12427
2021-09-22 15:19:08 -07:00
Paul Dagnelie 744cdfd93b Add SIGSTOP and SIGTSTP handling to issig
This change adds SIGSTOP and SIGTSTP handling to the issig function;
this mirrors its behavior on Solaris. This way, long running kernel
tasks can be stopped with the appropriate signals. Note that doing
so with ctrl-z on the command line doesn't return control of the tty
to the shell, because tty handling is done separately from stopping
the process. That can be future work, if people feel that it is a
necessary addition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Issue #810
Issue #10843
Closes #11801
2021-09-22 15:19:08 -07:00
Brian Behlendorf 36d50b60d8 Linux 5.14 compat: blk_alloc_disk()
In Linux 5.14, blk_alloc_queue is no longer exported, and its usage
has been superseded by blk_alloc_disk, which returns a gendisk struct
from which we can still retrieve the struct request_queue* that is
needed in the one place where it is used. This also replaces the call
to alloc_disk(minors), and minors is now set via struct member
assignment.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12362
Closes #12409
2021-09-22 15:19:08 -07:00
Mark Johnston 0c4f86be74 Initialize dn_next_type[] in the dnode constructor
It seems nothing ensures that this array is zeroed when a dnode is
freshly allocated, so in principle it retains the values from the
previous allocation.  In practice it seems to be the case that the
fields should end up zeroed, but we can zero the field anyway for
consistency.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-09-22 15:19:08 -07:00
Mark Johnston 88be308b2f Zero pad bytes following TX_WRITE log data
When logging a TX_WRITE record in the case where file data has to be
copied from the DMU, we pad the log record size to a multiple of 8
bytes.  In this case, any padding bytes should be zeroed, otherwise the
contents of uninitialized memory are written to the ZIL.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-09-22 15:19:08 -07:00
Mark Johnston d6dc79eabc Zero pad bytes when allocating a ZIL record
When allocating a record, we round up the allocation size to a multiple
of 8.  In this case, any padding bytes should be zeroed, otherwise the
contents of uninitialized memory are written to the ZIL.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-09-22 15:19:08 -07:00
Mark Johnston 900a444107 Initialize all fields in zfs_log_xvattr()
When logging TX_SETATTR, we could otherwise fail to initialize part of
the corresponding ZIL record depending on which fields are present in
the xvattr.  Initialize the creation time and the AV scan timestamp to
zero so that uninitialized bytes are not written to the ZIL.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-09-22 15:19:08 -07:00
George Wilson 1885e5ebab file reference counts can get corrupted
Callers of zfs_file_get and zfs_file_put can corrupt the reference
counts for the file structure resulting in a panic or a soft lockup.
When zfs send/recv runs, it will add a reference count to the
open file, and begin to send or recv the stream. If the file descriptor
is closed, then when dmu_recv_stream() or dmu_send() return we will
call zfs_file_put to remove the reference we placed on the file
structure. Unfortunately, because zfs_file_put() uses the file
descriptor to lookup the file structure, it may end up finding that
the file descriptor table no longer contains the file struct, thus
leaking the file structure. Or it might end up finding a file
descriptor for a different file and blindly updating its reference
counts. Other failure modes probably exists.

This change reworks the zfs_file_[get|put] interface to not rely
on the file descriptor but instead pass the zfs_file_t pointer around.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-issue: DLPX-76119
Closes #12299
2021-09-22 15:19:08 -07:00
Antonio Russo 1ad8fcc054 Revert Consolidate arc_buf allocation checks
This reverts commit 13fac09868.

Per the discussion in #11531, the reverted commit---which intended only
to be a cleanup commit---introduced a subtle, unintended change in
behavior.

Care was taken to partially revert and then reapply 10b3c7f5e4
which would otherwise have caused a conflict.  These changes were
squashed in to this commit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Suggested-by: @chrisrd
Suggested-by: robn@despairlabs.com
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11531
Closes #12227
2021-09-22 15:19:08 -07:00
Rich Ercolani 05c96b438a Fix unfortunate NULL in spa_update_dspace
After 1325434b, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12380 
Closes #12428
2021-09-22 15:19:08 -07:00
Rich Ercolani ccf6d0a59b Tinker with slop space accounting with dedup
* Tinker with slop space accounting with dedup

Do not include the deduplicated space usage in the slop space
reservation, it leads to surprising outcomes.

* Update spa_dedup_dspace sometimes

Sometimes, we get into spa_get_slop_space() with
spa_dedup_dspace=~0ULL, AKA "unset", while spa_dspace is correctly set.

So call the code to update it before we use it if we hit that case.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12271
2021-09-22 15:19:08 -07:00
Prakash Surya 61ae6c99f7 Add upper bound for slop space calculation
This change modifies the behavior of how we determine how much slop
space to use in the pool, such that now it has an upper limit. The
default upper limit is 128G, but is configurable via a tunable.

(Backporting note: Snipped out the embedded_log portion of the changes.)

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #11023
2021-09-22 15:19:08 -07:00
Tony Hutter e9353bc2ef Tag zfs-2.0.5
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2021-06-23 13:28:36 -07:00
George Amanakis 87d93731e7 Avoid deadlock when removing L2ARC devices under I/O
In case we have I/O and try to remove an L2ARC device a deadlock might
occur. arc_read()->zio_read()->zfs_blkptr_verify() waits for SCL_VDEV
to be dropped while holding the hash_lock. However, spa_l2cache_load()
holds SCL_ALL and waits for the hash_lock in l2arc_evict().

Fix this by moving zfs_blkptr_verify() to the top top arc_read() before
the hash_lock is taken. Verify the block pointer and return a checksum
error if damaged rather than halting the system, by using
BLK_VERIFY_LOG instead of BLK_VERIFY_HALT.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12054
2021-06-23 13:22:15 -07:00
Paul Zuchowski bd197378e7 Do not hash unlinked inodes
In zfs_znode_alloc we always hash inodes.  If the
znode is unlinked, we do not need to hash it.  This
fixes the problem where zfs_suspend_fs is doing zrele
(iput) in an async fashion, and zfs_resume_fs unlinked
drain processing will try to hash an inode that could
still be hashed, resulting in a panic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9741
Closes #11223
Closes #11648
Closes #12210
2021-06-23 13:22:15 -07:00
jharmening cd2bb9ca44 FreeBSD: incorporate changes to the VFS_QUOTACTL(9) KPI
VFS_QUOTACTL(9) has been updated to allow each filesystem to indicate
whether it has changed the busy state of the mount.  The filesystem
may still assume that its .vfs_quotactl entrypoint is always called
with the mount busied, but only needs to unbusy the mount (and clear
*mp_busy) if it does something that actually requires the mount to be
unbusied.  It no longer needs to blindly copy-paste the UFS protocol
for calling vfs_unbusy(9) for the Q_QUOTAOFF and Q_QUOTAON commands.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jason Harmening <jason.harmening@gmail.com>
Closes #12052
2021-06-23 13:22:15 -07:00
Mateusz Guzik 52d9bc7174 FreeBSD: use vnlru_free_vfsops if available
Fixes issues when zfs is used along with other filesystems.

External-issue: https://cgit.freebsd.org/src/commit/?id=e9272225e6bed840b00eef1c817b188c172338ee
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11881
2021-06-23 13:22:15 -07:00
Brian Behlendorf 6905f6b2c1 cppcheck: integrete cppcheck
In order for cppcheck to perform a proper analysis it needs to be
aware of how the sources are compiled (source files, include
paths/files, extra defines, etc).  All the needed information is
available from the Makefiles and can be leveraged with a generic
cppcheck Makefile target.  So let's add one.

Additional minor changes:

* Removing the cppcheck-suppressions.txt file.  With cppcheck 2.3
  and these changes it appears to no longer be needed.  Some inline
  suppressions were also removed since they appear not to be
  needed.  We can add them back if it turns out they're needed
  for older versions of cppcheck.

* Added the ax_count_cpus m4 macro to detect at configure time how
  many processors are available in order to run multiple cppcheck
  jobs.  This value is also now used as a replacement for nproc
  when executing the kernel interface checks.

* "PHONY =" line moved in to the Rules.am file which is included
  at the top of all Makefile.am's.  This is just convenient becase
  it allows us to use the += syntax to add phony targets.

* One upside of this integration worth mentioning is it now allows
  `make cppcheck` to be run in any directory to check that subtree.

* For the moment, cppcheck is not run against the FreeBSD specific
  kernel sources.  The cppcheck-FreeBSD target will need to be
  implemented and testing on FreeBSD to support this.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-06-23 13:22:15 -07:00
Rich Ercolani cfc82609a2 Simple change to fix building in recent environments
Renamed _fini too for symmetry.

Suggested-by: @ensch
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12059
Closes: #11987
Closes: #12056
2021-06-23 13:22:15 -07:00
George Melikov e26776f14c ZTS: pool_state test check for pool existence in cleanup
If there is no scsi_debug module, then this test
must be skipped, in this case cleanup routine should
be prepared for absent pool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11534
2021-06-23 13:22:15 -07:00
Chunwei Chen e68e938f54 Fix zfs_get_data access to files with wrong generation
If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.

This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #10593
Closes #11682
2021-06-23 13:22:15 -07:00
Christian Schwarz ab2110e3e2 zfs_vnops: make zfs_get_data OS-independent
Move zfs_get_data() in to platform-independent code. The only
platform-specific aspect of it is the way we release an inode
(Linux) / vnode_t (FreeBSD). I am not aware of a platform that
could be supported by ZFS that couldn't implement zfs_rele_async
itself. It's sibling zvol_get_data already is platform-independent.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #10979
2021-06-23 13:22:15 -07:00
Matthew Macy a909190683 Consolidate zfs_holey and zfs_access
The zfs_holey() and zfs_access() functions can be made common
to both FreeBSD and Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11125
2021-06-23 13:22:15 -07:00
наб 073f3c7826 zed: reap child after killing on time-out
When a child process is killed waitpid() must be called on the
pid the reap the zombie process.

Update BUGS section to reflect reality by replacing "zedlets
aren't time limited with "zedlets can be interrupted".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11769 
Closes #11798
2021-06-23 13:22:15 -07:00
Luis Henriques 5072f37419 Fix error code on __zpl_ioctl_setflags()
Other (all?) Linux filesystems seem to return -EPERM instead of -EACCESS
when trying to set FS_APPEND_FL or FS_IMMUTABLE_FL without the
CAP_LINUX_IMMUTABLE capability.  This was detected by generic/545 test
in the fstest suite.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Luis Henriques <henrix@camandro.org>
Closes #11791
2021-06-23 13:22:15 -07:00
Ryan Moeller 332cd2e313 Fix typo in zgenhostid.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11770
2021-06-23 13:22:15 -07:00
Adam D. Moss ac6f332154 Linux: always check or verify return of igrab()
zhold() wraps igrab() on Linux, and igrab() may fail when the inode 
is in the process of being deleted.  This means zhold() must only be
called when a reference exists and therefore it cannot be deleted. 
This is the case for all existing consumers so add a VERIFY and a
comment explaining this requirement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11704
2021-06-23 13:22:15 -07:00
Brian Behlendorf f14b6f2302 Linux: Set spl_kmem_cache_slab_limit when page size !4K
For small objects the kernel's slab implementation is very fast and
space efficient. However, as the allocation size increases to
require multiple pages performance suffers. The SPL kmem cache
allocator was designed to better handle these large allocation
sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers
prefer to use the kernel's slab allocator for small objects and
the custom SPL kmem cache allocator for larger objects.

This logic was effectively disabled for all architectures using
a non-4K page size which caused all kmem caches to only use the
SPL implementation. Functionally this is fine, but the SPL code
which calculates the target number of objects per-slab does not
take in to account that __vmalloc() always returns page-aligned
memory. This can result in a massive amount of wasted space when
allocating tiny objects on a platform using large pages (64k).

To resolve this issue we set the spl_kmem_cache_slab_limit cutoff
to 16K for all architectures. 

This particular change does not attempt to update the logic used
to calculate the optimal number of pages per slab. This remains
an issue which should be addressed in a future change.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12152
Closes #11429
Closes #11574
Closes #12150
2021-06-23 13:22:15 -07:00
Chunwei Chen bfb2928490 Fix zfs_get_data access to files with wrong generation
If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.

This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #10593
Closes #11682
2021-06-23 13:22:15 -07:00
Paul Zuchowski ecb1b1a31d Fix dmu_recv_stream test for resumable
Use dsl_dataset_has_resume_receive_state()
not dsl_dataset_is_zapified() to check if
stream is resumable.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #12034
2021-06-23 13:22:15 -07:00
Rich Ercolani 412b69dfab Remove iov_iter_advance() for iter_write
The additional iter advance is incorrect, as copy_from_iter() has
already done the right thing.  This will result in the following
warning being printed to the console as of the 5.12 kernel.

    Attempted to advance past end of bvec iter

This change should have been included with #11378 when a
similar change was made on the read side.

Suggested-by: @siebenmann
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Issue #11378
Closes #12041
Closes #12155
(cherry picked from commit 3f81aba766)
Signed-off-by: Jonathon Fernyhough <jonathon@m2x.dev>
2021-06-23 13:22:15 -07:00
Jonathon Fernyhough d014da0032 linux 5.13 compat: bdevops->revalidate_disk() removed (#12122)
Linux kernel commit 0f00b82e5413571ed225ddbccad6882d7ea60bc7 removes the
revalidate_disk() handler from struct block_device_operations. This
caused a regression, and this commit eliminates the call to it and the
assignment in the block_device_operations static handler assignment
code, when configure identifies that the kernel doesn't support that
API handler.

Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11967
Closes #11977
(cherry picked from commit 48c7b0e444)
Signed-off-by: Jonathon Fernyhough <jonathon@m2x.dev>

Co-authored-by: Coleman Kane <ckane@colemankane.org>
2021-06-23 13:22:15 -07:00
Rich Ercolani 8f93ab53d4 Bend zpl_set_acl to permit the new userns* parameter
Just like #12087, the set_acl signature changed with all the bolted-on
*userns parameters, which disabled set_acl usage, and caused #12076.

Turn zpl_set_acl into zpl_set_acl and zpl_set_acl_impl, and add a
new configure test for the new version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12076
Closes #12093
2021-06-23 13:22:15 -07:00
Rich Ercolani 88891b804b Update tmpfile() existence detection
Linux changed the tmpfile() signature again in torvalds/linux@6521f89,
which in turn broke our HAVE_TMPFILE detection in configure.

Update that macro to include the new case, and change the signature of
zpl_tmpfile as appropriate.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12060
Closes: #12087
2021-06-23 13:22:15 -07:00
Armin Wehrfritz 87b5e7fb1d RPM: Explicitly set the required min/max kernel version for the DKMS package
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Armin Wehrfritz <dkxls23@gmail.com>
Closes openzfs#12124
2021-06-23 13:22:15 -07:00
Coleman Kane 5722dce473 Linux 5.12 update: bio_max_segs() replaces BIO_MAX_PAGES
The BIO_MAX_PAGES macro is being retired in favor of a bio_max_segs()
function that implements the typical MIN(x,y) logic used throughout the
kernel for bounding the allocation, and also the new implementation is
intended to be signed-safe (which the former was not).

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11765
(cherry picked from commit ffd6978ef5)
Signed-off-by: Jonathon Fernyhough <jonathon@m2x.dev>
2021-06-23 13:22:15 -07:00
Coleman Kane 91d5ac85c0 Linux 5.12 compat: idmapped mounts
In Linux 5.12, the filesystem API was modified to support ipmapped
mounts by adding a "struct user_namespace *" parameter to a number
functions and VFS handlers. This change adds the needed autoconf
macros to detect the new interfaces and updates the code appropriately.
This change does not add support for idmapped mounts, instead it
preserves the existing behavior by passing the initial user namespace
where needed.  A subsequent commit will be required to add support
for idmapped mounted.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11712
(cherry picked from commit e2a8296131)
Signed-off-by: Jonathon Fernyhough <jonathon@m2x.dev>
2021-06-23 13:22:15 -07:00
Ryan Moeller fd8ccce07b FreeBSD: Initialize/destroy zp->z_lock
zp->z_lock is used in shared code for protecting projid and scantime.
We don't exercise these paths much if at all on FreeBSD, so have been
lucky enough not to have issues with the uninitialized locks so far.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #12003
2021-06-23 13:22:15 -07:00
Ryan Moeller c2a3bcc91d ZTS: Fix xattr_002_neg passing too soon
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11970
2021-06-23 13:22:15 -07:00
Toomas Soome d1277e2a13 zdb: ASSERT issues when DEBUG is not defined
If zdb is not built with DEBUG mode, the ASSERT macros will be
eliminated.

This will leave vim defined, but not used (gcc warning) and
checkpoint spacemap validation loop will do nothing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11932
2021-06-23 13:22:14 -07:00
Brian Behlendorf a59bf29606 ZTS: Add known exceptions
Both the zpool_initialize_import_export and checkpoint_discard_busy
test cases a known to occasionally fail.  Add them to the list of
known possible failures and reference the appropriate issue on the
tracker.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11949
2021-06-23 13:22:14 -07:00
Prawn 28fdc5f811 receive: don't fail inheriting (-x) properties on wrong dataset type
Receiving datasets while blanket inheriting properties like zfs 
receive -x mountpoint can generally be desirable, e.g. to avoid 
unexpected mounts on backup hosts.

Currently this will fail to receive zvols due to the mountpoint 
property being applicable to filesystems only.  This limitation 
currently requires operators to special-case their minds and tools 
for zvols.

This change gets rid of this limitation for inherit (-x) by
Spiting up the dataset type handling: Warnings for inheriting (-x), 
errors for overriding (-o).

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #11416
Closes #11840
Closes #11864
2021-06-23 13:22:14 -07:00
Mateusz Guzik a78a93e67e FreeBSD: damage control racing .. lookups in face of mkdir/rmdir
External-issue: https://reviews.freebsd.org/D29769
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11926
2021-06-23 13:22:14 -07:00
Romain Dolbeau 9a68ceba10 Fix AVX512BW Fletcher code on AVX512-but-not-BW machines
Introduce a specific valid function for avx512f+avx512bw (instead 
of checking only for avx512f).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Romain Dolbeau <romain@dolbeau.org>
Closes #11937
Closes #11938
2021-06-23 13:22:14 -07:00
Daniel Stevenson 3c7ad493a6 Fixed incorrect man page reference in zfsprops(8)
The special_small_blocks section directed readers to zpool(8) for
documentation on special allocation classes, while they are actually
documented in zpoolconcepts(8).

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Daniel Stevenson <daniel@dstev.net>
Closes #11918
2021-06-23 13:22:14 -07:00
наб faad85637b freebsd/libshare: nfs: make nfs_is_shared() thread-safe
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
2021-06-23 13:22:14 -07:00
наб 501da8d433 libshare: nfs: don't leak nfs_lock_fd when lock fails
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
2021-06-23 13:22:14 -07:00
наб 30620ad8e1 libzfs: refresh property cache after inheriting userprop
This matches what happens when inheriting a system property

Consider the following program:
	int main() {
		void *zhp = libzfs_init();
		void *dataset = zfs_open(zhp, "zest/__test", 1);

		printf("before:");
		dump_nvlist(zfs_get_user_props(dataset), 2);
		printf("\n");

		zfs_prop_inherit(dataset, "xyz.nabijaczleweli:test", 0);
		printf("after:");
		dump_nvlist(zfs_get_user_props(dataset), 2);
		printf("\n");

		zfs_refresh_properties(dataset);
		printf("refreshed:");
		dump_nvlist(zfs_get_user_props(dataset), 2);
		printf("\n");
	}

And the output before:
	# zfs set xyz.nabijaczleweli:test=hehe zest/__test
	# ./a.out
	before:  xyz.nabijaczleweli:test:
	      value: 'hehe'
	      source: 'zest/__test'

	after:  xyz.nabijaczleweli:test:
	      value: 'hehe'
	      source: 'zest/__test'

	refreshed:

As compared to the output after:
	# zfs set xyz.nabijaczleweli:test=hehe zest/__test
	# ./a.out
	before:  xyz.nabijaczleweli:test:
	      value: 'hehe'
	      source: 'zest/__test'

	after:
	refreshed:

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11064
Closes #11911
2021-06-23 13:22:14 -07:00
наб 1d65d1a597 libzfs: don't mark prompt+raw as retriable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11911
Closes #11031
2021-06-23 13:22:14 -07:00
Mateusz Guzik edf3d7aa43 Combine zio caches if possible
This deduplicates 2 sets of caches which use the same allocation size.

Memory savings fluctuate a lot, one sample result is FreeBSD running
"make buildworld" saving ~180MB RAM in reduced page count associated
with zio caches.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11877
2021-06-23 13:22:14 -07:00
Paul Zuchowski 9755cdfd89 Fix crash in zio_done error reporting
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #11872
Closes #11896
2021-06-23 13:22:14 -07:00
Brian Behlendorf f36e8118fd Fix 'make checkbashisms` warnings
The awk command used by the checkbashisms target incorrectly
adds the escape character before the ! and # characters.  This
results in the following warnings because these characters do not
need to be escaped.

    awk: cmd. line:1: warning: regexp escape sequence
        `\!' is not a known regexp operator
    awk: cmd. line:1: warning: regexp escape sequence
        `\#' is not a known regexp operator

Remove the unneeded escape character before ! and #.

Valid escape sequences are:

    https://www.gnu.org/software/gawk/manual/html_node/Escape-Sequences.html

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11902
2021-06-23 13:22:14 -07:00
Yuri Pankov e69f73c5cf Fix vdev health padding in zpool list -v
Do not (incorrectly, right instead left) pad health string itself,
it will be taken care of when printing property value below.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Yuri Pankov <yuripv@FreeBSD.org>
Closes #11899
2021-06-23 13:22:14 -07:00
наб 3430eced80 libzfs: zfs_mount_at(): load key for encryption root if MS_CRYPT
zfs_crypto_load_key() only works on encryption roots,
and zfs mount -la would fail if it encounters a datasets that
is sorted before their encroots.

To trigger:
  truncate -s 40G /tmp/test
  dd if=/dev/urandom of=/tmp/k bs=128 count=1 status=none
  zpool create -O encryption=on -O keylocation=file:///tmp/k \
               -O keyformat=passphrase test /tmp/test
  zfs create -o mountpoint=/a test/a
  zfs create -o mountpoint=/b test/b
  zfs umount test
  zfs unload-key test
  zfs mount -la

The final mount errored out with:
  Key load error: Keys must be loaded for
    encryption root of 'test/a' (test).
  Key load error: Keys must be loaded for
    encryption root of 'test/b' (test).

And only /test was mounted

This technically breaks the libzfs API, but the previous behavior was
decidedly a bug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11870 
Closes #11875
2021-06-23 13:22:14 -07:00
Brian Behlendorf 0839934d84 ZTS: fix removal_condense_export test case
It's been observed in the CI that the required 25% of obsolete bytes
in the mapping can be to high a threshold for this test resulting in
condensing never being triggered and a test failure.  To prevent these
failures make the existing zfs_condense_indirect_obsolete_pct tuning
available so the obsolete percentage can be reduced from 25% to 5%
during this test.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11869
2021-06-23 13:22:14 -07:00
наб f831d1377f libzfs{,_core}: set O_CLOEXEC on persistent (ZFS_DEV and MNTTAB) fds
These were fd 3, 4, and 5 by the time zfs change-key hit
execute_key_fob()

glibc appends "e" to setmntent() mode, but musl's just returns fopen()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-06-23 13:22:14 -07:00
наб a559c3d0f4 libzfs: zfs_crypto_create() requires a new key by definition: set newkey
This changes the password prompt for new encryption roots from
  Enter passphrase:
  Re-enter passphrase:
to
  Enter new passphrase:
  Re-enter new passphrase:
which makes more sense and is more consistent with "new passphrase"
now always meaning "come up with something" and plain "passphrase"
"remember that thing"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-06-23 13:22:14 -07:00
наб 95e29a7275 zfprops(8): fix spacing in jailed= arguments
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-06-23 13:22:14 -07:00
наб 85890d54d9 zfs-[un]jail(8): fix "zfs-jail [un]jail" leftovers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-06-23 13:22:14 -07:00
Ryan Moeller 12e25a6ec3 ZTS: Improve cleanup in removal_with_export
Kill the removal operation on every platform, not just Linux.
The test has been fixed and is now stable on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11856
2021-06-23 13:22:14 -07:00
Ryan Moeller 57490d5e13 ZTS: Tests using zhack may fail on FreeBSD
As described in #11854, zhack is occasionally segfaulting on FreeBSD. 
Debugging this is proving to be tricky. To avoid false positives in
the CI add entries for the tests that use zhack in zts-report to 
accept that they may occasionally fail on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Issue #11854
Closes #11855
2021-06-23 13:22:14 -07:00
Ryan Moeller 214196e9f5 Ratelimit deadman zevents as with delay zevents
Just as delay zevents can flood the zevent pipe when a vdev becomes
unresponsive, so do the deadman zevents.

Ratelimit deadman zevents according to the same tunable as for delay
zevents.

Enable deadman tests on FreeBSD and add a test for deadman event
ratelimiting. 

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11786
2021-06-23 13:22:14 -07:00
matt-fidd 932d471bec zfs get -p only outputs 3 columns if "clones" property is empty
get_clones_string currently returns an empty string for filesystem
snapshots which have no clones. This breaks parsable `zfs get` output as
only three columns are output, instead of 4.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Fiddaman <github@m.fiddaman.uk>
Co-authored-by: matt <matt@fiddaman.net>
Closes #11837
2021-06-23 13:22:14 -07:00
наб 77cba40b85 zpool-features.5: remove "booting not possible with this feature"s
The exact limitations on what features are supported when booting
vary considerably depending on the environment.  In order to minimize
confusion avoid categorical statements which assume GRUB2 is being 
used.  The supported GRUB2 features are covered earlier in this man 
page for easy reference.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11842
2021-06-23 13:22:14 -07:00
George Melikov 18d069a20f man: fix wrong .Xr macros usages
In addition, html doc will have working hyperlinks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11845
2021-06-23 13:22:14 -07:00
наб dfc25ea91c libzutil: zfs_isnumber(): return false if input empty
zpool list, which is the only user, would mistakenly try to parse the
empty string as the interval in this case:

  $ zpool list "a"
  cannot open 'a': no such pool
  $ zpool list ""
  interval cannot be zero
  usage: <usage string follows>
which is now symmetric with zpool get:
  $ zpool list ""
  cannot open '': name must begin with a letter

Avoid breaking the  "interval cannot be zero" string.
There simply isn't a need for this, and it's user-facing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11841 
Closes #11843
2021-06-23 13:22:14 -07:00
Brian Behlendorf 7428adc6b7 ZTS: pool_checkpoint improvements
The pool_checkpoint tests may incorrectly fail because several of
them invoke zdb for an imported pool.  In this scenario it's not
unexpected for zdb to fail if the pool is modified.  To resolve
this these zdb checks are now done after the pool has been exported.

Additionally, the default cleanup functions assumed the pool would
be imported when they were run.  If this was not the case they're
exit early and fail to cleanup all of the test state causing
subsequent tests to fail.  Add a check to only destroy the pool
when it is imported.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11832
2021-06-23 13:22:14 -07:00
Ryan Moeller 30782f2ff5 ZTS: inheritance/inherit_001_pos is flaky
Add inheritance/inherit_001_pos to the maybe fails on FreeBSD list.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11830
2021-06-23 13:22:14 -07:00
Ryan Moeller 43efd5625b Avoid taking global lock to destroy zfsdev state
We have exclusive access to our zfsdev state object in this section
until it is invalidated by setting zs_minor to -1, so we can destroy
the state without taking a lock if we do the invalidation last, after
a member to ensure correct ordering.

While here, strengthen the assertions that zs_minor is valid when we
enter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11751
2021-06-23 13:22:14 -07:00
Ryan Moeller 87a520c4e7 FreeBSD: Fix stable/12 after AT_BENEATH removal
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11827
2021-06-23 13:22:14 -07:00
Ryan Moeller d02bcae0c4 Allow pool names that look like Solaris disk names
Nothing bad happens if a prefix of your pool name matches a disk name.
This is a bit of a silly restriction at this point.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11781 
Closes #11813
2021-06-23 13:22:14 -07:00
Ryan Moeller d9bcadc6ba Don't scale zfs_zevent_len_max by CPU count
The lower bound for this scaling to too low and the upper bound is too
high.  Use a fixed default length of 512 instead, which is a reasonable
value on any system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11822
2021-06-23 13:22:14 -07:00
Ryan Moeller 67819ea28d Atomically check and set dropped zevent count
ratelimit_dropped isn't protected by a lock and is expected to
be updated atomically.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11822
2021-06-23 13:22:14 -07:00
Brian Behlendorf 79430bf34b CI: Increase free space in workflow
Recently we've been running out of free space in the ubuntu 20.04
environment resulting in test failures.  This appears to be caused
by a change in the default available free space and not because of
any change in OpenZFS. Try and avoid this failure by applying a
suggested workaround which removes some unnecessary files.

https://github.com/actions/virtual-environments/issues/2840

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11826
2021-06-23 13:22:14 -07:00
Andrew cabc605aa7 Fix regression in POSIX mode behavior
Commit 235a85657 introduced a regression in evaluation of POSIX modes
that require group DENY entries in the internal ZFS ACL. An example
of such a POSX mode is 007. When write_implies_delete_child is set,
then ACE_WRITE_DATA is added to `wanted_dirperms` in prior to calling
zfs_zaccess_common(). This occurs is zfs_zaccess_delete().

Unfortunately, when zfs_zaccess_aces_check hits this particular DENY
ACE, zfs_groupmember() is checked to determine whether access should be
denied, and since zfs_groupmember() always returns B_TRUE on Linux and
so this check is failed, resulting ultimately in EPERM being returned.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #11760
2021-06-23 13:22:14 -07:00
Palash Gandhi 0740d41e9b ZTS: New test for kernel panic induced by redacted send
This change adds a new test that covers a bug fix in the binary search
in the redacted send resume logic that causes a kernel panic.
The bug was fixed in https://github.com/openzfs/zfs/pull/11297.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Palash Gandhi <palash.gandhi@delphix.com>
Closes #11764
2021-06-23 13:22:14 -07:00
Martin Matuška 16e8e24f9c Allow setting bootfs property on pools with indirect vdevs
The FreeBSD boot loader relies on the bootfs property and is capable
of booting from removed (indirect) vdevs.

Reviewed-by Eric van Gyzen
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11763
2021-06-23 13:22:14 -07:00
Mateusz Guzik bc5afff351 FreeBSD: make seqc asserts conditional on replay
Avoids tripping on asserts when doing pool recovery.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11739
2021-06-23 13:22:14 -07:00
Ryan Moeller 825d8e1b0f FreeBSD: Fix memory leaks in kstats
Don't handle (incorrectly) kmem_zalloc() failure.  With KM_SLEEP,
will never return NULL.

Free the data allocated for non-virtual kstats when deleting the object.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11767
2021-06-23 13:22:14 -07:00
gldisater f7797f3f5e Hold and release permissions exist
The man page was missing these two permissions.
Add the missing permissions to the man page. 

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jeremy Faulkner <gldisater@gldis.ca>
Closes #11727
2021-06-23 13:22:14 -07:00
Ryan Moeller 2439e95f36 ZTS: Add tests for DOS mode attributes
Create a new section of tests to run with acltype=off.

For now the only test we have is for the DOS mode READONLY attribute on
FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11734
2021-06-23 13:22:14 -07:00
Ryan Moeller de722e55ca ZTS: Fix incorrect use of libtest in user_run by xattr_003_neg
You can't use user_run to eval ksh functions defined in libtest unless
you include libtest in the user shell.

Fix xattr_003_neg by:
* include libtest in the user shell
* *then* run get_xattr
* assert this fails
* use variables for filenames so they don't change in the user's shell
* don't log the contents of /etc/passwd
* cleanup all byproducts

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11185
2021-06-23 13:22:14 -07:00
Ryan Moeller acb4e7b0e1 ZTS: Use ksh and current environment for user_run
The current user_run often does not work as expected.  Commands are run
in a different shell, with a different environment, and all output is
discarded.

Simplify user_run to retain the current environment, eliminate eval,
and feed the command string into ksh.  Enhance the logging for
user_run so we can see out and err.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11185
2021-06-23 13:22:14 -07:00
Mariusz Zaborski 5a397fda09 FreeBSD: bring back possibility to rewind the checkpoint from bootloader
Add parsing of the rewind options.

When I was upstreaming the change [1], I omitted the part where we
detect that the pool should be rewind. When the FreeBSD repo has
synced with the OpenZFS, this part of the code was removed.

[1] FreeBSD repo: 277f38abffc6a8160b5044128b5b2c620fbb970c
[2] OpenZFS repo: f2c027bd6a

External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254152
Originally reviewed by: tsoome, allanjude
Originally reviewed by: kevans (ok from high-level overview)
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #11730
2021-06-23 13:22:14 -07:00
Ryan Moeller 525a7037c7 FreeBSD: Clean up zfsdev_close to match Linux
Resolve some oddities in zfsdev_close() which could result in a
panic and were not present in the equivalent function for Linux.

- Remove unused definition ZFS_MIN_MINOR
- FreeBSD: Simplify zfsdev state destruction
- Assert zs_minor is valid in zfsdev_close
- Make locking around zfsdev state match Linux

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11720
2021-06-23 13:22:14 -07:00
Mateusz Guzik 789f3d0e56 FreeBSD: switch teardown lock to rms
This deserializes otherwise non-contending operations.

The previous scheme of using 17 locks hashed by curthread runs into
conflicts very quickly. Check the pull request for sample results.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-06-23 13:22:14 -07:00
Mateusz Guzik 5d946dfa1c Macroify teardown lock handling
This will allow platforms to implement it as they see fit, in particular
in a different manner than rrm locks.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-06-23 13:22:14 -07:00
Mateusz Guzik cd80d6d355 FreeBSD: rename teardown inactive macros to mimick rrm convention
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-06-23 13:22:14 -07:00
Mateusz Guzik 140584bfff FreeBSD: remove 2 assertions that teardown lock is not held
They are not very useful and hard to implement in the rms routine
the code is about to start using.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-06-23 13:22:14 -07:00
Mateusz Guzik 8773e29c23 FreeBSD: rework asserts in zfs_dd_lookup
1. even up ifdefs
2. drop the arguably useless teardown lock asserts -- nothing else
   checks for it

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-06-23 13:22:14 -07:00
Mateusz Guzik bb21c0aa3b Add branch prediction to ZFS_ENTER and ZFS_VERIFY_ZP macros
They are expected to fail only in corner cases.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-06-23 13:22:14 -07:00
George Wilson 455ea5bb1f zpool import cachefile improvements
Importing a pool using the cachefile is ideal to reduce the time
required to import a pool. However, if the devices associated with
a pool in the cachefile have changed, then the import would fail.
This can easily be corrected by doing a normal import which would
then read the pool configuration from the labels.

The goal of this change is make importing using a cachefile more
resilient and auto-correcting. This is accomplished by having
the cachefile import logic automatically fallback to reading the
labels of the devices similar to a normal import. The main difference
between the fallback logic and a normal import is that the cachefile
import logic will only look at the device directories that were
originally used when the cachefile was populated. Additionally,
the fallback logic will always import by guid to ensure that only
the pools in the cachefile would be imported.

External-issue: DLPX-71980
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #11716
2021-06-23 13:22:14 -07:00
Martin Matuška 7126b0e5ed Fix whitespace introduced in ecc277cff
The manual page change in ecc277c has introduced whitespace on
line ends.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11722
2021-06-23 13:22:14 -07:00
Ryan Moeller 7ca0ef36e1 FreeBSD: Fix scope of deadman tunables
A few deadman tunables ended up in the wrong sysctl node.

Move them to vfs.zfs.deadman.*

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11715
2021-06-23 13:22:14 -07:00
Adam D. Moss 0d6186cff3 Microoptimizations for VERIFY() and friends
Add branch hints and constify the intermediate evaluations of 
left/right params in VERIFY3*().

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11708
2021-06-23 13:22:14 -07:00
Allan Jude 6f349d33a5 Add missing files to Makefile
Some .h files that were added were missed in this Makefile. Since 
they are .h files, their being missing only resulted in them 
disappeared from the dist archive.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11705
2021-06-23 13:22:14 -07:00
George Melikov 226d362b12 CI checkstyle: pin ubuntu version
Our checkstyle doesn't work well on Ubuntu 20.04,
temporary pin it to 18.04.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11713
2021-06-23 13:22:14 -07:00
Antonio Russo 497fc5fb06 ZTS events_002: Improve speed and reliability
events_002 exercises the ZED, ensuring that it neither misses events,
nor reporting events twice.

On slow test hardware, some of the timeouts are insufficient to allow
the ZED to properly settle.  Conversely, on fast hardware these same
timeouts are too long, unnecessarily slowing the test run.

Instead of using a fixed timeout, wait for the expected final event
before returning.  Additionally, wait with a timeout for unexpected
events to avoid missing them if they show up late.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11703
2021-06-23 13:22:14 -07:00
Christian Schwarz abb485a34a zvol: call zil_replaying() during replay
zil_replaying(zil, tx) has the side-effect of informing the ZIL that an
entry has been replayed in the (still open) tx.  The ZIL uses that
information to record the replay progress in the ZIL header when that
tx's txg syncs.

ZPL log entries are not idempotent and logically dependent and thus
calling zil_replaying() is necessary for correctness.

For ZVOLs the question of correctness is more nuanced: ZVOL logs only
TX_WRITE and TX_TRUNCATE, both of which are idempotent. Logical
dependencies between two records exist only if the write or discard
request had sync semantics or if the ranges affected by the records
overlap.

Thus, at a first glance, it would be correct to restart replay from
the beginning if we crash before replay completes. But this does not
address the following scenario:
Assume one log record per LWB.
The chain on disk is

    HDR -> 1:W(1, "A") -> 2:W(1, "B") -> 3:W(2, "X") -> 4:W(3, "Z")

where N:W(O, C) represents log entry number N which is a TX_WRITE of C
to offset A.
We replay 1, 2 and 3 in one txg, sync that txg, then crash.
Bit flips corrupt 2, 3, and 4.
We come up again and restart replay from the beginning because
we did not call zil_replaying() during replay.
We replay 1 again, then interpret 2's invalid checksum as the end
of the ZIL chain and call replay done.
The replayed zvol content is "AX".

If we had called zil_replaying() the HDR would have pointed to 3
and our resumed replay would not have replayed anything because
3 was corrupted, resulting in zvol content "BX".

If 3 logically depends on 2 then the replay corrupted the ZVOL_OBJ's
contents.

This patch adds the zil_replaying() calls to the replay functions.
Since the callbacks in the replay function need the zilog_t* pointer
so that they can call zil_replaying() we open the ZIL while
replaying in zvol_create_minor(). We also verify that replay has
been done when on-demand-opening the ZIL on the first modifying
bio.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11667
2021-06-23 13:22:14 -07:00
Ryan Moeller 0ccffb2634 ZTS: Improve cleanup in zpool tests
* Restore original kern.corefile value after the test.
* Don't leave behind a frozen pool.
* Clean up leftover vdev files.
* Make zpool_002_pos and zpool_003_pos consistent in their handling of
core files while here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11694
2021-06-23 13:22:14 -07:00
nssrikanth 26cb87d22d Cancel TRIM / initialize on FAULTED non-writeable vdevs
When a device which is actively trimming or initializing becomes
FAULTED, and therefore no longer writable, cancel the active
TRIM or initialization.  When the device is merely taken offline
with `zpool offline` then stop the operation but do not cancel it.
When the device is brought back online the operation will be
resumed if possible.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Closes #11588
2021-06-23 13:22:14 -07:00
Brian Behlendorf 43dbfa3921 ZTS: zpool_trim_start_and_cancel_pos.ksh
Several of the TRIM tests were based of the initialize tests and
then adapted for TRIM.  The zpool_trim_start_and_cancel_pos.ksh
test was intended to be one such test but it was overlooked and
actually never adapted.  Update it accordingly.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11649
2021-06-23 13:22:14 -07:00
Brian Behlendorf 395583e38e Fix overly broad locking in spa_vdev_config_exit()
Calling vdev_free() only requires the we acquire the spa config
SCL_STATE_ALL locks, not the SCL_ALL locks.  In particular, we need
need to avoid taking the SCL_CONFIG lock (included in SCL_ALL) as a
writer since this can lead to a deadlock.  The txg_sync_thread() may
block in spa_txg_history_init_io() when taking the SCL_CONFIG lock
as a reading when it detects there's a pending writer.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11585
2021-06-23 13:22:14 -07:00
Ryan Moeller 818bc70c32 Wrap bare EINVAL returns with SET_ERROR
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11636
2021-06-23 13:22:14 -07:00
Tony Hutter 6150fbe67f Tag zfs-2.0.4
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2021-03-08 13:04:37 -08:00
manfromafar 458bb8c8a1 Clarify compressed zfs send/recv behavior
Docs for send and receive do not explain behavior when sending a 
compressed stream then receiving on a host that overrides compression 
with -o compress=value.

The data from the send stream is written as it was from the send is 
the compressed form but the compression algorithm set on the receiver 
is the overridden version which causes some confusion as to what 
algorithm was actually used.

Updated man docs to clarify behavior

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed By: Allan Jude <allanjude@freebsd.org>
Signed-off-by: manfromafar <manfromafar@outlook.com>
Closes #11690
2021-03-08 09:07:32 -08:00
Ryan Moeller cda6fdd500 Intentionally allow ZFS_READONLY in zfs_write
ZFS_READONLY represents the "DOS R/O" attribute.
When that flag is set, we should behave as if write access
were not granted by anything in the ACL.  In particular:
We _must_ allow writes after opening the file r/w, then
setting the DOS R/O attribute, and writing some more.
(Similar to how you can write after fchmod(fd, 0444).)

Restore these semantics which were lost on FreeBSD when refactoring
zfs_write.  To my knowledge Linux does not actually expose this flag,
but we'll need it to eventually so I've added the supporting checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11693
2021-03-08 09:07:29 -08:00
Brian Behlendorf df8271301e Suppress cppcheck invalidSyntax warninigs
For some reason cppcheck 1.90 is generating an invalidSyntax warning
when the BF64_SET macro is used in the zstream source.  The same
warning is not reported by cppcheck 2.3, nor is their any evident
problem with the expanded macro.  This appears to be an issue with
this version of cppcheck.  This commit annotates the source to suppress
the warning.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11700
2021-03-08 09:07:25 -08:00
Brian Behlendorf e219935f10 Initialize ZIL buffers
When populating a ZIL destination buffer ensure it is always
zeroed before its contents are constructed.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11687
2021-03-08 09:07:21 -08:00
Thomas Lamprecht bb9104f1e8 zpool: use tab to intend continuation from removal status
Bring the output of the removal status in line with the other
"fields" that zpool status outputs, and thus allows an parser to
easier detect this as continuation of the 'remove:' output.

Before:
remove: Removal of vdev 0 copied 282G in 0h9m, completed on [...]
    776K memory used for removed device mappings

Now:
remove: Removal of vdev 0 copied 282G in 0h9m, completed on [...]
	776K memory used for removed device mappings

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Closes #11674
2021-03-08 09:07:18 -08:00
James Wah 94b240bae0 Don't bomb out when using keylocation=file://
Avoid following the error path when the operation in fact succeeded.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: James Wah <james@laird-wah.net>
Closes #11651
2021-03-05 12:59:07 -08:00
Jake Howard e93203e004 Add "zstd-fast" to help options for "compression" property
This value does work as expected, and is documented in the manpage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jake Howard <git@theorangeone.net>
Closes #11670
2021-03-05 12:58:47 -08:00
Andriy Gapon ccb453acd0 Fix assert in FreeBSD-specific dmu_read_pages
The function has three similar pieces of code: for read-behind pages,
requested pages and read-ahead pages.  All three pieces had an
assert to ensure that the page is not mapped.  Later the assert was
relaxed to require that the page is not mapped for writing.  But that
was done in two places out of three.  This change fixes the third piece,
read-ahead.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #11654
2021-03-05 12:58:08 -08:00
Coleman Kane 6c1989923e Linux 5.12 compat: replace bio_*_io_acct with disk_*_io_acct
The bio_*_acct functions became GPL exports, which causes the
kernel modules to refuse to compile. This replaces code with
alternate function calls to the disk_*_io_acct interfaces, which
are not GPL exports. This change was added in kernel commit
99dfc43ecbf67f12a06512918aaba61d55863efc.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11639
2021-03-05 12:57:54 -08:00
Coleman Kane a0eb5a77a0 Linux 5.12 compat: bio->bi_disk member moved
The struct bio member bi_disk was moved underneath a new member named
bi_bdev. So all attempts to reference bio->bi_disk need to now become
bio->bi_bdev->bd_disk.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11639
2021-03-05 12:57:46 -08:00
Brian Behlendorf fe77c48320 Linux: increase max nvlist_src size
On Linux increase the maximum allowed size of the src nvlist which
can be passed to the /dev/zfs ioctl.  Originally, this was set
to a maximum of KMALLOC_MAX_SIZE (4M) because it was kmalloc'd.
Since that time it's been converted to a vmalloc so that's no
longer a hard limit, and it's desirable for `zfs send/recv` to
allow larger nvlists so more snapshots can be sent at once.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6572
Closes #11638
2021-03-05 12:54:57 -08:00
Cedric Maunoury db9b29895e send_iterate_snap : doall send without fromsnap
The behavior of a NULL fromsnap was inadvertently changed for a doall
send when the send/recv logic in libzfs was updated.  Restore the
previous behavior by correcting send_iterate_snap() to include all
the snapshots in the nvlist for this case. 

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cedric Maunoury <cedric.maunoury@gmail.com>
Closes #11608
2021-03-05 12:54:39 -08:00
fbynite 4d4dd76f0f vdev_ops: don't try to call vdev_op_hold or vdev_op_rele when NULL
This prevents a panic after a SLOG add/removal on the root pool followed
by a zpool scrub.

When a SLOG is removed, a hole takes its place - the vdev_ops for a hole
is vdev_hole_ops, which defines the handler functions of vdev_op_hold
and vdev_op_rele as NULL.

This bug has been reported in illumos and FreeBSD, a different trigger
in the FreeBSD report though.

Credit for this patch goes to Patrick Mooney <pmooney@pfmooney.com>

Obtained from: illumos-gate commit: c65bd18728f34725
External-issue: https://www.illumos.org/issues/12981
External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252396
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.fx907@gmail.com>
Closes #11623
2021-03-05 12:53:50 -08:00
Christian Schwarz 286c7f75bb libzpool: set_global_var: fix endianness handling (fixes zdb -o )
Without this patch I get the error

  Setting global variables is only supported on little-endian systems

when using `zdb -o` on my amd64 machine.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602
2021-03-05 12:51:48 -08:00
Ryan Moeller 403703d57a Restore FreeBSD resource usage accounting
Add zfs_racct_* interfaces for platform-dependent read/write accounting.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11613
2021-03-05 12:50:32 -08:00
Mark Johnston f17c843eff FreeBSD: disable the use of hardware crypto offload drivers for now
First, the crypto request completion handler contains a bug in that it
fails to reset fs_done correctly after the request is completed.  This
is only a problem for asynchronous drivers.  Second, some hardware
drivers have input constraints which ZFS does not satisfy.  For
instance, ccp(4) apparently requires the AAD length for AES-GCM to be a
multiple of the cipher block size, and with qat(4) the AES-GCM AAD
length may not be longer than 240 bytes.  FreeBSD's generic crypto
framework doesn't have a mechanism to automatically fall back to a
software implementation if a hardware driver cannot process a request,
and ZFS does not tolerate such errors.

The plan is to implement such a fallback mechanism, but with FreeBSD
13.0 approaching we should simply disable the use hardware drivers for
now.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #11612
2021-03-05 12:49:41 -08:00
Andriy Gapon f6440fa094 Fix report_mount_progress never calling set_progress_header
That happens because of an off-by-one mistake.
share_mount_one_cb() calls report_mount_progress(current=sm_done) after
having incremented sm_done by one.  Then report_mount_progress()
increments the parameter again.  It appears that that logic became
obsolete after commit a10d50f999, parallel zfs mount.

On FreeBSD I observe that zfs mount -a -v prints, for example,
    (null): (201/248)
That happens because set_progress_header() is never called.

With this change the output becomes correct:
    Mounting ZFS filesystems: (209/248)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #11607
2021-03-05 12:49:22 -08:00
José Luis Salvador Rufo 62f9691e10 Support uClibc for the tests compilations
There are two issues that don't allow ZFS to be compiled using uClibc.
`backtrace()`, and `program_invocation_short_name` as a `const`.
This patch adds uClibc to the conditionals in the same way there are
already for Glibc for `backtrace()`; and removes the external param
`program_invocation_short_name` because its only used here for the
whole project.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes #11600
2021-03-05 12:48:50 -08:00
Brian Behlendorf 73e26fdc09 Linux 5.11 compat: META
Increase the Linux-Maximum version in the META file to 5.11.
All of the required compatibility patches have been merged.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11586
2021-03-05 12:48:08 -08:00
Tony Hutter 65a89d9f49 Tag zfs-2.0.3
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2021-02-10 13:03:21 -08:00
Antonio Russo 8829ba19b7 Set file mode during zfs_write
3d40b65 refactored zfs_vnops.c, which shared much code verbatim between
Linux and BSD.  After a successful write, the suid/sgid bits are reset,
and the mode to be written is stored in newmode.  On Linux, this was
propagated to both the in-memory inode and znode, which is then updated
with sa_update.

3d40b65 accidentally removed the initialization of newmode, which
happened to occur on the same line as the inode update (which has been
moved out of the function).

The uninitialized newmode can be saved to disk, leading to a crash on
stat() of that file, in addition to a merely incorrect file mode.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11474 
Closes #11576
2021-02-08 09:20:38 -08:00
наб 642d86af0d zfs-import-{cache,scan}: change condition to FileNotEmpty
When all pools are exported ZFS will generate an empty cache file.
This will cause the import service to fail, which is sub-optimal, 
since this means that dracut fails, and it necessary to run 
`zpool import -a` to boot, delete the file, and regenerate+reinstall 
the initrd.

This resolves the issue by treating an zero-length cache files the
same as a missing cache file.  This aligns the behavior with that
of the `zpool` command itself.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11568
2021-02-05 11:42:40 -08:00
Lorenz Hüdepohl 07ca7592ad dracut: Fix race condition between load-key and import
zfs-load-key.sh is called by the dracut-pre-mount.service unit which has
no explicit 'After' dependency on zfs-import.target. That way it can be
that the pool has not yet been imported and the zfs-load-key.sh finishes
without ever seeing the relevant pool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Lorenz Hüdepohl <dev@stellardeath.org>
Closes #11500
2021-02-05 11:40:33 -08:00
Brian Behlendorf fb0d807477 zfs-list.8: clarify listing snapshots
Clarify how to include snapshots in the `zpool list` output by
referencing the full name of the `listsnapshots` pool property,
and the `zpool list -t snapshot` option.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11562
Closes #11565
2021-02-05 11:32:22 -08:00
George Melikov 858ea8861c zts-report.py: ignore some skipped tests in Github CI
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554
2021-02-05 11:31:56 -08:00
George Melikov 549841ef9a CI: add ubuntu-* functional tests runner
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554
2021-02-05 11:31:38 -08:00
George Melikov 96f322e7e2 CI: rename zfs-tests workflow
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554
2021-02-05 11:31:34 -08:00
Brian Behlendorf d022406a14 Tag 2.0.2
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-02-01 10:08:22 -08:00
George Amanakis a1a1386965 Avoid updating the L2ARC device header unnecessarily
If we do not write any buffers to the cache device and the evict hand
has not advanced do not update the cache device header.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11522 
Closes #11537
2021-01-28 11:39:24 -08:00
Paul Dagnelie 43eaef6de8 Fix zrele race in zrele_async that can cause hang
There is a race condition in zfs_zrele_async when we are checking if 
we would be the one to evict an inode. This can lead to a txg sync 
deadlock.

Instead of calling into iput directly, we attempt to perform the atomic 
decrement ourselves, unless that would set the i_count value to zero. 
In that case, we dispatch a call to iput to run later, to prevent a 
deadlock from occurring.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11527 
Closes #11530
2021-01-28 11:39:13 -08:00
Alan Somers 5bc4c39d70 Fix a resource leak in uu_avl_pool_destroy
Need to destroy the pthread mutex created in uu_avl_pool_create.

https://svnweb.freebsd.org/base?view=revision&revision=262912

Obtained from: FreeBSD
Sponsored by:	Spectra Logic Corporation
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11528
2021-01-28 11:39:00 -08:00
Alan Somers 756e28be51 Fix a man page link in zfs-program.8
zfs-program.8 has an orphan link, fix it.

https://svnweb.freebsd.org/base?view=revision&revision=360080

Obtained from: FreeBSD
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11529
2021-01-28 11:38:48 -08:00
наб 039a810491 zfsprops.8: fix mispluralisation in "Default values is"
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11509
2021-01-24 16:07:10 -08:00
Ryan Moeller bfb7b9613a ZTS: Use swapctl to list swap devices on FreeBSD
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11503
2021-01-24 16:07:05 -08:00
Arshad Hussain 3e33897bec vdev_id: Add error message when $CONFIG is missing
It was observed that vdev_id exists silently when
the $CONFIG file is missing.

This patch adds error message in case vdev_id is
called without default $CONFIG or '-c'. This makes
end user observe the exit message more easily.

Before Patch:
~~~~~~~~~~~~~
$ ./cmd/vdev_id/vdev_id
$

After Patch:
~~~~~~~~~~~~
$ ./cmd/vdev_id/vdev_id
Error: Config file "/etc/zfs/vdev_id.conf" not found
$

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #11498
2021-01-24 16:06:58 -08:00
Colm 0f3b928e85 Fix two minor lint errors (cppcheck)
Fix two minor errors reported by cppcheck:

In module/zfs/abd.c (abd_get_offset_impl), add non-NULL
assertion to prevent NULL dereference warning.

In module/zfs/arc.c (l2arc_write_buffers), change 'try'
variable to 'pass' to avoid C++ reserved word.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #11507
2021-01-24 16:06:33 -08:00
Alexander Motin 12fec4a147 Relax special_small_blocks assertion.
Follow up for commit 624222a, value asserted <= SPA_OLD_MAXBLOCKSIZE
instead of SPA_MAXBLOCKSIZE as it should be after the previous change.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11501
2021-01-24 16:06:16 -08:00
Ryan Moeller 7930a5ee65 FreeBSD: upstream changes to VFS interface
Set VIRF_MOUNTPOINT flag on snapshot mountpoint.

Authored-by: Mateusz Guzik <mjg@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11458
2021-01-24 16:06:02 -08:00
Matt Macy be94a3de0f FreeBSD: fix HEAD build, conditionally remove FDSYNC defines
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11458
2021-01-24 16:05:56 -08:00
Lorenz Hüdepohl db83b3abe9 dracut: Support /usr/bin as 'systemctl' path
On openSUSE the initrd has systemctl in /usr/bin, check this path as
well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Lorenz Hüdepohl <dev@stellardeath.org>
Closes #11487
2021-01-23 15:47:06 -08:00
Antonio Russo 55fd8ceab6 Install zgenhostid to sbindir
zgenhostid(8) is used to modify or create /etc/hostid.  This
administrative tool is currently installed to bindir.  System utilities
are typically placed in sbin.

Modify the installation directory for zgenhostid.  Additionally, track
this change in its use in dracut and the rpm installation.

Authored-by: наб <nabijaczleweli@nabijaczleweli.xyz>
Authored-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11485
2021-01-23 15:47:06 -08:00
sterlingjensen e269e1c07a Re-apply path sanitizer, as mount(8) still mangles it
Prior to util-linux 2.36.2, if a file or directory in the
current working directory was named 'dataset' then mount(8)
would prepend the current working directory to the dataset.

Eventually, we should be able to drop this workaround.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11295 
Closes #11462
2021-01-23 15:47:06 -08:00
Antonio Russo 81f981cd82 ZTS: avoid piping to special devices
As described in #11445, the kernel interface kernel_{read,write} no
longer act on special devices.  In the ZTS, zfs send and receive are
tested by piping to these devices, leading to spurious failures (for
positive tests) and may mask errors (for negative tests).

Until a more permanent mechanism to address this deficiency is
developed, clean up the output from the ZTS by avoiding directly piping
to or from /dev/null and /dev/zero.

For /dev/zero input, simply use a pipe: `cat </dev/zero |` .

However, for /dev/null output, the shell semantics for pipe failures
means that zfs send error codes will be masked by the successful
`| cat >/dev/null` command execution.  In that case, use a temporary
file under $TEST_BASE_DIR for output in favor.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11478
2021-01-23 15:47:06 -08:00
Antonio Russo 3790aa8176 ZTS: avoid race to unmount in zfs_rollback_001
The zfs_rollback_001 test modifies files in a temporary, test dataset
repeatedly.  Before each iteration, any preexisting dataset is removed,
after unmounted with umount -f, if necessary.

Add a short delay after the forced unmount, avoiding a race that can
prevent zfs destroy from succeeding, leading to a test failure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11451
2021-01-23 15:47:06 -08:00
Matthew Ahrens 4e1d1b4b92 assertion failed in arc_wait_for_eviction()
If the system is very low on memory (specifically,
`arc_free_memory() < arc_sys_free/2`, i.e. less than 1/16th of RAM
free), `arc_evict_state_impl()` will defer wakups.  In this case, the
arc_evict_waiter_t's remain on the list, even though `arc_evict_count`
has been incremented past their `aew_count`.

The problem is that `arc_wait_for_eviction()` assumes that if there are
waiters on the list, the count they are waiting for has not yet been
reached.  However, the deferred wakeups may violate this, causing
`ASSERT(last->aew_count > arc_evict_count)` to fail.

This commit resolves the issue by having new waiters use the greater of
`arc_evict_count` and the last `aew_count`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11285
Closes #11397
2021-01-23 15:47:06 -08:00
Matthew Macy c32c475363 FreeBSD: minor_t needs to be signed so that -1 is recognized as such
zfsdev_close sets zs_minor to -1 to avoid duplicate calls to
destroy. This doesn't mix well with the current u_int used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11437
2021-01-23 15:47:06 -08:00
Brian Behlendorf dd487640b2 Linux 5.10 compat: restore custom uio_prefaultpages()
As part of commit 1c2358c1 the custom uio_prefaultpages() code
was removed in favor of using the generic kernel provided
iov_iter_fault_in_readable() interface.  Unfortunately, it
turns out that up until the Linux 4.7 kernel the function would
only ever fault in the first iovec of the iov_iter.  The result
being uiomove_iov() may hang waiting for the page.

This commit effectively restores the custom uio_prefaultpages()
pages code for Linux 4.9 and earlier kernels which contain the
troublesome version of iov_iter_fault_in_readable().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11463 
Closes #11484
2021-01-22 09:58:49 -08:00
Attila Fülöp acb39fc9a4 ZTS: three small follow up fixes for #11167
Follow up fix for 0cb40fa3. Remove unused variables, don't source
unused libs and add missed cleanup.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11311
2021-01-08 15:37:56 -08:00
Attila Fülöp 6539bf71fe zpool: Dryrun fails to list some devices
`zpool create -n` fails to list cache and spare vdevs.
`zpool add -n` fails to list spare devices.
`zpool split -n` fails to list `special` and `dedup` labels.
`zpool add -n` and `zpool split -n` shouldn't list hole devices.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11122
Closes #11167
2021-01-08 15:37:56 -08:00
Brian Behlendorf 32a78e579d Tag 2.0.1
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-01-05 14:17:09 -08:00
Brian Behlendorf fb7aa3c9b2 Autoconf 2.70 compatibility
Several m4 macros have been retired in autoconf 2.70.  Update the
the build system to use the new macros provided to replace them.

* Replaced AC_HELP_STRING with AS_HELP_STRING.

* Replaced AC_TRY_COMPILE with AC_COMPILE_IFELSE/AC_LANG_PROGRAM.

* Replaced AC_CANONICAL_SYSTEM with AC_CANONICAL_TARGET

* Replaced AC_PROG_LIBTOOL with LT_INIT

* $CPP is not defined in ZFS_AC_KERNEL and really shouldn't be
  directly used like this.  Replace it with an $AWK command
  to extract the kernel source version.

Reviewed-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11413 
Closes #11419
2021-01-05 10:33:55 -08:00
Toomas Soome 4016ad705b zfs_mount_all_mountpoints: cleanup_all should leave pool root mounted
if pool root is not mounted, then zpool umount in next test will leave
dataset mountpoint directory around and next zfs mount -a will fail
with error: cannot mount '/testpool': directory is not empty

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11417
2021-01-05 10:33:48 -08:00
Konstantin Khorenko 59570a05d8 VZ 7 kernel compat: introduce ITER-enabled .direct_IO() via IOVECs
Virtuozzo 7 kernels starting 3.10.0-1127.18.2.vz7.163.46
have the following configuration:

  * no HAVE_VFS_RW_ITERATE
  * HAVE_VFS_DIRECT_IO_ITER_RW_OFFSET

=> let's add implementation of zpl_direct_IO() via
zpl_aio_{read,write}() in this case.

https://bugs.openvz.org/browse/OVZ-7243

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Closes #11410 
Closes #11411
2021-01-05 10:33:41 -08:00
Matthew Ahrens 88ae76a4e0 Memory leak in zdb:import_checkpointed_state()
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396
2021-01-05 10:32:46 -08:00
Matthew Ahrens 88d7da62b5 Memory leak in ztest_dmu_objset_own()
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396
2021-01-05 10:31:58 -08:00
Matthew Ahrens 6410ee4a94 Memory leak in ztest_vdev_attach_detach()
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396
2021-01-05 10:31:53 -08:00
Matthew Ahrens a296875da0 nvlist leaked in zpool_find_config()
In `zpool_find_config()`, the `pools` nvlist is leaked.  Part of it (a
sub-nvlist) is returned in `*configp`, but the callers also leak that.

Additionally, in `zdb.c:main()`, the `searchdirs` is leaked.

The leaks were detected by ASAN (`configure --enable-asan`).

This commit resolves the leaks.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396
2021-01-05 10:31:47 -08:00
Toomas Soome 921ec61b77 implicit conversion from 'boolean_t' to 'ds_hold_flags_t'
Build error on illumos with gcc 10 did reveal:

In function 'dmu_objset_refresh_ownership':
../../common/fs/zfs/dmu_objset.c:857:25: error: implicit conversion
from 'boolean_t' to 'ds_hold_flags_t' {aka 'enum ds_hold_flags'}
[-Werror=enum-conversion]
      857 |  dsl_dataset_disown(ds, decrypt, tag);
          |                         ^~~~~~~
cc1: all warnings being treated as errors

libzfs_input_check.c: In function 'zfs_ioc_input_tests':
libzfs_input_check.c:754:28: error: implicit conversion from
'enum dmu_objset_type' to 'enum lzc_dataset_type'
[-Werror=enum-conversion]
  754 |  err = lzc_create(dataset, DMU_OST_ZFS, NULL, NULL, 0);
      |                            ^~~~~~~~~~~
cc1: all warnings being treated as errors

The same issue is present in openzfs, and also the same issue about
ds_hold_flags_t, which currently defines exactly one valid value.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11406
2021-01-05 10:30:19 -08:00
Brian Behlendorf fcd9966ed9 Linux 5.11 compat: blk_{un}register_region()
As of 5.11 the blk_register_region() and blk_unregister_region()
functions have been retired. This isn't a problem since add_disk()
has implicitly allocated minor numbers for a very long time.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2021-01-05 10:26:39 -08:00
Brian Behlendorf a2621753b2 Linux 5.11 compat: revalidate_disk_size()
Both revalidate_disk_size() and revalidate_disk() have been removed.
Functionally this isn't a problem because we only relied on these
functions to call zvol_revalidate_disk() for us and to perform any
additional handling which might be needed for that kernel version.
When neither are available we know there's no additional handling
needed and we can directly call zvol_revalidate_disk().

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2021-01-05 10:26:32 -08:00
Brian Behlendorf e888f28988 Linux 5.11 compat: bdev_whole()
The bd_contains member was removed from the block_device structure.
Callers needing to determine if a vdev is a whole block device should
use the new bdev_whole() wrapper.  For older kernels we provide our
own bdev_whole() wrapper which relies on bd_contains for compatibility.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2021-01-05 10:26:25 -08:00
Brian Behlendorf 305510fd33 Linux 5.11 compat: bio_start_io_acct() / bio_end_io_acct()
The generic IO accounting functions have been removed in favor of the
bio_start_io_acct() and bio_end_io_acct() functions which provide a
better interface.  These new functions were introduced in the 5.8
kernels but it wasn't until the 5.11 kernel that the previous generic
IO accounting interfaces were removed.

This commit updates the blk_generic_*_io_acct() wrappers to provide
and interface similar to the updated kernel interface.  It's slightly
different because for older kernels we need to pass the request queue
as well as the bio.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2021-01-05 10:26:18 -08:00
Brian Behlendorf 67cff6e4c1 Linux 5.11 compat: lookup_bdev()
The lookup_bdev() function has been updated to require a dev_t
be passed as the second argument. This is actually pretty nice
since the major number stored in the dev_t was the only part we
were interested in. This allows to us avoid handling the bdev
entirely.  The vdev_lookup_bdev() wrapper was updated to emulate
the behavior of the new lookup_bdev() for all supported kernels.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2021-01-05 10:26:09 -08:00
Brian Behlendorf 944180ab50 Linux 5.11 compat: conftest
Update the ZFS_LINUX_TEST_PROGRAM macro to always set the module
license.  As of the 5.11 kernel not setting a license has been
converted from a warning to an error.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2021-01-05 10:25:59 -08:00
Ryan Moeller 3075b29ca8 dbufstat: Fix warnings with Python 3.8
Replace "is" with "==" and "is not" with "!=".

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11394
2020-12-23 15:11:01 -08:00
Brian Behlendorf e1d9228b08 Linux 5.10 compat: META
Increase the Linux-Maximum version in the META file to 5.10.
All of the required compatibility patches have been merged.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11391
2020-12-23 15:00:57 -08:00
Christian Schwarz afaf9414ff zfs-kmods: install to /lib/modules instead of /usr/lib/modules
Before this patch, dracut wouldn't find zfs.ko for inclusion in
initramfs. This was caused by the packages installing in to
/lib/modules instead of /usr/lib/modules.  Correcting this allows
dracut to do the right thing, even without

    # /etc/dracut.conf
    add_drivers+=" zfs "

Notably, rpm/redhat/zfs-kmod.spec.in does not contain the definition of
the `prefix` macro that this commit removes in the generic kmod spec.
And https://rpmfusion.org/Packaging/KernelModules/Kmods2 does not
mention `prefix` at all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11381
2020-12-23 14:35:47 -08:00
Andy Fiddaman cee725c9bd Dangling reference from dmu_objset_upgrade
After porting the fix for https://github.com/openzfs/zfs/issues/5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andy Fiddaman <andy@omnios.org>
Closes #11368
2020-12-23 14:35:47 -08:00
Brian Behlendorf 03ad94a3b2 Linux 4.18.0-257.el8 compat: blk_alloc_queue()
The CentOS stream 4.18.0-257 kernel appears to have backported
the Linux 5.9 change to make_request_fn and the associated API.
To maintain weak modules compatibility the original symbol was
retained and the new interface blk_alloc_queue_rh() was added.

Unfortunately, blk_alloc_queue() was replaced in the blkdev.h
header by blk_alloc_queue_bh() so there doesn't seem to be a way
to build new kmods against the old interfces.  Even though they
appear to still be available for weak module binding.

To accommodate this a configure check is added for the new _rh()
variant of the function and used if available.  If compatibility
code gets added to the kernel for the original blk_alloc_queue()
interface this should be fine.  OpenZFS will simply continue to
prefer the new interface and only fallback to blk_alloc_queue()
when blk_alloc_queue_rh() isn't available.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11374
2020-12-23 14:35:47 -08:00
Michael D Labriola 7a7e101437 Linux 5.10 compat: also zvol_revalidate_disk()
Commit 59b68723 added a configure check for 5.10, which removed
revalidate_disk(), and conditionally replaced it's usage with a call to
the new revalidate_disk_size() function.  However, the old function also
invoked the device's registered callback, in our case
zvol_revalidate_disk().  This commit adds a call to zvol_revalidate_disk()
in zvol_update_volsize() to make sure the code path stays the same.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Michael D Labriola <michael.d.labriola@gmail.com>
Closes #11358
2020-12-23 14:35:47 -08:00
Brian Behlendorf 401ba57ccd Fix maybe uninitialized variable warning
Commit 1c2358c12 restructured this code and introduced a warning
about the variable maybe not being initialized.  This cannot happen
with the updated code but we should initialize the variable anyway
to silence the warning.

    zpl_file.c: In function ‘zpl_iter_write’:
    zpl_file.c:324:9: warning: ‘count’ may be used uninitialized
        in this function [-Wmaybe-uninitialized]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11373
2020-12-23 14:35:47 -08:00
Brian Behlendorf 188950df9e Remove iov_iter_advance() from iter_read
There's no need to call iov_iter_advance() in zpl_iter_read().
This was preserved from the previous code where it wasn't needed
but also didn't cause any problems.  Now that the iter functions
also handle pipes that's no longer the case.  When fully reading a
pipe buffer iov_iter_advance() may results in the pipe buf release
function being called which will not be registered resulting in
a NULL dereference.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11375 
Closes #11378
2020-12-23 14:35:47 -08:00
Brian Behlendorf 58bc86c5cb Linux 5.10 compat: use iov_iter in uio structure
As of the 5.10 kernel the generic splice compatibility code has been
removed.  All filesystems are now responsible for registering a
->splice_read and ->splice_write callback to support this operation.

The good news is the VFS provided generic_file_splice_read() and
iter_file_splice_write() callbacks can be used provided the ->iter_read
and ->iter_write callback support pipes.  However, this is currently
not the case and only iovecs and bvecs (not pipes) are ever attached
to the uio structure.

This commit changes that by allowing full iov_iter structures to be
attached to uios.  Ever since the 4.9 kernel the iov_iter structure
has supported iovecs, kvecs, bvevs, and pipes so it's desirable to
pass the entire thing when possible.  In conjunction with this the
uio helper functions (i.e uiomove(), uiocopy(), etc) have been
updated to understand the new UIO_ITER type.

Note that using the kernel provided uio_iter interfaces allowed the
existing Linux specific uio handling code to be simplified.  When
there's no longer a need to support kernel's older than 4.9, then
it will be possible to remove the iovec and bvec members from the
uio structure and always use a uio_iter.  Until then we need to
maintain all of the existing types for older kernels.

Some additional refactoring and cleanup was included in this change:

- Added checks to configure to detect available iov_iter interfaces.
  Some are available all the way back to the 3.10 kernel and are used
  when available.  In particular, uio_prefaultpages() now always uses
  iov_iter_fault_in_readable() which is available for all supported
  kernels.

- The unused UIO_USERISPACE type has been removed.  It is no longer
  needed now that the uio_seg enum is platform specific.

- Moved zfs_uio.c from the zcommon.ko module to the Linux specific
  platform code for the zfs.ko module.  This gets it out of libzfs
  where it was never needed and keeps this Linux specific code out
  of the common sources.

- Removed unnecessary O_APPEND handling from zfs_iter_write(), this
  is redundant and O_APPEND is already handled in zfs_write();

NOTE: Cleanly applying this kernel compatibility change required
applying the following commits.  This makes the change larger than
it absolutely needs to be, but the resulting code matches what's in
the branch branch.  This is both more tested and makes it easier to
apply any future backports in this area.

7cf4cd824 Remove incorrect assertion
783be694f Reduce confusion in zfs_write
af5626ac2 Return EFAULT at the end of zfs_write() when set
cc1f85be8 Simplify offset and length limit in zfs_write
9585538d0 Const some unchanging variables in zfs_write
86e74dc16 Remove redundant oid parameter to update_pages
b3d723fb0 Factor uid, gid, and projid out of loop in zfs_write
3d40b6554 Share zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSD

Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11351
2020-12-23 14:35:39 -08:00
Brian Behlendorf 7cf4cd8246 Remove incorrect assertion
Commit 85703f6 added a new ASSERT to zfs_write() as part of the
cleanup which isn't correct in the case where multiple processes
are concurrently extending a file.  The `zp->z_size` is updated
atomically while holding a range lock on only a portion of the
file.  Therefore, it's possible for the file size to increase
after a same check is performed earlier in the loop causing this
ASSERT to fail.  The code itself handles this case correctly so
only the invalid ASSERT needs to be removed.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11235
2020-12-23 14:35:00 -08:00
Ryan Moeller 783be694f1 Reduce confusion in zfs_write
Is this block when abuf != NULL ever reached? Yes, it is.

Add asserts and comments to prove that when we get here, we have a full
block write at an aligned offset extending past EOF.

Simplify by removing the check that tx_bytes == max_blksz, since we can
assert that it is always true.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11191
2020-12-23 14:35:00 -08:00
Ryan Moeller af5626ac27 Return EFAULT at the end of zfs_write() when set
FreeBSD's VFS expects EFAULT from zfs_write() if we didn't complete
the full write so it can retry the operation.  Add some missing
SET_ERRORs in zfs_write().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11193
2020-12-23 14:35:00 -08:00
Ryan Moeller cc1f85be8b Simplify offset and length limit in zfs_write
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-12-23 14:34:59 -08:00
Ryan Moeller 9585538d0e Const some unchanging variables in zfs_write
Show that these values will not be changing later.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-12-23 14:34:59 -08:00
Ryan Moeller 86e74dc162 Remove redundant oid parameter to update_pages
The oid comes from the znode we are already passing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-12-23 14:34:59 -08:00
Ryan Moeller b3d723fb0e Factor uid, gid, and projid out of loop in zfs_write
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-12-23 14:34:59 -08:00
Matthew Macy 3d40b65540 Share zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSD
The zfs_fsync, zfs_read, and zfs_write function are almost identical
between Linux and FreeBSD.  With a little refactoring they can be
moved to the common code which is what is done by this commit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11078
2020-12-23 14:34:59 -08:00
Brian Behlendorf fa7b558bef ZTS: Simplify zpool_initialize_verify_initialized
Consider the test to be a success as long as the initializing pattern
is found at least once per metaslab.  This indicates that at least
part of the free space was initialized.  Ideally we'd check that the
pattern was written to all free space but that's much trickier so this
check is a reasonable compromise.

Using a here-string to feed the loop in this test causes an empty
string to still trigger the loop so we miss the `spacemaps=0` case.
Pipe into the loop instead.

While here, we can use `zpool wait -t initialize $TESTPOOL` to wait for
the pool to initialize.

Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11365
2020-12-23 14:34:59 -08:00
Matthew Ahrens a103ae446e special device removal space accounting fixes
The space in special devices is not included in spa_dspace (or
dsl_pool_adjustedsize(), or the zfs `available` property).  Therefore
there is always at least as much free space in the normal class, as
there is allocated in the special class(es).  And therefore, there is
always enough free space to remove a special device.

However, the checks for free space when removing special devices did not
take this into account.  This commit corrects that.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11329
2020-12-23 14:34:59 -08:00
sterlingjensen 2ab24dfded Use the correct return type for getopt
Use the correct return type for getopt otherwise clang complains
about tautological-constant-out-of-range-compare.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11359
2020-12-23 14:34:59 -08:00
gregory-lee-bartholomew 489633d99a DKMS: Disable weak modules
Fedora does not guarantee a stable kABI, so weak modules should be dis-
abled. See the dkms man page for a more detailed explanation of the weak
module feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #9891
Closes #11128
Closes #11242
Closes #11335
2020-12-23 14:34:59 -08:00
Ryan Libby ee49d9e02b lua: avoid gcc -Wreturn-local-addr bug
Avoid a bug with gcc's -Wreturn-local-addr warning with some
obfuscation.  In buggy versions of gcc, if a return value is an
expression that involves the address of a local variable, and even if
that address is legally converted to a non-pointer type, a warning may
be emitted and the value of the address may be replaced with zero.
Howerver, buggy versions don't emit the warning or replace the value
when simply returning a local variable of non-pointer type.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90737

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11337
2020-12-23 14:34:59 -08:00
Ryan Libby 1fbda9caee spa: avoid type narrowing warning
Building the spa module for i386 caused gcc to emit
-Wint-to-pointer-cast "cast to pointer from integer of different size"
because spa.spa_did was uint64_t but pthread_join (via thread_join in
spa_deactivate) takes a pointer (32-bit on i386).  Define spa_did to be
pointer-size instead.  For now spa_did is in fact never non-zero and the
thread_join could instead be ifdef'd out, but changing the size of
spa_did may be more useful for the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11336
2020-12-23 14:34:59 -08:00
Ryan Libby 42bdfd3b36 FreeBSD libzfs: gcc requires __thread after static
Building libzfs with gcc on FreeBSD failed because gcc is picky about
the order of keywords in declarations with __thread, whereas clang is
more relaxed.

https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11331
2020-12-23 14:34:59 -08:00
George Amanakis 900480bd96 Fix reporting of CKSUM errors in indirect vdevs
When removing and subsequently reattaching a vdev, CKSUM errors may
occur as vdev_indirect_read_all() reads from all children of a mirror
in case of a resilver.

Fix this by checking whether a child is missing the data and setting a
flag (ic_error) which is then checked in vdev_indirect_repair() and
suppresses incrementing the checksum counter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11277
2020-12-23 14:34:59 -08:00
Ryan Moeller 058b6fd069 arc_summary3: Handle overflowing value width
Some tunables shown by arc_summary3 have string values that may exceed
the normal line length, leaving a negative offset between the name and
value fields.  The negative space is of course not valid and Python
rightly barfs up an exception traceback.

Handle an overflowing value field width by ignoring the line length
and separating the name from the value by a single space instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11270
2020-12-23 14:34:59 -08:00
Ryan Moeller 8847b06bf6 FreeBSD: Implement sysctl for fletcher4 impl
There is a tunable to select the fletcher 4 checksum implementation on
Linux but it was not present in FreeBSD.

Implement the sysctl handler for FreeBSD and use ZFS_MODULE_PARAM_CALL
to provide the tunable on both platforms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11270
2020-12-23 14:34:59 -08:00
Paul Dagnelie 21adfb031c Fix kernel panic induced by redacted send
In the redaction list traversal code, there is a bug in the binary search
logic when looking for the resume point. Maxbufid can be decremented to -1,
causing us to read the last possible block of the object instead of the one we
wanted. This can cause incorrect resume behavior, or possibly even a hang in
some cases. In addition, when examining non-last blocks, we can treat the
block as being the same size as the last block, causing us to miss entries in
the redaction list when determining where to resume. Finally, we were ignoring
the case where the resume point was found in the buffer being searched, and
resuming from minbufid. All these issues have been corrected, and the code has
been significantly simplified to make future issues less likely.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11297
2020-12-23 14:34:59 -08:00
Ryan Moeller ae2cfdf8a7 FreeBSD: Fix format of vfs.zfs.arc_no_grow_shift
vfs.zfs.arc_no_grow_shift has an invalid type (15) and this causes
py-sysctl to format it as a bytearray when it should be an integer.

"U" is not a valid format, it should be "I" and the type should match
the variable type, int.  We can return EINVAL if the value is set below
zero.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11318
2020-12-23 14:34:59 -08:00
Ryan Moeller 20e4513c56 FreeBSD: Update usage of py-sysctl
py-sysctl now includes the CTLTYPE_NODE type nodes in the list returned
by sysctl.filter() on FreeBSD head.  It also provides descriptions now.

Eliminate the subprocess call to get descriptions, and filter out the
nodes so we only deal with values.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11318
2020-12-23 14:34:59 -08:00
Brian Behlendorf f217a2b902 Fix possibly uninitialized 'root_inode' variable warning
Resolve an uninitialized variable warning when compiling.

    In function ‘zfs_domount’:
    warning: ‘root_inode’ may be used uninitialized in this
        function [-Wmaybe-uninitialized]
    sb->s_root = d_make_root(root_inode);

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11306
2020-12-23 14:34:59 -08:00
George Melikov aa5b9e1d7c CI: add zloop workflow
Run ztest via zloop for 20 minutes, total run time is ~30 minutes.

Signed-off-by: George Melikov <mail@gmelikov.ru>
2020-12-23 14:34:59 -08:00
Ryan Moeller fb3ad5d24e FreeBSD: Do zcommon_init sooner to avoid FPU panic
There has been a panic affecting some system configurations where the
thread FPU context is disturbed during the fletcher 4 benchmarks,
leading to a panic at boot.

module_init() registers zcommon_init to run in the last subsystem
(SI_SUB_LAST).  Running it as soon as interrupts have been configured
(SI_SUB_INT_CONFIG_HOOKS) makes sure we have finished the benchmarks
before we start doing other things.

While it's not clear *how* the FPU context was being disturbed, this
does seem to avoid it.

Add a module_init_early() macro to run zcommon_init() at this earlier
point on FreeBSD.  On Linux this is defined as module_init().

Authored by: Konstantin Belousov <kib@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11302
2020-12-23 14:34:59 -08:00
Érico Nogueira Rolim de2ac3f700 mount_zfs: print strerror instead of errno for error reporting
Tracking down an error message with the errno value can be difficult,
using strerror makes the error message clearer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11303
2020-12-23 14:34:59 -08:00
sterlingjensen f9688b21d7 Drop path prefix workaround
Canonicalization, the source of the trouble, was disabled in 9000a9f.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11295
2020-12-23 14:34:59 -08:00
Orivej Desh fad85e52e5 Delete rw_semaphore.wait_lock configure check
Last use of wait_lock was removed in "Linux 5.3 compat: retire
rw_tryupgrade()" (e7a99dab2b).

Fixes the issue reported in
https://github.com/openzfs/zfs/issues/11097#issuecomment-714532367

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Orivej Desh <orivej@gmx.fr>
Closes #11309
2020-12-23 14:34:59 -08:00
Brian Behlendorf 038aaec1cd Fix optional "force" arg handing in zfs_ioc_pool_sync()
The fnvlist_lookup_boolean_value() function should not be used
to check the force argument since it's optional.  It may not be
provided or may have been created with the wrong flags.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11281
Closes #11284
2020-12-23 14:34:59 -08:00
George Melikov 94ca328fb3 CI: add new zfs-tests-sanity workflow
Run zfs-tests with sanity.run for brief results.  Timeouts
are rare, so minimize false positives by increasing the
default from 60 to 180 seconds.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11304
2020-12-23 14:34:59 -08:00
George Melikov 65c4c9a233 ZTS: zpool_trim tests throttle trim process
Otherwise trim may finish before progress checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11296
2020-12-23 14:34:59 -08:00
Brian Behlendorf 07ca433973 Reduce fletcher4 and raidz benchmark times
During module load time all of the available fetcher4 and raidz
implementations are benchmarked for a fixed amount of time to
determine the fastest available.  Manual testing has shown that this
time can be significantly reduced with negligible effect on the final
results.

This commit changes the benchmark time to 1ms which can reduce the
module load time by over a second on x86_64.  On an x86_64 system
with sse3, ssse3, and avx2 instructions the benchmark times are:

    Fletcher4    603ms   -> 15ms
    RAIDZ        1,322ms -> 64ms

Reviewed-by: Matthew Macy <mmacy@freebsd.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11282
2020-12-23 14:34:59 -08:00
Brian Behlendorf f21d1f8fad ZTS: adjust zpool_import_012_pos timeout
When running in the CI the zpool_import_012_pos test case occasionally
takes longer than the maximum 600 seconds.  When this happens the test
case is considered to have failed but always completes a few minutes
latter.  Since the logs suggest nothing has actually failed this commit
increases timeout and removes the exception.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11286
2020-12-23 14:34:59 -08:00
Brian Behlendorf ee8794195b ZTS: Update zfs_share_concurrent_shares.ksh
Occasionally an out of memory error is hit by this test case
when mounting the filesystems.  Try and reduce the likelihood
of this occurring by reducing the thread count from 100 to 50.
It also has the advantage of slightly speeding up the test.

    cannot mount 'testpool/testfs3/79': Cannot allocate memory
        filesystem successfully created, but not mounted

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11283
2020-12-23 14:34:59 -08:00
Brian Behlendorf 0ef4def852 Add sanity.run file
This run file contains a subset of functional tests which exercise
as much functionality as possible while still executing relatively
quickly.  The included tests should take no more than a few seconds
each to run at most.  This provides a convenient way to sanity test a
change before committing to a full test run which takes several hours.

    $ ./scripts/zfs-tests.sh -r sanity
    ...
    Results Summary
    PASS	 813

    Running Time:	00:14:42
    Percent passed:	100.0%

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11271
2020-12-23 14:34:51 -08:00
melak aa51adf0a2 Fix trivial typo in zfs-diff.8
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tamas TEVESZ <ice@extreme.hu>
Closes #11268 
Closes #11272
2020-12-23 13:09:32 -08:00
Alexander Motin 1d02bdee6c Fix for "Reduce latency effects of non-interactive I/O"
It was found that setting min_active tunables for non-interactive I/Os
makes them stuck.  It is caused by zfs_vdev_nia_delay, that can never
be reached if we never issue any I/Os due to min_active set to zero.

Fix this by issuing at least one non-interactive I/O at a time when
there are no interactive I/Os.  When there are interactive I/Os, zero
min_active allows to completely block any non-interactive I/O.  It may
min_active starvation in some scenarios, but who we are to deny foot
shooting?

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11261
2020-12-23 13:09:17 -08:00
Alexander Motin 2080c4f27e Reduce latency effects of non-interactive I/O
Investigating influence of scrub (especially sequential) on random read
latency I've noticed that on some HDDs single 4KB read may take up to 4
seconds!  Deeper investigation shown that many HDDs heavily prioritize
sequential reads even when those are submitted with queue depth of 1.

This patch addresses the latency from two sides:
 - by using _min_active queue depths for non-interactive requests while
   the interactive request(s) are active and few requests after;
 - by throttling it further if no interactive requests has completed
   while configured amount of non-interactive did.

While there, I've also modified vdev_queue_class_to_issue() to give
more chances to schedule at least _min_active requests to the lowest
priorities.  It should reduce starvation if several non-interactive
processes are running same time with some interactive and I think should
make possible setting of zfs_vdev_max_active to as low as 1.

I've benchmarked this change with 4KB random reads from ZVOL with 16KB
block size on newly written non-fragmented pool.  On fragmented pool I
also saw improvements, but not so dramatic.  Below are log2 histograms
of the random read latency in milliseconds for different devices:

4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before:
0, 0, 2,  1,  12,  21,  19,  18, 10, 15, 17, 21
after:
0, 0, 0, 24, 101, 195, 419, 250, 47,  4,  0,  0
, that means maximum latency reduction from 2s to 500ms.

4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before:
0, 0,  2,  31,  38,  28,  18,  12, 17, 20, 24, 10, 3
after:
0, 0, 55, 247, 455, 470, 412, 181, 36,  0,  0,  0, 0
, i.e. from 4s to 250ms.

1 SAS HDD SEAGATE ST14000NM0048 before:
0,  0,  29,   70, 107,   45,  27, 1, 0, 0, 1, 4, 19
after:
1, 29, 681, 1261, 676, 1633,  67, 1, 0, 0, 0, 0,  0
, i.e. from 4s to 125ms.

1 SAS SSD SEAGATE XS3840TE70014 before (microseconds):
0, 0, 0, 0, 0, 0, 0, 0,  70, 18343, 82548, 618
after:
0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844,  90

I've also measured scrub time during the test and on idle pools.  On
idle fragmented pool I've measured scrub getting few percent faster
due to use of QD3 instead of QD2 before.  On idle non-fragmented pool
I've measured no difference.  On busy non-fragmented pool I've measured
scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11166
2020-12-23 13:09:03 -08:00
qzdanis 49ba502f99 Add compatibility for busybox mktemp
Busybox's mktemp requires at least six X's in the template, causing
the current sed --in-place check to fail because the file does not
exist. This change adds additional X's to mktemp templates that do
not already have at least six X's in them.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Quentin Zdanis <zdanisq@gmail.com>
Closes #11269
2020-12-23 13:08:30 -08:00
Ryan Moeller 7735c9addf FreeBSD: notify userspace when a vdev is removed
This is needed for zfsd to autoreplace vdevs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11260
2020-12-23 13:08:12 -08:00
Andrew Sun ed02d603a1 Make zpool status "remove:" label print in bold
When ZFS_COLOR is set, zpool status shows row headings in bold,
except for the "remove:" heading. This is a quick fix that makes
it print in bold too.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Sun <me@andrewsun.com>
Closes #11255
2020-12-23 13:07:27 -08:00
George Melikov 529469769f CI: simplify checkstyle runner
Remove excess steps.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11262
2020-12-23 13:07:14 -08:00
Prawn 3b854534f0 ZED/zfs-list-cacher.sh: don't exit on ignored event type
Check for the history_event type instead.

The zfs-list-cacher.sh script currently respects the event types
excluded from syslog(!) in ZED_SYSLOG_SUBCLASS_EXCLUDE.
This makes little sense in this single-purpose script and
silently breaks when history_events are excluded from syslog,
which is the default since 13d65987a9.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #11164
Closes #11347
2020-12-19 18:01:21 -08:00
Brian Behlendorf dcbf847493 Tag 2.0.0
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-11-30 10:13:14 -08:00
Brian Behlendorf 2757204434 Verify zfs module loaded before starting services
Extend the change made in ae12b02 to verify the zfs kernel
modules are loaded to the rest of the OpenZFS services.  If
the modules aren't loaded the neither the share, volume, or
and zed services can be started.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11243
2020-11-30 09:44:08 -08:00
Đoàn Trần Công Danh 24a6f83847 dracut: use /bin/sh instead of bash as the intepreter
Despite that dracut has a hard dependency on bash,
its modules doesn't, dracut only has a hard dependency on bash for
module-setup (on a fully usable machine). Inside initramfs, dracut
allows users choose from a list of handful other shells, e.g. bash,
busybox, dash, mkfsh.

In fact, my local machine's initramfs is being built with dash,
and it's functional for a very long time.

Before 64025fa3a (Silence 'make checkbashisms', 2020-08-20), we also
allows our users to have that right, too.

Let's fix the problem 'make checkbashisms' reported and allows our users
to have that right, again.

For 'plymouth' case, let's simply run the command inside the if instead
of checking for the existence of command before running it, because the
status is also failture if plymouth is unavailable.

While we're at it, let's remove an unnecessary fork for grep in
zfs-generator.sh.in and its following complicated 'if elif fi' with
a simple 'case ... esac'.

To support this change, also exclude 90zfs from "make checkbashisms"
because the current CI infrastructure ships an old version of
"checkbashisms", which complains about "command -v", while the current
latest "checkbashisms" thinks it's fine. In the near future, we can
revert that change to "Makefile.am" when CI infrastructure is updated.

Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Closes #11244
2020-11-30 09:44:02 -08:00
Brian Behlendorf 2c36eb763f Revert "Reduce latency effects of non-interactive I/O"
Under certain conditions commit a3a4b8def appears to result in a
hang, or poor performance, when importing a pool.  Until the root
cause can be identified it has been reverted from the release branch.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11245
2020-11-30 09:43:09 -08:00
Brian Behlendorf a4ab0c607e Tag 2.0.0-rc7
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-11-25 09:18:29 -08:00
Alexander Motin a3a4b8def7 Reduce latency effects of non-interactive I/O
Investigating influence of scrub (especially sequential) on random read
latency I've noticed that on some HDDs single 4KB read may take up to 4
seconds!  Deeper investigation shown that many HDDs heavily prioritize
sequential reads even when those are submitted with queue depth of 1.

This patch addresses the latency from two sides:
 - by using _min_active queue depths for non-interactive requests while
   the interactive request(s) are active and few requests after;
 - by throttling it further if no interactive requests has completed
   while configured amount of non-interactive did.

While there, I've also modified vdev_queue_class_to_issue() to give
more chances to schedule at least _min_active requests to the lowest
priorities.  It should reduce starvation if several non-interactive
processes are running same time with some interactive and I think should
make possible setting of zfs_vdev_max_active to as low as 1.

I've benchmarked this change with 4KB random reads from ZVOL with 16KB
block size on newly written non-fragmented pool.  On fragmented pool I
also saw improvements, but not so dramatic.  Below are log2 histograms
of the random read latency in milliseconds for different devices:

4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before:
0, 0, 2,  1,  12,  21,  19,  18, 10, 15, 17, 21
after:
0, 0, 0, 24, 101, 195, 419, 250, 47,  4,  0,  0
, that means maximum latency reduction from 2s to 500ms.

4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before:
0, 0,  2,  31,  38,  28,  18,  12, 17, 20, 24, 10, 3
after:
0, 0, 55, 247, 455, 470, 412, 181, 36,  0,  0,  0, 0
, i.e. from 4s to 250ms.

1 SAS HDD SEAGATE ST14000NM0048 before:
0,  0,  29,   70, 107,   45,  27, 1, 0, 0, 1, 4, 19
after:
1, 29, 681, 1261, 676, 1633,  67, 1, 0, 0, 0, 0,  0
, i.e. from 4s to 125ms.

1 SAS SSD SEAGATE XS3840TE70014 before (microseconds):
0, 0, 0, 0, 0, 0, 0, 0,  70, 18343, 82548, 618
after:
0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844,  90

I've also measured scrub time during the test and on idle pools.  On
idle fragmented pool I've measured scrub getting few percent faster
due to use of QD3 instead of QD2 before.  On idle non-fragmented pool
I've measured no difference.  On busy non-fragmented pool I've measured
scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11166
2020-11-25 08:45:38 -08:00
cragw 14bdf57a99 pam_zfs_key: accommodate different dataset naming scheme
Name of dataset for user home directory may vary from the expected
$homes_prefix/$username, if different naming scheme is being used.

We can use property mountpoint to specify the dataset for $username
as long as its value is identical to passwd's pw_dir.

For example:
    NAME                       PROPERTY     VALUE
    rpool/home/myuser_123456   mountpoint   /home/myuser

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Crag Wang <crag0715@gmail.com>
Closes #11165
2020-11-25 08:42:37 -08:00
Matthew Macy 45061cc797 FreeBSD: decouple ZFS_DEBUG from kernel debug settings
Reviewed-by: Martelli Nikola @martellini
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11213
2020-11-25 08:42:26 -08:00
Brian Behlendorf cb01817f22 Obsolete earlier packages due to version bump
In order for package managers such as dnf to upgrade cleanly after
the package SONAME bump the obsolete package names must be known.
Update the new packages to correctly obsolete the old ones.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11230 
Closes #11233
2020-11-24 10:27:14 -08:00
Antonio Russo 1c4ccfb34e libzfsbootenv: do not depend on libnvpair
We do not build libnvpair.pc.  Moreover, it is automatically pulled in
by libzfs.pc, so no additional specific dependency is required.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11227
2020-11-24 10:27:14 -08:00
Brian Behlendorf 056287e3f7 Include the ABI with dist tarball
The ABI should be included when generating the `make dist` tarball
since it's required by the `make checkabi` target.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11225
2020-11-22 10:01:47 -08:00
Brian Behlendorf ccfa35c6f9 Correct missing zil_claim() DTL updates
Commit a1d477c2 accidentally disabled DTL updates for the zil_claim()
case described at the end of vdev_stat_update() by unconditionally
disabling all DTL updates when loading.  This was done to avoid
a deadlock on the vd_dtl_lock when loading the DTLs from disk.

    vdev_dtl_contains <--- Takes vd->vd_dtl_lock
    vdev_mirror_child_missing
    vdev_mirror_io_start
    zio_vdev_io_start
    __zio_execute
    arc_read
    dbuf_issue_final_prefetch
    dbuf_prefetch_impl
    dbuf_prefetch
    dmu_prefetch
    space_map_iterate
    space_map_load_length
    space_map_load
    vdev_dtl_load <--- Takes vd->vd_dtl_lock
    vdev_load
    spa_ld_load_vdev_metadata
    spa_tryimport

The missing DTL updates can be restored by moving the space_map_load()
call outside the vd_dtl_lock.  A private range tree is populated by
reading the space map and then merged in to the DTL_MISSING tree
under the lock.

Furthermore, the SPA_LOAD_NONE check in vdev_dtl_contains() leads to an
additional problem.  Any resilvering which occurs before SPA_LOAD_NONE
is set will incorrectly determine that there's nothing to repair.  This
can result in full redundancy not being restored for some blocks.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11218
2020-11-22 10:01:43 -08:00
Antonio Russo 8471f71132 Track SONAME version bump in packaging
RPM and DEB packages are named after the SONAME version of the library
they contain.  After bumping this version, the packaging should be
renamed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11219
2020-11-20 08:58:27 -08:00
Brian Behlendorf 813185d141 Enable ABI checks for the checkstyle workflow
For the OpenZFS 2.0 release branch extend the CI checkstyle
workflow to perform the library ABI checks.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11215
2020-11-18 09:32:49 -08:00
Brian Behlendorf be12087783 Add ABI snapshot
Add a snapshot of the OpenZFS 2.0 ABI using libabigail-1.7-2.
The included ABI passes `make checkabi` for CentOS 7, Fedora 33,
Debian 10, and Ubuntu 20.04.  This covers a fairly wide range
of glibc, gcc, and libabigail versions plus other changes which
are platform specific.

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11144
2020-11-17 20:48:53 +00:00
Antonio Russo 4f9014b70b Library ABI tracking with abigail
Provide two make targets: checkabi and storeabi.

storeabi uses libabigail to generate a reference copy of the ABI for the
public libraries.

checkabi compares such a reference to the compiled version, failing if
they are not compatible.  No ABI is generated for libzpool.so, it is
only used by ztest and zdb and not external consumers.

Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11144
2020-11-17 20:29:02 +00:00
Matthew Macy 043ef5c25e Fix problems in zvol_set_volmode_impl
- Don't leave fstrans set when passed a snapshot
- Don't remove minor if volmode already matches new value
- (FreeBSD) Wait for GEOM ops to complete before trying
  remove (at create time GEOM will be "tasting" in parallel)
- (FreeBSD) Don't leak zvol_state_lock on open if zv == NULL
- (FreeBSD) Don't try to unlock zv->zv_state lock if zv == NULL

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11199
2020-11-17 12:20:09 -08:00
наб c06118e0b1 zpool: correctly align columns with -p
zpool_expand_proplist() now ignores pl_fixed if its new literal
argument is true.  The rest is a consequence of needing to pass
that down.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiao?=~Dska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11202
2020-11-17 12:20:01 -08:00
наб 5f24bd11ee zpool(8): fix pool-wi[sd]e typo
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11202
2020-11-17 12:19:56 -08:00
loli10K f6f3089cf6 Fix 'zfs userspace' for received datasets in encrypted root
For encrypted receives, where user accounting is initially disabled on
creation, both 'zfs userspace' and 'zfs groupspace' fails with
EOPNOTSUPP: this is because dmu_objset_id_quota_upgrade_cb() forgets to
set OBJSET_FLAG_USERACCOUNTING_COMPLETE on the objset flags after a
successful dmu_objset_space_upgrade().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9501
Closes #9596
2020-11-17 12:19:51 -08:00
George Amanakis a09aeb9fc4 Fix ASSERT logic in l2arc_evict()
In case of cache device removal it is possible that at the end of
l2arc_evict() we have l2ad_hand = l2ad_evict. This can lead to the
following panic in case of a debug build:

VERIFY3(dev->l2ad_hand < dev->l2ad_evict) failed (321920512 < 321920512)
Call Trace:
 dump_stack+0x66/0x90
 spl_panic+0xef/0x117 [spl]
 l2arc_remove_vdev+0x11d/0x290 [zfs]
 spa_load_l2cache+0x275/0x5b0 [zfs]
 spa_vdev_remove+0x4a5/0x6e0 [zfs]
 zfs_ioc_vdev_remove+0x59/0xa0 [zfs]
 zfsdev_ioctl_common+0x5b3/0x630 [zfs]
 zfsdev_ioctl+0x53/0xe0 [zfs]
 do_vfs_ioctl+0x42e/0x6b0
 ksys_ioctl+0x5e/0x90
 do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

In case of cache device removal it also possible that l2ad_hand +
distance > l2ad_end since we do not iterate l2arc_evict() and l2ad_hand
is not reset. This has no functional consequence however as the cache
device is about to be removed.

Fix this by omitting the ASSERT in case of device removal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11205
2020-11-17 12:19:46 -08:00
Érico Rolim e4257ed76d config/dracut/90zfs: handle cases where hostid(1) returns all zeros
On systems with musl libc, hostid(1) always prints "00000000", which
will cause improper behavior when the 90zfs module is configured in a
dracut initramfs. Work around this by copying the host /etc/hostid if
the file exists, and otherwise only write /etc/hostid if hostid(1)
returns something meaningful. This avoids zgenhostid creating a random
/etc/hostid for the initramfs, which could lead to errors when trying to
import the pool if spl_hostid isn't defined in the kernel command line.

Furthermore, tag the /etc/hostid file as hostonly, since it is system
specific and shouldn't be taken into account when trying to use an
initramfs generated in one system to boot into a different system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Co-authored-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11174
Closes #11189
2020-11-17 12:19:42 -08:00
Érico Rolim c4a5e3b90f zgenhostid: accept hostid arguments equal to zero.
A common usage pattern for zgenhostid, including in the ZFS dracut
module, is running it as:

  zgenhostid $(hostid)

However, zgenhostid only accepted hostid arguments greater than 0, which
meant that, when the output of hostid(1) was "00000000", zgenhostid
would error out, even though 0 is a possible return value for the
gethostid(3) function used by hostid(1):

- On current musl libc, gethostid(3) is a stub that always returns 0.
- On glibc, gethostid(3) will return 0 if /etc/hostid exists but is
  smaller than 4 bytes.

In these cases, it makes more sense for zgenhostid to treat a value of 0
as other parts of the zfs codebase do, meaning that a hostid value
couldn't be determined; therefore, it should attempt to generate a
random value to write into /etc/hostid.

The manpage and usage output have been updated to reflect this.

Whitespace has also been fixed in the usage output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11174
Closes #11189
2020-11-17 12:19:33 -08:00
Brian Behlendorf d02fc15ba1 Linux: Fix ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP usage
The ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP macros should not be used
in the Linux zpl_*.c source files.  They return a positive error
value which is correct for the common code, but not for the Linux
specific kernel code which expects a negative return value.  The
ZPL_ENTER/ZPL_EXIT/ZPL_VERFY_ZP macros should be used instead.

Furthermore, the ZPL_EXIT macro has been updated to not call the
zfs_exit_fs() function.  This prevents a possible deadlock which
can occur when a snapshot is automatically unmounted because the
zpl_show_devname() must never wait on in progress automatic
snapshot unmounts.

Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11169 
Closes #11201
2020-11-14 10:51:27 -08:00
Matthew Ahrens 435dc4baab Assertion failure when logging large output of channel program
The output of ZFS channel programs is logged on-disk in the zpool
history, and printed by `zpool history -i`.  Channel programs can use
10MB of memory by default, and up to 100MB by using the `zfs program -m`
flag.  Therefore their output can be up to some fraction of 100MB.

In addition to being somewhat wasteful of the limited space reserved for
the pool history (which for large pools is 1GB), in extreme cases this
can result in a failure of `ASSERT(length <= DMU_MAX_ACCESS);` in
`dmu_buf_hold_array_by_dnode()`.

This commit limits the output size that will be logged to 1MB.  Larger
outputs will not be logged, instead a entry will be logged indicating
the size of the omitted output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11194
2020-11-14 10:51:21 -08:00
Brian Behlendorf 04177b9c3f Tag 2.0.0-rc6
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-11-12 11:02:58 -08:00
Matthew Ahrens 4a87c280dc Channel program may spuriously fail with "memory limit exhausted"
ZFS channel programs (invoked by `zfs program`) are executed in a LUA
sandbox with a limit on the amount of memory they can consume.  The
limit is 10MB by default, and can be raised to 100MB with the `-m` flag.
If the memory limit is exceeded, the LUA program exits and the command
fails with a message like `Channel program execution failed: Memory
limit exhausted.`

The LUA sandbox allocates memory with `vmem_alloc(KM_NOSLEEP)`, which
will fail if the requested memory is not immediately available.  In this
case, the program fails with the same message, `Memory limit exhausted`.
However, in this case the specified memory limit has not been reached,
and the memory may only be temporarily unavailable.

This commit changes the LUA memory allocator `zcp_lua_alloc()` to use
`vmem_alloc(KM_SLEEP)`, so that we won't spuriously fail when memory is
temporarily low.  Instead, we rely on the system to be able to free up
memory (e.g. by evicting from the ARC), and we assume that even at the
highest memory limit of 100MB, the channel program will not truly
exhaust the system's memory.

External-issue: DLPX-71924
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11190
2020-11-12 09:02:00 -08:00
Brian Behlendorf d237d9a918 Linux: Fix mount/unmount when dataset name has a space
The custom zpl_show_devname() helper should translate spaces in
to the octal escape sequence \040.  The getmntent(2) function
is aware of this convention and properly translates the escape
character back to a space when reading the fsname.

Without this change the `zfs mount` and `zfs unmount` commands
incorrectly detect when a dataset with a name containing spaces
is mounted.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11182
Closes #11187
2020-11-12 09:01:55 -08:00
Tony Perkins 2132ae465d Start snapdir_iterate traversals to begin wtih the value of zero.
The microzap hash can sometimes be zero for single digit snapnames.
The zap cursor can then have a serialized value of two (for . and ..),
and skip the first entry in the avl tree for the .zfs/snapshot directory
listing, and therefore does not return all snapshots.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: Tony Perkins <tperkins@datto.com>
Closes #11039
2020-11-12 09:01:27 -08:00
Mateusz Guzik 87f01fc158 G/C data_alloc_arena
It is a leftover from illumos always set to NULL and introducing a
spurious difference between zio_buf and zio_data_buf.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11188
2020-11-11 18:46:22 -08:00
Mateusz Guzik 995b80fa3a G/C struct znode -> z_moved
The field is yet another leftover from unsupported zfs_znode_move.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11186
2020-11-11 11:40:15 -08:00
Adrian Chadd d842f99c6b Fix compiling on FreeBSD + gcc - don't assume illmnos bits
This looks like it was once from the illumnos compat code.
FreeBSD doesn't have cmn_err as a compiler format attribute, so
it definitely errors out.

It doesn't show up on LLVM because it doesn't trigger at all.

Add in the format flags but keep them behind #if 0 for now;
there are too many format issues that trigger when one does
format checking in the shared code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes #11068
Closes #11069
2020-11-11 11:09:50 -08:00
Adrian Chadd 8cad25a39c Fix pointer-is-uint64_t-sized assumption in the ioctl path
This shows up when compiling freebsd-head on amd64 using gcc-6.4.
The lib32 compat build ends up tripping over this assumption.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes #11068
Closes #11069
2020-11-11 11:09:41 -08:00
sterlingjensen c00bb5f4ea Fix memleak in cmd/mount_zfs.c
Convert dynamic allocation to static buffer, simplify parse_dataset
function return path. Add tests specific to the mount helper.

Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11098
2020-11-11 11:09:31 -08:00
наб d33cbbbf93 zpoolprops.8: clarify vdev expansion rules
Remove reference to EFI(?), explain that the new space
is beyond the GPT for whole-disk vdevs, and add section noting how it
behaves with partition vdevs in terms of how the user is most likely to
encounter it ‒ the previous phrasing was confusing
and seemed to indicate that "zpool online -e" will be able to claim

  GPT[whatever, ZFS, free space, whatever]

into

  GPT[whatever, ZFS, whatever]
but that's not the case, as it'll only be able to do so after manually
resizing the ZFS partition to include the free space beforehand, i.e.:
  GPT[whatever, ZFS, free space, whatever]
  GPT[whatever, [ZFS + free space], potentially left-overs, whatever]
  # zpool online -e
  GPT[whatever, ZFS, whatever]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11158
2020-11-11 11:08:57 -08:00
Pavel Zakharov a4efa59a94 initramfs: zfsunlock hook breaks /usr/bin
The copy_exec() function expects that the full path of the target
file is passed rather than just the directory, and will take care
of creating the underlying directories if they don't exist.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Closes #11162
2020-11-11 11:07:40 -08:00
Ryan Moeller cb4d3fb737 FreeBSD: Simplify zvol_geom_open and zvol_cdev_open
We can consolidate the unlocking procedure into one place by starting
with drop_suspend set to B_FALSE and moving the open count check up.

While here, a little code cleanup. Match the out labels between
zvol_geom_open and zvol_cdev_open, and add a missing period in some
comments.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11175
2020-11-11 11:07:40 -08:00
Ryan Moeller 4a2e9811e9 FreeBSD: Avoid spurious EINTR in zvol_cdev_open
zvol_first_open can fail with EINTR if spa_namespace_lock is not held
and cannot be taken without waiting.

Apply the same logic that was done for zvol_geom_open to take
spa_namespace_lock if not already held on first open in zvol_cdev_open.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11175
2020-11-11 11:07:40 -08:00
Alexander Motin 050dfc5045 Fix dmu_tx_dirty_throttle after arc_c reduction
After initial arc_c was reduced to arc_c_min it became possible that
on datasets with primarycache=metadata or none dirty data make up most
of ARC capacity and easily more than configured 50% of initial arc_c,
that causes forced txg commits by arc_tempreserve_space() and periodic
very long write delays.

This patch makes arc_tempreserve_space() to use arc_c only after ARC
warmed up once and arc_c really means something, but use arc_c_max
before that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11178
2020-11-11 11:03:43 -08:00
Matthew Macy b49118220c Fix dnode refcount tracking
Fix a couple of places where the wrong tag is passed
to dnode_{hold, rele}

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11184
2020-11-11 11:03:24 -08:00
Ryan Moeller 806dda56ce ZTS: Add L1 corruption test
Add a new test case which corrupts all level 1 block in a file.
Then verifies that corruption is detected and repaired.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141
2020-11-11 11:03:02 -08:00
Ryan Moeller 97f5cfea77 ZTS: Output all block copies in list_file_blocks
The second part of list_file_blocks transforms the object description
output by zdb -ddddd $ds $objnum into a stream of lines of the form
"level path offset length" for the indirect blocks in the given file.
The current code only works for the first copy of L0 blocks.  L1 and
L2 indirect blocks have more than one copy on disk.

Add one more -d to the zdb command so we get all block copies and
rewrite the transformation to match more than L0 and output all DVAs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141
2020-11-11 11:01:42 -08:00
Ryan Moeller 5ecbea67eb ZTS: Fix list_file_blocks for mirror vdevs, level > 0
The first part of list_file_blocks transforms the pool configuration
output by zdb -C $pool into shell code to set up a shell variable,
VDEV_MAP, that maps from vdev id to the underlying vdev path. This
variable is a simple indexed array. However, the vdev id in a DVA is
only the id of the top level vdev.

When the pool is mirrored, the top level vdev is a mirror and its
children are the mirrored devices. So, what we need is to map from
the top level vdev id to a list of the underlying vdev paths.
ist_file_blocks does not need to work for raidz vdevs, so we can
disregard that case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141
2020-11-11 11:01:11 -08:00
Mariusz Zaborski 957b4e9fbd FreeBSD: Prevent a NULL reference in zvol_cdev_open
Check if the ZVOL has been written before calling zil_async_to_sync.
The ZIL will be opened on the first write, not earlier.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #11152
2020-11-11 11:00:31 -08:00
khng300 ef648fec0e FreeBSD: Prevent NULL pointer dereference of resid
spa_config_load() passes NULL into resid when doing zfs_file_read().
This would trip over when vfs.zfs.autoimport_disable=0.

Sponsored by: The FreeBSD Foundation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ka Ho Ng <khng@freebsdfoundation.org>
Closes #11149
2020-11-11 11:00:19 -08:00
Brian Behlendorf e518548e17 Tag 2.0.0-rc5
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-11-03 10:25:31 -08:00
Antonio Russo 1f442afa41 Synchronize library ABI levels
Bump library SOVERSION under Linux to match FreeBSD's.

Additionally, this bump properly accounts for the ABI changes relative
to ZoL 0.8.5 for the Linux build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Issue #11144
2020-11-03 09:51:53 -08:00
Ryan Moeller 62d549d757 FreeBSD: zvol_os: Use SET_ERROR more judiciously
SET_ERROR is useful to trace errors, so use it where the errors occur
rather than factored out to the end of a function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11146
2020-11-03 09:51:49 -08:00
Brian Behlendorf ab9011e79b ZTS: zdb_block_size_histogram increase variance
The expected variance for this test case was originally set at 10%
based on local testing.  Additional testing via the CI has show it
can be as large as 11%.  Increase the expected maximum to 12% to
prevent this test from incorrectly failing.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11148
2020-11-03 09:51:42 -08:00
Brian Behlendorf b42f36f0b0 ZTS: Wait on all events in events_001_pos.ksh
The events_001_pos.ksh test case can fail because it's possible,
and correct, for the config_sync event to be posted after the last
"expected" event.  To accommodate this the run_and_verify() function
has been updated to wait for all non-history events, not just the
last event.  This does not increase the run time of the test as
long as all the events do get generated.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11147
2020-11-03 09:51:37 -08:00
Coleman Kane a30fed54f4 Linux 5.10 compat: revalidate_disk_size() added
A new function was added named revalidate_disk_size() and the old
revalidate_disk() appears to have been deprecated. As the only ZFS
code that calls this function is zvol_update_volsize, swapping the
old function call out for the new one should be all that is required.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11085
2020-11-03 09:51:31 -08:00
Coleman Kane e767b1cacc Linux 5.10 compat: check_disk_change() removed
Kernel 5.10 removed check_disk_change() in favor of callers using
the faster bdev_check_media_change() instead, and explicitly forcing
bdev revalidation when they desire that behavior. To preserve prior
behavior, I have wrapped this into a zfs_check_media_change() macro
that calls an inline function for the new API that mimics the old
behavior when check_disk_change() doesn't exist, and just calls
check_disk_change() if it exists.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11085
2020-11-03 09:51:26 -08:00
Coleman Kane d2090becab Linux 5.10 compat: percpu_ref added data member
Kernel commit 2b0d3d3e4fcfb brought in some changes to the struct
percpu_ref structure that moves most of its fields into a member
struct named "data" of type struct percpu_ref_data. This includes
the "count" member which is updated by vdev_blkg_tryget(), so update
this function to chase the API change, and detect it via configure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11085
2020-11-03 09:51:20 -08:00
Brian Behlendorf 54f10674f3 Linux 5.10 compat: frame.h renamed objtool.h
In Linux 5.10 the linux/frame.h header was renamed linux/objtool.h.
Add a configure check to detect and use the correctly named header.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11085
2020-11-03 09:51:15 -08:00
Sebastian Gottschall f8460e7e62 Optimize locking checks in mempool allocator
Avoid checking the whole array of objects each time by removing the self
organized memory reaping. this can be managed by the global memory reap
callback which is called every 60 seconds. this will reduce the use if
locking operations significant.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Closes #11126
2020-11-03 09:51:10 -08:00
Ryan Moeller 45479eb1de Remove duplicate cond_resched() definition
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11131
2020-11-03 09:50:52 -08:00
Ryan Moeller aaeffd09bf zvol_os: Fix handling of zvol private data
zvol private data is supposed to be nulled by zvol_clear_private before
zvol_free is called as an indicator that the zvol is going away.

Implement zvol_clear_private for volmode=dev.

Assert that zvol_clear_private has been called before zvol_free.

Check that zvol_clear_private has not been called when updating
volsize.  If it has, fail with ENXIO.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 16:06:54 -07:00
Ryan Moeller 0c270bb6c4 zvol_os: Don't leak doi in cdev error path
Make sure to free doi in zvol_create_minor impl when make_dev_s fails.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 16:06:49 -07:00
Ryan Moeller c2c643256c zvol_os: Properly ignore error in volmode lookup
We fall back to a default volmode and continue when looking up a zvol's
volmode property fails.  After this we should set the error to 0 to
ensure we take the success paths in the out section.

While here, make sure we only log that the zvol was created on success.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 16:06:42 -07:00
Ryan Moeller 896d0f0906 zvol_os: Code cleanup in zvol_create_minor_impl
Nonfunctional changes for readability and consistency.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 16:06:36 -07:00
Ryan Moeller ef525e0841 zvol_os: Keep better track of open count in close
zvol_geom_close gets a count of the number of close operations to do.

Make sure we're always using this count to check if this will be the
last close operation performed on the zvol.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 16:06:30 -07:00
Ryan Moeller 00a27515f0 zvol_os: Tidy up asserts
Using more specific assert variants gives better messages on failure.

No functional change.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 16:06:24 -07:00
Mateusz Guzik 52f1ef3b2d zstd: track allocator statistics
Note that this only tracks sizes as requested by the caller.
Actual allocated space will almost always be bigger (e.g., rounded up to
the next power of 2 or page size). Additionally the allocated buffer may
be holding other areas hostage. Nonetheless, this is a starting point
for tracking memory usage in zstd.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11129
2020-10-30 16:06:15 -07:00
Attila Fülöp c6b0680d9b ICP: gcm: Allocate hash subkey table separately
While evaluating other assembler implementations it turns out that
the precomputed hash subkey tables vary in size, from 8*16 bytes
(avx2/avx512) up to 48*16 bytes (avx512-vaes), depending on the
implementation.

To be able to handle the size differences later, allocate
`gcm_Htable` dynamically rather then having a fixed size array, and
adapt consumers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11102
2020-10-30 16:06:09 -07:00
Attila Fülöp 2c37e1416b Add some missing cfi frame info in aesni-gcm-x86_64.S
While preparing #9749 some .cfi_{start,end}proc directives
were missed. Add the missing ones.

See upstream https://github.com/openssl/openssl/commit/275a048f

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11101
2020-10-30 16:06:00 -07:00
Mateusz Guzik 6e4845aee3 FreeBSD: catch up with 1300124 version bump
- use cache_vop_mkdir
- cache_rename -> cache_vop_rename

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11136
2020-10-30 16:05:18 -07:00
Ryan Moeller 48cf7d674a FreeBSD: Fix 12.2-STABLE after AT_BENEATH MFC
AT_BENEATH was merged to stable/12, where kern_unlinkat takes a
non-const path.  DECONST the path passed to kern_unlinkat in the
case where AT_BENEATH is defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11139
2020-10-30 16:05:10 -07:00
Alexander Motin ca54e52122 Yield periodically when rebuilding L2ARC
L2ARC devices of several terabytes filled with 4KB blocks may take 15
minutes to rebuild.  Due to the way L2ARC log reading is implemented
it is quite likely that for all that time rebuild thread will never
sleep.  At least on FreeBSD kernel threads have absolute priority and
can not be preempted by threads with lower priorities.  If some thread
is also bound to that specific CPU it may not get any CPU time for all
the 15 minutes.

Reviewed-by: Cedric Berger <cedric@precidata.com>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11116
2020-10-30 16:04:53 -07:00
Ryan Moeller c3ae9321bf Update references to nonexistent man pages in code
Refer to the correct section or alternative for FreeBSD and Linux.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11132
2020-10-30 16:04:41 -07:00
Alexander Motin 2dd2e49cc7 FreeBSD: Remove BIO_ORDERED flag from BIO_FLUSH
ZFS always waits for the write completion before flushing the cache.
That is why it does not require explicit ordering fences around it,
which are pretty difficult to implement for NVMe, since one has no
internal concept of strict request ordering.

This was already removed from FreeBSD once, but got resurrected
by mistake during OpenZFS merge.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11130
2020-10-30 16:04:32 -07:00
Tony Hutter 8a1b26eb54 ZTS: Fix xattr_004_pos failure, don't use tmpfs
Previously, xattr_004_pos would create files with xattrs on both
tmpfs and ext2, and then copy them to zfs to verify that their
xattrs were preserved.  However tmpfs doesn't support xattrs.

This was never noticed until Fedora 33.  In Fedora 32 and older,
/tmp was on the root partition (like ext4), whereas on Fedora 33
/tmp is actually tmpfs.  That caused this test to fail on Fedora 33.

This fix updates the test to only create the file on ext2, not tmpfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11133
2020-10-30 16:04:16 -07:00
Mateusz Guzik 7e76d21bc8 Linux: g/c leftover fence in zfs_znode_alloc
The port removed provisions for zfs_znode_move but the cleanup missed
this bit. To quote the original:

[snip]
    list_insert_tail(&zfsvfs->z_all_znodes, zp);
    membar_producer();
    /*
     * Everything else must be valid before assigning z_zfsvfs makes the
     * znode eligible for zfs_znode_move().
     */
    zp->z_zfsvfs = zfsvfs;
[/snip]

In the current code it is immediately followed by unlock which issues
the same fence, thus plays no role in correctness.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11115
2020-10-30 16:04:05 -07:00
Mateusz Guzik e579a4ed0f FreeBSD: g/c unused zfs_znode_move support
The allocator does not provide the functionality to begin with.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11114
2020-10-30 16:03:58 -07:00
Brian Behlendorf 6867d00403 Use known license string for zlua
The Linux kernel MODULE_LICENSE macro only recognizes a handful of
license strings and "MIT" is not one of the them.  Update the macro
to use "Dual MIT/GPL" which is recognized and what the kernel expects
MIT licensed modules to use.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11112
Closes #11113
2020-10-30 16:03:37 -07:00
Ryan Moeller 0dc6fb730f FreeBSD: Skip RAW kstat sysctls by default
These kstats are often expensive to compute so we want to avoid them
unless specifically requested.

The following kstats are affected by this change:

kstat.zfs.${pool}.multihost
kstat.zfs.${pool}.misc.state
kstat.zfs.${pool}.txgs
kstat.zfs.misc.fletcher_4_bench
kstat.zfs.misc.vdev_raidz_bench
kstat.zfs.misc.dbufs
kstat.zfs.misc.dbgmsg

In FreeBSD 13, sysctl(8) has been updated to still list the
names/description/type of skipped sysctls so they are still
discoverable.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11099
2020-10-30 16:03:22 -07:00
Mateusz Guzik f5bffd3748 FreeBSD: catch up with 1300123 version bump
- removed thread argument from VOP_INACTIVE
- removed cred argument from VOP_VPTOCNP

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11104
2020-10-30 16:03:13 -07:00
Cy Schubert 07c7899a37 Restore identification of VDEVs using non-native block size
NAME         STATE     READ WRITE CKSUM
dsk02        ONLINE       0     0     0
  mirror-0   ONLINE       0     0     0
    ada1s4a  ONLINE       0     0     0
    ada2s4a  ONLINE       0     0     0  block size: 512B configured, 4096B native

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Toomas Soome <tsoome@me.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed off by: Cy Schubert <cy@FreeBSD.org>
Closes #11088
2020-10-30 16:02:58 -07:00
xtouqh 79bfba2fa8 Properly format NAME subsection of zfs/zpool subcommands
Use proper names (i.e. zfs-allow and zpool-add) in NAME subsections
of zfs/zpool subcommands instead of current "pretty-printed" ones as
makewhatis utilities (or some implementations of it, namely the one
from mandoc suite used in FreeBSD) look not only at the document title
but also in NAME subsection, adding zfs(8)/zpool(8) to search results
which is not correct. (Common sense and other utilities splitting
subcommands in multiple man pages, e.g. git, do the same.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: xtouqh <xtouqh@hotmail.com>
Closes #11086
2020-10-30 16:02:38 -07:00
Ryan Moeller 73511e3dde Add missing zfs_arc_evict_batch_limit tunable
It's even documented already.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11094
2020-10-30 16:02:24 -07:00
Ryan Moeller 3b79394bc9 arcstat: Add -a and -p options from FreeNAS
Added -a option to automatically print all valid statistics.
Added -p option to suppress scaling of printed data.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored by: Nick Principe <32284693+powernap@users.noreply.github.com>
Ported-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11090
2020-10-30 16:02:11 -07:00
Kyle Evans 4df31aa98c Makefile.bsd: remove directory that no longer exists
This was removed in a reorganization of directories preparing for the
merge of FreeBSD support, 006e9a4088 by mmacy. While llvm is perfectly
happy with the nonexistent -I directory, the gcc6 and gcc9 we can elect
to use as cross-toolchains both trip over it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #11077
2020-10-30 15:57:46 -07:00
Matthew Macy aeeada355c FreeBSD: delete unreferenced file
zfs_onexit_os.c was not deleted when it was removed from the build

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11079
2020-10-30 15:57:15 -07:00
Ryan Moeller 0905a4fe9b Fix commitcheck on FreeBSD
Convert from bash to sh, avoid Perl regexes and \s, prune unused
functions.

Reviewed-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11070
2020-10-30 15:57:03 -07:00
Kjeld Schouten-Lebbing 3ba6774e58 Update issue templates, commitcheck and Contributing.md
- Removes OpenZFS ports from commit check
- Removes OpenZFS ports from CONTRIBUTING.md
- Adds mailings lists and IRC to issue template selector
- Remove blank issue option from issue creator

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10965
2020-10-30 15:56:58 -07:00
Brian Behlendorf 26e9c479b5 Tag 2.0.0-rc4
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-10-19 11:28:28 -07:00
Don Brady db75854cbb zed syslog entries drop important info
ZED will log zevents summaries to the syslog, however the log entries 
tend to drop event details that can be useful for diagnosis. This is 
especially true for ereport events, like io, checksum, and delay.

Update the all-syslog.sh script to log additional event information.

Add an optional config option, ZED_SYSLOG_DISPLAY_GUIDS, to zed.rc
for choosing GUIDs over names for pool and vdev.

Change the default ZED_SYSLOG_SUBCLASS_EXCLUDE to exclude history_event 
events. These events tend to be frequent, convey no meaningful info, 
and are already logged in the zpool history.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #10967
2020-10-19 11:24:52 -07:00
Mateusz Guzik bd565f3e24 FreeBSD: add missing fplookup_vexec handler to special vop vectors
Otherwise lookup can fail with EOPNOTSUPP or panic.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11066
2020-10-16 13:05:43 -07:00
Mateusz Guzik 3c4e580e9a FreeBSD: g/c unused vop vector zfsctl_ops_shares_dir
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11066
2020-10-16 13:05:39 -07:00
Don Brady b3f4436d37 Ignore special vdev ashift for spa ashift min/max
The removal of a vdev in the normal class would fail if there was a
special or deup vdev that had a different ashift than the vdevs in
the normal class.

Moved the initialization of spa_min_ashift / spa_max_ashift from
vdev_open so that it occurs after the vdev allocation bias was
initialized (i.e. after vdev_load).

Caveat -- In order to remove a special/dedup vdev it must have the
same ashift as the normal pool vdevs.  This could perhaps be lifted
in the future (i.e. for the case where there is ample space in any
surviving special class vdevs)

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #9363
Closes #9364
Closes #11053
2020-10-16 13:05:34 -07:00
Christian Schwarz 05f8be3b49 Fix crash caused by invalid snapshot names in redactnvl
This is a follow up fix for commit 0fdd6106bb.  The VERIFY is
only true when we haven't hit an error code path.  See added
test case for a reproducer.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11048
2020-10-16 13:05:28 -07:00
Paul Dagnelie d8091c9294 Fix incorrect deletion order in range_tree_add_impl gap case
After a side-effectful call like add or remove, references to range
segs stored in btrees can no longer be used safely.  We move the
remove call to just before the reinsertion call so that the seg
remains valid for as long as we need it.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11044
Closes #11056
2020-10-16 13:05:23 -07:00
Mateusz Guzik 05613fa7a3 FreeBSD: fix panic due to tqid overflow
The 32-bit counter eventually wraps to 0 which is a sentinel for invalid
id.

Make it 64-bit on LP64 platforms and 0-check otherwise.

Note: Linux counterpart uses id stored per queue instead of a global.
I did not check going that way is feasible with the goal being the
minimal fix doing the job.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11059
2020-10-16 13:05:18 -07:00
Ryan Moeller 725c9e22ca Cross-platform acltype
The acltype property is currently hidden on FreeBSD and does not
reflect the NFSv4 style ZFS ACLs used on the platform.  This makes it
difficult to observe that a pool imported from FreeBSD on Linux has a
different type of ACL that is being ignored, and vice versa.

Add an nfsv4 acltype and expose the property on FreeBSD.

Make the default acltype nfsv4 on FreeBSD.

Setting acltype to an unhanded style is treated the same as setting
it to off.  The ACLs will not be removed, but they will be ignored.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10520
2020-10-16 13:05:00 -07:00
Warner Losh fbfc7e843a FreeBSD: make adjustments for the standalone environment
In FreeBSD, there are three compile environments that are supported:
user land, the kernel and the bootloader / standalone. Adjust the
headers to compile in the standalone environment. Limit kernel-only
items from view when _STANDALONE is defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Warner Losh <imp@FreeBSD.org>
Closes #10998
2020-10-16 13:04:41 -07:00
Warner Losh faa62966b1 aarch64: Use proper guards for NEON instructions
The zstd code assumes that if you are on aarch64, you have NEON
instructions. This is not necessarily true. In a boot loader, where
you might not have the VFP properly initialized, these instructions
may not be available. It's also an error to include arm_neon.h when
the NEON insturctions aren't enabled. Change the guards for using the
NEON instructions from __aarch64__ to __ARM_NEON which is the standard
symbol for knowing if they are available.

__ARM_NEON is the proper symbol, defined in ARM C Language Extensions
Release 2.1 (https://developer.arm.com/documentation/ihi0053/d/). Some
sources suggest __ARM_NEON__, but that's the obsolete spelling from
prior versions of the standard.

Updated based on zstd pull request https://github.com/facebook/zstd/pull/2356

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@bsdimp.com>
Closes #11055
2020-10-16 13:03:13 -07:00
Christian Schwarz be28cdd1c3 dmu.h: remove stale declaration dmu_objset_snapshot_tmp
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11047
2020-10-16 13:03:05 -07:00
Mateusz Guzik 7f0b3fa042 FreeBSD: use cache_rename if available
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11045
2020-10-16 13:03:00 -07:00
Mathieu Velten f40a1ad9e0 blkg_tryget config test: initialize struct
Missing struct initialization in a config test results in the
interface being incorrectly detected.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Mathieu Velten <matmaul@gmail.com>
Closes #10713
Closes #11049
2020-10-16 13:02:55 -07:00
Kjeld Schouten-Lebbing 9cf33c99fc Increase Supported Linux Kernel to 5.9
This increases the Linux kernel version to 5.9 from 5.8
as most compatibility fixes should already be included.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #11050
2020-10-16 13:02:50 -07:00
Ryan Moeller a51019f4ec FreeBSD: Improve libzfs_error_init messages
It is a common mistake to have failed to autoload the module due to
permission issues when running a ZFS command as a user.  "Operation
not permitted" is an unhelpfully vague error message.

Use a thread-local message buffer to format a nicer error message.
We can infer that loading the kernel module failed if the module is
not loaded.  This can be extended with heuristics for other errors
in the future.

While looking at this stuff, remove an unused thread-local message
buffer found in libspl and remove some inaccurate verbiage from the
comment on libzfs_load_module.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11033
2020-10-16 13:02:46 -07:00
Ryan Moeller c71847b77b Expose zfetch_max_idistance tunable
FreeBSD had this value tunable before the switch to the new OpenZFS.
The tunable name has changed, breaking legacy compat.

Restore legacy compat for this tunable, properly expose the tunable
with the new name on all platforms, and document it in
zfs-module-parameters(5).

While here, clean up the documentation for zfetch_max_distance a bit.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11038
2020-10-16 13:02:39 -07:00
Christian Schwarz 5c6d3c21b1 zil_parse: make callback parameters const
Code cleanup, a follow up commit to 4d55ea81.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Ryan Moeller <ryan@freqlabs.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11020
2020-10-16 13:01:53 -07:00
Ryan Moeller 5e7198b873 Linux: Initialize zp in zfs_setattr_dir
The value of zp is used without having been initialized under some
conditions.  Initialize the pointer to NULL.

Add a regression test case using chown in acl/posix.  However, this is
not enough because the setup sets xattr=sa, which means zfs_setattr_dir
will not be called.  Create a second group of acl tests in acl/posix-sa
duplicating the acl/posix tests with symlinks, and remove xattr=sa from
the original acl/posix tests.  This provides more coverage for the
default xattr=on code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10043
Closes #11025
2020-10-16 13:01:29 -07:00
Brian Behlendorf 46c71074ca Replace ZFS on Linux references with OpenZFS
This change updates the documentation to refer to the project
as OpenZFS instead ZFS on Linux.  Web links have been updated
to refer to https://github.com/openzfs/zfs.  The extraneous
zfsonlinux.org web links in the ZED and SPL sources have been
dropped.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11007
2020-10-16 13:01:24 -07:00
Jacob Adams 35ba2ca5b7 Fix Linux modules uninstall
A missing semicolon between kmoddir variable declaration and the
uninstall for loop caused modules_uninstall-Linux to fail with:

    Syntax error: "do" unexpected

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jacob Adams <jacob@tookmund.com>
Closes #11032
2020-10-16 13:01:14 -07:00
Ryan Moeller cbcb88dff8 ZTS: Fix path to /dev/null in nopwrite_recsize
Don't direct stdout and stderr of dd to $TEST_BASE_DIR/null,
direct it to /dev/null.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11026
2020-10-16 13:01:02 -07:00
Chuck Tuffli 0df5b5737c Fix ubsan: shift exponent is too large
When running libzpool with the Undefined Behavior Sanitizer (ubsan)
enabled, a zpool create causes a run-time error:

    module/zfs/vdev_label.c:600:14: runtime error: shift exponent 64 is
    too large for 64-bit type 'long long unsigned int'`

in vdev_config_generate()

Fix is to convert vdev_removal_max_span to its base-2 logarithm, using
highbit64(), and then compare the "shifts".

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Chuck Tuffli <ctuffli@gmail.com>
Closes #9744
Closes #11024
2020-10-16 13:00:44 -07:00
Christian Schwarz 102a1db6b2 libzfs_sendrecv: zfs_send: remove unused pipefd and tid variables
fixup of 196bee4

On gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), the code removed
caused `-Wmaybe-uninitialized` errors.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11021
2020-10-16 13:00:37 -07:00
Ryan Moeller 25e44a17ff Make dbufstat work on FreeBSD
With procfs_list kstats implemented for FreeBSD, dbufs are now exposed
as kstat.zfs.misc.dbufs.

On FreeBSD, dbufstats can use the sysctl instead of procfs when no
input file has been given.

Enable the dbufstats tests on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11008
2020-10-16 13:00:28 -07:00
Ryan Moeller 718d20ed93 FreeBSD: Sort and dedup includes in kmod_core
Code cleanup. Sort includes, remove duplicates, and drop
some extra blank lines in kmod_core.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11000
2020-10-16 13:00:23 -07:00
George Melikov 18fea82b89 docs: update README's installation link
OpenZFS is a cross-OS project now.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11022
2020-10-16 13:00:13 -07:00
Toomas Soome cfb602125e zdb should not output binary data on terminal
The zdb is interpreting byte array as textual string in dump_zap,
but there are also binary arrays and we should not output binary
data on terminal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
External-issue: https://www.illumos.org/issues/12012
External-issue: https://www.illumos.org/issues/11713
Closes #11006
2020-10-16 12:59:15 -07:00
Ryan Moeller 106627caa7 FreeBSD: Sort out kernel FPU headers for 12.1-REL
We were missing an include for kernel FPU functions, breaking the build
on FreeBSD 12.1-RELEASE.  This was apparently being pulled in from
elsewhere on stable/12 and head.

Sorted the other includes in these files while here.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11005
2020-10-16 12:59:09 -07:00
Alan Somers d6bee967ed Fix EIO after resuming receive of new dataset over an existing one
When resuming an interrupted ZFS send stream that creates a new dataset
with the same name as an existing dataset, if the existing dataset is
accessed after the failed receive, then after the subsequent successful
receive it will return EIO. This happens because nothing mounts the new
dataset, leaving the old, no longer valid dataset still mounted.

This commit fixes zfs receive to always unmount and remount the
destination, regardless of whether the stream is a new stream or a
resumed stream.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249579
Closes #10995
Closes #10999
2020-10-16 12:56:22 -07:00
Ryan Moeller 47e3dba972 Throw const on some strings
In C, const indicates to the reader that mutation will not occur.
It can also serve as a hint about ownership.

Add const in a few places where it makes sense.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10997
2020-10-16 12:55:56 -07:00
John Poduska a09e3a8594 Mismatched nvlist names in zfs_keys_send_space
This causes "zfs send -vt ..." to fail with:

    cannot resume send: Unknown error 1030

It turns out that some of the name/value pairs in the verification
list for zfs_ioc_send_space(), zfs_keys_send_space, had the wrong
name, so the ioctl got kicked out in zfs_check_input_nvpairs().
Update the names accordingly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes #10978
2020-10-16 12:55:19 -07:00
Brian Behlendorf 5573cbea9a Tag 2.0.0-rc3
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-10-01 12:35:40 -07:00
Brian Behlendorf fc5966589b Fix buggy procfs_list_seq_next warning
The kernel seq_read() helper function expects ->next() to update
the passed position even there are no more entries.  Failure to
do so results in the following warning being logged.

    seq_file: buggy .next function procfs_list_seq_next [spl]
    did not update position index

Functionally there is no issue with the way procfs_list_seq_next()
is implemented and the warning is harmless.  However, we want to
silence this some what scary incorrect warning.  This commit
updates the Linux procfs code to advance the position even for
the last entry.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10984
Closes #10996
2020-10-01 12:30:28 -07:00
Ryan Moeller 5d61d6e8dd FreeBSD: Fix legacy compat for platform IOCs
The request number is out of bounds of the platform table.

Subtract the starting offset to get the correct subscript.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10994
2020-10-01 12:23:00 -07:00
Matthew Macy 775afc4dcd Eliminate gratuitous bzeroing in dbuf_stats_hash_table_data
`dbuf_stats_hash_table_data` can take much longer than it needs to
by repeatedly bzeroing its buffer when in fact the buffer only needs
to be NULL terminated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10993
2020-10-01 12:22:54 -07:00
Sebastian Gottschall 7ce9da0bea do a cyclic seek for unused memory objects in pool
In non regular use cases allocated memory might stay persistent in memory
pool. This small patch checks every minute if there are old objects which
can be released from memory pool.

Right now with regular use, the pool is checked for old objects on each
allocation attempt from this pool. so basically polling by its use. Now
consider what happens if someone writes a lot of files and stops use of
the volume or even unmounts it. So the code will no longer check if
objects can be released from the pool. Already allocated objects will
still stay in pool cache. this is no big issue for common use. But
someone discovered this issue while doing tests. personally i know this
behavior and I'm aware of it. Its no big issue. just a enhancement

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Closes #10938
Closes #10969
2020-10-01 12:22:48 -07:00
Ryan Moeller e58dee8cae Drop references when skipping dmu_send due to EXDEV
When an invalid incremental send is requested where the "to" ds is
before the "from" ds, make sure to drop the reference to the pool
and the dataset before returning the error.

Add an assert on FreeBSD to make sure we don't hold any locks after
returning from an ioctl.

Add some test coverage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10919
2020-10-01 12:22:36 -07:00
Kjeld Schouten-Lebbing 83b5a22d86 Add intel_QAT patches
Add community compatibility patches for Intel QAT
Due to incompatibility with higher kernel versions.

Also includes basic instructions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10961
Closes #10962
2020-10-01 12:22:28 -07:00
Brian Behlendorf 13c38c4c45 Use known license string for zzstd
The Linux kernel MODULE_LICENSE macro only recognizes a handful of
license strings and "BSD" is not one of the them.  Update the macro
to use "Dual BSD/GPL" which is recognized and what the kernel expects
BSD licensed module to use.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10982
Closes #10992
2020-10-01 12:22:23 -07:00
Brian Behlendorf fdbec0423b Fix CONFIG_DEBUG_LOCK_ALLOC configure check
This check was accidentally broken when the kABI checks were updated
to run in parallel, commit 608f874.  The check must be for the
config_debug_lock_alloc_license name to determine if the symbol
is license compatible.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10991
2020-10-01 12:22:19 -07:00
Brian Behlendorf 9f29a4d972 Fix objtool configure check
The m4 objtool configure check can incorrectly fail because of a
missing header in the test.  This appears to be the result of a
recent kernel change and was observed on the Fedora 5.8.11-200
kernel.

  In file included from /home/fedora/zfs/build/objtool/objtool.c:75:
  ./arch/x86/include/asm/frame.h:100:57: error: 'struct pt_regs'
      declared inside parameter list will not be visible outside
      of this definition or declaration [-Werror]

The consequence of this is that the "stack_frame_non_standard"
check is never run and HAVE_STACK_FRAME_NON_STANDARD is set
incorrectly which results in a build failure.  This change adds
the appropriate header to the "objtool" check so it now behaves
as intended.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10990
2020-10-01 12:22:13 -07:00
grodik ced5f71eec Note that keys must be loaded for 'zpool remove'
The error returned by `zpool remove` when the encryption keys aren't
loaded isn't very helpful.  Furthermore, the man pages make no
mention that the keys need to be loaded. This change doesn't resolve
the error message but it does update the man page to mention this
requirement.

Authored-by: grodik <pat@litke.dev>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10939
Closes #10948
2020-10-01 12:22:08 -07:00
Kjeld Schouten-Lebbing e5a4f9cfc4 Document branching structure
This change documents the currently used branching structure.
It has been cut down to not include any controversial changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10976
2020-10-01 12:22:03 -07:00
Allan Jude 1579483a86 zfs userspace: use zfs_path_to_zhandle so argument can be a path
Change zfs userspace subcommand to use zfs_path_to_zhandle() so that
the provided dataset can be a path (/usr) or a dataset (rpool/usr).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #8915
2020-10-01 12:21:37 -07:00
Adam D. Moss edd23dba81 Add DB_RF_NOPREFETCH to dbuf_read()s in dnode.c
Prefetching of dnodes in dbuf_read() can cause significant mutex
contention for some workloads and isn't very helpful.  This is
because we already get 32 dnodes for each block read, and when
iterating over a directory we prefetch the dnodes in the directory.
Disable this prefetching to prevent the lock contention.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Submitted-by: Adam Moss <c@yotes.com>
Submitted-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #10877
Closes #10953
2020-10-01 12:21:09 -07:00
Brian Behlendorf 7b353d2c8c Fix PREEMPTION=y and BLK_CGROUP=y config on arm64
With PREEMPTION=y and BLK_CGROUP=y preempt_schedule_notrace() is being
used on arm64 which is a GPL-only function and hence the build of the
DKMS kernel module fails.

Fix that by redefining preempt_schedule_notrace() to preempt_schedule()
which should be safe as long as tracing is not used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Closes #8545
Closes #9948
Closes #10416
Closes #10973
2020-10-01 12:20:59 -07:00
Mateusz Guzik b37efb872b FreeBSD: update cache_purgevfs usage after 1300117 version bump
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Nick Wolff <darkfiberiru@gmail.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #10970
2020-10-01 12:20:45 -07:00
Ryan Moeller d8a81c3d3c FreeBSD: Code cleanup in zio_crypt
Address some unused value and control flow issues flagged by Coverity.

Unreachable code is pruned and unused values are avoided.
Some scattered sections are reordered for coherence.

We can assume kmem_alloc(n, KM_SLEEP) doesn't fail, so there is no need
to check if it returned NULL.  The allocated memory doesn't need to be
zeroed, other than the last iovec (the MAC).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10884
2020-10-01 12:20:39 -07:00
Ryan Moeller 875307b6a1 Prune dead branch reported by Coverity
wkey is NULL at every `goto error;`.
dcp is never NULL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10884
2020-10-01 12:20:16 -07:00
George Wilson 626abe164d zpool command complains about /etc/exports.d
If the /etc/exports.d directory does not exist, then we should only
create it when we're performing an action which already requires root
privileges.

This commit moves the directory creation to the enable/disable code
path which ensures that we have the appropriate privileges.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #10785
Closes #10934
2020-10-01 12:20:06 -07:00
Christian Schwarz ba28919168 zfs_log_write: simplify data copying code for WR_COPIED records
lr_write_t records that are WR_COPIED have the record data directly
appended to them (see lr_write_t type definition).

The data is copied from the debuf using dmu_read_by_dnode.

This function was called, only for WR_COPIED records, as part of a
short-circuiting if-statement's if-expression.

I found this side-effectful call to dmu_read_by_dnode pretty
hard to spot.
This patch improves readability by moving the call to its own line.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #10956
2020-10-01 12:20:00 -07:00
Matthew Macy c70c6e004e FreeBSD: Add support for procfs_list
The procfs_list interface is required by several kstats. Implement
this functionality for FreeBSD to provide access to these kstats.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10890
2020-10-01 12:18:56 -07:00
Matthew Macy 227273efa4 FreeBSD: Don't save user FPU context in kernel threads
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10899
2020-10-01 12:18:51 -07:00
Paul Dagnelie b199e62d17 Don't set numobjs to UINT64_MAX or near it
Resolves an issue with `zfs send` streams from 0.8.4 which
prevents them from being received by versions < 0.7.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10911
Closes #10916
2020-10-01 12:18:38 -07:00
наб b9d18bdbdc contrib/initramfs: fix shellcheck and checkbashisms errors with shebang
Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #10908
Closes #10917
2020-10-01 12:17:45 -07:00
Mark Johnston e651a5b233 Fix a logic bug in the FreeBSD getpages VOP
In commit cd32b4f5b7 ("Fix a deadlock in the FreeBSD getpages VOP") I
introduced a bug while porting the patch originally committed to
FreeBSD: the rangelock pointer may be NULL if the try operation failed,
so we must avoid calling zfs_rangelock_unlock() in that case.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reported-by: Steve Wills <swills@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #10519
Closes #10960
2020-10-01 12:16:33 -07:00
Ryan Moeller 723726ae7d FreeBSD: Reduce stack usage of Lua
Use the same reduced buffer size for lauxlib that is used on Linux.

Fixes panic on HEAD in lua gsub test designed to exhaust stack space.

With this we can remove the special case to reserve more stack space
on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kyle Evans <kevans@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10959
2020-10-01 12:16:28 -07:00
Mark Johnston aba5b019cb Annontate FreeBSD sysctls with CTLFLAG_MPSAFE
Without this, the sysctl system calls will acquire a global lock before
invoking the handler.  This is noticeable in some situations when
running top(1).  The global lock is mostly vestigal but continues to see
some use and so contention is still a problem; until the default sense
of the MPSAFE flag changes, we have to annotate each and every handler.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #10836
2020-10-01 12:16:21 -07:00
Mark Johnston f664153078 Fix switch statement indentation in the FreeBSD kstat code
This is in preparation for some functional changes.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #10950
2020-10-01 12:16:00 -07:00
Brian Behlendorf 4ce06f940e Tag 2.0.0-rc2
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-09-18 12:50:49 -07:00
George Wilson 5899ea5a77 vdev_ashift should only be set once
== Motivation and Context

The new vdev ashift optimization prevents the removal of devices when
a zfs configuration is comprised of disks which have different logical
and physical block sizes. This is caused because we set 'spa_min_ashift'
in vdev_open and then later call 'vdev_ashift_optimize'. This would
result in an inconsistency between spa's ashift calculations and that
of the top-level vdev.

In addition, the optimization logical ignores the overridden ashift
value that would be provided by '-o ashift=<val>'.

== Description

This change reworks the vdev ashift optimization so that it's only
set the first time the device is configured. It still allows the
physical and logical ahsift values to be set every time the device
is opened but those values are only consulted on first open.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-Issue: DLPX-71831
Closes #10932
2020-09-18 12:40:20 -07:00
Allan Jude 56e69c1e9c libzfs: Don't leak buf if nvlist is too large
Resolves FreeBSD Coverity defect:
CID 1432398:  Resource leaks  (RESOURCE_LEAK)

libzfs: don't leak hdl if there is an error reading env var

Resolves FreeBSD Coverity defect:
CID 1432395:  Resource leaks  (RESOURCE_LEAK)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #10882
2020-09-18 12:38:40 -07:00
George Wilson dacb4f6a61 pool may become suspended during device expansion
When expanding a device zfs needs to rescan the partition table to
get the correct size. This can only happen when we're in the kernel
and requires the device to be closed. As part of the rescan, udev is
notified and the device links are removed and recreated. This leave a
window where the vdev code may try to reopen the device before udev
has recreated the link. If that happens, then the pool may end up in
a suspended state.

To correct this, we leverage the BLKPG_RESIZE_PARTITION ioctl which
allows the partition information to be modified even while it's in use.
This ioctl also does not remove the device link associated with the zfs
data partition so it eliminates the race condition that can occur in
the kernel.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #10897
2020-09-18 12:38:30 -07:00
Matthew Ahrens 66ccc9b75f zdb leak detection fails with in-progress device removal
When a device removal is in progress, there are 2 locations for the data
that's already been moved: the original location, on the device that's
being removed; and the new location, which is pointed to by the indirect
mapping.  When doing leak detection, zdb needs to know about both
locations.  To determine what's already been copied, we load the
spacemaps of the removing vdev, omit the blocks that are yet to be
copied, and then use the vdev's remap op to find the new location.

The problem is with an optimization to the spacemap-loading code in zdb.
When processing the log spacemaps, we ignore entries that are not
relevant because they are past the point that's been copied.  However,
entries which span the point that's been copied (i.e. they are partly
relevant and partly irrelevant) are processed normally.  This can lead
to an illegal spacemap operation, for example if offsets up to 100KB
have been copied, and the spacemap log has the following entries:

	ALLOC 50KB-150KB (partly relevant)
	FREE 50KB-100KB (entirely relevant)
	FREE 100KB-150KB (entirely irrlevant - ignored)
	ALLOC 50KB-150KB (partly relevant)

Because the entirely irrelevant entry was ignored, its space remains in
the spacemap.  When the last entry is processed, we attempt to add it to
the spacemap, but it partially overlaps with the 100-150KB entry that
was left over.

This problem was discovered by ztest/zloop.

One solution would be to also ignore the irrelevant parts of
partially-irrelevant entries (i.e. when processing the ALLOC 50-150, to
only add 50-100 to the spacemap).  However, this commit implements a
simpler solution, which is to remove this optimization entirely.  I.e.
to process the entire spacemap log, without regard for the point that's
been copied.  After reconstructing the entire allocatable range tree,
there's already code to remove the parts that have not yet been copied.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-71820
Closes #10920
2020-09-18 12:38:24 -07:00
Ryan Moeller 7b86ad215e FreeBSD: Do not copy vp into f_data for DTYPE_VNODE files
https://reviews.freebsd.org/D26346

Do not copy vp into f_data for DTYPE_VNODE files.  The vnode pointer is
already stored in f_vnode.  Use that so f_data can be reused.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10929
2020-09-18 12:38:14 -07:00
John Poduska aa7817c151 Need a long hold in zpl_mount_impl
In zpl_mount_impl, there is:
    dmu_objset_hold	; returns with pool & ds held
    dsl_pool_rele

    sget

    dsl_dataset_rele

As spelled out in the "DSL Pool Configuration Lock" in dsl_pool.c,
this requires a long hold.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes #10936
2020-09-18 12:38:09 -07:00
Toomas Soome 3902eaf9ed libzfsbootenv: lzbe_nvlist_set needs to store bootenv version VB_NVLIST
A small bug did slip into initial libzfsbootenv; while storing nvlist
in nvlist, we should make sure the bootenv is using VB_NVLIST format.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10937
2020-09-18 12:38:04 -07:00
Ryan Moeller 2cec08a1f0 Rename acltype=posixacl to acltype=posix
Prefer acltype=off|posix, retaining the old names as aliases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10918
2020-09-18 12:38:00 -07:00
Georgy Yakovlev 0968d689a2 cmd/zgenhostid: replace with simple c implementation
It was discovered that dracut scripts and zgenhostid
always generate little-endian /etc/hostid.

This commit provides simple endianess-aware binary
and updates the scripts to use it.

New features include:
 -f flag to force overwrite.
 -o flag to write to different file (for dracut)
 accepting both 0x01234567 and 01234567 values as input

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #10887
Closes #10925
2020-09-18 12:37:54 -07:00
Pavel Snajdr 1ce90aa441 Fix stack frame size: dnode_dirty_l1range()
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10879
2020-09-18 12:37:44 -07:00
Pavel Snajdr c9eab8257d dmu_redact_snap: fix possible memleak
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10879
2020-09-18 12:37:39 -07:00
Pavel Snajdr df39626fdd Fix stack frame size: dmu_redact_snap()
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10879
2020-09-18 12:37:34 -07:00
Pavel Snajdr 083ddb7714 Fix stack frame size: spa_livelist_delete_cb()
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10879
2020-09-18 12:37:29 -07:00
наб 7bc2d04398 zpoolprops.8: fix raidz par[i]ty typo
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #10923
2020-09-15 18:36:17 -07:00
Toomas Soome 84d9492e52 zfs label bootenv should store data as nvlist
nvlist does allow us to support different data types and systems.

To encapsulate user data to/from nvlist, the libzfsbootenv library is
provided.

Reviewed-by: Arvind Sankar <nivedita@alum.mit.edu>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10774
2020-09-15 18:36:12 -07:00
Ryan Moeller c8bbb0c93d Linux: Prevent destruction while showing mount devname
Use ZFS_ENTER and ZFS_EXIT to protect datasets while their mount
devname is being retrieved.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10892
Closes #10927
2020-09-15 18:36:03 -07:00
Harald van Dijk ae93e46716 config/zfs-build.m4: never define _initramfs in RPM_DEFINE_UTIL
The zfs-initramfs package has never worked as no RPM-based distribution
uses initramfs-tools, which is listed as a dependency of zfs-initramfs.

This would not ordinarily be a problem, as it is only enabled when
/usr/share/initramfs-tools is present, which should not normally be the
case on RPM-based distributions. However, other packages may install
unused files there even if initramfs-tools is not used, so remove this
auto-detection for the rpm-utils target.

This does not fully remove the logic for the zfs-initramfs package. This
splits it out into a separate rpm-utils-initramfs target so that the
Debian builds can still use it.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Harald van Dijk <harald@gigawatt.nl>
Closes #10898
2020-09-15 18:36:03 -07:00
Matthew Ahrens 645ca45a13 libzutil depends on libnvpair
libzutil depends on libnvpair, but this dependency is undeclared in the
build system.  Therefore it isn't possible to make a new command that
depends on libzutil, but does not (directly) depend on libnvpair.

This commit makes this dependency explicit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reivewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10915
2020-09-15 18:36:03 -07:00
Mateusz Guzik 29bc31f62f FreeBSD: convert teardown inactive lock to a read-mostly sleepable lock
The lock is taken all the time and as a regular read-write lock
avoidably serves as a mount point-wide contention point.

This forward ports FreeBSD revision r357322.

To quote aforementioned commit:

Sample result doing an incremental -j 40 build:
before: 173.30s user 458.97s system 2595% cpu 24.358 total
after:  168.58s user 254.92s system 2211% cpu 19.147 total

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #10896
2020-09-09 10:26:05 -07:00
xdch47 17e2fd3bfd Force the use of '.' as decimal separator.
This solves issues occurring with a different decimal operator and
keeps the command line interface consistent for all locales .
E.g. `zfs set quota=0.5T`

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Felix Neumärker <xdch47@posteo.de>
Closes #10878
2020-09-09 10:26:04 -07:00
Olaf Faaland 55de40fe47 Initialize mmp_last_write when the mmp thread starts
A great deal of time may go by between when mmp_init() is called and
the MMP thread starts, particularly if there are bad devices, because
there is I/O checking configs etc.  If this time is too long,

    (gethrtime() - mmp_last_write) > mmp_fail_ns

at the time the MMP thread starts.  If MMP is configured to suspend
the pool, the pool will be suspended immediately.

This can be seen in issue #10838

The value of mmp_last_write doesn't matter before the mmp thread
starts.  To give the MMP thread time to issue and land MMP writes,
initialize mmp_last_write when the MMP thread starts.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #10873
2020-09-09 10:26:04 -07:00
Ryan Moeller 9cea5f0d69 FreeBSD: drop dependency on cryptodev module
We only need the kernel interfaces in crypto, not the device node in
cryptodev.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10901
2020-09-09 10:26:04 -07:00
George Amanakis 78d84f56d1 Introduce ZFS module parameter l2arc_mfuonly
In certain workloads it may be beneficial to reduce wear of L2ARC
devices by not caching MRU metadata and data into L2ARC. This commit
introduces a new tunable l2arc_mfuonly for this purpose.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10710
2020-09-09 10:26:03 -07:00
Ryan Moeller 127daad223 Avoid possibility of division by zero
When hz > 1000, msec / (1000 / hz) results in division by zero.

I found somewhere in FreeBSD using howmany(msec * hz, 1000) to convert
ms to ticks, avoiding the potential for a zero in the divisor.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10894
2020-09-09 10:26:03 -07:00
Toomas Soome b155a243a6 dnode_special_open() error: unchecked function return 'zrl_tryenter'
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10876
2020-09-09 10:26:03 -07:00
Peter Dave Hello ac71835706 Add a missing option prefix - in zfs-tests.sh usage()
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Peter Dave Hello <hsu@peterdavehello.org>
Closes #10893
2020-09-09 10:26:02 -07:00
Fabio Buso 3625a0131a Display pbkdf2iters property as plain number
The pbkdf2iters property is an iteration counter
and should be displayed as plain number rather
than in binary unit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fabio Buso <buso.fabio@gmail.com>
Closes #10871
2020-09-09 10:26:02 -07:00
alaviss 0b5a4c4d6b libshare: Add missing headers for nfs.c
On musl libc, zfs failed to compile due to the missing <fcntl.h>
include, which is required for `open()` per POSIX.

This commit add the missing <fcntl.h> include.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hiếu Lê <leorize+oss@disroot.org>
Closes #10880
2020-09-09 10:26:02 -07:00
Matthew Macy ee73a8ff3d FreeBSD: reduce priority of ZIO_TASKQ_ISSUE writes by a larger value
On FreeBSD, if priorities divided by four (RQ_PPQ) are equal then
a difference between them is insignificant. In other words,
incrementing pri by only one as on Linux is insufficient.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10872
2020-09-09 10:26:02 -07:00
Ryan Moeller c0234eab65 Spruce up pkg-config files for libzfs/libzfs_core
Several of the listed library dependencies are not relevant on FreeBSD.
Have ./configure save libraries that are found via pkg-config as
${LIB}_PC and use the configured automake variables instead of hard
coded names so we only get what was actually needed.

While here, update the URL to point at the OpenZFS Github repo.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10869
2020-09-09 10:26:01 -07:00
Ryan Moeller dd34e6cdd9 man: Cross-reference zfs-load-key(8) for ENCRYPTION mention
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Submitted-by: Harry Schmalzbauer
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10866
2020-09-09 10:26:01 -07:00
Ryan Moeller e9c1fa0cc1 man: Add zfs rename -r to zfs-rename(8) SYNOPSIS
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10866
2020-09-09 10:26:01 -07:00
Brian Behlendorf 18524b936d Sequential scrub and resilver updated comments
Commit d4a72f2 which introduced multi-phase scrubs and resilvers
continued the work presented by Nexenta at the 2016 ZFS developer
summit.  Update the source to reflect their contribution.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-09-09 10:26:00 -07:00
Don Brady 8afac5dc55 Avoid posting duplicate zpool events
Duplicate io and checksum ereport events can misrepresent that
things are worse than they seem. Ideally the zpool events and the
corresponding vdev stat error counts in a zpool status should be
for unique errors -- not the same error being counted over and over.
This can be demonstrated in a simple example. With a single bad
block in a datafile and just 5 reads of the file we end up with a
degraded vdev, even though there is only one unique error in the pool.

The proposed solution to the above issue, is to eliminate duplicates
when posting events and when updating vdev error stats. We now save
recent error events of interest when posting events so that we can
easily check for duplicates when posting an error.

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #10861
2020-09-09 10:26:00 -07:00
Matthew Ahrens bd724261d2 nowait synctask must succeed
If a `zfs_space_check_t` other than `ZFS_SPACE_CHECK_NONE` is used with
`dsl_sync_task_nowait()`, the sync task may fail due to ENOSPC.
However, there is no way to notice or communicate this failure, so it's
extremely difficult to use this functionality correctly, and in fact
almost all callers use `ZFS_SPACE_CHECK_NONE`.

This commit removes the `zfs_space_check_t` argument from
`dsl_sync_task_nowait()`, and always uses `ZFS_SPACE_CHECK_NONE`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10855
2020-09-09 10:25:59 -07:00
Ryan Moeller a1e03186fd Retain thread name when resuming a zthr
When created, a zthr is given a name to identify it by.  This name is
lost when a cancelled zthr is resumed.

Retain the name of a zthr so it can be used when resuming.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10881
2020-09-09 10:21:16 -07:00
Alexander Richardson e28635396a Fixes for running FreeBSD buildworld on Linux/macOS hosts
Adding an #ifdef __FreeBSD__ to a FreeBSD-specific header may seem odd,
but these headers are used on non-FreeBSD systems during the bootstrap
tools phase.
Originally submitted downstream as https://reviews.freebsd.org/D26193

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Closes #10863
2020-09-09 10:21:11 -07:00
Matthew Macy 36f36610c3 Replace cv_{timed}wait_sig with cv_{timed}wait_idle where appropriate
There are a number of places where cv_?_sig is used simply for
accounting purposes but the surrounding code has no ability to
cope with actually receiving a signal. On FreeBSD it is possible
to send signals to individual kernel threads so this could
enable undesirable behavior.

This patch adds routines on Linux that will do the same idle
accounting as _sig without making the task interruptible. On
FreeBSD cv_*_idle  are all aliases for cv_*

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10843
2020-09-09 10:21:01 -07:00
Spencer Kinny fd20a81b9a Links in Source Files
Added comments in following files
with links to Illumos manual pages:

./module/avl/avl.c
./module/nvpair/nvpair.c
./module/os/linux/spl/spl-kstat.c
./module/os/freebsd/spl/spl_kstat.c

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Spencer Kinny <spencerkinny1995@gmail.com>
Closes #5113
Closes #10859
2020-09-03 16:17:18 -07:00
Toomas Soome ef8a6fe9fe zvol: unsigned off can not be less than zero
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10867
2020-09-03 16:16:52 -07:00
Alexander Richardson 7c36a9e24a Fix -Werror,-Wmacro-redefined in limits.h
Those macros are also defined by the compiler-provided float.h which
will be included later on (at least in the FreeBSD buildworld case) and
triggers these -Werror warnings. Including <float.h> first and only
defining the macros when DBL_DIG/FLT_DIG is missing fixes this problem.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Closes #10864
2020-09-03 16:16:42 -07:00
Ryan Moeller da81d91d48 Make spa_stats.c tunables visible on FreeBSD
Use ZFS_MODULE_PARAM for cross-platform tunables in spa_stats.c, and
add update tunables.cfg in tests for the newly supported ones.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10858
2020-09-03 16:16:34 -07:00
Matthew Macy ecd3976f5b FreeBSD: Fix up after spa_stats.c move
Moving spa_stats added the additional burden of supporting
KSTAT_TYPE_IO.

spa_state_addr will always return a valid value regardless of
the value of 'n'. On FreeBSD this will cause an infinite loop
as it relies on the raw ops addr routine to indicate that there
is no more data.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10860
2020-09-03 16:16:22 -07:00
Ryan Moeller 76a157f004 Add 'zfs rename -u' to rename without remounting
Allow to rename file systems without remounting if it is possible.
It is possible for file systems with 'mountpoint' property set to
'legacy' or 'none' - we don't have to change mount directory for them.
Currently such file systems are unmounted on rename and not even
mounted back.

This introduces layering violation, as we need to update
'f_mntfromname' field in statfs structure related to mountpoint (for
the dataset we are renaming and all its children).

In my opinion it is worth it, as it allow to update FreeBSD in even
cleaner way - in ZFS-only configuration root file system is ZFS file
system with 'mountpoint' property set to 'legacy'. If root dataset is
named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone
it (system/oldrootfs), update FreeBSD and if it doesn't boot we can
boot back from system/oldrootfs and rename it back to system/rootfs
while it is mounted as /. Before it was not possible, because
unmounting / was not possible.

Authored by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported by: Matt Macy <mmacy@freebsd.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10839
2020-09-03 16:16:15 -07:00
Ryan Moeller 6512c18fe1 FreeBSD: Remove unused SECLABEL code
SECLABEL is undefined on FreeBSD and should be pruned.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10847
2020-09-03 16:16:10 -07:00
Ryan Moeller f5ddb3b481 libspl: Provide platform-specific zone implementations
FreeBSD has the concept of jails, a precursor to Solaris's zones, which
can be mapped to the required zones interface with relative ease.  The
previous ZFS implementation in FreeBSD did so, and we should continue
to provide an appropriate implementation in OpenZFS as well.

Move lib/libspl/zone.c into platform code and adopt the correct
implementation for FreeBSD.

While here, prune unused code.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10851
2020-09-03 16:16:04 -07:00
Ryan Moeller d6a779a278 FreeBSD: Simplify INGLOBALZONE
FreeBSD's previous ZFS implemented INGLOBALZONE(thread) as
(!jailed((thread)->td_ucred)) and passed curthread to INGLOBALZONE.

We pass curproc instead of curthread, so we can achieve the same effect
with (!jailed((proc)->p_ucred)).  The implementation is trivial enough
to fit on a single line in a define.  We don't really need a whole
separate function for something that's already macros all the way down.

Eliminate in_globalzone.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10851
2020-09-03 16:15:59 -07:00
Ryan Moeller bbba0b7f93 FreeBSD: Define crgetzoneid appropriately
The previous ZFS implementation on FreeBSD had ifdefs to use jailed()
instead of crgetzoneid() in dsl_dir.c, however we can simply provide an
appropriate definition of crgetzoneid for the same effect.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10851
2020-09-03 16:15:53 -07:00
Toomas Soome 8a06356e24 zio_ereport_post() and zio_ereport_start() return values are ignored
use (void) to silence analyzers.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10857
2020-09-03 16:15:47 -07:00
Spencer Kinny b73a8b1dc2 Typo Correction
Corrected the typo in zfs/cmd/zfs/zfs_main.c
line number 404 pbkfd2iters to pbkdf2iters

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Spencer Kinny <spencerkinny1995@gmail.com>
Closes #10850
2020-08-30 14:19:12 -07:00
Matthew Macy baed4fbacb Move spa_stats.c to common code
Initially it was considered simplest to stub out all
of the functions on FreeBSD. Now that FreeBSD supports
KSTAT_TYPE_RAW at least some of the functionality should
be made available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10842
2020-08-30 14:19:08 -07:00
Matthew Macy f4c8e9c69b FreeBSD: Fix spurious failure in zvol_geom_open
In zvol_geom_open on first open we need to guarantee
that the namespace lock is held to avoid spurious
failures in zvol_first_open.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10841
2020-08-30 14:19:03 -07:00
Matthew Macy 8639ca86da FreeBSD: add support for KSTAT_TYPE_RAW
A few kstats use KSTAT_TYPE_RAW to provide a string generated on
demand.  Implementing these as sysctls was punted until now.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10836
2020-08-30 14:18:54 -07:00
Brian Behlendorf c6ee83893e Linux 5.9 compat: NR_SLAB_RECLAIMABLE
Commit dcdc12e added compatibility code to treat NR_SLAB_RECLAIMABLE_B
as if it were the same as NR_SLAB_RECLAIMABLE.  However, the new value
is in bytes while the old value was in pages which means they are not
interchangeable.

The only place the reclaimable slab size is used is as a component of
the calculation done by arc_free_memory().  This function returns the
amount of memory the ARC considers to be free or reclaimable at little
cost.  Rather than switch to a new interface to get this value it has
been removed it from the calculation.  It is normally a minor component
compared to the number of inactive or free pages, and removing it
aligns the behavior with the FreeBSD version of arc_free_memory().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10834
2020-08-30 14:18:50 -07:00
Richard Laager 0fba4d138c Fix another dependency loop
zfs-load-key-DATASET.service was gaining an
After=systemd-journald.socket due to its stdout/stderr going to the
journal (which is the default).  systemd-journald.socket has an After
(via RequiresMountsFor=/run/systemd/journal) on -.mount.  If the root
filesystem is encrypted, -.mount gets an After
zfs-load-key-DATASET.service.

By setting stdout and stderr to null on the key load services, we avoid
this loop.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10356
Closes #10388
2020-08-30 14:18:45 -07:00
Richard Laager 6bf3f4dfe5 Fix a dependency loop
When generating units with zfs-mount-generator, if the pool is already
imported, zfs-import.target is not needed.  This avoids a dependency
loop on root-on-ZFS systems:
  systemd-random-seed.service After (via RequiresMountsFor)
  var-lib.mount After
  zfs-import.target After
  zfs-import-{cache,scan}.service After
  cryptsetup.service After
  systemd-random-seed.service

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10388
2020-08-30 14:18:41 -07:00
Georgy Yakovlev fa0cd2d16f config/zfs-build.m4: add --with-vendor flag
This will allow an override of auto-detection of distribution, which
is based on checking presence of /etc/*-release files.

Build systems makes a lot of file location assumptions based on
detected distribution.

Some distributions (like gentoo) may prefer explicitly
setting --with-vendor=gentoo to avoid auto-detection.

Since auto-detection checks all files in order, current script may
misdetect even on gentoo system if /etc/redhat-release file is present

Default behavior is unchanged and default is --with-vendor=check

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #10835
2020-08-30 14:18:36 -07:00
Alexander Richardson a00c61db44 Fix definition of BLKGETSIZE64 on FreeBSD
The matching ioctl is DIOCGMEDIASIZE.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Closes #10818
2020-08-27 16:09:51 -07:00
Georgy Yakovlev c2068750d7 module/zstd: pass -U__BMI__
If kernel is compiled with -march=znver1 or -march=znver2 zstd module
compilation will fail due to SSE register return with SSE disabled.
What's interesting, is that -march=skylake also implies -mbmi which
defines __BMI__ but compilation succeeds.  It is probably due to
different BMI implementations on AMD and INTEL processors and the
way compiler uses instructions.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #10758
Closes #10829
2020-08-27 16:07:13 -07:00
John-Mark Gurney af424d8a1a Add the Xr's to the SEE ALSO as well
There are a ton of zfs-* and zpool-* man pages. This adds them to
the SEE ALSO section so that people can more quickly look through
what all the options are, now that the pages have been split.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: John-Mark Gurney <jmg@funkthat.com>
Closes #10589
2020-08-27 16:07:10 -07:00
Patrick Mooney 1ac6248312 dnode_sync is careless with range tree
Because dnode_sync_free_range() must drop dn_mtx during its processing,
using it as a callback to range_tree_vacate() is not safe.  No other
operations (besides destroy) are allowed once range_tree_vacate() has
begun, and dropping dn_mtx would leave a window open for another thread
to observe that invalid (and unsafe) state via dnode_block_freed().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Patrick Mooney <pmooney@oxide.computer>
Closes #10708
Closes #10823
2020-08-27 16:07:05 -07:00
Cédric Berger 77b01f53e7 Fix NEWS file
Points to https://github.com/openzfs/zfs/releases

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Cédric Berger <cedric@precidata.com>
Closes #10824
2020-08-27 16:07:01 -07:00
Ryan Moeller 57fc3987a0 zpool: Change base URL for ZFS messages to openzfs-docs
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10820
2020-08-27 16:06:57 -07:00
Brian Behlendorf 4f6167deb5 Remove duplicate dnode.h include
The zfs/sa.c source file accidentally includes sys/dnode.h twice.
Remove the second occurrence.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10816
Closes #10819
2020-08-27 16:06:52 -07:00
Paul Dagnelie 79d6a1b1da Always track temporary fses and snapshots for accounting
The root cause of the issue is that we only occasionally do as the
comments in the code suggest and actually ignore the %recv dataset when
it comes to filesystem limit tracking. Specifically, the only time we
ignore it is when initializing the filesystem and snapshot limit values;
when creating a new %recv dataset or deleting one, we always update
the bookkeeping. This causes a problem if you init the fs count on a
filesystem that already has a %recv dataset, since the bookmarking
will be decremented but not incremented. This is resolved in this
patch by simply always tracking the %recv dataset as a child.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10791
2020-08-27 16:06:47 -07:00
Toomas Soome 510179f086 Remove pragma ident lines
The #pragma ident is a historical relic and not needed any more, this
pragma is actually unknown for common compilers and is only causing
trouble.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10810
2020-08-27 16:06:39 -07:00
Matthew Macy cb16a5e043 FreeBSD: disable neon usage
The neon support code does not build on FreeBSD,
ifdef out references to fix linker issues on arm64.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10809
2020-08-27 16:06:35 -07:00
Alexander Motin 3ca31bd0c6 Introduce limit on size of L2ARC headers
Since L2ARC buffers are not evicted on memory pressure, too large
amount of headers on system with irrationally large L2ARC can render
it slow or even unusable.  This change limits L2ARC writes and
rebuild if unevictable L2ARC-only headers reach dangerous level.

While there, call arc_adapt() on L2ARC rebuild, so that it could
properly grow arc_c, reflecting potentially significant ARC size
increase and avoiding slow growth with hopeless eviction attempts
later when "overflow" is detected.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #10765
2020-08-27 16:06:28 -07:00
1012 changed files with 38305 additions and 53050 deletions
-3
View File
@@ -1,8 +1,5 @@
blank_issues_enabled: false
contact_links:
- name: OpenZFS Questions
url: https://github.com/openzfs/zfs/discussions/new
about: Ask the community for help
- name: OpenZFS Community Support Mailing list (Linux)
url: https://zfsonlinux.topicbox.com/groups/zfs-discuss
about: Get community support for OpenZFS on Linux
+37
View File
@@ -0,0 +1,37 @@
---
name: Code Question
about: Ask a question about the code
title: ''
labels: 'Type: Question'
assignees: ''
---
<!--
Thank you for taking an interest in the OpenZFS codebase.
Please be aware that most questions are preferably asked in the mailing list first.
This form is primarily meant for asking questions about the code itself.
Please also check our issue tracker before opening a new question.
Filling out the following template will help other contributors better understand your question.
-->
### Ask your question!
<!--
Please provide a clear and concise question.
-->
### Which portion of the codebase does your question involve?
<!--
Optional: Please describe what portion of the codebase your issue involved.
Example: "Testsuite", "Buildbots", "CLI", a code snippet etc.
-->
### Additional context
<!--
Any additional information you want to add?
-->
+4 -5
View File
@@ -28,15 +28,14 @@ https://openzfs.github.io/openzfs-docs/Developer%20Resources/Buildbot%20Options.
- [ ] Performance enhancement (non-breaking change which improves efficiency)
- [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Library ABI change (libzfs, libzfs\_core, libnvpair, libuutil and libzfsbootenv)
- [ ] Documentation (a change to man pages or other documentation)
### Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
- [ ] My code follows the OpenZFS [code style requirements](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#coding-conventions).
- [ ] My code follows the ZFS on Linux [code style requirements](https://github.com/zfsonlinux/zfs/blob/master/.github/CONTRIBUTING.md#coding-conventions).
- [ ] I have updated the documentation accordingly.
- [ ] I have read the [**contributing** document](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md).
- [ ] I have added [tests](https://github.com/openzfs/zfs/tree/master/tests) to cover my changes.
- [ ] I have read the [**contributing** document](https://github.com/zfsonlinux/zfs/blob/master/.github/CONTRIBUTING.md).
- [ ] I have added [tests](https://github.com/zfsonlinux/zfs/tree/master/tests) to cover my changes.
- [ ] I have run the ZFS Test Suite with this change applied.
- [ ] All commit messages are properly formatted and contain [`Signed-off-by`](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#signed-off-by).
- [ ] All commit messages are properly formatted and contain [`Signed-off-by`](https://github.com/zfsonlinux/zfs/blob/master/.github/CONTRIBUTING.md#signed-off-by).
-13
View File
@@ -1,13 +0,0 @@
# Configuration for probot-no-response - https://github.com/probot/no-response
# Number of days of inactivity before an Issue is closed for lack of response
daysUntilClose: 31
# Label requiring a response
responseRequiredLabel: "Status: Feedback requested"
# Comment to post when closing an Issue for lack of response. Set to `false` to disable
closeComment: >
This issue has been automatically closed because there has been no response
to our request for more information from the original author. With only the
information that is currently in the issue, we don't have enough information
to take action. Please reach out if you have or find the answers we need so
that we can investigate further.
+1 -10
View File
@@ -7,14 +7,7 @@ only: issues
# Issues with these labels will never be considered stale
exemptLabels:
- "Type: Feature"
- "Bot: Not Stale"
- "Status: Work in Progress"
# Set to true to ignore issues in a project (defaults to false)
exemptProjects: true
# Set to true to ignore issues in a milestone (defaults to false)
exemptMilestones: true
# Set to true to ignore issues with an assignee (defaults to false)
exemptAssignees: true
- "Type: Understood"
# Label to use when marking an issue as stale
staleLabel: "Status: Stale"
# Comment to post when marking an issue as stale. Set to `false` to disable
@@ -22,5 +15,3 @@ markComment: >
This issue has been automatically marked as "stale" because it has not had
any activity for a while. It will be closed in 90 days if no further activity occurs.
Thank you for your contributions.
# Limit the number of actions per hour, from 1-30. Default is 30
limitPerRun: 6
+1 -1
View File
@@ -6,7 +6,7 @@ on:
jobs:
checkstyle:
runs-on: ubuntu-18.04
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v2
with:
+14 -3
View File
@@ -26,7 +26,7 @@ jobs:
xfslibs-dev libattr1-dev libacl1-dev libudev-dev libdevmapper-dev \
libssl-dev libffi-dev libaio-dev libelf-dev libmount-dev \
libpam0g-dev pamtester python-dev python-setuptools python-cffi \
python3 python3-dev python3-setuptools python3-cffi
python3 python3-dev python3-setuptools python3-cffi python3-packaging
- name: Autogen.sh
run: |
sh autogen.sh
@@ -44,6 +44,17 @@ jobs:
sudo sed -i.bak 's/updates/extra updates/' /etc/depmod.d/ubuntu.conf
sudo depmod
sudo modprobe zfs
# Workaround for cloud-init bug
# see https://github.com/openzfs/zfs/issues/12644
FILE=/lib/udev/rules.d/10-cloud-init-hook-hotplug.rules
if [ -r "${FILE}" ]; then
HASH=$(md5sum "${FILE}" | awk '{ print $1 }')
if [ "${HASH}" = "121ff0ef1936cd2ef65aec0458a35772" ]; then
# Just shove a zd* exclusion right above the hotplug hook...
sudo sed -i -e s/'LABEL="cloudinit_hook"'/'KERNEL=="zd*", GOTO="cloudinit_end"\n&'/ "${FILE}"
sudo udevadm control --reload-rules
fi
fi
# Workaround to provide additional free space for testing.
# https://github.com/actions/virtual-environments/issues/2840
sudo rm -rf /usr/share/dotnet
@@ -52,7 +63,7 @@ jobs:
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- name: Tests
run: |
/usr/share/zfs/zfs-tests.sh -v -s 3G
/usr/share/zfs/zfs-tests.sh -vR -s 3G
- name: Prepare artifacts
if: failure()
run: |
@@ -61,7 +72,7 @@ jobs:
sudo cp /var/log/syslog $RESULTS_PATH/
sudo chmod +r $RESULTS_PATH/*
# Replace ':' in dir names, actions/upload-artifact doesn't support it
for f in $(find $RESULTS_PATH -name '*:*'); do mv "$f" "${f//:/__}"; done
for f in $(find /var/tmp/test_results -name '*:*'); do mv "$f" "${f//:/__}"; done
- uses: actions/upload-artifact@v2
if: failure()
with:
+14 -3
View File
@@ -22,7 +22,7 @@ jobs:
xfslibs-dev libattr1-dev libacl1-dev libudev-dev libdevmapper-dev \
libssl-dev libffi-dev libaio-dev libelf-dev libmount-dev \
libpam0g-dev pamtester python-dev python-setuptools python-cffi \
python3 python3-dev python3-setuptools python3-cffi
python3 python3-dev python3-setuptools python3-cffi python3-packaging
- name: Autogen.sh
run: |
sh autogen.sh
@@ -40,6 +40,17 @@ jobs:
sudo sed -i.bak 's/updates/extra updates/' /etc/depmod.d/ubuntu.conf
sudo depmod
sudo modprobe zfs
# Workaround for cloud-init bug
# see https://github.com/openzfs/zfs/issues/12644
FILE=/lib/udev/rules.d/10-cloud-init-hook-hotplug.rules
if [ -r "${FILE}" ]; then
HASH=$(md5sum "${FILE}" | awk '{ print $1 }')
if [ "${HASH}" = "121ff0ef1936cd2ef65aec0458a35772" ]; then
# Just shove a zd* exclusion right above the hotplug hook...
sudo sed -i -e s/'LABEL="cloudinit_hook"'/'KERNEL=="zd*", GOTO="cloudinit_end"\n&'/ "${FILE}"
sudo udevadm control --reload-rules
fi
fi
# Workaround to provide additional free space for testing.
# https://github.com/actions/virtual-environments/issues/2840
sudo rm -rf /usr/share/dotnet
@@ -48,7 +59,7 @@ jobs:
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- name: Tests
run: |
/usr/share/zfs/zfs-tests.sh -v -s 3G -r sanity
/usr/share/zfs/zfs-tests.sh -vR -s 3G -r sanity
- name: Prepare artifacts
if: failure()
run: |
@@ -57,7 +68,7 @@ jobs:
sudo cp /var/log/syslog $RESULTS_PATH/
sudo chmod +r $RESULTS_PATH/*
# Replace ':' in dir names, actions/upload-artifact doesn't support it
for f in $(find $RESULTS_PATH -name '*:*'); do mv "$f" "${f//:/__}"; done
for f in $(find /var/tmp/test_results -name '*:*'); do mv "$f" "${f//:/__}"; done
- uses: actions/upload-artifact@v2
if: failure()
with:
+3 -3
View File
@@ -22,8 +22,8 @@ jobs:
xfslibs-dev libattr1-dev libacl1-dev libudev-dev libdevmapper-dev \
libssl-dev libffi-dev libaio-dev libelf-dev libmount-dev \
libpam0g-dev \
python-dev python-setuptools python-cffi \
python3 python3-dev python3-setuptools python3-cffi
python-dev python-setuptools python-cffi python-packaging \
python3 python3-dev python3-setuptools python3-cffi python3-packaging
- name: Autogen.sh
run: |
sh autogen.sh
@@ -45,7 +45,7 @@ jobs:
run: |
sudo mkdir -p $TEST_DIR
# run for 20 minutes to have a total runner time of 30 minutes
sudo /usr/share/zfs/zloop.sh -t 1200 -l -m1 -- -T 120 -P 60
sudo /usr/share/zfs/zloop.sh -t 1200 -l -m1
- name: Prepare artifacts
if: failure()
run: |
+1 -1
View File
@@ -1,3 +1,3 @@
[submodule "scripts/zfs-images"]
path = scripts/zfs-images
url = https://github.com/openzfs/zfs-images
url = https://github.com/zfsonlinux/zfs-images
+2 -2
View File
@@ -1,10 +1,10 @@
Meta: 1
Name: zfs
Branch: 1.0
Version: 2.1.1
Version: 2.0.7
Release: 1
Release-Tags: relext
License: CDDL
Author: OpenZFS
Linux-Maximum: 5.14
Linux-Maximum: 5.15
Linux-Minimum: 3.10
+48 -20
View File
@@ -1,5 +1,3 @@
include $(top_srcdir)/config/Shellcheck.am
ACLOCAL_AMFLAGS = -I config
SUBDIRS = include
@@ -8,7 +6,7 @@ SUBDIRS += rpm
endif
if CONFIG_USER
SUBDIRS += man scripts lib tests cmd etc contrib
SUBDIRS += etc man scripts lib tests cmd contrib
if BUILD_LINUX
SUBDIRS += udev
endif
@@ -28,8 +26,8 @@ endif
AUTOMAKE_OPTIONS = foreign
EXTRA_DIST = autogen.sh copy-builtin
EXTRA_DIST += config/config.awk config/rpm.am config/deb.am config/tgz.am
EXTRA_DIST += AUTHORS CODE_OF_CONDUCT.md COPYRIGHT LICENSE META NEWS NOTICE
EXTRA_DIST += README.md RELEASES.md
EXTRA_DIST += META AUTHORS COPYRIGHT LICENSE NEWS NOTICE README.md
EXTRA_DIST += CODE_OF_CONDUCT.md
EXTRA_DIST += module/lua/README.zfs module/os/linux/spl/README.md
# Include all the extra licensing information for modules
@@ -125,29 +123,59 @@ cstyle:
filter_executable = -exec test -x '{}' \; -print
SHELLCHECKDIRS = cmd contrib etc scripts tests
SHELLCHECKSCRIPTS = autogen.sh
PHONY += shellcheck
shellcheck:
@if type shellcheck > /dev/null 2>&1; then \
shellcheck --exclude=SC1090 --exclude=SC1117 --format=gcc \
$$(find ${top_srcdir}/scripts/*.sh -type f) \
$$(find ${top_srcdir}/cmd/zed/zed.d/*.sh -type f) \
$$(find ${top_srcdir}/cmd/zpool/zpool.d/* \
-type f ${filter_executable}); \
else \
echo "skipping shellcheck because shellcheck is not installed"; \
fi
PHONY += checkabi storeabi
checklibabiversion:
libabiversion=`abidw -v | $(SED) 's/[^0-9]//g'`; \
if test $$libabiversion -lt "180"; then \
/bin/echo -e "\n" \
"*** Please use libabigail 1.8.0 version or newer;\n" \
"*** otherwise results are not consistent!\n"; \
exit 1; \
fi;
checkabi: checklibabiversion lib
checkabi: lib
$(MAKE) -C lib checkabi
storeabi: checklibabiversion lib
storeabi: lib
$(MAKE) -C lib storeabi
PHONY += checkbashisms
checkbashisms:
@if type checkbashisms > /dev/null 2>&1; then \
checkbashisms -n -p -x \
$$(find ${top_srcdir} \
-name '.git' -prune \
-o -name 'build' -prune \
-o -name 'tests' -prune \
-o -name 'config' -prune \
-o -name 'zed-functions.sh*' -prune \
-o -name 'zfs-import*' -prune \
-o -name 'zfs-mount*' -prune \
-o -name 'zfs-zed*' -prune \
-o -name 'smart' -prune \
-o -name 'paxcheck.sh' -prune \
-o -name 'make_gitrev.sh' -prune \
-o -name '90zfs' -prune \
-o -type f ! -name 'config*' \
! -name 'libtool' \
-exec sh -c 'awk "NR==1 && /#!.*bin\/sh.*/ {print FILENAME;}" "{}"' \;); \
else \
echo "skipping checkbashisms because checkbashisms is not installed"; \
fi
PHONY += mancheck
mancheck:
${top_srcdir}/scripts/mancheck.sh ${top_srcdir}/man ${top_srcdir}/tests/test-runner/man
@if type mandoc > /dev/null 2>&1; then \
find ${top_srcdir}/man/man8 -type f -name 'zfs.8' \
-o -name 'zpool.8' -o -name 'zdb.8' \
-o -name 'zgenhostid.8' | \
xargs mandoc -Tlint -Werror; \
else \
echo "skipping mancheck because mandoc is not installed"; \
fi
if BUILD_LINUX
stat_fmt = -c '%A %n'
+1 -1
View File
@@ -32,4 +32,4 @@ For more details see the NOTICE, LICENSE and COPYRIGHT files; `UCRL-CODE-235197`
# Supported Kernels
* The `META` file contains the officially recognized supported Linux kernel versions.
* Supported FreeBSD versions are any supported branches and releases starting from 12.2-RELEASE.
* Supported FreeBSD versions are 12-STABLE and 13-CURRENT.
-37
View File
@@ -1,37 +0,0 @@
OpenZFS uses the MAJOR.MINOR.PATCH versioning scheme described here:
* MAJOR - Incremented at the discretion of the OpenZFS developers to indicate
a particularly noteworthy feature or change. An increase in MAJOR number
does not indicate any incompatible on-disk format change. The ability
to import a ZFS pool is controlled by the feature flags enabled on the
pool and the feature flags supported by the installed OpenZFS version.
Increasing the MAJOR version is expected to be an infrequent occurrence.
* MINOR - Incremented to indicate new functionality such as a new feature
flag, pool/dataset property, zfs/zpool sub-command, new user/kernel
interface, etc. MINOR releases may introduce incompatible changes to the
user space library APIs (libzfs.so). Existing user/kernel interfaces are
considered to be stable to maximize compatibility between OpenZFS releases.
Additions to the user/kernel interface are backwards compatible.
* PATCH - Incremented when applying documentation updates, important bug
fixes, minor performance improvements, and kernel compatibility patches.
The user space library APIs and user/kernel interface are considered to
be stable. PATCH releases for a MAJOR.MINOR are published as needed.
Two release branches are maintained for OpenZFS, they are:
* OpenZFS LTS - A designated MAJOR.MINOR release with periodic PATCH
releases that incorporate important changes backported from newer OpenZFS
releases. This branch is intended for use in environments using an
LTS, enterprise, or similarly managed kernel (RHEL, Ubuntu LTS, Debian).
Minor changes to support these distribution kernels will be applied as
needed. New kernel versions released after the OpenZFS LTS release are
not supported. LTS releases will receive patches for at least 2 years.
The current LTS release is OpenZFS 2.1.
* OpenZFS current - Tracks the newest MAJOR.MINOR release. This branch
includes support for the latest OpenZFS features and recently releases
kernels. When a new MINOR release is tagged the previous MINOR release
will no longer be maintained (unless it is an LTS release). New MINOR
releases are planned to occur roughly annually.
+2 -9
View File
@@ -1,14 +1,8 @@
include $(top_srcdir)/config/Shellcheck.am
SUBDIRS = zfs zpool zdb zhack zinject zstream ztest
SUBDIRS = zfs zpool zdb zhack zinject zstream zstreamdump ztest
SUBDIRS += fsck_zfs vdev_id raidz_test zfs_ids_to_path
SUBDIRS += zpool_influxdb
CPPCHECKDIRS = zfs zpool zdb zhack zinject zstream ztest
CPPCHECKDIRS += raidz_test zfs_ids_to_path zpool_influxdb
# TODO: #12084: SHELLCHECKDIRS = fsck_zfs vdev_id zpool
SHELLCHECKDIRS = fsck_zfs zpool
CPPCHECKDIRS += raidz_test zfs_ids_to_path
if USING_PYTHON
SUBDIRS += arcstat arc_summary dbufstat
@@ -17,7 +11,6 @@ endif
if BUILD_LINUX
SUBDIRS += mount_zfs zed zgenhostid zvol_id zvol_wait
CPPCHECKDIRS += mount_zfs zed zgenhostid zvol_id
SHELLCHECKDIRS += zed
endif
PHONY = cppcheck
+58 -139
View File
@@ -102,6 +102,18 @@ show_tunable_descriptions = False
alternate_tunable_layout = False
def handle_Exception(ex_cls, ex, tb):
if ex is IOError:
if ex.errno == errno.EPIPE:
sys.exit()
if ex is KeyboardInterrupt:
sys.exit()
sys.excepthook = handle_Exception
def get_Kstat():
"""Collect information on the ZFS subsystem from the /proc virtual
file system. The name "kstat" is a holdover from the Solaris utility
@@ -213,30 +225,12 @@ def get_arc_summary(Kstat):
deleted = Kstat["kstat.zfs.misc.arcstats.deleted"]
mutex_miss = Kstat["kstat.zfs.misc.arcstats.mutex_miss"]
evict_skip = Kstat["kstat.zfs.misc.arcstats.evict_skip"]
evict_l2_cached = Kstat["kstat.zfs.misc.arcstats.evict_l2_cached"]
evict_l2_eligible = Kstat["kstat.zfs.misc.arcstats.evict_l2_eligible"]
evict_l2_eligible_mfu = Kstat["kstat.zfs.misc.arcstats.evict_l2_eligible_mfu"]
evict_l2_eligible_mru = Kstat["kstat.zfs.misc.arcstats.evict_l2_eligible_mru"]
evict_l2_ineligible = Kstat["kstat.zfs.misc.arcstats.evict_l2_ineligible"]
evict_l2_skip = Kstat["kstat.zfs.misc.arcstats.evict_l2_skip"]
# ARC Misc.
output["arc_misc"] = {}
output["arc_misc"]["deleted"] = fHits(deleted)
output["arc_misc"]["mutex_miss"] = fHits(mutex_miss)
output["arc_misc"]["evict_skips"] = fHits(evict_skip)
output["arc_misc"]["evict_l2_skip"] = fHits(evict_l2_skip)
output["arc_misc"]["evict_l2_cached"] = fBytes(evict_l2_cached)
output["arc_misc"]["evict_l2_eligible"] = fBytes(evict_l2_eligible)
output["arc_misc"]["evict_l2_eligible_mfu"] = {
'per': fPerc(evict_l2_eligible_mfu, evict_l2_eligible),
'num': fBytes(evict_l2_eligible_mfu),
}
output["arc_misc"]["evict_l2_eligible_mru"] = {
'per': fPerc(evict_l2_eligible_mru, evict_l2_eligible),
'num': fBytes(evict_l2_eligible_mru),
}
output["arc_misc"]["evict_l2_ineligible"] = fBytes(evict_l2_ineligible)
output["arc_misc"]['mutex_miss'] = fHits(mutex_miss)
output["arc_misc"]['evict_skips'] = fHits(evict_skip)
# ARC Sizing
arc_size = Kstat["kstat.zfs.misc.arcstats.size"]
@@ -352,26 +346,8 @@ def _arc_summary(Kstat):
sys.stdout.write("\tDeleted:\t\t\t\t%s\n" % arc['arc_misc']['deleted'])
sys.stdout.write("\tMutex Misses:\t\t\t\t%s\n" %
arc['arc_misc']['mutex_miss'])
sys.stdout.write("\tEviction Skips:\t\t\t\t%s\n" %
sys.stdout.write("\tEvict Skips:\t\t\t\t%s\n" %
arc['arc_misc']['evict_skips'])
sys.stdout.write("\tEviction Skips Due to L2 Writes:\t%s\n" %
arc['arc_misc']['evict_l2_skip'])
sys.stdout.write("\tL2 Cached Evictions:\t\t\t%s\n" %
arc['arc_misc']['evict_l2_cached'])
sys.stdout.write("\tL2 Eligible Evictions:\t\t\t%s\n" %
arc['arc_misc']['evict_l2_eligible'])
sys.stdout.write("\tL2 Eligible MFU Evictions:\t%s\t%s\n" % (
arc['arc_misc']['evict_l2_eligible_mfu']['per'],
arc['arc_misc']['evict_l2_eligible_mfu']['num'],
)
)
sys.stdout.write("\tL2 Eligible MRU Evictions:\t%s\t%s\n" % (
arc['arc_misc']['evict_l2_eligible_mru']['per'],
arc['arc_misc']['evict_l2_eligible_mru']['num'],
)
)
sys.stdout.write("\tL2 Ineligible Evictions:\t\t%s\n" %
arc['arc_misc']['evict_l2_ineligible'])
sys.stdout.write("\n")
# ARC Sizing
@@ -707,11 +683,6 @@ def get_l2arc_summary(Kstat):
l2_writes_done = Kstat["kstat.zfs.misc.arcstats.l2_writes_done"]
l2_writes_error = Kstat["kstat.zfs.misc.arcstats.l2_writes_error"]
l2_writes_sent = Kstat["kstat.zfs.misc.arcstats.l2_writes_sent"]
l2_mfu_asize = Kstat["kstat.zfs.misc.arcstats.l2_mfu_asize"]
l2_mru_asize = Kstat["kstat.zfs.misc.arcstats.l2_mru_asize"]
l2_prefetch_asize = Kstat["kstat.zfs.misc.arcstats.l2_prefetch_asize"]
l2_bufc_data_asize = Kstat["kstat.zfs.misc.arcstats.l2_bufc_data_asize"]
l2_bufc_metadata_asize = Kstat["kstat.zfs.misc.arcstats.l2_bufc_metadata_asize"]
l2_access_total = (l2_hits + l2_misses)
output['l2_health_count'] = (l2_writes_error + l2_cksum_bad + l2_io_error)
@@ -734,7 +705,7 @@ def get_l2arc_summary(Kstat):
output["io_errors"] = fHits(l2_io_error)
output["l2_arc_size"] = {}
output["l2_arc_size"]["adaptive"] = fBytes(l2_size)
output["l2_arc_size"]["adative"] = fBytes(l2_size)
output["l2_arc_size"]["actual"] = {
'per': fPerc(l2_asize, l2_size),
'num': fBytes(l2_asize)
@@ -743,26 +714,6 @@ def get_l2arc_summary(Kstat):
'per': fPerc(l2_hdr_size, l2_size),
'num': fBytes(l2_hdr_size),
}
output["l2_arc_size"]["mfu_asize"] = {
'per': fPerc(l2_mfu_asize, l2_asize),
'num': fBytes(l2_mfu_asize),
}
output["l2_arc_size"]["mru_asize"] = {
'per': fPerc(l2_mru_asize, l2_asize),
'num': fBytes(l2_mru_asize),
}
output["l2_arc_size"]["prefetch_asize"] = {
'per': fPerc(l2_prefetch_asize, l2_asize),
'num': fBytes(l2_prefetch_asize),
}
output["l2_arc_size"]["bufc_data_asize"] = {
'per': fPerc(l2_bufc_data_asize, l2_asize),
'num': fBytes(l2_bufc_data_asize),
}
output["l2_arc_size"]["bufc_metadata_asize"] = {
'per': fPerc(l2_bufc_metadata_asize, l2_asize),
'num': fBytes(l2_bufc_metadata_asize),
}
output["l2_arc_evicts"] = {}
output["l2_arc_evicts"]['lock_retries'] = fHits(l2_evict_lock_retry)
@@ -827,7 +778,7 @@ def _l2arc_summary(Kstat):
sys.stdout.write("\n")
sys.stdout.write("L2 ARC Size: (Adaptive)\t\t\t\t%s\n" %
arc["l2_arc_size"]["adaptive"])
arc["l2_arc_size"]["adative"])
sys.stdout.write("\tCompressed:\t\t\t%s\t%s\n" % (
arc["l2_arc_size"]["actual"]["per"],
arc["l2_arc_size"]["actual"]["num"],
@@ -838,36 +789,11 @@ def _l2arc_summary(Kstat):
arc["l2_arc_size"]["head_size"]["num"],
)
)
sys.stdout.write("\tMFU Alloc. Size:\t\t%s\t%s\n" % (
arc["l2_arc_size"]["mfu_asize"]["per"],
arc["l2_arc_size"]["mfu_asize"]["num"],
)
)
sys.stdout.write("\tMRU Alloc. Size:\t\t%s\t%s\n" % (
arc["l2_arc_size"]["mru_asize"]["per"],
arc["l2_arc_size"]["mru_asize"]["num"],
)
)
sys.stdout.write("\tPrefetch Alloc. Size:\t\t%s\t%s\n" % (
arc["l2_arc_size"]["prefetch_asize"]["per"],
arc["l2_arc_size"]["prefetch_asize"]["num"],
)
)
sys.stdout.write("\tData (buf content) Alloc. Size:\t%s\t%s\n" % (
arc["l2_arc_size"]["bufc_data_asize"]["per"],
arc["l2_arc_size"]["bufc_data_asize"]["num"],
)
)
sys.stdout.write("\tMetadata (buf content) Size:\t%s\t%s\n" % (
arc["l2_arc_size"]["bufc_metadata_asize"]["per"],
arc["l2_arc_size"]["bufc_metadata_asize"]["num"],
)
)
sys.stdout.write("\n")
if arc["l2_arc_evicts"]['lock_retries'] != '0' or \
arc["l2_arc_evicts"]["reading"] != '0':
sys.stdout.write("L2 ARC Evictions:\n")
sys.stdout.write("L2 ARC Evicts:\n")
sys.stdout.write("\tLock Retries:\t\t\t\t%s\n" %
arc["l2_arc_evicts"]['lock_retries'])
sys.stdout.write("\tUpon Reading:\t\t\t\t%s\n" %
@@ -1125,55 +1051,48 @@ def main():
global alternate_tunable_layout
try:
try:
opts, args = getopt.getopt(
sys.argv[1:],
"adp:h", ["alternate", "description", "page=", "help"]
)
except getopt.error as e:
sys.stderr.write("Error: %s\n" % e.msg)
opts, args = getopt.getopt(
sys.argv[1:],
"adp:h", ["alternate", "description", "page=", "help"]
)
except getopt.error as e:
sys.stderr.write("Error: %s\n" % e.msg)
usage()
sys.exit(1)
args = {}
for opt, arg in opts:
if opt in ('-a', '--alternate'):
args['a'] = True
if opt in ('-d', '--description'):
args['d'] = True
if opt in ('-p', '--page'):
args['p'] = arg
if opt in ('-h', '--help'):
usage()
sys.exit(1)
args = {}
for opt, arg in opts:
if opt in ('-a', '--alternate'):
args['a'] = True
if opt in ('-d', '--description'):
args['d'] = True
if opt in ('-p', '--page'):
args['p'] = arg
if opt in ('-h', '--help'):
usage()
sys.exit(0)
Kstat = get_Kstat()
alternate_tunable_layout = 'a' in args
show_tunable_descriptions = 'd' in args
pages = []
if 'p' in args:
try:
pages.append(unSub[int(args['p']) - 1])
except IndexError:
sys.stderr.write('the argument to -p must be between 1 and ' +
str(len(unSub)) + '\n')
sys.exit(1)
else:
pages = unSub
zfs_header()
for page in pages:
page(Kstat)
sys.stdout.write("\n")
except IOError as ex:
if (ex.errno == errno.EPIPE):
sys.exit(0)
raise
except KeyboardInterrupt:
sys.exit(0)
Kstat = get_Kstat()
alternate_tunable_layout = 'a' in args
show_tunable_descriptions = 'd' in args
pages = []
if 'p' in args:
try:
pages.append(unSub[int(args['p']) - 1])
except IndexError:
sys.stderr.write('the argument to -p must be between 1 and ' +
str(len(unSub)) + '\n')
sys.exit(1)
else:
pages = unSub
zfs_header()
for page in pages:
page(Kstat)
sys.stdout.write("\n")
if __name__ == '__main__':
+15 -64
View File
@@ -42,13 +42,6 @@ import os
import subprocess
import sys
import time
import errno
# We can't use env -S portably, and we need python3 -u to handle pipes in
# the shell abruptly closing the way we want to, so...
import io
if isinstance(sys.__stderr__.buffer, io.BufferedWriter):
os.execv(sys.executable, [sys.executable, "-u"] + sys.argv)
DESCRIPTION = 'Print ARC and other statistics for OpenZFS'
INDENT = ' '*8
@@ -168,11 +161,21 @@ elif sys.platform.startswith('linux'):
# The original arc_summary called /sbin/modinfo/{spl,zfs} to get
# the version information. We switch to /sys/module/{spl,zfs}/version
# to make sure we get what is really loaded in the kernel
try:
with open("/sys/module/{}/version".format(request)) as f:
return f.read().strip()
except:
return "(unknown)"
command = ["cat", "/sys/module/{0}/version".format(request)]
req = request.upper()
# The recommended way to do this is with subprocess.run(). However,
# some installed versions of Python are < 3.5, so we offer them
# the option of doing it the old way (for now)
if 'run' in dir(subprocess):
info = subprocess.run(command, stdout=subprocess.PIPE,
universal_newlines=True)
version = info.stdout.strip()
else:
info = subprocess.check_output(command, universal_newlines=True)
version = info.strip()
return version
def get_descriptions(request):
"""Get the descriptions of the Solaris Porting Layer (SPL) or the
@@ -228,29 +231,6 @@ elif sys.platform.startswith('linux'):
return descs
def handle_unraisableException(exc_type, exc_value=None, exc_traceback=None,
err_msg=None, object=None):
handle_Exception(exc_type, object, exc_traceback)
def handle_Exception(ex_cls, ex, tb):
if ex_cls is KeyboardInterrupt:
sys.exit()
if ex_cls is BrokenPipeError:
# It turns out that while sys.exit() triggers an exception
# not handled message on Python 3.8+, os._exit() does not.
os._exit(0)
if ex_cls is OSError:
if ex.errno == errno.ENOTCONN:
sys.exit()
raise ex
if hasattr(sys,'unraisablehook'): # Python 3.8+
sys.unraisablehook = handle_unraisableException
sys.excepthook = handle_Exception
def cleanup_line(single_line):
"""Format a raw line of data from /proc and isolate the name value
@@ -612,20 +592,6 @@ def section_arc(kstats_dict):
prt_i1('Deleted:', f_hits(arc_stats['deleted']))
prt_i1('Mutex misses:', f_hits(arc_stats['mutex_miss']))
prt_i1('Eviction skips:', f_hits(arc_stats['evict_skip']))
prt_i1('Eviction skips due to L2 writes:',
f_hits(arc_stats['evict_l2_skip']))
prt_i1('L2 cached evictions:', f_bytes(arc_stats['evict_l2_cached']))
prt_i1('L2 eligible evictions:', f_bytes(arc_stats['evict_l2_eligible']))
prt_i2('L2 eligible MFU evictions:',
f_perc(arc_stats['evict_l2_eligible_mfu'],
arc_stats['evict_l2_eligible']),
f_bytes(arc_stats['evict_l2_eligible_mfu']))
prt_i2('L2 eligible MRU evictions:',
f_perc(arc_stats['evict_l2_eligible_mru'],
arc_stats['evict_l2_eligible']),
f_bytes(arc_stats['evict_l2_eligible_mru']))
prt_i1('L2 ineligible evictions:',
f_bytes(arc_stats['evict_l2_ineligible']))
print()
@@ -764,21 +730,6 @@ def section_l2arc(kstats_dict):
prt_i2('Header size:',
f_perc(arc_stats['l2_hdr_size'], arc_stats['l2_size']),
f_bytes(arc_stats['l2_hdr_size']))
prt_i2('MFU allocated size:',
f_perc(arc_stats['l2_mfu_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_mfu_asize']))
prt_i2('MRU allocated size:',
f_perc(arc_stats['l2_mru_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_mru_asize']))
prt_i2('Prefetch allocated size:',
f_perc(arc_stats['l2_prefetch_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_prefetch_asize']))
prt_i2('Data (buffer content) allocated size:',
f_perc(arc_stats['l2_bufc_data_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_bufc_data_asize']))
prt_i2('Metadata (buffer content) allocated size:',
f_perc(arc_stats['l2_bufc_metadata_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_bufc_metadata_asize']))
print()
prt_1('L2ARC breakdown:', f_hits(l2_access_total))
-33
View File
@@ -88,12 +88,6 @@ cols = {
"mfug": [4, 1000, "MFU ghost list hits per second"],
"mrug": [4, 1000, "MRU ghost list hits per second"],
"eskip": [5, 1000, "evict_skip per second"],
"el2skip": [7, 1000, "evict skip, due to l2 writes, per second"],
"el2cach": [7, 1024, "Size of L2 cached evictions per second"],
"el2el": [5, 1024, "Size of L2 eligible evictions per second"],
"el2mfu": [6, 1024, "Size of L2 eligible MFU evictions per second"],
"el2mru": [6, 1024, "Size of L2 eligible MRU evictions per second"],
"el2inel": [7, 1024, "Size of L2 ineligible evictions per second"],
"mtxmis": [6, 1000, "mutex_miss per second"],
"dread": [5, 1000, "Demand accesses per second"],
"pread": [5, 1000, "Prefetch accesses per second"],
@@ -102,16 +96,6 @@ cols = {
"l2read": [6, 1000, "Total L2ARC accesses per second"],
"l2hit%": [6, 100, "L2ARC access hit percentage"],
"l2miss%": [7, 100, "L2ARC access miss percentage"],
"l2pref": [6, 1024, "L2ARC prefetch allocated size"],
"l2mfu": [5, 1024, "L2ARC MFU allocated size"],
"l2mru": [5, 1024, "L2ARC MRU allocated size"],
"l2data": [6, 1024, "L2ARC data allocated size"],
"l2meta": [6, 1024, "L2ARC metadata allocated size"],
"l2pref%": [7, 100, "L2ARC prefetch percentage"],
"l2mfu%": [6, 100, "L2ARC MFU percentage"],
"l2mru%": [6, 100, "L2ARC MRU percentage"],
"l2data%": [7, 100, "L2ARC data percentage"],
"l2meta%": [7, 100, "L2ARC metadata percentage"],
"l2asize": [7, 1024, "Actual (compressed) size of the L2ARC"],
"l2size": [6, 1024, "Size of the L2ARC"],
"l2bytes": [7, 1024, "Bytes read per second from the L2ARC"],
@@ -479,12 +463,6 @@ def calculate():
v["mrug"] = d["mru_ghost_hits"] / sint
v["mfug"] = d["mfu_ghost_hits"] / sint
v["eskip"] = d["evict_skip"] / sint
v["el2skip"] = d["evict_l2_skip"] / sint
v["el2cach"] = d["evict_l2_cached"] / sint
v["el2el"] = d["evict_l2_eligible"] / sint
v["el2mfu"] = d["evict_l2_eligible_mfu"] / sint
v["el2mru"] = d["evict_l2_eligible_mru"] / sint
v["el2inel"] = d["evict_l2_ineligible"] / sint
v["mtxmis"] = d["mutex_miss"] / sint
if l2exist:
@@ -498,17 +476,6 @@ def calculate():
v["l2size"] = cur["l2_size"]
v["l2bytes"] = d["l2_read_bytes"] / sint
v["l2pref"] = cur["l2_prefetch_asize"]
v["l2mfu"] = cur["l2_mfu_asize"]
v["l2mru"] = cur["l2_mru_asize"]
v["l2data"] = cur["l2_bufc_data_asize"]
v["l2meta"] = cur["l2_bufc_metadata_asize"]
v["l2pref%"] = 100 * v["l2pref"] / v["l2asize"]
v["l2mfu%"] = 100 * v["l2mfu"] / v["l2asize"]
v["l2mru%"] = 100 * v["l2mru"] / v["l2asize"]
v["l2data%"] = 100 * v["l2data"] / v["l2asize"]
v["l2meta%"] = 100 * v["l2meta"] / v["l2asize"]
v["grow"] = 0 if cur["arc_no_grow"] else 1
v["need"] = cur["arc_need_free"]
v["free"] = cur["memory_free_bytes"]
-1
View File
@@ -1 +0,0 @@
/fsck.zfs
-5
View File
@@ -1,6 +1 @@
include $(top_srcdir)/config/Substfiles.am
include $(top_srcdir)/config/Shellcheck.am
dist_sbin_SCRIPTS = fsck.zfs
SUBSTFILES += $(dist_sbin_SCRIPTS)
+9
View File
@@ -0,0 +1,9 @@
#!/bin/sh
#
# fsck.zfs: A fsck helper to accommodate distributions that expect
# to be able to execute a fsck on all filesystem types. Currently
# this script does nothing but it could be extended to act as a
# compatibility wrapper for 'zpool scrub'.
#
exit 0
-44
View File
@@ -1,44 +0,0 @@
#!/bin/sh
#
# fsck.zfs: A fsck helper to accommodate distributions that expect
# to be able to execute a fsck on all filesystem types.
#
# This script simply bubbles up some already-known-about errors,
# see fsck.zfs(8)
#
if [ "$#" = "0" ]; then
echo "Usage: $0 [options] dataset…" >&2
exit 16
fi
ret=0
for dataset in "$@"; do
case "$dataset" in
-*)
continue
;;
*)
;;
esac
pool="${dataset%%/*}"
case "$(@sbindir@/zpool list -Ho health "$pool")" in
DEGRADED)
ret=$(( ret | 4 ))
;;
FAULTED)
awk '!/^([[:space:]]*#.*)?$/ && $1 == "'"$dataset"'" && $3 == "zfs" {exit 1}' /etc/fstab || \
ret=$(( ret | 8 ))
;;
"")
# Pool not found, error printed by zpool(8)
ret=$(( ret | 8 ))
;;
*)
;;
esac
done
exit "$ret"
+4 -5
View File
@@ -185,11 +185,10 @@ main(int argc, char **argv)
break;
case 'h':
case '?':
if (optopt)
(void) fprintf(stderr,
gettext("Invalid option '%c'\n"), optopt);
(void) fprintf(stderr, gettext("Invalid option '%c'\n"),
optopt);
(void) fprintf(stderr, gettext("Usage: mount.zfs "
"[-sfnvh] [-o options] <dataset> <mountpoint>\n"));
"[-sfnv] [-o options] <dataset> <mountpoint>\n"));
return (MOUNT_USAGE);
}
}
@@ -368,7 +367,7 @@ main(int argc, char **argv)
"mount the filesystem again.\n"), dataset);
return (MOUNT_SYSERR);
}
/* fallthru */
fallthrough;
#endif
default:
(void) fprintf(stderr, gettext("filesystem "
+6 -21
View File
@@ -31,6 +31,8 @@
#include <sys/vdev_raidz_impl.h>
#include <stdio.h>
#include <sys/time.h>
#include "raidz_test.h"
#define GEN_BENCH_MEMORY (((uint64_t)1ULL)<<32)
@@ -81,17 +83,8 @@ run_gen_bench_impl(const char *impl)
/* create suitable raidz_map */
ncols = rto_opts.rto_dcols + fn + 1;
zio_bench.io_size = 1ULL << ds;
if (rto_opts.rto_expand) {
rm_bench = vdev_raidz_map_alloc_expanded(
zio_bench.io_abd,
zio_bench.io_size, zio_bench.io_offset,
rto_opts.rto_ashift, ncols+1, ncols,
fn+1, rto_opts.rto_expand_offset);
} else {
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, fn+1);
}
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, fn+1);
/* estimate iteration count */
iter_cnt = GEN_BENCH_MEMORY;
@@ -170,16 +163,8 @@ run_rec_bench_impl(const char *impl)
(1ULL << BENCH_ASHIFT))
continue;
if (rto_opts.rto_expand) {
rm_bench = vdev_raidz_map_alloc_expanded(
zio_bench.io_abd,
zio_bench.io_size, zio_bench.io_offset,
BENCH_ASHIFT, ncols+1, ncols,
PARITY_PQR, rto_opts.rto_expand_offset);
} else {
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, PARITY_PQR);
}
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, PARITY_PQR);
/* estimate iteration count */
iter_cnt = (REC_BENCH_MEMORY);
+45 -284
View File
@@ -77,20 +77,16 @@ static void print_opts(raidz_test_opts_t *opts, boolean_t force)
(void) fprintf(stdout, DBLSEP "Running with options:\n"
" (-a) zio ashift : %zu\n"
" (-o) zio offset : 1 << %zu\n"
" (-e) expanded map : %s\n"
" (-r) reflow offset : %llx\n"
" (-d) number of raidz data columns : %zu\n"
" (-s) size of DATA : 1 << %zu\n"
" (-S) sweep parameters : %s \n"
" (-v) verbose : %s \n\n",
opts->rto_ashift, /* -a */
ilog2(opts->rto_offset), /* -o */
opts->rto_expand ? "yes" : "no", /* -e */
(u_longlong_t)opts->rto_expand_offset, /* -r */
opts->rto_dcols, /* -d */
ilog2(opts->rto_dsize), /* -s */
opts->rto_sweep ? "yes" : "no", /* -S */
verbose); /* -v */
opts->rto_ashift, /* -a */
ilog2(opts->rto_offset), /* -o */
opts->rto_dcols, /* -d */
ilog2(opts->rto_dsize), /* -s */
opts->rto_sweep ? "yes" : "no", /* -S */
verbose); /* -v */
}
}
@@ -108,8 +104,6 @@ static void usage(boolean_t requested)
"\t[-S parameter sweep (default: %s)]\n"
"\t[-t timeout for parameter sweep test]\n"
"\t[-B benchmark all raidz implementations]\n"
"\t[-e use expanded raidz map (default: %s)]\n"
"\t[-r expanded raidz map reflow offset (default: %llx)]\n"
"\t[-v increase verbosity (default: %zu)]\n"
"\t[-h (print help)]\n"
"\t[-T test the test, see if failure would be detected]\n"
@@ -120,8 +114,6 @@ static void usage(boolean_t requested)
o->rto_dcols, /* -d */
ilog2(o->rto_dsize), /* -s */
rto_opts.rto_sweep ? "yes" : "no", /* -S */
rto_opts.rto_expand ? "yes" : "no", /* -e */
(u_longlong_t)o->rto_expand_offset, /* -r */
o->rto_v); /* -d */
exit(requested ? 0 : 1);
@@ -136,7 +128,7 @@ static void process_options(int argc, char **argv)
bcopy(&rto_opts_defaults, o, sizeof (*o));
while ((opt = getopt(argc, argv, "TDBSvha:er:o:d:s:t:")) != -1) {
while ((opt = getopt(argc, argv, "TDBSvha:o:d:s:t:")) != -1) {
value = 0;
switch (opt) {
@@ -144,12 +136,6 @@ static void process_options(int argc, char **argv)
value = strtoull(optarg, NULL, 0);
o->rto_ashift = MIN(13, MAX(9, value));
break;
case 'e':
o->rto_expand = 1;
break;
case 'r':
o->rto_expand_offset = strtoull(optarg, NULL, 0);
break;
case 'o':
value = strtoull(optarg, NULL, 0);
o->rto_offset = ((1ULL << MIN(12, value)) >> 9) << 9;
@@ -193,34 +179,25 @@ static void process_options(int argc, char **argv)
}
}
#define DATA_COL(rr, i) ((rr)->rr_col[rr->rr_firstdatacol + (i)].rc_abd)
#define DATA_COL_SIZE(rr, i) ((rr)->rr_col[rr->rr_firstdatacol + (i)].rc_size)
#define DATA_COL(rm, i) ((rm)->rm_col[raidz_parity(rm) + (i)].rc_abd)
#define DATA_COL_SIZE(rm, i) ((rm)->rm_col[raidz_parity(rm) + (i)].rc_size)
#define CODE_COL(rr, i) ((rr)->rr_col[(i)].rc_abd)
#define CODE_COL_SIZE(rr, i) ((rr)->rr_col[(i)].rc_size)
#define CODE_COL(rm, i) ((rm)->rm_col[(i)].rc_abd)
#define CODE_COL_SIZE(rm, i) ((rm)->rm_col[(i)].rc_size)
static int
cmp_code(raidz_test_opts_t *opts, const raidz_map_t *rm, const int parity)
{
int r, i, ret = 0;
int i, ret = 0;
VERIFY(parity >= 1 && parity <= 3);
for (r = 0; r < rm->rm_nrows; r++) {
raidz_row_t * const rr = rm->rm_row[r];
raidz_row_t * const rrg = opts->rm_golden->rm_row[r];
for (i = 0; i < parity; i++) {
if (CODE_COL_SIZE(rrg, i) == 0) {
VERIFY0(CODE_COL_SIZE(rr, i));
continue;
}
if (abd_cmp(CODE_COL(rr, i),
CODE_COL(rrg, i)) != 0) {
ret++;
LOG_OPT(D_DEBUG, opts,
"\nParity block [%d] different!\n", i);
}
for (i = 0; i < parity; i++) {
if (abd_cmp(CODE_COL(rm, i), CODE_COL(opts->rm_golden, i))
!= 0) {
ret++;
LOG_OPT(D_DEBUG, opts,
"\nParity block [%d] different!\n", i);
}
}
return (ret);
@@ -229,26 +206,16 @@ cmp_code(raidz_test_opts_t *opts, const raidz_map_t *rm, const int parity)
static int
cmp_data(raidz_test_opts_t *opts, raidz_map_t *rm)
{
int r, i, dcols, ret = 0;
int i, ret = 0;
int dcols = opts->rm_golden->rm_cols - raidz_parity(opts->rm_golden);
for (r = 0; r < rm->rm_nrows; r++) {
raidz_row_t *rr = rm->rm_row[r];
raidz_row_t *rrg = opts->rm_golden->rm_row[r];
dcols = opts->rm_golden->rm_row[0]->rr_cols -
raidz_parity(opts->rm_golden);
for (i = 0; i < dcols; i++) {
if (DATA_COL_SIZE(rrg, i) == 0) {
VERIFY0(DATA_COL_SIZE(rr, i));
continue;
}
for (i = 0; i < dcols; i++) {
if (abd_cmp(DATA_COL(opts->rm_golden, i), DATA_COL(rm, i))
!= 0) {
ret++;
if (abd_cmp(DATA_COL(rrg, i),
DATA_COL(rr, i)) != 0) {
ret++;
LOG_OPT(D_DEBUG, opts,
"\nData block [%d] different!\n", i);
}
LOG_OPT(D_DEBUG, opts,
"\nData block [%d] different!\n", i);
}
}
return (ret);
@@ -269,13 +236,12 @@ init_rand(void *data, size_t size, void *private)
static void
corrupt_colums(raidz_map_t *rm, const int *tgts, const int cnt)
{
for (int r = 0; r < rm->rm_nrows; r++) {
raidz_row_t *rr = rm->rm_row[r];
for (int i = 0; i < cnt; i++) {
raidz_col_t *col = &rr->rr_col[tgts[i]];
abd_iterate_func(col->rc_abd, 0, col->rc_size,
init_rand, NULL);
}
int i;
raidz_col_t *col;
for (i = 0; i < cnt; i++) {
col = &rm->rm_col[tgts[i]];
abd_iterate_func(col->rc_abd, 0, col->rc_size, init_rand, NULL);
}
}
@@ -322,22 +288,10 @@ init_raidz_golden_map(raidz_test_opts_t *opts, const int parity)
VERIFY0(vdev_raidz_impl_set("original"));
if (opts->rto_expand) {
opts->rm_golden =
vdev_raidz_map_alloc_expanded(opts->zio_golden->io_abd,
opts->zio_golden->io_size, opts->zio_golden->io_offset,
opts->rto_ashift, total_ncols+1, total_ncols,
parity, opts->rto_expand_offset);
rm_test = vdev_raidz_map_alloc_expanded(zio_test->io_abd,
zio_test->io_size, zio_test->io_offset,
opts->rto_ashift, total_ncols+1, total_ncols,
parity, opts->rto_expand_offset);
} else {
opts->rm_golden = vdev_raidz_map_alloc(opts->zio_golden,
opts->rto_ashift, total_ncols, parity);
rm_test = vdev_raidz_map_alloc(zio_test,
opts->rto_ashift, total_ncols, parity);
}
opts->rm_golden = vdev_raidz_map_alloc(opts->zio_golden,
opts->rto_ashift, total_ncols, parity);
rm_test = vdev_raidz_map_alloc(zio_test,
opts->rto_ashift, total_ncols, parity);
VERIFY(opts->zio_golden);
VERIFY(opts->rm_golden);
@@ -358,187 +312,6 @@ init_raidz_golden_map(raidz_test_opts_t *opts, const int parity)
return (err);
}
/*
* If reflow is not in progress, reflow_offset should be UINT64_MAX.
* For each row, if the row is entirely before reflow_offset, it will
* come from the new location. Otherwise this row will come from the
* old location. Therefore, rows that straddle the reflow_offset will
* come from the old location.
*
* NOTE: Until raidz expansion is implemented this function is only
* needed by raidz_test.c to the multi-row raid_map_t functionality.
*/
raidz_map_t *
vdev_raidz_map_alloc_expanded(abd_t *abd, uint64_t size, uint64_t offset,
uint64_t ashift, uint64_t physical_cols, uint64_t logical_cols,
uint64_t nparity, uint64_t reflow_offset)
{
/* The zio's size in units of the vdev's minimum sector size. */
uint64_t s = size >> ashift;
uint64_t q, r, bc, devidx, asize = 0, tot;
/*
* "Quotient": The number of data sectors for this stripe on all but
* the "big column" child vdevs that also contain "remainder" data.
* AKA "full rows"
*/
q = s / (logical_cols - nparity);
/*
* "Remainder": The number of partial stripe data sectors in this I/O.
* This will add a sector to some, but not all, child vdevs.
*/
r = s - q * (logical_cols - nparity);
/* The number of "big columns" - those which contain remainder data. */
bc = (r == 0 ? 0 : r + nparity);
/*
* The total number of data and parity sectors associated with
* this I/O.
*/
tot = s + nparity * (q + (r == 0 ? 0 : 1));
/* How many rows contain data (not skip) */
uint64_t rows = howmany(tot, logical_cols);
int cols = MIN(tot, logical_cols);
raidz_map_t *rm = kmem_zalloc(offsetof(raidz_map_t, rm_row[rows]),
KM_SLEEP);
rm->rm_nrows = rows;
for (uint64_t row = 0; row < rows; row++) {
raidz_row_t *rr = kmem_alloc(offsetof(raidz_row_t,
rr_col[cols]), KM_SLEEP);
rm->rm_row[row] = rr;
/* The starting RAIDZ (parent) vdev sector of the row. */
uint64_t b = (offset >> ashift) + row * logical_cols;
/*
* If we are in the middle of a reflow, and any part of this
* row has not been copied, then use the old location of
* this row.
*/
int row_phys_cols = physical_cols;
if (b + (logical_cols - nparity) > reflow_offset >> ashift)
row_phys_cols--;
/* starting child of this row */
uint64_t child_id = b % row_phys_cols;
/* The starting byte offset on each child vdev. */
uint64_t child_offset = (b / row_phys_cols) << ashift;
/*
* We set cols to the entire width of the block, even
* if this row is shorter. This is needed because parity
* generation (for Q and R) needs to know the entire width,
* because it treats the short row as though it was
* full-width (and the "phantom" sectors were zero-filled).
*
* Another approach to this would be to set cols shorter
* (to just the number of columns that we might do i/o to)
* and have another mechanism to tell the parity generation
* about the "entire width". Reconstruction (at least
* vdev_raidz_reconstruct_general()) would also need to
* know about the "entire width".
*/
rr->rr_cols = cols;
rr->rr_bigcols = bc;
rr->rr_missingdata = 0;
rr->rr_missingparity = 0;
rr->rr_firstdatacol = nparity;
rr->rr_abd_empty = NULL;
rr->rr_nempty = 0;
for (int c = 0; c < rr->rr_cols; c++, child_id++) {
if (child_id >= row_phys_cols) {
child_id -= row_phys_cols;
child_offset += 1ULL << ashift;
}
rr->rr_col[c].rc_devidx = child_id;
rr->rr_col[c].rc_offset = child_offset;
rr->rr_col[c].rc_orig_data = NULL;
rr->rr_col[c].rc_error = 0;
rr->rr_col[c].rc_tried = 0;
rr->rr_col[c].rc_skipped = 0;
rr->rr_col[c].rc_need_orig_restore = B_FALSE;
uint64_t dc = c - rr->rr_firstdatacol;
if (c < rr->rr_firstdatacol) {
rr->rr_col[c].rc_size = 1ULL << ashift;
rr->rr_col[c].rc_abd =
abd_alloc_linear(rr->rr_col[c].rc_size,
B_TRUE);
} else if (row == rows - 1 && bc != 0 && c >= bc) {
/*
* Past the end, this for parity generation.
*/
rr->rr_col[c].rc_size = 0;
rr->rr_col[c].rc_abd = NULL;
} else {
/*
* "data column" (col excluding parity)
* Add an ASCII art diagram here
*/
uint64_t off;
if (c < bc || r == 0) {
off = dc * rows + row;
} else {
off = r * rows +
(dc - r) * (rows - 1) + row;
}
rr->rr_col[c].rc_size = 1ULL << ashift;
rr->rr_col[c].rc_abd = abd_get_offset_struct(
&rr->rr_col[c].rc_abdstruct,
abd, off << ashift, 1 << ashift);
}
asize += rr->rr_col[c].rc_size;
}
/*
* If all data stored spans all columns, there's a danger that
* parity will always be on the same device and, since parity
* isn't read during normal operation, that that device's I/O
* bandwidth won't be used effectively. We therefore switch
* the parity every 1MB.
*
* ...at least that was, ostensibly, the theory. As a practical
* matter unless we juggle the parity between all devices
* evenly, we won't see any benefit. Further, occasional writes
* that aren't a multiple of the LCM of the number of children
* and the minimum stripe width are sufficient to avoid pessimal
* behavior. Unfortunately, this decision created an implicit
* on-disk format requirement that we need to support for all
* eternity, but only for single-parity RAID-Z.
*
* If we intend to skip a sector in the zeroth column for
* padding we must make sure to note this swap. We will never
* intend to skip the first column since at least one data and
* one parity column must appear in each row.
*/
if (rr->rr_firstdatacol == 1 && rr->rr_cols > 1 &&
(offset & (1ULL << 20))) {
ASSERT(rr->rr_cols >= 2);
ASSERT(rr->rr_col[0].rc_size == rr->rr_col[1].rc_size);
devidx = rr->rr_col[0].rc_devidx;
uint64_t o = rr->rr_col[0].rc_offset;
rr->rr_col[0].rc_devidx = rr->rr_col[1].rc_devidx;
rr->rr_col[0].rc_offset = rr->rr_col[1].rc_offset;
rr->rr_col[1].rc_devidx = devidx;
rr->rr_col[1].rc_offset = o;
}
}
ASSERT3U(asize, ==, tot << ashift);
/* init RAIDZ parity ops */
rm->rm_ops = vdev_raidz_math_get_ops();
return (rm);
}
static raidz_map_t *
init_raidz_map(raidz_test_opts_t *opts, zio_t **zio, const int parity)
{
@@ -557,15 +330,8 @@ init_raidz_map(raidz_test_opts_t *opts, zio_t **zio, const int parity)
(*zio)->io_abd = raidz_alloc(alloc_dsize);
init_zio_abd(*zio);
if (opts->rto_expand) {
rm = vdev_raidz_map_alloc_expanded((*zio)->io_abd,
(*zio)->io_size, (*zio)->io_offset,
opts->rto_ashift, total_ncols+1, total_ncols,
parity, opts->rto_expand_offset);
} else {
rm = vdev_raidz_map_alloc(*zio, opts->rto_ashift,
total_ncols, parity);
}
rm = vdev_raidz_map_alloc(*zio, opts->rto_ashift,
total_ncols, parity);
VERIFY(rm);
/* Make sure code columns are destroyed */
@@ -654,7 +420,7 @@ run_rec_check_impl(raidz_test_opts_t *opts, raidz_map_t *rm, const int fn)
if (fn < RAIDZ_REC_PQ) {
/* can reconstruct 1 failed data disk */
for (x0 = 0; x0 < opts->rto_dcols; x0++) {
if (x0 >= rm->rm_row[0]->rr_cols - raidz_parity(rm))
if (x0 >= rm->rm_cols - raidz_parity(rm))
continue;
/* Check if should stop */
@@ -679,11 +445,10 @@ run_rec_check_impl(raidz_test_opts_t *opts, raidz_map_t *rm, const int fn)
} else if (fn < RAIDZ_REC_PQR) {
/* can reconstruct 2 failed data disk */
for (x0 = 0; x0 < opts->rto_dcols; x0++) {
if (x0 >= rm->rm_row[0]->rr_cols - raidz_parity(rm))
if (x0 >= rm->rm_cols - raidz_parity(rm))
continue;
for (x1 = x0 + 1; x1 < opts->rto_dcols; x1++) {
if (x1 >= rm->rm_row[0]->rr_cols -
raidz_parity(rm))
if (x1 >= rm->rm_cols - raidz_parity(rm))
continue;
/* Check if should stop */
@@ -710,15 +475,14 @@ run_rec_check_impl(raidz_test_opts_t *opts, raidz_map_t *rm, const int fn)
} else {
/* can reconstruct 3 failed data disk */
for (x0 = 0; x0 < opts->rto_dcols; x0++) {
if (x0 >= rm->rm_row[0]->rr_cols - raidz_parity(rm))
if (x0 >= rm->rm_cols - raidz_parity(rm))
continue;
for (x1 = x0 + 1; x1 < opts->rto_dcols; x1++) {
if (x1 >= rm->rm_row[0]->rr_cols -
raidz_parity(rm))
if (x1 >= rm->rm_cols - raidz_parity(rm))
continue;
for (x2 = x1 + 1; x2 < opts->rto_dcols; x2++) {
if (x2 >= rm->rm_row[0]->rr_cols -
raidz_parity(rm))
if (x2 >=
rm->rm_cols - raidz_parity(rm))
continue;
/* Check if should stop */
@@ -936,8 +700,6 @@ run_sweep(void)
opts->rto_dcols = dcols_v[d];
opts->rto_offset = (1 << ashift_v[a]) * rand();
opts->rto_dsize = size_v[s];
opts->rto_expand = rto_opts.rto_expand;
opts->rto_expand_offset = rto_opts.rto_expand_offset;
opts->rto_v = 0; /* be quiet */
VERIFY3P(thread_create(NULL, 0, sweep_thread, (void *) opts,
@@ -970,7 +732,6 @@ exit:
return (sweep_state == SWEEP_ERROR ? SWEEP_ERROR : 0);
}
int
main(int argc, char **argv)
{
+1 -8
View File
@@ -44,15 +44,13 @@ static const char *raidz_impl_names[] = {
typedef struct raidz_test_opts {
size_t rto_ashift;
uint64_t rto_offset;
size_t rto_offset;
size_t rto_dcols;
size_t rto_dsize;
size_t rto_v;
size_t rto_sweep;
size_t rto_sweep_timeout;
size_t rto_benchmark;
size_t rto_expand;
uint64_t rto_expand_offset;
size_t rto_sanity;
size_t rto_gdb;
@@ -71,8 +69,6 @@ static const raidz_test_opts_t rto_opts_defaults = {
.rto_v = 0,
.rto_sweep = 0,
.rto_benchmark = 0,
.rto_expand = 0,
.rto_expand_offset = -1ULL,
.rto_sanity = 0,
.rto_gdb = 0,
.rto_should_stop = B_FALSE
@@ -117,7 +113,4 @@ void init_zio_abd(zio_t *zio);
void run_raidz_benchmark(void);
struct raidz_map *vdev_raidz_map_alloc_expanded(abd_t *, uint64_t, uint64_t,
uint64_t, uint64_t, uint64_t, uint64_t, uint64_t);
#endif /* RAIDZ_TEST_H */
-2
View File
@@ -1,3 +1 @@
include $(top_srcdir)/config/Shellcheck.am
dist_udev_SCRIPTS = vdev_id
+3 -3
View File
@@ -375,7 +375,7 @@ sas_handler() {
i=$((i + 1))
done
PHY=$(ls -d "$port_dir"/phy* 2>/dev/null | head -1 | awk -F: '{print $NF}')
PHY=$(ls -vd "$port_dir"/phy* 2>/dev/null | head -1 | awk -F: '{print $NF}')
if [ -z "$PHY" ] ; then
PHY=0
fi
@@ -622,8 +622,8 @@ enclosure_handler () {
PCI_ID=$(echo "$PCI_ID_LONG" | sed -r 's/^[0-9]+://g')
# Name our device according to vdev_id.conf (like "L0" or "U1").
NAME=$(awk '/channel/{if ($1 == "channel" && $2 == "$PCI_ID" && \
$3 == "$PORT_ID") {print ${4}int(count[$4])}; count[$4]++}' $CONFIG)
NAME=$(awk "/channel/{if (\$1 == \"channel\" && \$2 == \"$PCI_ID\" && \
\$3 == \"$PORT_ID\") {print \$4\$3}}" $CONFIG)
echo "${NAME}"
}
+20 -138
View File
@@ -31,7 +31,6 @@
*
* [1] Portions of this software were developed by Allan Jude
* under sponsorship from the FreeBSD Foundation.
* Copyright (c) 2021 Allan Jude
*/
#include <stdio.h>
@@ -783,14 +782,13 @@ usage(void)
"\t%s -m [-AFLPX] [-e [-V] [-p <path> ...]] [-t <txg>] "
"[-U <cache>]\n\t\t<poolname> [<vdev> [<metaslab> ...]]\n"
"\t%s -O <dataset> <path>\n"
"\t%s -r <dataset> <path> <destination>\n"
"\t%s -R [-A] [-e [-V] [-p <path> ...]] [-U <cache>]\n"
"\t\t<poolname> <vdev>:<offset>:<size>[:<flags>]\n"
"\t%s -E [-A] word0:word1:...:word15\n"
"\t%s -S [-AP] [-e [-V] [-p <path> ...]] [-U <cache>] "
"<poolname>\n\n",
cmdname, cmdname, cmdname, cmdname, cmdname, cmdname, cmdname,
cmdname, cmdname, cmdname, cmdname);
cmdname, cmdname, cmdname);
(void) fprintf(stderr, " Dataset name must include at least one "
"separator character '/' or '@'\n");
@@ -829,7 +827,6 @@ usage(void)
(void) fprintf(stderr, " -m metaslabs\n");
(void) fprintf(stderr, " -M metaslab groups\n");
(void) fprintf(stderr, " -O perform object lookups by path\n");
(void) fprintf(stderr, " -r copy an object by path to file\n");
(void) fprintf(stderr, " -R read and display block from a "
"device\n");
(void) fprintf(stderr, " -s report stats on zdb's I/O\n");
@@ -1672,11 +1669,7 @@ dump_metaslab(metaslab_t *msp)
SPACE_MAP_HISTOGRAM_SIZE, sm->sm_shift);
}
if (vd->vdev_ops == &vdev_draid_ops)
ASSERT3U(msp->ms_size, <=, 1ULL << vd->vdev_ms_shift);
else
ASSERT3U(msp->ms_size, ==, 1ULL << vd->vdev_ms_shift);
ASSERT(msp->ms_size == (1ULL << vd->vdev_ms_shift));
dump_spacemap(spa->spa_meta_objset, msp->ms_sm);
if (spa_feature_is_active(spa, SPA_FEATURE_LOG_SPACEMAP)) {
@@ -4096,7 +4089,7 @@ cksum_record_compare(const void *x1, const void *x2)
const cksum_record_t *l = (cksum_record_t *)x1;
const cksum_record_t *r = (cksum_record_t *)x2;
int arraysize = ARRAY_SIZE(l->cksum.zc_word);
int difference;
int difference = 0;
for (int i = 0; i < arraysize; i++) {
difference = TREE_CMP(l->cksum.zc_word[i], r->cksum.zc_word[i]);
@@ -4238,8 +4231,6 @@ dump_l2arc_log_entries(uint64_t log_entries,
(u_longlong_t)L2BLK_GET_PREFETCH((&le[j])->le_prop));
(void) printf("|\t\t\t\taddress: %llu\n",
(u_longlong_t)le[j].le_daddr);
(void) printf("|\t\t\t\tARC state: %llu\n",
(u_longlong_t)L2BLK_GET_STATE((&le[j])->le_prop));
(void) printf("|\n");
}
(void) printf("\n");
@@ -4522,7 +4513,7 @@ static char curpath[PATH_MAX];
* for the last one.
*/
static int
dump_path_impl(objset_t *os, uint64_t obj, char *name, uint64_t *retobj)
dump_path_impl(objset_t *os, uint64_t obj, char *name)
{
int err;
boolean_t header = B_TRUE;
@@ -4572,15 +4563,10 @@ dump_path_impl(objset_t *os, uint64_t obj, char *name, uint64_t *retobj)
switch (doi.doi_type) {
case DMU_OT_DIRECTORY_CONTENTS:
if (s != NULL && *(s + 1) != '\0')
return (dump_path_impl(os, child_obj, s + 1, retobj));
/*FALLTHROUGH*/
return (dump_path_impl(os, child_obj, s + 1));
fallthrough;
case DMU_OT_PLAIN_FILE_CONTENTS:
if (retobj != NULL) {
*retobj = child_obj;
} else {
dump_object(os, child_obj, dump_opt['v'], &header,
NULL, 0);
}
dump_object(os, child_obj, dump_opt['v'], &header, NULL, 0);
return (0);
default:
(void) fprintf(stderr, "object %llu has non-file/directory "
@@ -4595,7 +4581,7 @@ dump_path_impl(objset_t *os, uint64_t obj, char *name, uint64_t *retobj)
* Dump the blocks for the object specified by path inside the dataset.
*/
static int
dump_path(char *ds, char *path, uint64_t *retobj)
dump_path(char *ds, char *path)
{
int err;
objset_t *os;
@@ -4615,89 +4601,12 @@ dump_path(char *ds, char *path, uint64_t *retobj)
(void) snprintf(curpath, sizeof (curpath), "dataset=%s path=/", ds);
err = dump_path_impl(os, root_obj, path, retobj);
err = dump_path_impl(os, root_obj, path);
close_objset(os, FTAG);
return (err);
}
static int
zdb_copy_object(objset_t *os, uint64_t srcobj, char *destfile)
{
int err = 0;
uint64_t size, readsize, oursize, offset;
ssize_t writesize;
sa_handle_t *hdl;
(void) printf("Copying object %" PRIu64 " to file %s\n", srcobj,
destfile);
VERIFY3P(os, ==, sa_os);
if ((err = sa_handle_get(os, srcobj, NULL, SA_HDL_PRIVATE, &hdl))) {
(void) printf("Failed to get handle for SA znode\n");
return (err);
}
if ((err = sa_lookup(hdl, sa_attr_table[ZPL_SIZE], &size, 8))) {
(void) sa_handle_destroy(hdl);
return (err);
}
(void) sa_handle_destroy(hdl);
(void) printf("Object %" PRIu64 " is %" PRIu64 " bytes\n", srcobj,
size);
if (size == 0) {
return (EINVAL);
}
int fd = open(destfile, O_WRONLY | O_CREAT | O_TRUNC, 0644);
/*
* We cap the size at 1 mebibyte here to prevent
* allocation failures and nigh-infinite printing if the
* object is extremely large.
*/
oursize = MIN(size, 1 << 20);
offset = 0;
char *buf = kmem_alloc(oursize, KM_NOSLEEP);
if (buf == NULL) {
return (ENOMEM);
}
while (offset < size) {
readsize = MIN(size - offset, 1 << 20);
err = dmu_read(os, srcobj, offset, readsize, buf, 0);
if (err != 0) {
(void) printf("got error %u from dmu_read\n", err);
kmem_free(buf, oursize);
return (err);
}
if (dump_opt['v'] > 3) {
(void) printf("Read offset=%" PRIu64 " size=%" PRIu64
" error=%d\n", offset, readsize, err);
}
writesize = write(fd, buf, readsize);
if (writesize < 0) {
err = errno;
break;
} else if (writesize != readsize) {
/* Incomplete write */
(void) fprintf(stderr, "Short write, only wrote %llu of"
" %" PRIu64 " bytes, exiting...\n",
(u_longlong_t)writesize, readsize);
break;
}
offset += readsize;
}
(void) close(fd);
if (buf != NULL)
kmem_free(buf, oursize);
return (err);
}
static int
dump_label(const char *dev)
{
@@ -5321,6 +5230,8 @@ zdb_blkptr_done(zio_t *zio)
zdb_cb_t *zcb = zio->io_private;
zbookmark_phys_t *zb = &zio->io_bookmark;
abd_free(zio->io_abd);
mutex_enter(&spa->spa_scrub_lock);
spa->spa_load_verify_bytes -= BP_GET_PSIZE(bp);
cv_broadcast(&spa->spa_scrub_io_cv);
@@ -5347,8 +5258,6 @@ zdb_blkptr_done(zio_t *zio)
blkbuf);
}
mutex_exit(&spa->spa_scrub_lock);
abd_free(zio->io_abd);
}
static int
@@ -5958,7 +5867,6 @@ zdb_leak_init_prepare_indirect_vdevs(spa_t *spa, zdb_cb_t *zcb)
* metaslabs. We want to set them up for
* zio_claim().
*/
vdev_metaslab_group_create(vd);
VERIFY0(vdev_metaslab_init(vd, 0));
vdev_indirect_mapping_t *vim __maybe_unused =
@@ -5998,7 +5906,6 @@ zdb_leak_init(spa_t *spa, zdb_cb_t *zcb)
*/
spa->spa_normal_class->mc_ops = &zdb_metaslab_ops;
spa->spa_log_class->mc_ops = &zdb_metaslab_ops;
spa->spa_embedded_log_class->mc_ops = &zdb_metaslab_ops;
zcb->zcb_vd_obsolete_counts =
umem_zalloc(rvd->vdev_children * sizeof (uint32_t *),
@@ -6132,6 +6039,7 @@ zdb_leak_fini(spa_t *spa, zdb_cb_t *zcb)
vdev_t *rvd = spa->spa_root_vdev;
for (unsigned c = 0; c < rvd->vdev_children; c++) {
vdev_t *vd = rvd->vdev_child[c];
metaslab_group_t *mg __maybe_unused = vd->vdev_mg;
if (zcb->zcb_vd_obsolete_counts[c] != NULL) {
leaks |= zdb_check_for_obsolete_leaks(vd, zcb);
@@ -6139,9 +6047,7 @@ zdb_leak_fini(spa_t *spa, zdb_cb_t *zcb)
for (uint64_t m = 0; m < vd->vdev_ms_count; m++) {
metaslab_t *msp = vd->vdev_ms[m];
ASSERT3P(msp->ms_group, ==, (msp->ms_group->mg_class ==
spa_embedded_log_class(spa)) ?
vd->vdev_log_mg : vd->vdev_mg);
ASSERT3P(mg, ==, msp->ms_group);
/*
* ms_allocatable has been overloaded
@@ -6348,8 +6254,6 @@ dump_block_stats(spa_t *spa)
zcb.zcb_totalasize = metaslab_class_get_alloc(spa_normal_class(spa));
zcb.zcb_totalasize += metaslab_class_get_alloc(spa_special_class(spa));
zcb.zcb_totalasize += metaslab_class_get_alloc(spa_dedup_class(spa));
zcb.zcb_totalasize +=
metaslab_class_get_alloc(spa_embedded_log_class(spa));
zcb.zcb_start = zcb.zcb_lastprint = gethrtime();
err = traverse_pool(spa, 0, flags, zdb_blkptr_cb, &zcb);
@@ -6397,7 +6301,6 @@ dump_block_stats(spa_t *spa)
total_alloc = norm_alloc +
metaslab_class_get_alloc(spa_log_class(spa)) +
metaslab_class_get_alloc(spa_embedded_log_class(spa)) +
metaslab_class_get_alloc(spa_special_class(spa)) +
metaslab_class_get_alloc(spa_dedup_class(spa)) +
get_unflushed_alloc_space(spa);
@@ -6443,7 +6346,7 @@ dump_block_stats(spa_t *spa)
(void) printf("\t%-16s %14llu used: %5.2f%%\n", "Normal class:",
(u_longlong_t)norm_alloc, 100.0 * norm_alloc / norm_space);
if (spa_special_class(spa)->mc_allocator[0].mca_rotor != NULL) {
if (spa_special_class(spa)->mc_rotor != NULL) {
uint64_t alloc = metaslab_class_get_alloc(
spa_special_class(spa));
uint64_t space = metaslab_class_get_space(
@@ -6454,7 +6357,7 @@ dump_block_stats(spa_t *spa)
100.0 * alloc / space);
}
if (spa_dedup_class(spa)->mc_allocator[0].mca_rotor != NULL) {
if (spa_dedup_class(spa)->mc_rotor != NULL) {
uint64_t alloc = metaslab_class_get_alloc(
spa_dedup_class(spa));
uint64_t space = metaslab_class_get_space(
@@ -6465,17 +6368,6 @@ dump_block_stats(spa_t *spa)
100.0 * alloc / space);
}
if (spa_embedded_log_class(spa)->mc_allocator[0].mca_rotor != NULL) {
uint64_t alloc = metaslab_class_get_alloc(
spa_embedded_log_class(spa));
uint64_t space = metaslab_class_get_space(
spa_embedded_log_class(spa));
(void) printf("\t%-16s %14llu used: %5.2f%%\n",
"Embedded log class", (u_longlong_t)alloc,
100.0 * alloc / space);
}
for (i = 0; i < NUM_BP_EMBEDDED_TYPES; i++) {
if (zcb.zcb_embedded_blocks[i] == 0)
continue;
@@ -8282,7 +8174,6 @@ main(int argc, char **argv)
nvlist_t *policy = NULL;
uint64_t max_txg = UINT64_MAX;
int64_t objset_id = -1;
uint64_t object;
int flags = ZFS_IMPORT_MISSING_LOG;
int rewind = ZPOOL_NEVER_REWIND;
char *spa_config_path_env, *objset_str;
@@ -8311,7 +8202,7 @@ main(int argc, char **argv)
zfs_btree_verify_intensity = 3;
while ((c = getopt(argc, argv,
"AbcCdDeEFGhiI:klLmMo:Op:PqrRsSt:uU:vVx:XYyZ")) != -1) {
"AbcCdDeEFGhiI:klLmMo:Op:PqRsSt:uU:vVx:XYyZ")) != -1) {
switch (c) {
case 'b':
case 'c':
@@ -8326,7 +8217,6 @@ main(int argc, char **argv)
case 'm':
case 'M':
case 'O':
case 'r':
case 'R':
case 's':
case 'S':
@@ -8416,7 +8306,7 @@ main(int argc, char **argv)
(void) fprintf(stderr, "-p option requires use of -e\n");
usage();
}
if (dump_opt['d'] || dump_opt['r']) {
if (dump_opt['d']) {
/* <pool>[/<dataset | objset id> is accepted */
if (argv[2] && (objset_str = strchr(argv[2], '/')) != NULL &&
objset_str++ != NULL) {
@@ -8475,7 +8365,7 @@ main(int argc, char **argv)
verbose = MAX(verbose, 1);
for (c = 0; c < 256; c++) {
if (dump_all && strchr("AeEFklLOPrRSXy", c) == NULL)
if (dump_all && strchr("AeEFklLOPRSXy", c) == NULL)
dump_opt[c] = 1;
if (dump_opt[c])
dump_opt[c] += verbose;
@@ -8511,13 +8401,7 @@ main(int argc, char **argv)
if (argc != 2)
usage();
dump_opt['v'] = verbose + 3;
return (dump_path(argv[0], argv[1], NULL));
}
if (dump_opt['r']) {
if (argc != 3)
usage();
dump_opt['v'] = verbose;
error = dump_path(argv[0], argv[1], &object);
return (dump_path(argv[0], argv[1]));
}
if (dump_opt['X'] || dump_opt['F'])
@@ -8695,9 +8579,7 @@ main(int argc, char **argv)
argv++;
argc--;
if (dump_opt['r']) {
error = zdb_copy_object(os, object, argv[1]);
} else if (!dump_opt['R']) {
if (!dump_opt['R']) {
flagbits['d'] = ZOR_FLAG_DIRECTORY;
flagbits['f'] = ZOR_FLAG_PLAIN_FILE;
flagbits['m'] = ZOR_FLAG_SPACE_MAP;
+1 -3
View File
@@ -1,10 +1,8 @@
include $(top_srcdir)/config/Rules.am
include $(top_srcdir)/config/Shellcheck.am
AM_CFLAGS += $(LIBUDEV_CFLAGS) $(LIBUUID_CFLAGS)
SUBDIRS = zed.d
SHELLCHECKDIRS = $(SUBDIRS)
sbin_PROGRAMS = zed
@@ -45,7 +43,7 @@ zed_LDADD = \
$(abs_top_builddir)/lib/libnvpair/libnvpair.la \
$(abs_top_builddir)/lib/libuutil/libuutil.la
zed_LDADD += -lrt $(LIBATOMIC_LIBS) $(LIBUDEV_LIBS) $(LIBUUID_LIBS)
zed_LDADD += -lrt $(LIBUDEV_LIBS) $(LIBUUID_LIBS)
zed_LDFLAGS = -pthread
EXTRA_DIST = agents/README.md
+6 -14
View File
@@ -13,7 +13,6 @@
/*
* Copyright (c) 2016, Intel Corporation.
* Copyright (c) 2018, loli10K <ezomori.nozomu@gmail.com>
* Copyright (c) 2021 Hewlett Packard Enterprise Development LP
*/
#include <libnvpair.h>
@@ -212,18 +211,12 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
* For multipath, spare and l2arc devices ZFS_EV_VDEV_GUID or
* ZFS_EV_POOL_GUID may be missing so find them.
*/
if (pool_guid == 0 || vdev_guid == 0) {
if ((nvlist_lookup_string(nvl, DEV_IDENTIFIER,
&search.gs_devid) == 0) &&
(zpool_iter(g_zfs_hdl, zfs_agent_iter_pool, &search)
== 1)) {
if (pool_guid == 0)
pool_guid = search.gs_pool_guid;
if (vdev_guid == 0)
vdev_guid = search.gs_vdev_guid;
devtype = search.gs_vdev_type;
}
}
(void) nvlist_lookup_string(nvl, DEV_IDENTIFIER,
&search.gs_devid);
(void) zpool_iter(g_zfs_hdl, zfs_agent_iter_pool, &search);
pool_guid = search.gs_pool_guid;
vdev_guid = search.gs_vdev_guid;
devtype = search.gs_vdev_type;
/*
* We want to avoid reporting "remove" events coming from
@@ -392,7 +385,6 @@ zfs_agent_init(libzfs_handle_t *zfs_hdl)
list_destroy(&agent_events);
zed_log_die("Failed to initialize agents");
}
pthread_setname_np(g_agents_tid, "agents");
}
void
+7 -44
View File
@@ -435,15 +435,7 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
return;
}
/*
* Prefer sequential resilvering when supported (mirrors and dRAID),
* otherwise fallback to a traditional healing resilver.
*/
ret = zpool_vdev_attach(zhp, fullpath, path, nvroot, B_TRUE, B_TRUE);
if (ret != 0) {
ret = zpool_vdev_attach(zhp, fullpath, path, nvroot,
B_TRUE, B_FALSE);
}
ret = zpool_vdev_attach(zhp, fullpath, path, nvroot, B_TRUE, B_FALSE);
zed_log_msg(LOG_INFO, " zpool_vdev_replace: %s with %s (%s)",
fullpath, path, (ret == 0) ? "no errors" :
@@ -640,27 +632,6 @@ devid_iter(const char *devid, zfs_process_func_t func, boolean_t is_slice)
return (data.dd_found);
}
/*
* Given a device guid, find any vdevs with a matching guid.
*/
static boolean_t
guid_iter(uint64_t pool_guid, uint64_t vdev_guid, const char *devid,
zfs_process_func_t func, boolean_t is_slice)
{
dev_data_t data = { 0 };
data.dd_func = func;
data.dd_found = B_FALSE;
data.dd_pool_guid = pool_guid;
data.dd_vdev_guid = vdev_guid;
data.dd_islabeled = is_slice;
data.dd_new_devid = devid;
(void) zpool_iter(g_zfshdl, zfs_iter_pool, &data);
return (data.dd_found);
}
/*
* Handle a EC_DEV_ADD.ESC_DISK event.
*
@@ -684,18 +655,15 @@ static int
zfs_deliver_add(nvlist_t *nvl, boolean_t is_lofi)
{
char *devpath = NULL, *devid;
uint64_t pool_guid = 0, vdev_guid = 0;
boolean_t is_slice;
/*
* Expecting a devid string and an optional physical location and guid
* Expecting a devid string and an optional physical location
*/
if (nvlist_lookup_string(nvl, DEV_IDENTIFIER, &devid) != 0)
return (-1);
(void) nvlist_lookup_string(nvl, DEV_PHYS_PATH, &devpath);
(void) nvlist_lookup_uint64(nvl, ZFS_EV_POOL_GUID, &pool_guid);
(void) nvlist_lookup_uint64(nvl, ZFS_EV_VDEV_GUID, &vdev_guid);
is_slice = (nvlist_lookup_boolean(nvl, DEV_IS_PART) == 0);
@@ -706,16 +674,12 @@ zfs_deliver_add(nvlist_t *nvl, boolean_t is_lofi)
* Iterate over all vdevs looking for a match in the following order:
* 1. ZPOOL_CONFIG_DEVID (identifies the unique disk)
* 2. ZPOOL_CONFIG_PHYS_PATH (identifies disk physical location).
* 3. ZPOOL_CONFIG_GUID (identifies unique vdev).
*
* For disks, we only want to pay attention to vdevs marked as whole
* disks or are a multipath device.
*/
if (devid_iter(devid, zfs_process_add, is_slice))
return (0);
if (devpath != NULL && devphys_iter(devpath, devid, zfs_process_add,
is_slice))
return (0);
if (vdev_guid != 0)
(void) guid_iter(pool_guid, vdev_guid, devid, zfs_process_add,
is_slice);
if (!devid_iter(devid, zfs_process_add, is_slice) && devpath != NULL)
(void) devphys_iter(devpath, devid, zfs_process_add, is_slice);
return (0);
}
@@ -946,7 +910,6 @@ zfs_slm_init()
return (-1);
}
pthread_setname_np(g_zfs_tid, "enum-pools");
list_create(&g_device_list, sizeof (struct pendingdev),
offsetof(struct pendingdev, pd_node));
+3 -10
View File
@@ -219,18 +219,12 @@ replace_with_spare(fmd_hdl_t *hdl, zpool_handle_t *zhp, nvlist_t *vdev)
* replace it.
*/
for (s = 0; s < nspares; s++) {
boolean_t rebuild = B_FALSE;
char *spare_name, *type;
char *spare_name;
if (nvlist_lookup_string(spares[s], ZPOOL_CONFIG_PATH,
&spare_name) != 0)
continue;
/* prefer sequential resilvering for distributed spares */
if ((nvlist_lookup_string(spares[s], ZPOOL_CONFIG_TYPE,
&type) == 0) && strcmp(type, VDEV_TYPE_DRAID_SPARE) == 0)
rebuild = B_TRUE;
/* if set, add the "ashift" pool property to the spare nvlist */
if (source != ZPROP_SRC_DEFAULT)
(void) nvlist_add_uint64(spares[s],
@@ -243,7 +237,7 @@ replace_with_spare(fmd_hdl_t *hdl, zpool_handle_t *zhp, nvlist_t *vdev)
dev_name, basename(spare_name));
if (zpool_vdev_attach(zhp, dev_name, spare_name,
replacement, B_TRUE, rebuild) == 0) {
replacement, B_TRUE, B_FALSE) == 0) {
free(dev_name);
nvlist_free(replacement);
return (B_TRUE);
@@ -334,7 +328,7 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
*/
if (strcmp(class, "resource.fs.zfs.removed") == 0 ||
(strcmp(class, "resource.fs.zfs.statechange") == 0 &&
(state == VDEV_STATE_REMOVED || state == VDEV_STATE_FAULTED))) {
state == VDEV_STATE_REMOVED)) {
char *devtype;
char *devname;
@@ -505,7 +499,6 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
* Attempt to substitute a hot spare.
*/
(void) replace_with_spare(hdl, zhp, vdev);
zpool_close(zhp);
}
+23 -25
View File
@@ -3,7 +3,7 @@
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
* Refer to the ZoL git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -60,8 +60,8 @@ _setup_sig_handlers(void)
zed_log_die("Failed to initialize sigset");
sa.sa_flags = SA_RESTART;
sa.sa_handler = SIG_IGN;
if (sigaction(SIGPIPE, &sa, NULL) < 0)
zed_log_die("Failed to ignore SIGPIPE");
@@ -75,10 +75,6 @@ _setup_sig_handlers(void)
sa.sa_handler = _hup_handler;
if (sigaction(SIGHUP, &sa, NULL) < 0)
zed_log_die("Failed to register SIGHUP handler");
(void) sigaddset(&sa.sa_mask, SIGCHLD);
if (pthread_sigmask(SIG_BLOCK, &sa.sa_mask, NULL) < 0)
zed_log_die("Failed to block SIGCHLD");
}
/*
@@ -216,20 +212,22 @@ _finish_daemonize(void)
int
main(int argc, char *argv[])
{
struct zed_conf zcp;
struct zed_conf *zcp;
uint64_t saved_eid;
int64_t saved_etime[2];
zed_log_init(argv[0]);
zed_log_stderr_open(LOG_NOTICE);
zed_conf_init(&zcp);
zed_conf_parse_opts(&zcp, argc, argv);
if (zcp.do_verbose)
zcp = zed_conf_create();
zed_conf_parse_opts(zcp, argc, argv);
if (zcp->do_verbose)
zed_log_stderr_open(LOG_INFO);
if (geteuid() != 0)
zed_log_die("Must be run as root");
zed_conf_parse_file(zcp);
zed_file_close_from(STDERR_FILENO + 1);
(void) umask(0);
@@ -237,32 +235,32 @@ main(int argc, char *argv[])
if (chdir("/") < 0)
zed_log_die("Failed to change to root directory");
if (zed_conf_scan_dir(&zcp) < 0)
if (zed_conf_scan_dir(zcp) < 0)
exit(EXIT_FAILURE);
if (!zcp.do_foreground) {
if (!zcp->do_foreground) {
_start_daemonize();
zed_log_syslog_open(LOG_DAEMON);
}
_setup_sig_handlers();
if (zcp.do_memlock)
if (zcp->do_memlock)
_lock_memory();
if ((zed_conf_write_pid(&zcp) < 0) && (!zcp.do_force))
if ((zed_conf_write_pid(zcp) < 0) && (!zcp->do_force))
exit(EXIT_FAILURE);
if (!zcp.do_foreground)
if (!zcp->do_foreground)
_finish_daemonize();
zed_log_msg(LOG_NOTICE,
"ZFS Event Daemon %s-%s (PID %d)",
ZFS_META_VERSION, ZFS_META_RELEASE, (int)getpid());
if (zed_conf_open_state(&zcp) < 0)
if (zed_conf_open_state(zcp) < 0)
exit(EXIT_FAILURE);
if (zed_conf_read_state(&zcp, &saved_eid, saved_etime) < 0)
if (zed_conf_read_state(zcp, &saved_eid, saved_etime) < 0)
exit(EXIT_FAILURE);
idle:
@@ -271,24 +269,24 @@ idle:
* successful.
*/
do {
if (!zed_event_init(&zcp))
if (!zed_event_init(zcp))
break;
/* Wait for some time and try again. tunable? */
sleep(30);
} while (!_got_exit && zcp.do_idle);
} while (!_got_exit && zcp->do_idle);
if (_got_exit)
goto out;
zed_event_seek(&zcp, saved_eid, saved_etime);
zed_event_seek(zcp, saved_eid, saved_etime);
while (!_got_exit) {
int rv;
if (_got_hup) {
_got_hup = 0;
(void) zed_conf_scan_dir(&zcp);
(void) zed_conf_scan_dir(zcp);
}
rv = zed_event_service(&zcp);
rv = zed_event_service(zcp);
/* ENODEV: When kernel module is unloaded (osx) */
if (rv == ENODEV)
@@ -296,13 +294,13 @@ idle:
}
zed_log_msg(LOG_NOTICE, "Exiting");
zed_event_fini(&zcp);
zed_event_fini(zcp);
if (zcp.do_idle && !_got_exit)
if (zcp->do_idle && !_got_exit)
goto idle;
out:
zed_conf_destroy(&zcp);
zed_conf_destroy(zcp);
zed_log_fini();
exit(EXIT_SUCCESS);
}
-4
View File
@@ -1,6 +1,5 @@
include $(top_srcdir)/config/Rules.am
include $(top_srcdir)/config/Substfiles.am
include $(top_srcdir)/config/Shellcheck.am
EXTRA_DIST += README
@@ -52,6 +51,3 @@ install-data-hook:
ln -s "$(zedexecdir)/$${f}" "$(DESTDIR)$(zedconfdir)"; \
done
chmod 0600 "$(DESTDIR)$(zedconfdir)/zed.rc"
# False positive: 1>&"${ZED_FLOCK_FD}" looks suspiciously similar to a >&filename bash extension
CHECKBASHISMS_IGNORE = -e 'should be >word 2>&1' -e '&"$${ZED_FLOCK_FD}"'
+10 -6
View File
@@ -12,11 +12,15 @@
zed_exit_if_ignoring_this_event
zed_lock "${ZED_DEBUG_LOG}"
{
printenv | sort
echo
} 1>&"${ZED_FLOCK_FD}"
zed_unlock "${ZED_DEBUG_LOG}"
lockfile="$(basename -- "${ZED_DEBUG_LOG}").lock"
umask 077
zed_lock "${lockfile}"
exec >> "${ZED_DEBUG_LOG}"
printenv | sort
echo
exec >&-
zed_unlock "${lockfile}"
exit 0
-1
View File
@@ -42,7 +42,6 @@ fi
msg="${msg} delay=$((ZEVENT_ZIO_DELAY / 1000000))ms"
# list the bookmark data together
# shellcheck disable=SC2153
[ -n "${ZEVENT_ZIO_OBJSET}" ] && \
msg="${msg} bookmark=${ZEVENT_ZIO_OBJSET}:${ZEVENT_ZIO_OBJECT}:${ZEVENT_ZIO_LEVEL}:${ZEVENT_ZIO_BLKID}"
+1 -1
View File
@@ -25,7 +25,7 @@ zed_rate_limit "${rate_limit_tag}" || exit 3
umask 077
note_subject="ZFS ${ZEVENT_SUBCLASS} error for ${ZEVENT_POOL} on $(hostname)"
note_pathname="$(mktemp)"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
{
echo "ZFS has detected a data error:"
echo
+1 -1
View File
@@ -31,7 +31,7 @@ umask 077
pool_str="${ZEVENT_POOL:+" for ${ZEVENT_POOL}"}"
host_str=" on $(hostname)"
note_subject="ZFS ${ZEVENT_SUBCLASS} event${pool_str}${host_str}"
note_pathname="$(mktemp)"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
{
echo "ZFS has posted the following event:"
echo
@@ -3,8 +3,9 @@
# Track changes to enumerated pools for use in early-boot
set -ef
FSLIST="@sysconfdir@/zfs/zfs-list.cache/${ZEVENT_POOL}"
FSLIST_TMP="@runstatedir@/zfs-list.cache@${ZEVENT_POOL}"
FSLIST_DIR="@sysconfdir@/zfs/zfs-list.cache"
FSLIST_TMP="@runstatedir@/zfs-list.cache.new"
FSLIST="${FSLIST_DIR}/${ZEVENT_POOL}"
# If the pool specific cache file is not writeable, abort
[ -w "${FSLIST}" ] || exit 0
@@ -13,20 +14,20 @@ FSLIST_TMP="@runstatedir@/zfs-list.cache@${ZEVENT_POOL}"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
[ "$ZEVENT_SUBCLASS" != "history_event" ] && exit 0
zed_check_cmd "${ZFS}" sort diff
zed_check_cmd "${ZFS}" sort diff grep
# If we are acting on a snapshot, we have nothing to do
[ "${ZEVENT_HISTORY_DSNAME%@*}" = "${ZEVENT_HISTORY_DSNAME}" ] || exit 0
printf '%s' "${ZEVENT_HISTORY_DSNAME}" | grep '@' && exit 0
# We lock the output file to avoid simultaneous writes.
# We obtain a lock on zfs-list to avoid any simultaneous writes.
# If we run into trouble, log and drop the lock
abort_alter() {
zed_log_msg "Error updating zfs-list.cache for ${ZEVENT_POOL}!"
zed_unlock "${FSLIST}"
zed_log_msg "Error updating zfs-list.cache!"
zed_unlock zfs-list
}
finished() {
zed_unlock "${FSLIST}"
zed_unlock zfs-list
trap - EXIT
exit 0
}
@@ -36,7 +37,7 @@ case "${ZEVENT_HISTORY_INTERNAL_NAME}" in
;;
export)
zed_lock "${FSLIST}"
zed_lock zfs-list
trap abort_alter EXIT
echo > "${FSLIST}"
finished
@@ -62,7 +63,7 @@ case "${ZEVENT_HISTORY_INTERNAL_NAME}" in
;;
esac
zed_lock "${FSLIST}"
zed_lock zfs-list
trap abort_alter EXIT
PROPS="name,mountpoint,canmount,atime,relatime,devices,exec\
@@ -78,7 +79,7 @@ PROPS="name,mountpoint,canmount,atime,relatime,devices,exec\
sort "${FSLIST_TMP}" -o "${FSLIST_TMP}"
# Don't modify the file if it hasn't changed
diff -q "${FSLIST_TMP}" "${FSLIST}" || cat "${FSLIST_TMP}" > "${FSLIST}"
diff -q "${FSLIST_TMP}" "${FSLIST}" || mv "${FSLIST_TMP}" "${FSLIST}"
rm -f "${FSLIST_TMP}"
finished
+1 -1
View File
@@ -41,7 +41,7 @@ fi
umask 077
note_subject="ZFS ${ZEVENT_SUBCLASS} event for ${ZEVENT_POOL} on $(hostname)"
note_pathname="$(mktemp)"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
{
echo "ZFS has finished a ${action}:"
echo
+68 -5
View File
@@ -29,7 +29,8 @@
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
if [ ! -d /sys/class/enclosure ] ; then
if [ ! -d /sys/class/enclosure ] && [ ! -d /sys/bus/pci/slots ] ; then
# No JBOD enclosure or NVMe slots
exit 1
fi
@@ -92,6 +93,29 @@ check_and_set_led()
done
}
# Fault LEDs for JBODs and NVMe drives are handled a little differently.
#
# On JBODs the fault LED is called 'fault' and on a path like this:
#
# /sys/class/enclosure/0:0:1:0/SLOT 10/fault
#
# On NVMe it's called 'attention' and on a path like this:
#
# /sys/bus/pci/slot/0/attention
#
# This function returns the full path to the fault LED file for a given
# enclosure/slot directory.
#
path_to_led()
{
dir=$1
if [ -f "$dir/fault" ] ; then
echo "$dir/fault"
elif [ -f "$dir/attention" ] ; then
echo "$dir/attention"
fi
}
state_to_val()
{
state="$1"
@@ -105,6 +129,38 @@ state_to_val()
esac
}
#
# Given a nvme name like 'nvme0n1', pass back its slot directory
# like "/sys/bus/pci/slots/0"
#
nvme_dev_to_slot()
{
dev="$1"
# Get the address "0000:01:00.0"
address=$(cat "/sys/class/block/$dev/device/address")
# For each /sys/bus/pci/slots subdir that is an actual number
# (rather than weird directories like "1-3/").
# shellcheck disable=SC2010
for i in $(ls /sys/bus/pci/slots/ | grep -E "^[0-9]+$") ; do
this_address=$(cat "/sys/bus/pci/slots/$i/address")
# The format of address is a little different between
# /sys/class/block/$dev/device/address and
# /sys/bus/pci/slots/
#
# address= "0000:01:00.0"
# this_address = "0000:01:00"
#
if echo "$address" | grep -Eq ^"$this_address" ; then
echo "/sys/bus/pci/slots/$i"
break
fi
done
}
# process_pool (pool)
#
# Iterate through a pool and set the vdevs' enclosure slot LEDs to
@@ -134,6 +190,11 @@ process_pool()
# Get dev name (like 'sda')
dev=$(basename "$(echo "$therest" | awk '{print $(NF-1)}')")
vdev_enc_sysfs_path=$(realpath "/sys/class/block/$dev/device/enclosure_device"*)
if [ ! -d "$vdev_enc_sysfs_path" ] ; then
# This is not a JBOD disk, but it could be a PCI NVMe drive
vdev_enc_sysfs_path=$(nvme_dev_to_slot "$dev")
fi
current_val=$(echo "$therest" | awk '{print $NF}')
if [ "$current_val" != "0" ] ; then
@@ -145,9 +206,10 @@ process_pool()
continue
fi
if [ ! -e "$vdev_enc_sysfs_path/fault" ] ; then
led_path=$(path_to_led "$vdev_enc_sysfs_path")
if [ ! -e "$led_path" ] ; then
rc=3
zed_log_msg "vdev $vdev '$file/fault' doesn't exist"
zed_log_msg "vdev $vdev '$led_path' doesn't exist"
continue
fi
@@ -158,7 +220,7 @@ process_pool()
continue
fi
if ! check_and_set_led "$vdev_enc_sysfs_path/fault" "$val"; then
if ! check_and_set_led "$led_path" "$val"; then
rc=3
fi
done
@@ -169,7 +231,8 @@ if [ -n "$ZEVENT_VDEV_ENC_SYSFS_PATH" ] && [ -n "$ZEVENT_VDEV_STATE_STR" ] ; the
# Got a statechange for an individual vdev
val=$(state_to_val "$ZEVENT_VDEV_STATE_STR")
vdev=$(basename "$ZEVENT_VDEV_PATH")
check_and_set_led "$ZEVENT_VDEV_ENC_SYSFS_PATH/fault" "$val"
ledpath=$(path_to_led "$ZEVENT_VDEV_ENC_SYSFS_PATH")
check_and_set_led "$ledpath" "$val"
else
# Process the entire pool
poolname=$(zed_guid_to_pool "$ZEVENT_POOL_GUID")
+1 -1
View File
@@ -37,7 +37,7 @@ fi
umask 077
note_subject="ZFS device fault for pool ${ZEVENT_POOL_GUID} on $(hostname)"
note_pathname="$(mktemp)"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
{
if [ "${ZEVENT_VDEV_STATE_STR}" = "FAULTED" ] ; then
echo "The number of I/O errors associated with a ZFS device exceeded"
+1 -1
View File
@@ -19,7 +19,7 @@ zed_check_cmd "${ZPOOL}" || exit 9
umask 077
note_subject="ZFS ${ZEVENT_SUBCLASS} event for ${ZEVENT_POOL} on $(hostname)"
note_pathname="$(mktemp)"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
{
echo "ZFS has finished a trim:"
echo
+13 -7
View File
@@ -126,8 +126,10 @@ zed_lock()
# Obtain a lock on the file bound to the given file descriptor.
#
eval "exec ${fd}>> '${lockfile}'"
if ! err="$(flock --exclusive "${fd}" 2>&1)"; then
eval "exec ${fd}> '${lockfile}'"
err="$(flock --exclusive "${fd}" 2>&1)"
# shellcheck disable=SC2181
if [ $? -ne 0 ]; then
zed_log_err "failed to lock \"${lockfile}\": ${err}"
fi
@@ -163,7 +165,9 @@ zed_unlock()
fi
# Release the lock and close the file descriptor.
if ! err="$(flock --unlock "${fd}" 2>&1)"; then
err="$(flock --unlock "${fd}" 2>&1)"
# shellcheck disable=SC2181
if [ $? -ne 0 ]; then
zed_log_err "failed to unlock \"${lockfile}\": ${err}"
fi
eval "exec ${fd}>&-"
@@ -263,7 +267,7 @@ zed_notify_email()
-e "s/@SUBJECT@/${subject}/g")"
# shellcheck disable=SC2086
eval ${ZED_EMAIL_PROG} ${ZED_EMAIL_OPTS} < "${pathname}" >/dev/null 2>&1
eval "${ZED_EMAIL_PROG}" ${ZED_EMAIL_OPTS} < "${pathname}" >/dev/null 2>&1
rv=$?
if [ "${rv}" -ne 0 ]; then
zed_log_err "$(basename "${ZED_EMAIL_PROG}") exit=${rv}"
@@ -363,7 +367,7 @@ zed_notify_pushbullet()
#
# Notification via Slack Webhook <https://api.slack.com/incoming-webhooks>.
# The Webhook URL (ZED_SLACK_WEBHOOK_URL) identifies this client to the
# Slack channel.
# Slack channel.
#
# Requires awk, curl, and sed executables to be installed in the standard PATH.
#
@@ -507,8 +511,10 @@ zed_guid_to_pool()
return
fi
guid="$(printf "%u" "$1")"
$ZPOOL get -H -ovalue,name guid | awk '$1 == '"$guid"' {print $2; exit}'
guid=$(printf "%llu" "$1")
if [ -n "$guid" ] ; then
$ZPOOL get -H -ovalue,name guid | awk '$1=='"$guid"' {print $2}'
fi
}
# zed_exit_if_ignoring_this_event
+2 -2
View File
@@ -89,8 +89,8 @@
##
# Turn on/off enclosure LEDs when drives get DEGRADED/FAULTED. This works for
# device mapper and multipath devices as well. Your enclosure must be
# supported by the Linux SES driver for this to work.
# device mapper and multipath devices as well. This works with JBOD enclosures
# and NVMe PCI drives (assuming they're supported by Linux in sysfs).
#
ZED_USE_ENCLOSURE_LEDS=1
+16 -1
View File
@@ -3,7 +3,7 @@
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
* Refer to the ZoL git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -15,6 +15,11 @@
#ifndef ZED_H
#define ZED_H
/*
* Absolute path for the default zed configuration file.
*/
#define ZED_CONF_FILE SYSCONFDIR "/zfs/zed.conf"
/*
* Absolute path for the default zed pid file.
*/
@@ -30,6 +35,16 @@
*/
#define ZED_ZEDLET_DIR SYSCONFDIR "/zfs/zed.d"
/*
* Reserved for future use.
*/
#define ZED_MAX_EVENTS 0
/*
* Reserved for future use.
*/
#define ZED_MIN_EVENTS 0
/*
* String prefix for ZED variables passed via environment variables.
*/
+124 -94
View File
@@ -3,7 +3,7 @@
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
* Refer to the ZoL git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -32,26 +32,43 @@
#include "zed_strings.h"
/*
* Initialise the configuration with default values.
* Return a new configuration with default values.
*/
void
zed_conf_init(struct zed_conf *zcp)
struct zed_conf *
zed_conf_create(void)
{
memset(zcp, 0, sizeof (*zcp));
struct zed_conf *zcp;
/* zcp->zfs_hdl opened in zed_event_init() */
/* zcp->zedlets created in zed_conf_scan_dir() */
zcp = calloc(1, sizeof (*zcp));
if (!zcp)
goto nomem;
zcp->pid_fd = -1; /* opened in zed_conf_write_pid() */
zcp->state_fd = -1; /* opened in zed_conf_open_state() */
zcp->zevent_fd = -1; /* opened in zed_event_init() */
zcp->syslog_facility = LOG_DAEMON;
zcp->min_events = ZED_MIN_EVENTS;
zcp->max_events = ZED_MAX_EVENTS;
zcp->pid_fd = -1;
zcp->zedlets = NULL; /* created via zed_conf_scan_dir() */
zcp->state_fd = -1; /* opened via zed_conf_open_state() */
zcp->zfs_hdl = NULL; /* opened via zed_event_init() */
zcp->zevent_fd = -1; /* opened via zed_event_init() */
zcp->max_jobs = 16;
if (!(zcp->conf_file = strdup(ZED_CONF_FILE)))
goto nomem;
if (!(zcp->pid_file = strdup(ZED_PID_FILE)) ||
!(zcp->zedlet_dir = strdup(ZED_ZEDLET_DIR)) ||
!(zcp->state_file = strdup(ZED_STATE_FILE)))
zed_log_die("Failed to create conf: %s", strerror(errno));
if (!(zcp->pid_file = strdup(ZED_PID_FILE)))
goto nomem;
if (!(zcp->zedlet_dir = strdup(ZED_ZEDLET_DIR)))
goto nomem;
if (!(zcp->state_file = strdup(ZED_STATE_FILE)))
goto nomem;
return (zcp);
nomem:
zed_log_die("Failed to create conf: %s", strerror(errno));
return (NULL);
}
/*
@@ -62,6 +79,9 @@ zed_conf_init(struct zed_conf *zcp)
void
zed_conf_destroy(struct zed_conf *zcp)
{
if (!zcp)
return;
if (zcp->state_fd >= 0) {
if (close(zcp->state_fd) < 0)
zed_log_msg(LOG_WARNING,
@@ -82,6 +102,10 @@ zed_conf_destroy(struct zed_conf *zcp)
zcp->pid_file, strerror(errno));
zcp->pid_fd = -1;
}
if (zcp->conf_file) {
free(zcp->conf_file);
zcp->conf_file = NULL;
}
if (zcp->pid_file) {
free(zcp->pid_file);
zcp->pid_file = NULL;
@@ -98,6 +122,7 @@ zed_conf_destroy(struct zed_conf *zcp)
zed_strings_destroy(zcp->zedlets);
zcp->zedlets = NULL;
}
free(zcp);
}
/*
@@ -107,52 +132,46 @@ zed_conf_destroy(struct zed_conf *zcp)
* otherwise, output to stderr and exit with a failure status.
*/
static void
_zed_conf_display_help(const char *prog, boolean_t got_err)
_zed_conf_display_help(const char *prog, int got_err)
{
struct opt { const char *o, *d, *v; };
FILE *fp = got_err ? stderr : stdout;
struct opt *oo;
struct opt iopts[] = {
{ .o = "-h", .d = "Display help" },
{ .o = "-L", .d = "Display license information" },
{ .o = "-V", .d = "Display version information" },
{},
};
struct opt nopts[] = {
{ .o = "-v", .d = "Be verbose" },
{ .o = "-f", .d = "Force daemon to run" },
{ .o = "-F", .d = "Run daemon in the foreground" },
{ .o = "-I",
.d = "Idle daemon until kernel module is (re)loaded" },
{ .o = "-M", .d = "Lock all pages in memory" },
{ .o = "-P", .d = "$PATH for ZED to use (only used by ZTS)" },
{ .o = "-Z", .d = "Zero state file" },
{},
};
struct opt vopts[] = {
{ .o = "-d DIR", .d = "Read enabled ZEDLETs from DIR.",
.v = ZED_ZEDLET_DIR },
{ .o = "-p FILE", .d = "Write daemon's PID to FILE.",
.v = ZED_PID_FILE },
{ .o = "-s FILE", .d = "Write daemon's state to FILE.",
.v = ZED_STATE_FILE },
{ .o = "-j JOBS", .d = "Start at most JOBS at once.",
.v = "16" },
{},
};
int w1 = 4; /* width of leading whitespace */
int w2 = 8; /* width of L-justified option field */
fprintf(fp, "Usage: %s [OPTION]...\n", (prog ? prog : "zed"));
fprintf(fp, "\n");
for (oo = iopts; oo->o; ++oo)
fprintf(fp, " %*s %s\n", -8, oo->o, oo->d);
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-h",
"Display help.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-L",
"Display license information.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-V",
"Display version information.");
fprintf(fp, "\n");
for (oo = nopts; oo->o; ++oo)
fprintf(fp, " %*s %s\n", -8, oo->o, oo->d);
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-v",
"Be verbose.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-f",
"Force daemon to run.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-F",
"Run daemon in the foreground.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-I",
"Idle daemon until kernel module is (re)loaded.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-M",
"Lock all pages in memory.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-P",
"$PATH for ZED to use (only used by ZTS).");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-Z",
"Zero state file.");
fprintf(fp, "\n");
for (oo = vopts; oo->o; ++oo)
fprintf(fp, " %*s %s [%s]\n", -8, oo->o, oo->d, oo->v);
#if 0
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-c FILE",
"Read configuration from FILE.", ZED_CONF_FILE);
#endif
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-d DIR",
"Read enabled ZEDLETs from DIR.", ZED_ZEDLET_DIR);
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-p FILE",
"Write daemon's PID to FILE.", ZED_PID_FILE);
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-s FILE",
"Write daemon's state to FILE.", ZED_STATE_FILE);
fprintf(fp, "\n");
exit(got_err ? EXIT_FAILURE : EXIT_SUCCESS);
@@ -164,14 +183,20 @@ _zed_conf_display_help(const char *prog, boolean_t got_err)
static void
_zed_conf_display_license(void)
{
printf(
"The ZFS Event Daemon (ZED) is distributed under the terms of the\n"
" Common Development and Distribution License (CDDL-1.0)\n"
" <http://opensource.org/licenses/CDDL-1.0>.\n"
"\n"
const char **pp;
const char *text[] = {
"The ZFS Event Daemon (ZED) is distributed under the terms of the",
" Common Development and Distribution License (CDDL-1.0)",
" <http://opensource.org/licenses/CDDL-1.0>.",
"",
"Developed at Lawrence Livermore National Laboratory"
" (LLNL-CODE-403049).\n"
"\n");
" (LLNL-CODE-403049).",
"",
NULL
};
for (pp = text; *pp; pp++)
printf("%s\n", *pp);
exit(EXIT_SUCCESS);
}
@@ -206,19 +231,16 @@ _zed_conf_parse_path(char **resultp, const char *path)
if (path[0] == '/') {
*resultp = strdup(path);
} else if (!getcwd(buf, sizeof (buf))) {
zed_log_die("Failed to get current working dir: %s",
strerror(errno));
} else if (strlcat(buf, "/", sizeof (buf)) >= sizeof (buf)) {
zed_log_die("Failed to copy path: %s", strerror(ENAMETOOLONG));
} else if (strlcat(buf, path, sizeof (buf)) >= sizeof (buf)) {
zed_log_die("Failed to copy path: %s", strerror(ENAMETOOLONG));
} else {
if (!getcwd(buf, sizeof (buf)))
zed_log_die("Failed to get current working dir: %s",
strerror(errno));
if (strlcat(buf, "/", sizeof (buf)) >= sizeof (buf) ||
strlcat(buf, path, sizeof (buf)) >= sizeof (buf))
zed_log_die("Failed to copy path: %s",
strerror(ENAMETOOLONG));
*resultp = strdup(buf);
}
if (!*resultp)
zed_log_die("Failed to copy path: %s", strerror(ENOMEM));
}
@@ -229,9 +251,8 @@ _zed_conf_parse_path(char **resultp, const char *path)
void
zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
{
const char * const opts = ":hLVd:p:P:s:vfFMZIj:";
const char * const opts = ":hLVc:d:p:P:s:vfFMZI";
int opt;
unsigned long raw;
if (!zcp || !argv || !argv[0])
zed_log_die("Failed to parse options: Internal error");
@@ -241,7 +262,7 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
while ((opt = getopt(argc, argv, opts)) != -1) {
switch (opt) {
case 'h':
_zed_conf_display_help(argv[0], B_FALSE);
_zed_conf_display_help(argv[0], EXIT_SUCCESS);
break;
case 'L':
_zed_conf_display_license();
@@ -249,6 +270,9 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
case 'V':
_zed_conf_display_version();
break;
case 'c':
_zed_conf_parse_path(&zcp->conf_file, optarg);
break;
case 'd':
_zed_conf_parse_path(&zcp->zedlet_dir, optarg);
break;
@@ -279,30 +303,31 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
case 'Z':
zcp->do_zero = 1;
break;
case 'j':
errno = 0;
raw = strtoul(optarg, NULL, 0);
if (errno == ERANGE || raw > INT16_MAX) {
zed_log_die("%lu is too many jobs", raw);
} if (raw == 0) {
zed_log_die("0 jobs makes no sense");
} else {
zcp->max_jobs = raw;
}
break;
case '?':
default:
if (optopt == '?')
_zed_conf_display_help(argv[0], B_FALSE);
_zed_conf_display_help(argv[0], EXIT_SUCCESS);
fprintf(stderr, "%s: Invalid option '-%c'\n\n",
argv[0], optopt);
_zed_conf_display_help(argv[0], B_TRUE);
fprintf(stderr, "%s: %s '-%c'\n\n", argv[0],
"Invalid option", optopt);
_zed_conf_display_help(argv[0], EXIT_FAILURE);
break;
}
}
}
/*
* Parse the configuration file into the configuration [zcp].
*
* FIXME: Not yet implemented.
*/
void
zed_conf_parse_file(struct zed_conf *zcp)
{
if (!zcp)
zed_log_die("Failed to parse config: %s", strerror(EINVAL));
}
/*
* Scan the [zcp] zedlet_dir for files to exec based on the event class.
* Files must be executable by user, but not writable by group or other.
@@ -310,6 +335,8 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
*
* Return 0 on success with an updated set of zedlets,
* or -1 on error with errno set.
*
* FIXME: Check if zedlet_dir and all parent dirs are secure.
*/
int
zed_conf_scan_dir(struct zed_conf *zcp)
@@ -425,6 +452,8 @@ zed_conf_scan_dir(struct zed_conf *zcp)
int
zed_conf_write_pid(struct zed_conf *zcp)
{
const mode_t dirmode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
const mode_t filemode = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;
char buf[PATH_MAX];
int n;
char *p;
@@ -452,7 +481,7 @@ zed_conf_write_pid(struct zed_conf *zcp)
if (p)
*p = '\0';
if ((mkdirp(buf, 0755) < 0) && (errno != EEXIST)) {
if ((mkdirp(buf, dirmode) < 0) && (errno != EEXIST)) {
zed_log_msg(LOG_ERR, "Failed to create directory \"%s\": %s",
buf, strerror(errno));
goto err;
@@ -462,7 +491,7 @@ zed_conf_write_pid(struct zed_conf *zcp)
*/
mask = umask(0);
umask(mask | 022);
zcp->pid_fd = open(zcp->pid_file, O_RDWR | O_CREAT | O_CLOEXEC, 0644);
zcp->pid_fd = open(zcp->pid_file, (O_RDWR | O_CREAT), filemode);
umask(mask);
if (zcp->pid_fd < 0) {
zed_log_msg(LOG_ERR, "Failed to open PID file \"%s\": %s",
@@ -499,7 +528,7 @@ zed_conf_write_pid(struct zed_conf *zcp)
errno = ERANGE;
zed_log_msg(LOG_ERR, "Failed to write PID file \"%s\": %s",
zcp->pid_file, strerror(errno));
} else if (write(zcp->pid_fd, buf, n) != n) {
} else if (zed_file_write_n(zcp->pid_fd, buf, n) != n) {
zed_log_msg(LOG_ERR, "Failed to write PID file \"%s\": %s",
zcp->pid_file, strerror(errno));
} else if (fdatasync(zcp->pid_fd) < 0) {
@@ -527,6 +556,7 @@ int
zed_conf_open_state(struct zed_conf *zcp)
{
char dirbuf[PATH_MAX];
mode_t dirmode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
int n;
char *p;
int rv;
@@ -548,7 +578,7 @@ zed_conf_open_state(struct zed_conf *zcp)
if (p)
*p = '\0';
if ((mkdirp(dirbuf, 0755) < 0) && (errno != EEXIST)) {
if ((mkdirp(dirbuf, dirmode) < 0) && (errno != EEXIST)) {
zed_log_msg(LOG_WARNING,
"Failed to create directory \"%s\": %s",
dirbuf, strerror(errno));
@@ -566,7 +596,7 @@ zed_conf_open_state(struct zed_conf *zcp)
(void) unlink(zcp->state_file);
zcp->state_fd = open(zcp->state_file,
O_RDWR | O_CREAT | O_CLOEXEC, 0644);
(O_RDWR | O_CREAT), (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH));
if (zcp->state_fd < 0) {
zed_log_msg(LOG_WARNING, "Failed to open state file \"%s\": %s",
zcp->state_file, strerror(errno));
+22 -18
View File
@@ -3,7 +3,7 @@
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
* Refer to the ZoL git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -20,39 +20,43 @@
#include "zed_strings.h"
struct zed_conf {
unsigned do_force:1; /* true if force enabled */
unsigned do_foreground:1; /* true if run in foreground */
unsigned do_memlock:1; /* true if locking memory */
unsigned do_verbose:1; /* true if verbosity enabled */
unsigned do_zero:1; /* true if zeroing state */
unsigned do_idle:1; /* true if idle enabled */
int syslog_facility; /* syslog facility value */
int min_events; /* RESERVED FOR FUTURE USE */
int max_events; /* RESERVED FOR FUTURE USE */
char *conf_file; /* abs path to config file */
char *pid_file; /* abs path to pid file */
char *zedlet_dir; /* abs path to zedlet dir */
char *state_file; /* abs path to state file */
libzfs_handle_t *zfs_hdl; /* handle to libzfs */
zed_strings_t *zedlets; /* names of enabled zedlets */
char *path; /* custom $PATH for zedlets to use */
int pid_fd; /* fd to pid file for lock */
char *zedlet_dir; /* abs path to zedlet dir */
zed_strings_t *zedlets; /* names of enabled zedlets */
char *state_file; /* abs path to state file */
int state_fd; /* fd to state file */
libzfs_handle_t *zfs_hdl; /* handle to libzfs */
int zevent_fd; /* fd for access to zevents */
int16_t max_jobs; /* max zedlets to run at one time */
boolean_t do_force:1; /* true if force enabled */
boolean_t do_foreground:1; /* true if run in foreground */
boolean_t do_memlock:1; /* true if locking memory */
boolean_t do_verbose:1; /* true if verbosity enabled */
boolean_t do_zero:1; /* true if zeroing state */
boolean_t do_idle:1; /* true if idle enabled */
char *path; /* custom $PATH for zedlets to use */
};
void zed_conf_init(struct zed_conf *zcp);
struct zed_conf *zed_conf_create(void);
void zed_conf_destroy(struct zed_conf *zcp);
void zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv);
void zed_conf_parse_file(struct zed_conf *zcp);
int zed_conf_scan_dir(struct zed_conf *zcp);
int zed_conf_write_pid(struct zed_conf *zcp);
int zed_conf_open_state(struct zed_conf *zcp);
int zed_conf_read_state(struct zed_conf *zcp, uint64_t *eidp, int64_t etime[]);
int zed_conf_write_state(struct zed_conf *zcp, uint64_t eid, int64_t etime[]);
#endif /* !ZED_CONF_H */
-3
View File
@@ -72,8 +72,6 @@ zed_udev_event(const char *class, const char *subclass, nvlist_t *nvl)
zed_log_msg(LOG_INFO, "\t%s: %s", DEV_PATH, strval);
if (nvlist_lookup_string(nvl, DEV_IDENTIFIER, &strval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %s", DEV_IDENTIFIER, strval);
if (nvlist_lookup_boolean(nvl, DEV_IS_PART) == B_TRUE)
zed_log_msg(LOG_INFO, "\t%s: B_TRUE", DEV_IS_PART);
if (nvlist_lookup_string(nvl, DEV_PHYS_PATH, &strval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %s", DEV_PHYS_PATH, strval);
if (nvlist_lookup_uint64(nvl, DEV_SIZE, &numval) == 0)
@@ -381,7 +379,6 @@ zed_disk_event_init()
return (-1);
}
pthread_setname_np(g_mon_tid, "udev monitor");
zed_log_msg(LOG_INFO, "zed_disk_event_init");
return (0);
+30 -52
View File
@@ -3,7 +3,7 @@
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
* Refer to the ZoL git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -15,7 +15,7 @@
#include <ctype.h>
#include <errno.h>
#include <fcntl.h>
#include <libzfs_core.h>
#include <libzfs.h> /* FIXME: Replace with libzfs_core. */
#include <paths.h>
#include <stdarg.h>
#include <stdio.h>
@@ -54,7 +54,7 @@ zed_event_init(struct zed_conf *zcp)
zed_log_die("Failed to initialize libzfs");
}
zcp->zevent_fd = open(ZFS_DEV, O_RDWR | O_CLOEXEC);
zcp->zevent_fd = open(ZFS_DEV, O_RDWR);
if (zcp->zevent_fd < 0) {
if (zcp->do_idle)
return (-1);
@@ -96,47 +96,6 @@ zed_event_fini(struct zed_conf *zcp)
libzfs_fini(zcp->zfs_hdl);
zcp->zfs_hdl = NULL;
}
zed_exec_fini();
}
static void
_bump_event_queue_length(void)
{
int zzlm = -1, wr;
char qlen_buf[12] = {0}; /* parameter is int => max "-2147483647\n" */
long int qlen;
zzlm = open("/sys/module/zfs/parameters/zfs_zevent_len_max", O_RDWR);
if (zzlm < 0)
goto done;
if (read(zzlm, qlen_buf, sizeof (qlen_buf)) < 0)
goto done;
qlen_buf[sizeof (qlen_buf) - 1] = '\0';
errno = 0;
qlen = strtol(qlen_buf, NULL, 10);
if (errno == ERANGE)
goto done;
if (qlen <= 0)
qlen = 512; /* default zfs_zevent_len_max value */
else
qlen *= 2;
if (qlen > INT_MAX)
qlen = INT_MAX;
wr = snprintf(qlen_buf, sizeof (qlen_buf), "%ld", qlen);
if (pwrite(zzlm, qlen_buf, wr, 0) < 0)
goto done;
zed_log_msg(LOG_WARNING, "Bumping queue length to %ld", qlen);
done:
if (zzlm > -1)
(void) close(zzlm);
}
/*
@@ -177,7 +136,10 @@ zed_event_seek(struct zed_conf *zcp, uint64_t saved_eid, int64_t saved_etime[])
if (n_dropped > 0) {
zed_log_msg(LOG_WARNING, "Missed %d events", n_dropped);
_bump_event_queue_length();
/*
* FIXME: Increase max size of event nvlist in
* /sys/module/zfs/parameters/zfs_zevent_len_max ?
*/
}
if (nvlist_lookup_uint64(nvl, "eid", &eid) != 0) {
zed_log_msg(LOG_WARNING, "Failed to lookup zevent eid");
@@ -249,7 +211,7 @@ _zed_event_value_is_hex(const char *name)
*
* All environment variables in [zsp] should be added through this function.
*/
static __attribute__((format(printf, 5, 6))) int
static int
_zed_event_add_var(uint64_t eid, zed_strings_t *zsp,
const char *prefix, const char *name, const char *fmt, ...)
{
@@ -624,6 +586,8 @@ _zed_event_add_string_array(uint64_t eid, zed_strings_t *zsp,
* Convert the nvpair [nvp] to a string which is added to the environment
* of the child process.
* Return 0 on success, -1 on error.
*
* FIXME: Refactor with cmd/zpool/zpool_main.c:zpool_do_events_nvprint()?
*/
static void
_zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
@@ -722,11 +686,23 @@ _zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
_zed_event_add_var(eid, zsp, prefix, name,
"%llu", (u_longlong_t)i64);
break;
case DATA_TYPE_NVLIST:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break;
case DATA_TYPE_STRING:
(void) nvpair_value_string(nvp, &str);
_zed_event_add_var(eid, zsp, prefix, name,
"%s", (str ? str : "<NULL>"));
break;
case DATA_TYPE_BOOLEAN_ARRAY:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break;
case DATA_TYPE_BYTE_ARRAY:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break;
case DATA_TYPE_INT8_ARRAY:
_zed_event_add_int8_array(eid, zsp, prefix, nvp);
break;
@@ -754,11 +730,9 @@ _zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
case DATA_TYPE_STRING_ARRAY:
_zed_event_add_string_array(eid, zsp, prefix, nvp);
break;
case DATA_TYPE_NVLIST:
case DATA_TYPE_BOOLEAN_ARRAY:
case DATA_TYPE_BYTE_ARRAY:
case DATA_TYPE_NVLIST_ARRAY:
_zed_event_add_var(eid, zsp, prefix, name, "_NOT_IMPLEMENTED_");
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break;
default:
errno = EINVAL;
@@ -938,7 +912,10 @@ zed_event_service(struct zed_conf *zcp)
if (n_dropped > 0) {
zed_log_msg(LOG_WARNING, "Missed %d events", n_dropped);
_bump_event_queue_length();
/*
* FIXME: Increase max size of event nvlist in
* /sys/module/zfs/parameters/zfs_zevent_len_max ?
*/
}
if (nvlist_lookup_uint64(nvl, "eid", &eid) != 0) {
zed_log_msg(LOG_WARNING, "Failed to lookup zevent eid");
@@ -976,7 +953,8 @@ zed_event_service(struct zed_conf *zcp)
_zed_event_add_time_strings(eid, zsp, etime);
zed_exec_process(eid, class, subclass, zcp, zsp);
zed_exec_process(eid, class, subclass,
zcp->zedlet_dir, zcp->zedlets, zsp, zcp->zevent_fd);
zed_conf_write_state(zcp, eid, etime);
+1 -1
View File
@@ -3,7 +3,7 @@
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
* Refer to the ZoL git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
+60 -195
View File
@@ -3,7 +3,7 @@
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
* Refer to the ZoL git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -18,53 +18,17 @@
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h>
#include <sys/avl.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>
#include <pthread.h>
#include "zed_exec.h"
#include "zed_file.h"
#include "zed_log.h"
#include "zed_strings.h"
#define ZEVENT_FILENO 3
struct launched_process_node {
avl_node_t node;
pid_t pid;
uint64_t eid;
char *name;
};
static int
_launched_process_node_compare(const void *x1, const void *x2)
{
pid_t p1;
pid_t p2;
assert(x1 != NULL);
assert(x2 != NULL);
p1 = ((const struct launched_process_node *) x1)->pid;
p2 = ((const struct launched_process_node *) x2)->pid;
if (p1 < p2)
return (-1);
else if (p1 == p2)
return (0);
else
return (1);
}
static pthread_t _reap_children_tid = (pthread_t)-1;
static volatile boolean_t _reap_children_stop;
static avl_tree_t _launched_processes;
static pthread_mutex_t _launched_processes_lock = PTHREAD_MUTEX_INITIALIZER;
static int16_t _launched_processes_limit;
/*
* Create an environment string array for passing to execve() using the
* NAME=VALUE strings in container [zsp].
@@ -115,26 +79,20 @@ _zed_exec_create_env(zed_strings_t *zsp)
*/
static void
_zed_exec_fork_child(uint64_t eid, const char *dir, const char *prog,
char *env[], int zfd, boolean_t in_foreground)
char *env[], int zfd)
{
char path[PATH_MAX];
int n;
pid_t pid;
int fd;
struct launched_process_node *node;
sigset_t mask;
struct timespec launch_timeout =
{ .tv_sec = 0, .tv_nsec = 200 * 1000 * 1000, };
pid_t wpid;
int status;
assert(dir != NULL);
assert(prog != NULL);
assert(env != NULL);
assert(zfd >= 0);
while (__atomic_load_n(&_launched_processes_limit,
__ATOMIC_SEQ_CST) <= 0)
(void) nanosleep(&launch_timeout, NULL);
n = snprintf(path, sizeof (path), "%s/%s", dir, prog);
if ((n < 0) || (n >= sizeof (path))) {
zed_log_msg(LOG_WARNING,
@@ -142,179 +100,101 @@ _zed_exec_fork_child(uint64_t eid, const char *dir, const char *prog,
prog, eid, strerror(ENAMETOOLONG));
return;
}
(void) pthread_mutex_lock(&_launched_processes_lock);
pid = fork();
if (pid < 0) {
(void) pthread_mutex_unlock(&_launched_processes_lock);
zed_log_msg(LOG_WARNING,
"Failed to fork \"%s\" for eid=%llu: %s",
prog, eid, strerror(errno));
return;
} else if (pid == 0) {
(void) sigemptyset(&mask);
(void) sigprocmask(SIG_SETMASK, &mask, NULL);
(void) umask(022);
if (in_foreground && /* we're already devnulled if daemonised */
(fd = open("/dev/null", O_RDWR | O_CLOEXEC)) != -1) {
if ((fd = open("/dev/null", O_RDWR)) != -1) {
(void) dup2(fd, STDIN_FILENO);
(void) dup2(fd, STDOUT_FILENO);
(void) dup2(fd, STDERR_FILENO);
}
(void) dup2(zfd, ZEVENT_FILENO);
zed_file_close_from(ZEVENT_FILENO + 1);
execle(path, prog, NULL, env);
_exit(127);
}
/* parent process */
node = calloc(1, sizeof (*node));
if (node) {
node->pid = pid;
node->eid = eid;
node->name = strdup(prog);
avl_add(&_launched_processes, node);
}
(void) pthread_mutex_unlock(&_launched_processes_lock);
__atomic_sub_fetch(&_launched_processes_limit, 1, __ATOMIC_SEQ_CST);
zed_log_msg(LOG_INFO, "Invoking \"%s\" eid=%llu pid=%d",
prog, eid, pid);
}
static void
_nop(int sig)
{}
/* FIXME: Timeout rogue child processes with sigalarm? */
static void *
_reap_children(void *arg)
{
struct launched_process_node node, *pnode;
pid_t pid;
int status;
struct rusage usage;
struct sigaction sa = {};
/*
* Wait for child process using WNOHANG to limit
* the time spent waiting to 10 seconds (10,000ms).
*/
for (n = 0; n < 1000; n++) {
wpid = waitpid(pid, &status, WNOHANG);
if (wpid == (pid_t)-1) {
if (errno == EINTR)
continue;
zed_log_msg(LOG_WARNING,
"Failed to wait for \"%s\" eid=%llu pid=%d",
prog, eid, pid);
break;
} else if (wpid == 0) {
struct timespec t;
(void) sigfillset(&sa.sa_mask);
(void) sigdelset(&sa.sa_mask, SIGCHLD);
(void) pthread_sigmask(SIG_SETMASK, &sa.sa_mask, NULL);
(void) sigemptyset(&sa.sa_mask);
sa.sa_handler = _nop;
sa.sa_flags = SA_NOCLDSTOP;
(void) sigaction(SIGCHLD, &sa, NULL);
for (_reap_children_stop = B_FALSE; !_reap_children_stop; ) {
(void) pthread_mutex_lock(&_launched_processes_lock);
pid = wait4(0, &status, WNOHANG, &usage);
if (pid == 0 || pid == (pid_t)-1) {
(void) pthread_mutex_unlock(&_launched_processes_lock);
if (pid == 0 || errno == ECHILD)
pause();
else if (errno != EINTR)
zed_log_msg(LOG_WARNING,
"Failed to wait for children: %s",
strerror(errno));
} else {
memset(&node, 0, sizeof (node));
node.pid = pid;
pnode = avl_find(&_launched_processes, &node, NULL);
if (pnode) {
memcpy(&node, pnode, sizeof (node));
avl_remove(&_launched_processes, pnode);
free(pnode);
}
(void) pthread_mutex_unlock(&_launched_processes_lock);
__atomic_add_fetch(&_launched_processes_limit, 1,
__ATOMIC_SEQ_CST);
usage.ru_utime.tv_sec += usage.ru_stime.tv_sec;
usage.ru_utime.tv_usec += usage.ru_stime.tv_usec;
usage.ru_utime.tv_sec +=
usage.ru_utime.tv_usec / (1000 * 1000);
usage.ru_utime.tv_usec %= 1000 * 1000;
if (WIFEXITED(status)) {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d "
"time=%llu.%06us exit=%d",
node.name, node.eid, pid,
(unsigned long long) usage.ru_utime.tv_sec,
(unsigned int) usage.ru_utime.tv_usec,
WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d "
"time=%llu.%06us sig=%d/%s",
node.name, node.eid, pid,
(unsigned long long) usage.ru_utime.tv_sec,
(unsigned int) usage.ru_utime.tv_usec,
WTERMSIG(status),
strsignal(WTERMSIG(status)));
} else {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d "
"time=%llu.%06us status=0x%X",
node.name, node.eid,
(unsigned long long) usage.ru_utime.tv_sec,
(unsigned int) usage.ru_utime.tv_usec,
(unsigned int) status);
}
free(node.name);
/* child still running */
t.tv_sec = 0;
t.tv_nsec = 10000000; /* 10ms */
(void) nanosleep(&t, NULL);
continue;
}
if (WIFEXITED(status)) {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d exit=%d",
prog, eid, pid, WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d sig=%d/%s",
prog, eid, pid, WTERMSIG(status),
strsignal(WTERMSIG(status)));
} else {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d status=0x%X",
prog, eid, (unsigned int) status);
}
break;
}
return (NULL);
}
void
zed_exec_fini(void)
{
struct launched_process_node *node;
void *ck = NULL;
if (_reap_children_tid == (pthread_t)-1)
return;
_reap_children_stop = B_TRUE;
(void) pthread_kill(_reap_children_tid, SIGCHLD);
(void) pthread_join(_reap_children_tid, NULL);
while ((node = avl_destroy_nodes(&_launched_processes, &ck)) != NULL) {
free(node->name);
free(node);
/*
* kill child process after 10 seconds
*/
if (wpid == 0) {
zed_log_msg(LOG_WARNING, "Killing hung \"%s\" pid=%d",
prog, pid);
(void) kill(pid, SIGKILL);
(void) waitpid(pid, &status, 0);
}
avl_destroy(&_launched_processes);
(void) pthread_mutex_destroy(&_launched_processes_lock);
(void) pthread_mutex_init(&_launched_processes_lock, NULL);
_reap_children_tid = (pthread_t)-1;
}
/*
* Process the event [eid] by synchronously invoking all zedlets with a
* matching class prefix.
*
* Each executable in [zcp->zedlets] from the directory [zcp->zedlet_dir]
* is matched against the event's [class], [subclass], and the "all" class
* (which matches all events).
* Every zedlet with a matching class prefix is invoked.
* Each executable in [zedlets] from the directory [dir] is matched against
* the event's [class], [subclass], and the "all" class (which matches
* all events). Every zedlet with a matching class prefix is invoked.
* The NAME=VALUE strings in [envs] will be passed to the zedlet as
* environment variables.
*
* The file descriptor [zcp->zevent_fd] is the zevent_fd used to track the
* The file descriptor [zfd] is the zevent_fd used to track the
* current cursor location within the zevent nvlist.
*
* Return 0 on success, -1 on error.
*/
int
zed_exec_process(uint64_t eid, const char *class, const char *subclass,
struct zed_conf *zcp, zed_strings_t *envs)
const char *dir, zed_strings_t *zedlets, zed_strings_t *envs, int zfd)
{
const char *class_strings[4];
const char *allclass = "all";
@@ -323,22 +203,9 @@ zed_exec_process(uint64_t eid, const char *class, const char *subclass,
char **e;
int n;
if (!zcp->zedlet_dir || !zcp->zedlets || !envs || zcp->zevent_fd < 0)
if (!dir || !zedlets || !envs || zfd < 0)
return (-1);
if (_reap_children_tid == (pthread_t)-1) {
_launched_processes_limit = zcp->max_jobs;
if (pthread_create(&_reap_children_tid, NULL,
_reap_children, NULL) != 0)
return (-1);
pthread_setname_np(_reap_children_tid, "reap ZEDLETs");
avl_create(&_launched_processes, _launched_process_node_compare,
sizeof (struct launched_process_node),
offsetof(struct launched_process_node, node));
}
csp = class_strings;
if (class)
@@ -354,13 +221,11 @@ zed_exec_process(uint64_t eid, const char *class, const char *subclass,
e = _zed_exec_create_env(envs);
for (z = zed_strings_first(zcp->zedlets); z;
z = zed_strings_next(zcp->zedlets)) {
for (z = zed_strings_first(zedlets); z; z = zed_strings_next(zedlets)) {
for (csp = class_strings; *csp; csp++) {
n = strlen(*csp);
if ((strncmp(z, *csp, n) == 0) && !isalpha(z[n]))
_zed_exec_fork_child(eid, zcp->zedlet_dir,
z, e, zcp->zevent_fd, zcp->do_foreground);
_zed_exec_fork_child(eid, dir, z, e, zfd);
}
}
free(e);
+3 -5
View File
@@ -3,7 +3,7 @@
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
* Refer to the ZoL git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -17,11 +17,9 @@
#include <stdint.h>
#include "zed_strings.h"
#include "zed_conf.h"
void zed_exec_fini(void);
int zed_exec_process(uint64_t eid, const char *class, const char *subclass,
struct zed_conf *zcp, zed_strings_t *envs);
const char *dir, zed_strings_t *zedlets, zed_strings_t *envs,
int zevent_fd);
#endif /* !ZED_EXEC_H */
+85 -1
View File
@@ -3,7 +3,7 @@
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
* Refer to the ZoL git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -23,6 +23,62 @@
#include "zed_file.h"
#include "zed_log.h"
/*
* Read up to [n] bytes from [fd] into [buf].
* Return the number of bytes read, 0 on EOF, or -1 on error.
*/
ssize_t
zed_file_read_n(int fd, void *buf, size_t n)
{
unsigned char *p;
size_t n_left;
ssize_t n_read;
p = buf;
n_left = n;
while (n_left > 0) {
if ((n_read = read(fd, p, n_left)) < 0) {
if (errno == EINTR)
continue;
else
return (-1);
} else if (n_read == 0) {
break;
}
n_left -= n_read;
p += n_read;
}
return (n - n_left);
}
/*
* Write [n] bytes from [buf] out to [fd].
* Return the number of bytes written, or -1 on error.
*/
ssize_t
zed_file_write_n(int fd, void *buf, size_t n)
{
const unsigned char *p;
size_t n_left;
ssize_t n_written;
p = buf;
n_left = n;
while (n_left > 0) {
if ((n_written = write(fd, p, n_left)) < 0) {
if (errno == EINTR)
continue;
else
return (-1);
}
n_left -= n_written;
p += n_written;
}
return (n);
}
/*
* Set an exclusive advisory lock on the open file descriptor [fd].
* Return 0 on success, 1 if a conflicting lock is held by another process,
@@ -139,3 +195,31 @@ zed_file_close_from(int lowfd)
errno = errno_bak;
}
/*
* Set the CLOEXEC flag on file descriptor [fd] so it will be automatically
* closed upon successful execution of one of the exec functions.
* Return 0 on success, or -1 on error.
*
* FIXME: No longer needed?
*/
int
zed_file_close_on_exec(int fd)
{
int flags;
if (fd < 0) {
errno = EBADF;
return (-1);
}
flags = fcntl(fd, F_GETFD);
if (flags == -1)
return (-1);
flags |= FD_CLOEXEC;
if (fcntl(fd, F_SETFD, flags) == -1)
return (-1);
return (0);
}
+7 -1
View File
@@ -3,7 +3,7 @@
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
* Refer to the ZoL git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -18,6 +18,10 @@
#include <sys/types.h>
#include <unistd.h>
ssize_t zed_file_read_n(int fd, void *buf, size_t n);
ssize_t zed_file_write_n(int fd, void *buf, size_t n);
int zed_file_lock(int fd);
int zed_file_unlock(int fd);
@@ -26,4 +30,6 @@ pid_t zed_file_is_locked(int fd);
void zed_file_close_from(int fd);
int zed_file_close_on_exec(int fd);
#endif /* !ZED_FILE_H */
+1 -1
View File
@@ -3,7 +3,7 @@
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
* Refer to the ZoL git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
+1 -1
View File
@@ -3,7 +3,7 @@
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
* Refer to the ZoL git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
+1 -1
View File
@@ -3,7 +3,7 @@
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
* Refer to the ZoL git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
+1 -1
View File
@@ -3,7 +3,7 @@
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
* Refer to the ZoL git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
+19 -149
View File
@@ -53,6 +53,7 @@
#include <grp.h>
#include <pwd.h>
#include <signal.h>
#include <sys/debug.h>
#include <sys/list.h>
#include <sys/mkdev.h>
#include <sys/mntent.h>
@@ -70,6 +71,7 @@
#include <zfs_prop.h>
#include <zfs_deleg.h>
#include <libzutil.h>
#include <libuutil.h>
#ifdef HAVE_IDMAP
#include <aclutils.h>
#include <directory.h>
@@ -268,7 +270,7 @@ get_usage(zfs_help_t idx)
return (gettext("\tclone [-p] [-o property=value] ... "
"<snapshot> <filesystem|volume>\n"));
case HELP_CREATE:
return (gettext("\tcreate [-Pnpuv] [-o property=value] ... "
return (gettext("\tcreate [-Pnpv] [-o property=value] ... "
"<filesystem>\n"
"\tcreate [-Pnpsv] [-b blocksize] [-o property=value] ... "
"-V <size> <volume>\n"));
@@ -317,7 +319,7 @@ get_usage(zfs_help_t idx)
case HELP_SEND:
return (gettext("\tsend [-DnPpRvLecwhb] [-[i|I] snapshot] "
"<snapshot>\n"
"\tsend [-DnvPLecw] [-i snapshot|bookmark] "
"\tsend [-nvPLecw] [-i snapshot|bookmark] "
"<filesystem|volume|snapshot>\n"
"\tsend [-DnPpvLec] [-i bookmark|snapshot] "
"--redact <bookmark> <snapshot>\n"
@@ -916,107 +918,6 @@ usage:
return (-1);
}
/*
* Return a default volblocksize for the pool which always uses more than
* half of the data sectors. This primarily applies to dRAID which always
* writes full stripe widths.
*/
static uint64_t
default_volblocksize(zpool_handle_t *zhp, nvlist_t *props)
{
uint64_t volblocksize, asize = SPA_MINBLOCKSIZE;
nvlist_t *tree, **vdevs;
uint_t nvdevs;
nvlist_t *config = zpool_get_config(zhp, NULL);
if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE, &tree) != 0 ||
nvlist_lookup_nvlist_array(tree, ZPOOL_CONFIG_CHILDREN,
&vdevs, &nvdevs) != 0) {
return (ZVOL_DEFAULT_BLOCKSIZE);
}
for (int i = 0; i < nvdevs; i++) {
nvlist_t *nv = vdevs[i];
uint64_t ashift, ndata, nparity;
if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_ASHIFT, &ashift) != 0)
continue;
if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_DRAID_NDATA,
&ndata) == 0) {
/* dRAID minimum allocation width */
asize = MAX(asize, ndata * (1ULL << ashift));
} else if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NPARITY,
&nparity) == 0) {
/* raidz minimum allocation width */
if (nparity == 1)
asize = MAX(asize, 2 * (1ULL << ashift));
else
asize = MAX(asize, 4 * (1ULL << ashift));
} else {
/* mirror or (non-redundant) leaf vdev */
asize = MAX(asize, 1ULL << ashift);
}
}
/*
* Calculate the target volblocksize such that more than half
* of the asize is used. The following table is for 4k sectors.
*
* n asize blksz used | n asize blksz used
* -------------------------+---------------------------------
* 1 4,096 8,192 100% | 9 36,864 32,768 88%
* 2 8,192 8,192 100% | 10 40,960 32,768 80%
* 3 12,288 8,192 66% | 11 45,056 32,768 72%
* 4 16,384 16,384 100% | 12 49,152 32,768 66%
* 5 20,480 16,384 80% | 13 53,248 32,768 61%
* 6 24,576 16,384 66% | 14 57,344 32,768 57%
* 7 28,672 16,384 57% | 15 61,440 32,768 53%
* 8 32,768 32,768 100% | 16 65,536 65,636 100%
*
* This is primarily a concern for dRAID which always allocates
* a full stripe width. For dRAID the default stripe width is
* n=8 in which case the volblocksize is set to 32k. Ignoring
* compression there are no unused sectors. This same reasoning
* applies to raidz[2,3] so target 4 sectors to minimize waste.
*/
uint64_t tgt_volblocksize = ZVOL_DEFAULT_BLOCKSIZE;
while (tgt_volblocksize * 2 <= asize)
tgt_volblocksize *= 2;
const char *prop = zfs_prop_to_name(ZFS_PROP_VOLBLOCKSIZE);
if (nvlist_lookup_uint64(props, prop, &volblocksize) == 0) {
/* Issue a warning when a non-optimal size is requested. */
if (volblocksize < ZVOL_DEFAULT_BLOCKSIZE) {
(void) fprintf(stderr, gettext("Warning: "
"volblocksize (%llu) is less than the default "
"minimum block size (%llu).\nTo reduce wasted "
"space a volblocksize of %llu is recommended.\n"),
(u_longlong_t)volblocksize,
(u_longlong_t)ZVOL_DEFAULT_BLOCKSIZE,
(u_longlong_t)tgt_volblocksize);
} else if (volblocksize < tgt_volblocksize) {
(void) fprintf(stderr, gettext("Warning: "
"volblocksize (%llu) is much less than the "
"minimum allocation\nunit (%llu), which wastes "
"at least %llu%% of space. To reduce wasted "
"space,\nuse a larger volblocksize (%llu is "
"recommended), fewer dRAID data disks\n"
"per group, or smaller sector size (ashift).\n"),
(u_longlong_t)volblocksize, (u_longlong_t)asize,
(u_longlong_t)((100 * (asize - volblocksize)) /
asize), (u_longlong_t)tgt_volblocksize);
}
} else {
volblocksize = tgt_volblocksize;
fnvlist_add_uint64(props, prop, volblocksize);
}
return (volblocksize);
}
/*
* zfs create [-Pnpv] [-o prop=value] ... fs
* zfs create [-Pnpsv] [-b blocksize] [-o prop=value] ... -V vol size
@@ -1036,8 +937,6 @@ default_volblocksize(zpool_handle_t *zhp, nvlist_t *props)
* check of arguments and properties, but does not check for permissions,
* available space, etc.
*
* The '-u' flag prevents the newly created file system from being mounted.
*
* The '-v' flag is for verbose output.
*
* The '-P' flag is used for parseable output. It implies '-v'.
@@ -1054,19 +953,17 @@ zfs_do_create(int argc, char **argv)
boolean_t bflag = B_FALSE;
boolean_t parents = B_FALSE;
boolean_t dryrun = B_FALSE;
boolean_t nomount = B_FALSE;
boolean_t verbose = B_FALSE;
boolean_t parseable = B_FALSE;
int ret = 1;
nvlist_t *props;
uint64_t intval;
char *strval;
if (nvlist_alloc(&props, NV_UNIQUE_NAME, 0) != 0)
nomem();
/* check options */
while ((c = getopt(argc, argv, ":PV:b:nso:puv")) != -1) {
while ((c = getopt(argc, argv, ":PV:b:nso:pv")) != -1) {
switch (c) {
case 'V':
type = ZFS_TYPE_VOLUME;
@@ -1113,9 +1010,6 @@ zfs_do_create(int argc, char **argv)
case 's':
noreserve = B_TRUE;
break;
case 'u':
nomount = B_TRUE;
break;
case 'v':
verbose = B_TRUE;
break;
@@ -1135,11 +1029,6 @@ zfs_do_create(int argc, char **argv)
"used when creating a volume\n"));
goto badusage;
}
if (nomount && type != ZFS_TYPE_FILESYSTEM) {
(void) fprintf(stderr, gettext("'-u' can only be "
"used when creating a filesystem\n"));
goto badusage;
}
argc -= optind;
argv += optind;
@@ -1155,7 +1044,7 @@ zfs_do_create(int argc, char **argv)
goto badusage;
}
if (dryrun || type == ZFS_TYPE_VOLUME) {
if (dryrun || (type == ZFS_TYPE_VOLUME && !noreserve)) {
char msg[ZFS_MAX_DATASET_NAME_LEN * 2];
char *p;
@@ -1177,24 +1066,18 @@ zfs_do_create(int argc, char **argv)
}
}
/*
* if volsize is not a multiple of volblocksize, round it up to the
* nearest multiple of the volblocksize
*/
if (type == ZFS_TYPE_VOLUME) {
const char *prop = zfs_prop_to_name(ZFS_PROP_VOLBLOCKSIZE);
uint64_t volblocksize = default_volblocksize(zpool_handle,
real_props);
uint64_t volblocksize;
if (volblocksize != ZVOL_DEFAULT_BLOCKSIZE &&
nvlist_lookup_string(props, prop, &strval) != 0) {
if (asprintf(&strval, "%llu",
(u_longlong_t)volblocksize) == -1)
nomem();
nvlist_add_string(props, prop, strval);
free(strval);
}
if (nvlist_lookup_uint64(props,
zfs_prop_to_name(ZFS_PROP_VOLBLOCKSIZE),
&volblocksize) != 0)
volblocksize = ZVOL_DEFAULT_BLOCKSIZE;
/*
* If volsize is not a multiple of volblocksize, round it
* up to the nearest multiple of the volblocksize.
*/
if (volsize % volblocksize) {
volsize = P2ROUNDUP_TYPED(volsize, volblocksize,
uint64_t);
@@ -1207,9 +1090,11 @@ zfs_do_create(int argc, char **argv)
}
}
if (type == ZFS_TYPE_VOLUME && !noreserve) {
uint64_t spa_version;
zfs_prop_t resv_prop;
char *strval;
spa_version = zpool_get_prop_int(zpool_handle,
ZPOOL_PROP_VERSION, NULL);
@@ -1300,11 +1185,6 @@ zfs_do_create(int argc, char **argv)
log_history = B_FALSE;
}
if (nomount) {
ret = 0;
goto error;
}
ret = zfs_mount_and_share(g_zfs, argv[0], ZFS_TYPE_DATASET);
error:
nvlist_free(props);
@@ -4402,7 +4282,6 @@ zfs_do_send(int argc, char **argv)
struct option long_options[] = {
{"replicate", no_argument, NULL, 'R'},
{"skip-missing", no_argument, NULL, 's'},
{"redact", required_argument, NULL, 'd'},
{"props", no_argument, NULL, 'p'},
{"parsable", no_argument, NULL, 'P'},
@@ -4421,7 +4300,7 @@ zfs_do_send(int argc, char **argv)
};
/* check options */
while ((c = getopt_long(argc, argv, ":i:I:RsDpvnPLeht:cwbd:S",
while ((c = getopt_long(argc, argv, ":i:I:RDpvnPLeht:cwbd:S",
long_options, NULL)) != -1) {
switch (c) {
case 'i':
@@ -4438,9 +4317,6 @@ zfs_do_send(int argc, char **argv)
case 'R':
flags.replicate = B_TRUE;
break;
case 's':
flags.skipmissing = B_TRUE;
break;
case 'd':
redactbook = optarg;
break;
@@ -4610,13 +4486,6 @@ zfs_do_send(int argc, char **argv)
return (err);
}
if (flags.skipmissing && !flags.replicate) {
(void) fprintf(stderr,
gettext("skip-missing flag can only be used in "
"conjunction with replicate\n"));
usage(B_FALSE);
}
/*
* For everything except -R and -I, use the new, cleaner code path.
*/
@@ -7475,6 +7344,7 @@ unshare_unmount(int op, int argc, char **argv)
if (zfs_prop_get_int(zhp, ZFS_PROP_CANMOUNT) ==
ZFS_CANMOUNT_NOAUTO)
continue;
break;
default:
break;
}
+4 -4
View File
@@ -35,7 +35,7 @@ libzfs_handle_t *g_zfs;
static void
usage(int err)
{
fprintf(stderr, "Usage: zfs_ids_to_path [-v] <pool> <objset id> "
fprintf(stderr, "Usage: [-v] zfs_ids_to_path <pool> <objset id> "
"<object id>\n");
exit(err);
}
@@ -63,11 +63,11 @@ main(int argc, char **argv)
uint64_t objset, object;
if (sscanf(argv[1], "%llu", (u_longlong_t *)&objset) != 1) {
(void) fprintf(stderr, "Invalid objset id: %s\n", argv[1]);
(void) fprintf(stderr, "Invalid objset id: %s\n", argv[2]);
usage(2);
}
if (sscanf(argv[2], "%llu", (u_longlong_t *)&object) != 1) {
(void) fprintf(stderr, "Invalid object id: %s\n", argv[2]);
(void) fprintf(stderr, "Invalid object id: %s\n", argv[3]);
usage(3);
}
if ((g_zfs = libzfs_init()) == NULL) {
@@ -76,7 +76,7 @@ main(int argc, char **argv)
}
zpool_handle_t *pool = zpool_open(g_zfs, argv[0]);
if (pool == NULL) {
fprintf(stderr, "Could not open pool %s\n", argv[0]);
fprintf(stderr, "Could not open pool %s\n", argv[1]);
libzfs_fini(g_zfs);
return (5);
}
+14 -4
View File
@@ -36,6 +36,8 @@
#include <time.h>
#include <unistd.h>
static void usage(void);
static void
usage(void)
{
@@ -58,11 +60,12 @@ int
main(int argc, char **argv)
{
/* default file path, can be optionally set by user */
const char *path = "/etc/hostid";
char path[PATH_MAX] = "/etc/hostid";
/* holds converted user input or lrand48() generated value */
unsigned long input_i = 0;
int opt;
int pathlen;
int force_fwrite = 0;
while ((opt = getopt_long(argc, argv, "fo:h?", 0, 0)) != -1) {
switch (opt) {
@@ -70,7 +73,14 @@ main(int argc, char **argv)
force_fwrite = 1;
break;
case 'o':
path = optarg;
pathlen = snprintf(path, sizeof (path), "%s", optarg);
if (pathlen >= sizeof (path)) {
fprintf(stderr, "%s\n", strerror(EOVERFLOW));
exit(EXIT_FAILURE);
} else if (pathlen < 1) {
fprintf(stderr, "%s\n", strerror(EINVAL));
exit(EXIT_FAILURE);
}
break;
case 'h':
case '?':
@@ -108,7 +118,7 @@ main(int argc, char **argv)
if (force_fwrite == 0 && stat(path, &fstat) == 0 &&
S_ISREG(fstat.st_mode)) {
fprintf(stderr, "%s: %s\n", path, strerror(EEXIST));
exit(EXIT_FAILURE);
exit(EXIT_FAILURE);
}
/*
@@ -127,7 +137,7 @@ main(int argc, char **argv)
}
/*
* we need just 4 bytes in native endianness
* we need just 4 bytes in native endianess
* not using sethostid() because it may be missing or just a stub
*/
uint32_t hostid = input_i;
+3 -52
View File
@@ -1,5 +1,4 @@
include $(top_srcdir)/config/Rules.am
include $(top_srcdir)/config/Shellcheck.am
AM_CFLAGS += $(LIBBLKID_CFLAGS) $(LIBUUID_CFLAGS)
@@ -26,7 +25,8 @@ zpool_LDADD = \
$(abs_top_builddir)/lib/libzfs/libzfs.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la \
$(abs_top_builddir)/lib/libnvpair/libnvpair.la \
$(abs_top_builddir)/lib/libuutil/libuutil.la
$(abs_top_builddir)/lib/libuutil/libuutil.la \
$(abs_top_builddir)/lib/libzutil/libzutil.la
zpool_LDADD += $(LTLIBINTL)
@@ -40,7 +40,7 @@ include $(top_srcdir)/config/CppCheck.am
zpoolconfdir = $(sysconfdir)/zfs/zpool.d
zpoolexecdir = $(zfsexecdir)/zpool.d
EXTRA_DIST = zpool.d/README compatibility.d
EXTRA_DIST = zpool.d/README
dist_zpoolexec_SCRIPTS = \
zpool.d/dm-deps \
@@ -130,52 +130,6 @@ zpoolconfdefaults = \
test_progress \
test_ended
zpoolcompatdir = $(pkgdatadir)/compatibility.d
dist_zpoolcompat_DATA = \
compatibility.d/compat-2018 \
compatibility.d/compat-2019 \
compatibility.d/compat-2020 \
compatibility.d/compat-2021 \
compatibility.d/freebsd-11.0 \
compatibility.d/freebsd-11.2 \
compatibility.d/freebsd-11.3 \
compatibility.d/freenas-9.10.2 \
compatibility.d/grub2 \
compatibility.d/openzfsonosx-1.7.0 \
compatibility.d/openzfsonosx-1.8.1 \
compatibility.d/openzfsonosx-1.9.3 \
compatibility.d/openzfs-2.0-freebsd \
compatibility.d/openzfs-2.0-linux \
compatibility.d/openzfs-2.1-freebsd \
compatibility.d/openzfs-2.1-linux \
compatibility.d/zol-0.6.1 \
compatibility.d/zol-0.6.4 \
compatibility.d/zol-0.6.5 \
compatibility.d/zol-0.7 \
compatibility.d/zol-0.8
# canonical <- alias symbolic link pairs
# eg: "2018" is a link to "compat-2018"
zpoolcompatlinks = \
"compat-2018 2018" \
"compat-2019 2019" \
"compat-2020 2020" \
"compat-2021 2021" \
"freebsd-11.0 freebsd-11.1" \
"freebsd-11.0 freenas-11.0" \
"freebsd-11.2 freenas-11.2" \
"freebsd-11.3 freebsd-11.4" \
"freebsd-11.3 freebsd-12.0" \
"freebsd-11.3 freebsd-12.1" \
"freebsd-11.3 freebsd-12.2" \
"freebsd-11.3 freenas-11.3" \
"freenas-11.0 freenas-11.1" \
"openzfsonosx-1.9.3 openzfsonosx-1.9.4" \
"openzfs-2.0-freebsd truenas-12.0" \
"zol-0.7 ubuntu-18.04" \
"zol-0.8 ubuntu-20.04"
install-data-hook:
$(MKDIR_P) "$(DESTDIR)$(zpoolconfdir)"
for f in $(zpoolconfdefaults); do \
@@ -183,6 +137,3 @@ install-data-hook:
-L "$(DESTDIR)$(zpoolconfdir)/$${f}" || \
ln -s "$(zpoolexecdir)/$${f}" "$(DESTDIR)$(zpoolconfdir)"; \
done
for l in $(zpoolcompatlinks); do \
(cd "$(DESTDIR)$(zpoolcompatdir)"; ln -sf $${l} ); \
done
-12
View File
@@ -1,12 +0,0 @@
# Features supported by all Tier 1 platforms as of 2018
async_destroy
bookmarks
embedded_data
empty_bpobj
enabled_txg
extensible_dataset
filesystem_limits
hole_birth
large_blocks
lz4_compress
spacemap_histogram
-15
View File
@@ -1,15 +0,0 @@
# Features supported by all Tier 1 platforms as of 2019
async_destroy
bookmarks
embedded_data
empty_bpobj
enabled_txg
extensible_dataset
filesystem_limits
hole_birth
large_blocks
lz4_compress
multi_vdev_crash_dump
sha512
skein
spacemap_histogram
-15
View File
@@ -1,15 +0,0 @@
# Features supported by all Tier 1 platforms as of 2020
async_destroy
bookmarks
embedded_data
empty_bpobj
enabled_txg
extensible_dataset
filesystem_limits
hole_birth
large_blocks
lz4_compress
multi_vdev_crash_dump
sha512
skein
spacemap_histogram
-19
View File
@@ -1,19 +0,0 @@
# Features supported by all Tier 1 platforms as of 2021
async_destroy
bookmarks
device_removal
embedded_data
empty_bpobj
enabled_txg
extensible_dataset
filesystem_limits
hole_birth
large_blocks
lz4_compress
multi_vdev_crash_dump
obsolete_counts
sha512
skein
spacemap_histogram
spacemap_v2
zpool_checkpoint
-15
View File
@@ -1,15 +0,0 @@
# Features supported by FreeBSD 11.0
async_destroy
bookmarks
embedded_data
empty_bpobj
enabled_txg
extensible_dataset
filesystem_limits
hole_birth
large_blocks
lz4_compress
multi_vdev_crash_dump
sha512
skein
spacemap_histogram
-18
View File
@@ -1,18 +0,0 @@
# Features supported by FreeBSD 11.2
async_destroy
bookmarks
device_removal
embedded_data
empty_bpobj
enabled_txg
extensible_dataset
filesystem_limits
hole_birth
large_blocks
lz4_compress
multi_vdev_crash_dump
obsolete_counts
sha512
skein
spacemap_histogram
zpool_checkpoint
-19
View File
@@ -1,19 +0,0 @@
# Features supported by FreeBSD 11.3
async_destroy
bookmarks
device_removal
embedded_data
empty_bpobj
enabled_txg
extensible_dataset
filesystem_limits
hole_birth
large_blocks
lz4_compress
multi_vdev_crash_dump
obsolete_counts
sha512
skein
spacemap_histogram
spacemap_v2
zpool_checkpoint
-13
View File
@@ -1,13 +0,0 @@
# Features supported by FreeNAS 9.10.2
async_destroy
bookmarks
embedded_data
empty_bpobj
enabled_txg
extensible_dataset
filesystem_limits
hole_birth
large_blocks
lz4_compress
multi_vdev_crash_dump
spacemap_histogram
-12
View File
@@ -1,12 +0,0 @@
# Features which are supported by GRUB2
async_destroy
bookmarks
embedded_data
empty_bpobj
enabled_txg
extensible_dataset
filesystem_limits
hole_birth
large_blocks
lz4_compress
spacemap_histogram
@@ -1,33 +0,0 @@
# Features supported by OpenZFS 2.0 on FreeBSD
allocation_classes
async_destroy
bookmark_v2
bookmark_written
bookmarks
device_rebuild
device_removal
embedded_data
empty_bpobj
enabled_txg
encryption
extensible_dataset
filesystem_limits
hole_birth
large_blocks
large_dnode
livelist
log_spacemap
lz4_compress
multi_vdev_crash_dump
obsolete_counts
project_quota
redacted_datasets
redaction_bookmarks
resilver_defer
sha512
skein
spacemap_histogram
spacemap_v2
userobj_accounting
zpool_checkpoint
zstd_compress
@@ -1,34 +0,0 @@
# Features supported by OpenZFS 2.0 on Linux
allocation_classes
async_destroy
bookmark_v2
bookmark_written
bookmarks
device_rebuild
device_removal
edonr
embedded_data
empty_bpobj
enabled_txg
encryption
extensible_dataset
filesystem_limits
hole_birth
large_blocks
large_dnode
livelist
log_spacemap
lz4_compress
multi_vdev_crash_dump
obsolete_counts
project_quota
redacted_datasets
redaction_bookmarks
resilver_defer
sha512
skein
spacemap_histogram
spacemap_v2
userobj_accounting
zpool_checkpoint
zstd_compress
@@ -1,34 +0,0 @@
# Features supported by OpenZFS 2.1 on FreeBSD
allocation_classes
async_destroy
bookmark_v2
bookmark_written
bookmarks
device_rebuild
device_removal
draid
embedded_data
empty_bpobj
enabled_txg
encryption
extensible_dataset
filesystem_limits
hole_birth
large_blocks
large_dnode
livelist
log_spacemap
lz4_compress
multi_vdev_crash_dump
obsolete_counts
project_quota
redacted_datasets
redaction_bookmarks
resilver_defer
sha512
skein
spacemap_histogram
spacemap_v2
userobj_accounting
zpool_checkpoint
zstd_compress
@@ -1,35 +0,0 @@
# Features supported by OpenZFS 2.1 on Linux
allocation_classes
async_destroy
bookmark_v2
bookmark_written
bookmarks
device_rebuild
device_removal
draid
edonr
embedded_data
empty_bpobj
enabled_txg
encryption
extensible_dataset
filesystem_limits
hole_birth
large_blocks
large_dnode
livelist
log_spacemap
lz4_compress
multi_vdev_crash_dump
obsolete_counts
project_quota
redacted_datasets
redaction_bookmarks
resilver_defer
sha512
skein
spacemap_histogram
spacemap_v2
userobj_accounting
zpool_checkpoint
zstd_compress
@@ -1,16 +0,0 @@
# Features supported by OpenZFSonOSX 1.7.0
async_destroy
bookmarks
edonr
embedded_data
empty_bpobj
enabled_txg
extensible_dataset
filesystem_limits
hole_birth
large_blocks
lz4_compress
multi_vdev_crash_dump
sha512
skein
spacemap_histogram
@@ -1,21 +0,0 @@
# Features supported by OpenZFSonOSX 1.8.1
async_destroy
bookmarks
device_removal
edonr
embedded_data
empty_bpobj
enabled_txg
encryption
extensible_dataset
filesystem_limits
hole_birth
large_blocks
lz4_compress
multi_vdev_crash_dump
obsolete_counts
sha512
skein
spacemap_histogram
spacemap_v2
zpool_checkpoint
@@ -1,27 +0,0 @@
# Features supported by OpenZFSonOSX 1.9.3
allocation_classes
async_destroy
bookmark_v2
bookmarks
device_removal
edonr
embedded_data
empty_bpobj
enabled_txg
encryption
extensible_dataset
filesystem_limits
hole_birth
large_blocks
large_dnode
lz4_compress
multi_vdev_crash_dump
obsolete_counts
project_quota
resilver_defer
sha512
skein
spacemap_histogram
spacemap_v2
userobj_accounting
zpool_checkpoint
-4
View File
@@ -1,4 +0,0 @@
# Features supported by ZFSonLinux v0.6.1
async_destroy
empty_bpobj
lz4_compress
-10
View File
@@ -1,10 +0,0 @@
# Features supported by ZFSonLinux v0.6.4
async_destroy
bookmarks
embedded_data
empty_bpobj
enabled_txg
extensible_dataset
hole_birth
lz4_compress
spacemap_histogram
-12
View File
@@ -1,12 +0,0 @@
# Features supported by ZFSonLinux v0.6.5
async_destroy
bookmarks
embedded_data
empty_bpobj
enabled_txg
extensible_dataset
filesystem_limits
hole_birth
large_blocks
lz4_compress
spacemap_histogram
-18
View File
@@ -1,18 +0,0 @@
# Features supported by ZFSonLinux v0.7
async_destroy
bookmarks
edonr
embedded_data
empty_bpobj
enabled_txg
extensible_dataset
filesystem_limits
hole_birth
large_blocks
large_dnode
lz4_compress
multi_vdev_crash_dump
sha512
skein
spacemap_histogram
userobj_accounting
-27
View File
@@ -1,27 +0,0 @@
# Features supported by ZFSonLinux v0.8
allocation_classes
async_destroy
bookmark_v2
bookmarks
device_removal
edonr
embedded_data
empty_bpobj
enabled_txg
encryption
extensible_dataset
filesystem_limits
hole_birth
large_blocks
large_dnode
lz4_compress
multi_vdev_crash_dump
obsolete_counts
project_quota
resilver_defer
sha512
skein
spacemap_histogram
spacemap_v2
userobj_accounting
zpool_checkpoint
-15
View File
@@ -101,18 +101,3 @@ check_sector_size_database(char *path, int *sector_size)
{
return (0);
}
void
after_zpool_upgrade(zpool_handle_t *zhp)
{
char bootfs[ZPOOL_MAXPROPLEN];
if (zpool_get_prop(zhp, ZPOOL_PROP_BOOTFS, bootfs,
sizeof (bootfs), NULL, B_FALSE) == 0 &&
strcmp(bootfs, "-") != 0) {
(void) printf(gettext("Pool '%s' has the bootfs "
"property set, you might need to update\nthe boot "
"code. See gptzfsboot(8) and loader.efi(8) for "
"details.\n"), zpool_get_name(zhp));
}
}
+3 -5
View File
@@ -79,6 +79,9 @@
#include <scsi/scsi.h>
#include <scsi/sg.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/efi_partition.h>
#include <sys/stat.h>
#include <sys/vtoc.h>
@@ -405,8 +408,3 @@ check_device(const char *path, boolean_t force,
return (error);
}
void
after_zpool_upgrade(zpool_handle_t *zhp)
{
}
+1 -8
View File
@@ -4,7 +4,7 @@
#
if [ "$1" = "-h" ] ; then
echo "Show whether a vdev is a file, hdd, ssd, or iscsi."
echo "Show whether a vdev is a file, hdd, or ssd."
exit
fi
@@ -18,13 +18,6 @@ if [ -b "$VDEV_UPATH" ]; then
if [ "$val" = "1" ]; then
MEDIA="hdd"
fi
vpd_pg83="/sys/block/$device/device/vpd_pg83"
if [ -f "$vpd_pg83" ]; then
if grep -q --binary "iqn." "$vpd_pg83"; then
MEDIA="iscsi"
fi
fi
else
if [ -f "$VDEV_UPATH" ]; then
MEDIA="file"
+7 -1
View File
@@ -41,7 +41,13 @@ for i in $scripts ; do
val=$(ls "$VDEV_ENC_SYSFS_PATH/../device/scsi_generic" 2>/dev/null)
;;
fault_led)
val=$(cat "$VDEV_ENC_SYSFS_PATH/fault" 2>/dev/null)
# JBODs fault LED is called 'fault', NVMe fault LED is called
# 'attention'.
if [ -f "$VDEV_ENC_SYSFS_PATH/fault" ] ; then
val=$(cat "$VDEV_ENC_SYSFS_PATH/fault" 2>/dev/null)
elif [ -f "$VDEV_ENC_SYSFS_PATH/attention" ] ; then
val=$(cat "$VDEV_ENC_SYSFS_PATH/attention" 2>/dev/null)
fi
;;
locate_led)
val=$(cat "$VDEV_ENC_SYSFS_PATH/locate" 2>/dev/null)
+9 -6
View File
@@ -53,7 +53,7 @@ get_filename_from_dir()
num_files=$(find "$dir" -maxdepth 1 -type f | wc -l)
mod=$((pid % num_files))
i=0
find "$dir" -type f -printf '%f\n' | while read -r file ; do
find "$dir" -type f -printf "%f\n" | while read -r file ; do
if [ "$mod" = "$i" ] ; then
echo "$file"
break
@@ -62,14 +62,17 @@ get_filename_from_dir()
done
}
script="${0##*/}"
script=$(basename "$0")
if [ "$1" = "-h" ] ; then
echo "$helpstr" | grep "$script:" | tr -s '\t' | cut -f 2-
exit
fi
if [ -b "$VDEV_UPATH" ] && PATH="/usr/sbin:$PATH" command -v smartctl > /dev/null || [ -n "$samples" ] ; then
smartctl_path=$(command -v smartctl)
# shellcheck disable=SC2015
if [ -b "$VDEV_UPATH" ] && [ -x "$smartctl_path" ] || [ -n "$samples" ] ; then
if [ -n "$samples" ] ; then
# cat a smartctl output text file instead of running smartctl
# on a vdev (only used for developer testing).
@@ -77,7 +80,7 @@ if [ -b "$VDEV_UPATH" ] && PATH="/usr/sbin:$PATH" command -v smartctl > /dev/nul
echo "file=$file"
raw_out=$(cat "$samples/$file")
else
raw_out=$(sudo smartctl -a "$VDEV_UPATH")
raw_out=$(eval "sudo $smartctl_path -a $VDEV_UPATH")
fi
# What kind of drive are we? Look for the right line in smartctl:
@@ -228,11 +231,11 @@ esac
with_vals=$(echo "$out" | grep -E "$scripts")
if [ -n "$with_vals" ]; then
echo "$with_vals"
without_vals=$(echo "$scripts" | tr '|' '\n' |
without_vals=$(echo "$scripts" | tr "|" "\n" |
grep -v -E "$(echo "$with_vals" |
awk -F "=" '{print $1}')" | awk '{print $0"="}')
else
without_vals=$(echo "$scripts" | tr '|' '\n' | awk '{print $0"="}')
without_vals=$(echo "$scripts" | tr "|" "\n" | awk '{print $0"="}')
fi
if [ -n "$without_vals" ]; then
+8 -57
View File
@@ -264,51 +264,6 @@ for_each_pool(int argc, char **argv, boolean_t unavail,
return (ret);
}
static int
for_each_vdev_cb(zpool_handle_t *zhp, nvlist_t *nv, pool_vdev_iter_f func,
void *data)
{
nvlist_t **child;
uint_t c, children;
int ret = 0;
int i;
char *type;
const char *list[] = {
ZPOOL_CONFIG_SPARES,
ZPOOL_CONFIG_L2CACHE,
ZPOOL_CONFIG_CHILDREN
};
for (i = 0; i < ARRAY_SIZE(list); i++) {
if (nvlist_lookup_nvlist_array(nv, list[i], &child,
&children) == 0) {
for (c = 0; c < children; c++) {
uint64_t ishole = 0;
(void) nvlist_lookup_uint64(child[c],
ZPOOL_CONFIG_IS_HOLE, &ishole);
if (ishole)
continue;
ret |= for_each_vdev_cb(zhp, child[c], func,
data);
}
}
}
if (nvlist_lookup_string(nv, ZPOOL_CONFIG_TYPE, &type) != 0)
return (ret);
/* Don't run our function on root vdevs */
if (strcmp(type, VDEV_TYPE_ROOT) != 0) {
ret |= func(zhp, nv, data);
}
return (ret);
}
/*
* This is the equivalent of for_each_pool() for vdevs. It iterates thorough
* all vdevs in the pool, ignoring root vdevs and holes, calling func() on
@@ -327,7 +282,7 @@ for_each_vdev(zpool_handle_t *zhp, pool_vdev_iter_f func, void *data)
verify(nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE,
&nvroot) == 0);
}
return (for_each_vdev_cb(zhp, nvroot, func, data));
return (for_each_vdev_cb((void *) zhp, nvroot, func, data));
}
/*
@@ -494,25 +449,19 @@ vdev_run_cmd(vdev_cmd_data_t *data, char *cmd)
/* Setup our custom environment variables */
rc = asprintf(&env[1], "VDEV_PATH=%s",
data->path ? data->path : "");
if (rc == -1) {
env[1] = NULL;
if (rc == -1)
goto out;
}
rc = asprintf(&env[2], "VDEV_UPATH=%s",
data->upath ? data->upath : "");
if (rc == -1) {
env[2] = NULL;
if (rc == -1)
goto out;
}
rc = asprintf(&env[3], "VDEV_ENC_SYSFS_PATH=%s",
data->vdev_enc_sysfs_path ?
data->vdev_enc_sysfs_path : "");
if (rc == -1) {
env[3] = NULL;
if (rc == -1)
goto out;
}
/* Run the command */
rc = libzfs_run_process_get_stdout_nopath(cmd, argv, env, &lines,
@@ -531,7 +480,8 @@ out:
/* Start with i = 1 since env[0] was statically allocated */
for (i = 1; i < ARRAY_SIZE(env); i++)
free(env[i]);
if (env[i] != NULL)
free(env[i]);
}
/*
@@ -603,7 +553,7 @@ vdev_run_cmd_thread(void *cb_cmd_data)
/* For each vdev in the pool run a command */
static int
for_each_vdev_run_cb(zpool_handle_t *zhp, nvlist_t *nv, void *cb_vcdl)
for_each_vdev_run_cb(void *zhp_data, nvlist_t *nv, void *cb_vcdl)
{
vdev_cmd_data_list_t *vcdl = cb_vcdl;
vdev_cmd_data_t *data;
@@ -611,6 +561,7 @@ for_each_vdev_run_cb(zpool_handle_t *zhp, nvlist_t *nv, void *cb_vcdl)
char *vname = NULL;
char *vdev_enc_sysfs_path = NULL;
int i, match = 0;
zpool_handle_t *zhp = zhp_data;
if (nvlist_lookup_string(nv, ZPOOL_CONFIG_PATH, &path) != 0)
return (1);
+53 -333
View File
@@ -31,8 +31,6 @@
* Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
* Copyright (c) 2017, Intel Corporation.
* Copyright (c) 2019, loli10K <ezomori.nozomu@gmail.com>
* Copyright (c) 2021, Colm Buckley <colm@tuatha.org>
* Copyright [2021] Hewlett Packard Enterprise Development LP
*/
#include <assert.h>
@@ -126,9 +124,6 @@ static int zpool_do_version(int, char **);
static int zpool_do_wait(int, char **);
static zpool_compat_status_t zpool_do_load_compat(
const char *, boolean_t *);
/*
* These libumem hooks provide a reasonable set of defaults for the allocator's
* debugging facilities.
@@ -533,7 +528,7 @@ usage(boolean_t requested)
(void) fprintf(fp, "YES disabled | enabled | active\n");
(void) fprintf(fp, gettext("\nThe feature@ properties must be "
"appended with a feature name.\nSee zpool-features(7).\n"));
"appended with a feature name.\nSee zpool-features(5).\n"));
}
/*
@@ -787,8 +782,6 @@ add_prop_list(const char *propname, char *propval, nvlist_t **props,
if (poolprop) {
const char *vname = zpool_prop_to_name(ZPOOL_PROP_VERSION);
const char *cname =
zpool_prop_to_name(ZPOOL_PROP_COMPATIBILITY);
if ((prop = zpool_name_to_prop(propname)) == ZPOOL_PROP_INVAL &&
!zpool_prop_feature(propname)) {
@@ -811,22 +804,6 @@ add_prop_list(const char *propname, char *propval, nvlist_t **props,
return (2);
}
/*
* if version is specified, only "legacy" compatibility
* may be requested
*/
if ((prop == ZPOOL_PROP_COMPATIBILITY &&
strcmp(propval, ZPOOL_COMPAT_LEGACY) != 0 &&
nvlist_exists(proplist, vname)) ||
(prop == ZPOOL_PROP_VERSION &&
nvlist_exists(proplist, cname) &&
strcmp(fnvlist_lookup_string(proplist, cname),
ZPOOL_COMPAT_LEGACY) != 0)) {
(void) fprintf(stderr, gettext("when 'version' is "
"specified, the 'compatibility' feature may only "
"be set to '" ZPOOL_COMPAT_LEGACY "'\n"));
return (2);
}
if (zpool_prop_feature(propname))
normnm = propname;
@@ -1069,7 +1046,7 @@ zpool_do_add(int argc, char **argv)
free(vname);
}
}
/* And finally the spares */
/* And finaly the spares */
if (nvlist_lookup_nvlist_array(poolnvroot, ZPOOL_CONFIG_SPARES,
&sparechild, &sparechildren) == 0 && sparechildren > 0) {
hadspare = B_TRUE;
@@ -1215,26 +1192,6 @@ zpool_do_remove(int argc, char **argv)
return (ret);
}
/*
* Return 1 if a vdev is active (being used in a pool)
* Return 0 if a vdev is inactive (offlined or faulted, or not in active pool)
*
* This is useful for checking if a disk in an active pool is offlined or
* faulted.
*/
static int
vdev_is_active(char *vdev_path)
{
int fd;
fd = open(vdev_path, O_EXCL);
if (fd < 0) {
return (1); /* cant open O_EXCL - disk is active */
}
close(fd);
return (0); /* disk is inactive in the pool */
}
/*
* zpool labelclear [-f] <vdev>
*
@@ -1344,23 +1301,9 @@ zpool_do_labelclear(int argc, char **argv)
case POOL_STATE_ACTIVE:
case POOL_STATE_SPARE:
case POOL_STATE_L2CACHE:
/*
* We allow the user to call 'zpool offline -f'
* on an offlined disk in an active pool. We can check if
* the disk is online by calling vdev_is_active().
*/
if (force && !vdev_is_active(vdev))
break;
(void) fprintf(stderr, gettext(
"%s is a member (%s) of pool \"%s\""),
"%s is a member (%s) of pool \"%s\"\n"),
vdev, zpool_pool_state_to_name(state), name);
if (force) {
(void) fprintf(stderr, gettext(
". Offline the disk first to clear its label."));
}
printf("\n");
ret = 1;
goto errout;
@@ -1431,15 +1374,13 @@ zpool_do_create(int argc, char **argv)
{
boolean_t force = B_FALSE;
boolean_t dryrun = B_FALSE;
boolean_t enable_pool_features = B_TRUE;
boolean_t enable_all_pool_feat = B_TRUE;
int c;
nvlist_t *nvroot = NULL;
char *poolname;
char *tname = NULL;
int ret = 1;
char *altroot = NULL;
char *compat = NULL;
char *mountpoint = NULL;
nvlist_t *fsprops = NULL;
nvlist_t *props = NULL;
@@ -1455,7 +1396,7 @@ zpool_do_create(int argc, char **argv)
dryrun = B_TRUE;
break;
case 'd':
enable_pool_features = B_FALSE;
enable_all_pool_feat = B_FALSE;
break;
case 'R':
altroot = optarg;
@@ -1493,14 +1434,11 @@ zpool_do_create(int argc, char **argv)
ver = strtoull(propval, &end, 10);
if (*end == '\0' &&
ver < SPA_VERSION_FEATURES) {
enable_pool_features = B_FALSE;
enable_all_pool_feat = B_FALSE;
}
}
if (zpool_name_to_prop(optarg) == ZPOOL_PROP_ALTROOT)
altroot = propval;
if (zpool_name_to_prop(optarg) ==
ZPOOL_PROP_COMPATIBILITY)
compat = propval;
break;
case 'O':
if ((propval = strchr(optarg, '=')) == NULL) {
@@ -1694,27 +1632,10 @@ zpool_do_create(int argc, char **argv)
ret = 0;
} else {
/*
* Load in feature set.
* Note: if compatibility property not given, we'll have
* NULL, which means 'all features'.
* Hand off to libzfs.
*/
boolean_t requested_features[SPA_FEATURES];
if (zpool_do_load_compat(compat, requested_features) !=
ZPOOL_COMPATIBILITY_OK)
goto errout;
/*
* props contains list of features to enable.
* For each feature:
* - remove it if feature@name=disabled
* - leave it there if feature@name=enabled
* - add it if:
* - enable_pool_features (ie: no '-d' or '-o version')
* - it's supported by the kernel module
* - it's in the requested feature set
* - warn if it's enabled but not in compat
*/
for (spa_feature_t i = 0; i < SPA_FEATURES; i++) {
spa_feature_t i;
for (i = 0; i < SPA_FEATURES; i++) {
char propname[MAXPATHLEN];
char *propval;
zfeature_info_t *feat = &spa_feature_table[i];
@@ -1722,22 +1643,17 @@ zpool_do_create(int argc, char **argv)
(void) snprintf(propname, sizeof (propname),
"feature@%s", feat->fi_uname);
/*
* Only features contained in props will be enabled:
* remove from the nvlist every ZFS_FEATURE_DISABLED
* value and add every missing ZFS_FEATURE_ENABLED if
* enable_all_pool_feat is set.
*/
if (!nvlist_lookup_string(props, propname, &propval)) {
if (strcmp(propval, ZFS_FEATURE_DISABLED) == 0)
(void) nvlist_remove_all(props,
propname);
if (strcmp(propval,
ZFS_FEATURE_ENABLED) == 0 &&
!requested_features[i])
(void) fprintf(stderr, gettext(
"Warning: feature \"%s\" enabled "
"but is not in specified "
"'compatibility' feature set.\n"),
feat->fi_uname);
} else if (
enable_pool_features &&
feat->fi_zfs_mod_supported &&
requested_features[i]) {
} else if (enable_all_pool_feat) {
ret = add_prop_list(propname,
ZFS_FEATURE_ENABLED, &props, B_TRUE);
if (ret != 0)
@@ -2093,7 +2009,7 @@ zpool_print_cmd(vdev_cmd_data_list_t *vcdl, const char *pool, char *path)
* Mark empty values with dashes to make output
* awk-able.
*/
if (val == NULL || is_blank_str(val))
if (is_blank_str(val))
val = "-";
printf("%*s", vcdl->uniq_cols_width[j], val);
@@ -2414,10 +2330,6 @@ print_status_config(zpool_handle_t *zhp, status_cbdata_t *cb, const char *name,
(void) printf(gettext("all children offline"));
break;
case VDEV_AUX_BAD_LABEL:
(void) printf(gettext("invalid label"));
break;
default:
(void) printf(gettext("corrupted data"));
break;
@@ -2463,7 +2375,7 @@ print_status_config(zpool_handle_t *zhp, status_cbdata_t *cb, const char *name,
}
}
/* Display vdev initialization and trim status for leaves. */
/* Display vdev initialization and trim status for leaves */
if (children == 0) {
print_status_initialize(vs, cb->cb_print_vdev_init);
print_status_trim(vs, cb->cb_print_vdev_trim);
@@ -2560,10 +2472,6 @@ print_import_config(status_cbdata_t *cb, const char *name, nvlist_t *nv,
(void) printf(gettext("all children offline"));
break;
case VDEV_AUX_BAD_LABEL:
(void) printf(gettext("invalid label"));
break;
default:
(void) printf(gettext("corrupted data"));
break;
@@ -2772,24 +2680,8 @@ show_import(nvlist_t *config, boolean_t report_error)
case ZPOOL_STATUS_FEAT_DISABLED:
printf_color(ANSI_BOLD, gettext("status: "));
printf_color(ANSI_YELLOW, gettext("Some supported "
"features are not enabled on the pool.\n\t"
"(Note that they may be intentionally disabled "
"if the\n\t'compatibility' property is set.)\n"));
break;
case ZPOOL_STATUS_COMPATIBILITY_ERR:
printf_color(ANSI_BOLD, gettext("status: "));
printf_color(ANSI_YELLOW, gettext("Error reading or parsing "
"the file(s) indicated by the 'compatibility'\n"
"property.\n"));
break;
case ZPOOL_STATUS_INCOMPATIBLE_FEAT:
printf_color(ANSI_BOLD, gettext("status: "));
printf_color(ANSI_YELLOW, gettext("One or more features "
"are enabled on the pool despite not being\n"
"requested by the 'compatibility' property.\n"));
printf_color(ANSI_YELLOW, gettext("Some supported features are "
"not enabled on the pool.\n"));
break;
case ZPOOL_STATUS_UNSUP_FEAT_READ:
@@ -2881,12 +2773,6 @@ show_import(nvlist_t *config, boolean_t report_error)
"imported using its name or numeric identifier, "
"though\n\tsome features will not be available "
"without an explicit 'zpool upgrade'.\n"));
} else if (reason == ZPOOL_STATUS_COMPATIBILITY_ERR) {
(void) printf(gettext(" action: The pool can be "
"imported using its name or numeric\n\tidentifier, "
"though the file(s) indicated by its "
"'compatibility'\n\tproperty cannot be parsed at "
"this time.\n"));
} else if (reason == ZPOOL_STATUS_HOSTID_MISMATCH) {
(void) printf(gettext(" action: The pool can be "
"imported using its name or numeric "
@@ -3798,10 +3684,9 @@ zpool_do_import(int argc, char **argv)
return (1);
}
err = import_pools(pools, props, mntopts, flags,
argc >= 1 ? argv[0] : NULL,
argc >= 2 ? argv[1] : NULL,
do_destroyed, pool_specified, do_all, &idata);
err = import_pools(pools, props, mntopts, flags, argv[0],
argc == 1 ? NULL : argv[1], do_destroyed, pool_specified,
do_all, &idata);
/*
* If we're using the cachefile and we failed to import, then
@@ -3821,10 +3706,9 @@ zpool_do_import(int argc, char **argv)
nvlist_free(pools);
pools = zpool_search_import(g_zfs, &idata, &libzfs_config_ops);
err = import_pools(pools, props, mntopts, flags,
argc >= 1 ? argv[0] : NULL,
argc >= 2 ? argv[1] : NULL,
do_destroyed, pool_specified, do_all, &idata);
err = import_pools(pools, props, mntopts, flags, argv[0],
argc == 1 ? NULL : argv[1], do_destroyed, pool_specified,
do_all, &idata);
}
error:
@@ -5165,11 +5049,12 @@ get_stat_flags(zpool_list_t *list)
* Return 1 if cb_data->cb_vdev_names[0] is this vdev's name, 0 otherwise.
*/
static int
is_vdev_cb(zpool_handle_t *zhp, nvlist_t *nv, void *cb_data)
is_vdev_cb(void *zhp_data, nvlist_t *nv, void *cb_data)
{
iostat_cbdata_t *cb = cb_data;
char *name = NULL;
int ret = 0;
zpool_handle_t *zhp = zhp_data;
name = zpool_vdev_name(g_zfs, zhp, nv, cb->cb_name_flags);
@@ -8120,9 +8005,7 @@ status_callback(zpool_handle_t *zhp, void *data)
if (cbp->cb_explain &&
(reason == ZPOOL_STATUS_OK ||
reason == ZPOOL_STATUS_VERSION_OLDER ||
reason == ZPOOL_STATUS_FEAT_DISABLED ||
reason == ZPOOL_STATUS_COMPATIBILITY_ERR ||
reason == ZPOOL_STATUS_INCOMPATIBLE_FEAT)) {
reason == ZPOOL_STATUS_FEAT_DISABLED)) {
if (!cbp->cb_allpools) {
(void) printf(gettext("pool '%s' is healthy\n"),
zpool_get_name(zhp));
@@ -8297,40 +8180,14 @@ status_callback(zpool_handle_t *zhp, void *data)
case ZPOOL_STATUS_FEAT_DISABLED:
printf_color(ANSI_BOLD, gettext("status: "));
printf_color(ANSI_YELLOW, gettext("Some supported and "
"requested features are not enabled on the pool.\n\t"
"The pool can still be used, but some features are "
"unavailable.\n"));
printf_color(ANSI_YELLOW, gettext("Some supported features are "
"not enabled on the pool. The pool can\n\tstill be used, "
"but some features are unavailable.\n"));
printf_color(ANSI_BOLD, gettext("action: "));
printf_color(ANSI_YELLOW, gettext("Enable all features using "
"'zpool upgrade'. Once this is done,\n\tthe pool may no "
"longer be accessible by software that does not support\n\t"
"the features. See zpool-features(7) for details.\n"));
break;
case ZPOOL_STATUS_COMPATIBILITY_ERR:
printf_color(ANSI_BOLD, gettext("status: "));
printf_color(ANSI_YELLOW, gettext("This pool has a "
"compatibility list specified, but it could not be\n\t"
"read/parsed at this time. The pool can still be used, "
"but this\n\tshould be investigated.\n"));
printf_color(ANSI_BOLD, gettext("action: "));
printf_color(ANSI_YELLOW, gettext("Check the value of the "
"'compatibility' property against the\n\t"
"appropriate file in " ZPOOL_SYSCONF_COMPAT_D " or "
ZPOOL_DATA_COMPAT_D ".\n"));
break;
case ZPOOL_STATUS_INCOMPATIBLE_FEAT:
printf_color(ANSI_BOLD, gettext("status: "));
printf_color(ANSI_YELLOW, gettext("One or more features "
"are enabled on the pool despite not being\n\t"
"requested by the 'compatibility' property.\n"));
printf_color(ANSI_BOLD, gettext("action: "));
printf_color(ANSI_YELLOW, gettext("Consider setting "
"'compatibility' to an appropriate value, or\n\t"
"adding needed features to the relevant file in\n\t"
ZPOOL_SYSCONF_COMPAT_D " or " ZPOOL_DATA_COMPAT_D ".\n"));
"the features. See zpool-features(5) for details.\n"));
break;
case ZPOOL_STATUS_UNSUP_FEAT_READ:
@@ -8792,11 +8649,6 @@ upgrade_version(zpool_handle_t *zhp, uint64_t version)
verify(nvlist_lookup_uint64(config, ZPOOL_CONFIG_VERSION,
&oldversion) == 0);
char compat[ZFS_MAXPROPLEN];
if (zpool_get_prop(zhp, ZPOOL_PROP_COMPATIBILITY, compat,
ZFS_MAXPROPLEN, NULL, B_FALSE) != 0)
compat[0] = '\0';
assert(SPA_VERSION_IS_SUPPORTED(oldversion));
assert(oldversion < version);
@@ -8811,13 +8663,6 @@ upgrade_version(zpool_handle_t *zhp, uint64_t version)
return (1);
}
if (strcmp(compat, ZPOOL_COMPAT_LEGACY) == 0) {
(void) fprintf(stderr, gettext("Upgrade not performed because "
"'compatibility' property set to '"
ZPOOL_COMPAT_LEGACY "'.\n"));
return (1);
}
ret = zpool_upgrade(zhp, version);
if (ret != 0)
return (ret);
@@ -8843,25 +8688,11 @@ upgrade_enable_all(zpool_handle_t *zhp, int *countp)
boolean_t firstff = B_TRUE;
nvlist_t *enabled = zpool_get_features(zhp);
char compat[ZFS_MAXPROPLEN];
if (zpool_get_prop(zhp, ZPOOL_PROP_COMPATIBILITY, compat,
ZFS_MAXPROPLEN, NULL, B_FALSE) != 0)
compat[0] = '\0';
boolean_t requested_features[SPA_FEATURES];
if (zpool_do_load_compat(compat, requested_features) !=
ZPOOL_COMPATIBILITY_OK)
return (-1);
count = 0;
for (i = 0; i < SPA_FEATURES; i++) {
const char *fname = spa_feature_table[i].fi_uname;
const char *fguid = spa_feature_table[i].fi_guid;
if (!spa_feature_table[i].fi_zfs_mod_supported)
continue;
if (!nvlist_exists(enabled, fguid) && requested_features[i]) {
if (!nvlist_exists(enabled, fguid)) {
char *propname;
verify(-1 != asprintf(&propname, "feature@%s", fname));
ret = zpool_set_prop(zhp, propname,
@@ -8894,7 +8725,7 @@ upgrade_cb(zpool_handle_t *zhp, void *arg)
upgrade_cbdata_t *cbp = arg;
nvlist_t *config;
uint64_t version;
boolean_t modified_pool = B_FALSE;
boolean_t printnl = B_FALSE;
int ret;
config = zpool_get_config(zhp, NULL);
@@ -8908,7 +8739,7 @@ upgrade_cb(zpool_handle_t *zhp, void *arg)
ret = upgrade_version(zhp, cbp->cb_version);
if (ret != 0)
return (ret);
modified_pool = B_TRUE;
printnl = B_TRUE;
/*
* If they did "zpool upgrade -a", then we could
@@ -8928,13 +8759,12 @@ upgrade_cb(zpool_handle_t *zhp, void *arg)
if (count > 0) {
cbp->cb_first = B_FALSE;
modified_pool = B_TRUE;
printnl = B_TRUE;
}
}
if (modified_pool) {
(void) printf("\n");
(void) after_zpool_upgrade(zhp);
if (printnl) {
(void) printf(gettext("\n"));
}
return (0);
@@ -8960,10 +8790,7 @@ upgrade_list_older_cb(zpool_handle_t *zhp, void *arg)
"be upgraded to use feature flags. After "
"being upgraded, these pools\nwill no "
"longer be accessible by software that does not "
"support feature\nflags.\n\n"
"Note that setting a pool's 'compatibility' "
"feature to '" ZPOOL_COMPAT_LEGACY "' will\n"
"inhibit upgrades.\n\n"));
"support feature\nflags.\n\n"));
(void) printf(gettext("VER POOL\n"));
(void) printf(gettext("--- ------------\n"));
cbp->cb_first = B_FALSE;
@@ -8995,10 +8822,6 @@ upgrade_list_disabled_cb(zpool_handle_t *zhp, void *arg)
for (i = 0; i < SPA_FEATURES; i++) {
const char *fguid = spa_feature_table[i].fi_guid;
const char *fname = spa_feature_table[i].fi_uname;
if (!spa_feature_table[i].fi_zfs_mod_supported)
continue;
if (!nvlist_exists(enabled, fguid)) {
if (cbp->cb_first) {
(void) printf(gettext("\nSome "
@@ -9008,12 +8831,8 @@ upgrade_list_disabled_cb(zpool_handle_t *zhp, void *arg)
"pool may become incompatible with "
"software\nthat does not support "
"the feature. See "
"zpool-features(7) for "
"details.\n\n"
"Note that the pool "
"'compatibility' feature can be "
"used to inhibit\nfeature "
"upgrades.\n\n"));
"zpool-features(5) for "
"details.\n\n"));
(void) printf(gettext("POOL "
"FEATURE\n"));
(void) printf(gettext("------"
@@ -9047,7 +8866,7 @@ upgrade_list_disabled_cb(zpool_handle_t *zhp, void *arg)
static int
upgrade_one(zpool_handle_t *zhp, void *data)
{
boolean_t modified_pool = B_FALSE;
boolean_t printnl = B_FALSE;
upgrade_cbdata_t *cbp = data;
uint64_t cur_version;
int ret;
@@ -9075,7 +8894,7 @@ upgrade_one(zpool_handle_t *zhp, void *data)
}
if (cur_version != cbp->cb_version) {
modified_pool = B_TRUE;
printnl = B_TRUE;
ret = upgrade_version(zhp, cbp->cb_version);
if (ret != 0)
return (ret);
@@ -9088,17 +8907,16 @@ upgrade_one(zpool_handle_t *zhp, void *data)
return (ret);
if (count != 0) {
modified_pool = B_TRUE;
printnl = B_TRUE;
} else if (cur_version == SPA_VERSION) {
(void) printf(gettext("Pool '%s' already has all "
"supported and requested features enabled.\n"),
"supported features enabled.\n"),
zpool_get_name(zhp));
}
}
if (modified_pool) {
(void) printf("\n");
(void) after_zpool_upgrade(zhp);
if (printnl) {
(void) printf(gettext("\n"));
}
return (0);
@@ -9193,8 +9011,6 @@ zpool_do_upgrade(int argc, char **argv)
"---------------\n");
for (i = 0; i < SPA_FEATURES; i++) {
zfeature_info_t *fi = &spa_feature_table[i];
if (!fi->fi_zfs_mod_supported)
continue;
const char *ro =
(fi->fi_flags & ZFEATURE_FLAG_READONLY_COMPAT) ?
" (read-only compatible)" : "";
@@ -9255,8 +9071,8 @@ zpool_do_upgrade(int argc, char **argv)
(void) printf(gettext("All pools are already "
"formatted using feature flags.\n\n"));
(void) printf(gettext("Every feature flags "
"pool already has all supported and "
"requested features enabled.\n"));
"pool already has all supported features "
"enabled.\n"));
} else {
(void) printf(gettext("All pools are already "
"formatted with version %llu or higher.\n"),
@@ -9282,7 +9098,7 @@ zpool_do_upgrade(int argc, char **argv)
if (cb.cb_first) {
(void) printf(gettext("Every feature flags pool has "
"all supported and requested features enabled.\n"));
"all supported features enabled.\n"));
} else {
(void) printf(gettext("\n"));
}
@@ -9311,7 +9127,7 @@ print_history_records(nvlist_t *nvhis, hist_cbdata_t *cb)
&records, &numrecords) == 0);
for (i = 0; i < numrecords; i++) {
nvlist_t *rec = records[i];
char tbuf[64] = "";
char tbuf[30] = "";
if (nvlist_exists(rec, ZPOOL_HIST_TIME)) {
time_t tsec;
@@ -9323,14 +9139,6 @@ print_history_records(nvlist_t *nvhis, hist_cbdata_t *cb)
(void) strftime(tbuf, sizeof (tbuf), "%F.%T", &t);
}
if (nvlist_exists(rec, ZPOOL_HIST_ELAPSED_NS)) {
uint64_t elapsed_ns = fnvlist_lookup_int64(records[i],
ZPOOL_HIST_ELAPSED_NS);
(void) snprintf(tbuf + strlen(tbuf),
sizeof (tbuf) - strlen(tbuf),
" (%lldms)", (long long)elapsed_ns / 1000 / 1000);
}
if (nvlist_exists(rec, ZPOOL_HIST_CMD)) {
(void) printf("%s %s", tbuf,
fnvlist_lookup_string(rec, ZPOOL_HIST_CMD));
@@ -10070,63 +9878,6 @@ set_callback(zpool_handle_t *zhp, void *data)
int error;
set_cbdata_t *cb = (set_cbdata_t *)data;
/* Check if we have out-of-bounds features */
if (strcmp(cb->cb_propname, ZPOOL_CONFIG_COMPATIBILITY) == 0) {
boolean_t features[SPA_FEATURES];
if (zpool_do_load_compat(cb->cb_value, features) !=
ZPOOL_COMPATIBILITY_OK)
return (-1);
nvlist_t *enabled = zpool_get_features(zhp);
spa_feature_t i;
for (i = 0; i < SPA_FEATURES; i++) {
const char *fguid = spa_feature_table[i].fi_guid;
if (nvlist_exists(enabled, fguid) && !features[i])
break;
}
if (i < SPA_FEATURES)
(void) fprintf(stderr, gettext("Warning: one or "
"more features already enabled on pool '%s'\n"
"are not present in this compatibility set.\n"),
zpool_get_name(zhp));
}
/* if we're setting a feature, check it's in compatibility set */
if (zpool_prop_feature(cb->cb_propname) &&
strcmp(cb->cb_value, ZFS_FEATURE_ENABLED) == 0) {
char *fname = strchr(cb->cb_propname, '@') + 1;
spa_feature_t f;
if (zfeature_lookup_name(fname, &f) == 0) {
char compat[ZFS_MAXPROPLEN];
if (zpool_get_prop(zhp, ZPOOL_PROP_COMPATIBILITY,
compat, ZFS_MAXPROPLEN, NULL, B_FALSE) != 0)
compat[0] = '\0';
boolean_t features[SPA_FEATURES];
if (zpool_do_load_compat(compat, features) !=
ZPOOL_COMPATIBILITY_OK) {
(void) fprintf(stderr, gettext("Error: "
"cannot enable feature '%s' on pool '%s'\n"
"because the pool's 'compatibility' "
"property cannot be parsed.\n"),
fname, zpool_get_name(zhp));
return (-1);
}
if (!features[f]) {
(void) fprintf(stderr, gettext("Error: "
"cannot enable feature '%s' on pool '%s'\n"
"as it is not specified in this pool's "
"current compatibility set.\n"
"Consider setting 'compatibility' to a "
"less restrictive set, or to 'off'.\n"),
fname, zpool_get_name(zhp));
return (-1);
}
}
}
error = zpool_set_prop(zhp, cb->cb_propname, cb->cb_value);
if (!error)
@@ -10257,8 +10008,7 @@ vdev_any_spare_replacing(nvlist_t *nv)
(void) nvlist_lookup_string(nv, ZPOOL_CONFIG_TYPE, &vdev_type);
if (strcmp(vdev_type, VDEV_TYPE_REPLACING) == 0 ||
strcmp(vdev_type, VDEV_TYPE_SPARE) == 0 ||
strcmp(vdev_type, VDEV_TYPE_DRAID_SPARE) == 0) {
strcmp(vdev_type, VDEV_TYPE_SPARE) == 0) {
return (B_TRUE);
}
@@ -10643,36 +10393,6 @@ zpool_do_version(int argc, char **argv)
return (0);
}
/*
* Do zpool_load_compat() and print error message on failure
*/
static zpool_compat_status_t
zpool_do_load_compat(const char *compat, boolean_t *list)
{
char report[1024];
zpool_compat_status_t ret;
ret = zpool_load_compat(compat, list, report, 1024);
switch (ret) {
case ZPOOL_COMPATIBILITY_OK:
break;
case ZPOOL_COMPATIBILITY_NOFILES:
case ZPOOL_COMPATIBILITY_BADFILE:
case ZPOOL_COMPATIBILITY_BADTOKEN:
(void) fprintf(stderr, "Error: %s\n", report);
break;
case ZPOOL_COMPATIBILITY_WARNTOKEN:
(void) fprintf(stderr, "Warning: %s\n", report);
ret = ZPOOL_COMPATIBILITY_OK;
break;
}
return (ret);
}
int
main(int argc, char **argv)
{
+1 -2
View File
@@ -27,6 +27,7 @@
#include <libnvpair.h>
#include <libzfs.h>
#include <libzutil.h>
#ifdef __cplusplus
extern "C" {
@@ -67,7 +68,6 @@ int for_each_pool(int, char **, boolean_t unavail, zprop_list_t **,
boolean_t, zpool_iter_f, void *);
/* Vdev list functions */
typedef int (*pool_vdev_iter_f)(zpool_handle_t *, nvlist_t *, void *);
int for_each_vdev(zpool_handle_t *zhp, pool_vdev_iter_f func, void *data);
typedef struct zpool_list zpool_list_t;
@@ -129,7 +129,6 @@ int check_device(const char *path, boolean_t force,
boolean_t check_sector_size_database(char *path, int *sector_size);
void vdev_error(const char *fmt, ...);
int check_file(const char *file, boolean_t force, boolean_t isspare);
void after_zpool_upgrade(zpool_handle_t *zhp);
#ifdef __cplusplus
}
+56 -345
View File
@@ -86,6 +86,9 @@
boolean_t error_seen;
boolean_t is_force;
/*PRINTFLIKE1*/
void
vdev_error(const char *fmt, ...)
@@ -219,9 +222,6 @@ is_spare(nvlist_t *config, const char *path)
uint_t i, nspares;
boolean_t inuse;
if (zpool_is_draid_spare(path))
return (B_TRUE);
if ((fd = open(path, O_RDONLY|O_DIRECT)) < 0)
return (B_FALSE);
@@ -267,10 +267,9 @@ is_spare(nvlist_t *config, const char *path)
* /dev/xxx Complete disk path
* /xxx Full path to file
* xxx Shorthand for <zfs_vdev_paths>/xxx
* draid* Virtual dRAID spare
*/
static nvlist_t *
make_leaf_vdev(nvlist_t *props, const char *arg, boolean_t is_primary)
make_leaf_vdev(nvlist_t *props, const char *arg, uint64_t is_log)
{
char path[MAXPATHLEN];
struct stat64 statbuf;
@@ -310,17 +309,6 @@ make_leaf_vdev(nvlist_t *props, const char *arg, boolean_t is_primary)
/* After whole disk check restore original passed path */
strlcpy(path, arg, sizeof (path));
} else if (zpool_is_draid_spare(arg)) {
if (!is_primary) {
(void) fprintf(stderr,
gettext("cannot open '%s': dRAID spares can only "
"be used to replace primary vdevs\n"), arg);
return (NULL);
}
wholedisk = B_TRUE;
strlcpy(path, arg, sizeof (path));
type = VDEV_TYPE_DRAID_SPARE;
} else {
err = is_shorthand_path(arg, path, sizeof (path),
&statbuf, &wholedisk);
@@ -349,19 +337,17 @@ make_leaf_vdev(nvlist_t *props, const char *arg, boolean_t is_primary)
}
}
if (type == NULL) {
/*
* Determine whether this is a device or a file.
*/
if (wholedisk || S_ISBLK(statbuf.st_mode)) {
type = VDEV_TYPE_DISK;
} else if (S_ISREG(statbuf.st_mode)) {
type = VDEV_TYPE_FILE;
} else {
fprintf(stderr, gettext("cannot use '%s': must "
"be a block device or regular file\n"), path);
return (NULL);
}
/*
* Determine whether this is a device or a file.
*/
if (wholedisk || S_ISBLK(statbuf.st_mode)) {
type = VDEV_TYPE_DISK;
} else if (S_ISREG(statbuf.st_mode)) {
type = VDEV_TYPE_FILE;
} else {
(void) fprintf(stderr, gettext("cannot use '%s': must be a "
"block device or regular file\n"), path);
return (NULL);
}
/*
@@ -372,7 +358,10 @@ make_leaf_vdev(nvlist_t *props, const char *arg, boolean_t is_primary)
verify(nvlist_alloc(&vdev, NV_UNIQUE_NAME, 0) == 0);
verify(nvlist_add_string(vdev, ZPOOL_CONFIG_PATH, path) == 0);
verify(nvlist_add_string(vdev, ZPOOL_CONFIG_TYPE, type) == 0);
verify(nvlist_add_uint64(vdev, ZPOOL_CONFIG_IS_LOG, is_log) == 0);
if (is_log)
verify(nvlist_add_string(vdev, ZPOOL_CONFIG_ALLOCATION_BIAS,
VDEV_ALLOC_BIAS_LOG) == 0);
if (strcmp(type, VDEV_TYPE_DISK) == 0)
verify(nvlist_add_uint64(vdev, ZPOOL_CONFIG_WHOLE_DISK,
(uint64_t)wholedisk) == 0);
@@ -443,16 +432,11 @@ typedef struct replication_level {
#define ZPOOL_FUZZ (16 * 1024 * 1024)
/*
* N.B. For the purposes of comparing replication levels dRAID can be
* considered functionally equivalent to raidz.
*/
static boolean_t
is_raidz_mirror(replication_level_t *a, replication_level_t *b,
replication_level_t **raidz, replication_level_t **mirror)
{
if ((strcmp(a->zprl_type, "raidz") == 0 ||
strcmp(a->zprl_type, "draid") == 0) &&
if (strcmp(a->zprl_type, "raidz") == 0 &&
strcmp(b->zprl_type, "mirror") == 0) {
*raidz = a;
*mirror = b;
@@ -461,22 +445,6 @@ is_raidz_mirror(replication_level_t *a, replication_level_t *b,
return (B_FALSE);
}
/*
* Comparison for determining if dRAID and raidz where passed in either order.
*/
static boolean_t
is_raidz_draid(replication_level_t *a, replication_level_t *b)
{
if ((strcmp(a->zprl_type, "raidz") == 0 ||
strcmp(a->zprl_type, "draid") == 0) &&
(strcmp(b->zprl_type, "raidz") == 0 ||
strcmp(b->zprl_type, "draid") == 0)) {
return (B_TRUE);
}
return (B_FALSE);
}
/*
* Given a list of toplevel vdevs, return the current replication level. If
* the config is inconsistent, then NULL is returned. If 'fatal' is set, then
@@ -543,8 +511,7 @@ get_replication(nvlist_t *nvroot, boolean_t fatal)
rep.zprl_type = type;
rep.zprl_children = 0;
if (strcmp(type, VDEV_TYPE_RAIDZ) == 0 ||
strcmp(type, VDEV_TYPE_DRAID) == 0) {
if (strcmp(type, VDEV_TYPE_RAIDZ) == 0) {
verify(nvlist_lookup_uint64(nv,
ZPOOL_CONFIG_NPARITY,
&rep.zprl_parity) == 0);
@@ -710,29 +677,6 @@ get_replication(nvlist_t *nvroot, boolean_t fatal)
else
return (NULL);
}
} else if (is_raidz_draid(&lastrep, &rep)) {
/*
* Accepted raidz and draid when they can
* handle the same number of disk failures.
*/
if (lastrep.zprl_parity != rep.zprl_parity) {
if (ret != NULL)
free(ret);
ret = NULL;
if (fatal)
vdev_error(gettext(
"mismatched replication "
"level: %s and %s vdevs "
"with different "
"redundancy, %llu vs. "
"%llu are present\n"),
lastrep.zprl_type,
rep.zprl_type,
lastrep.zprl_parity,
rep.zprl_parity);
else
return (NULL);
}
} else if (strcmp(lastrep.zprl_type, rep.zprl_type) !=
0) {
if (ret != NULL)
@@ -1159,87 +1103,31 @@ is_device_in_use(nvlist_t *config, nvlist_t *nv, boolean_t force,
return (anyinuse);
}
/*
* Returns the parity level extracted from a raidz or draid type.
* If the parity cannot be determined zero is returned.
*/
static int
get_parity(const char *type)
{
long parity = 0;
const char *p;
if (strncmp(type, VDEV_TYPE_RAIDZ, strlen(VDEV_TYPE_RAIDZ)) == 0) {
p = type + strlen(VDEV_TYPE_RAIDZ);
if (*p == '\0') {
/* when unspecified default to single parity */
return (1);
} else if (*p == '0') {
/* no zero prefixes allowed */
return (0);
} else {
/* 0-3, no suffixes allowed */
char *end;
errno = 0;
parity = strtol(p, &end, 10);
if (errno != 0 || *end != '\0' ||
parity < 1 || parity > VDEV_RAIDZ_MAXPARITY) {
return (0);
}
}
} else if (strncmp(type, VDEV_TYPE_DRAID,
strlen(VDEV_TYPE_DRAID)) == 0) {
p = type + strlen(VDEV_TYPE_DRAID);
if (*p == '\0' || *p == ':') {
/* when unspecified default to single parity */
return (1);
} else if (*p == '0') {
/* no zero prefixes allowed */
return (0);
} else {
/* 0-3, allowed suffixes: '\0' or ':' */
char *end;
errno = 0;
parity = strtol(p, &end, 10);
if (errno != 0 ||
parity < 1 || parity > VDEV_DRAID_MAXPARITY ||
(*end != '\0' && *end != ':')) {
return (0);
}
}
}
return ((int)parity);
}
/*
* Assign the minimum and maximum number of devices allowed for
* the specified type. On error NULL is returned, otherwise the
* type prefix is returned (raidz, mirror, etc).
*/
static const char *
is_grouping(const char *type, int *mindev, int *maxdev)
{
int nparity;
if (strncmp(type, "raidz", 5) == 0) {
const char *p = type + 5;
char *end;
long nparity;
if (*p == '\0') {
nparity = 1;
} else if (*p == '0') {
return (NULL); /* no zero prefixes allowed */
} else {
errno = 0;
nparity = strtol(p, &end, 10);
if (errno != 0 || nparity < 1 || nparity >= 255 ||
*end != '\0')
return (NULL);
}
if (strncmp(type, VDEV_TYPE_RAIDZ, strlen(VDEV_TYPE_RAIDZ)) == 0 ||
strncmp(type, VDEV_TYPE_DRAID, strlen(VDEV_TYPE_DRAID)) == 0) {
nparity = get_parity(type);
if (nparity == 0)
return (NULL);
if (mindev != NULL)
*mindev = nparity + 1;
if (maxdev != NULL)
*maxdev = 255;
if (strncmp(type, VDEV_TYPE_RAIDZ,
strlen(VDEV_TYPE_RAIDZ)) == 0) {
return (VDEV_TYPE_RAIDZ);
} else {
return (VDEV_TYPE_DRAID);
}
return (VDEV_TYPE_RAIDZ);
}
if (maxdev != NULL)
@@ -1279,163 +1167,6 @@ is_grouping(const char *type, int *mindev, int *maxdev)
return (NULL);
}
/*
* Extract the configuration parameters encoded in the dRAID type and
* use them to generate a dRAID configuration. The expected format is:
*
* draid[<parity>][:<data><d|D>][:<children><c|C>][:<spares><s|S>]
*
* The intent is to be able to generate a good configuration when no
* additional information is provided. The only mandatory component
* of the 'type' is the 'draid' prefix. If a value is not provided
* then reasonable defaults are used. The optional components may
* appear in any order but the d/s/c suffix is required.
*
* Valid inputs:
* - data: number of data devices per group (1-255)
* - parity: number of parity blocks per group (1-3)
* - spares: number of distributed spare (0-100)
* - children: total number of devices (1-255)
*
* Examples:
* - zpool create tank draid <devices...>
* - zpool create tank draid2:8d:51c:2s <devices...>
*/
static int
draid_config_by_type(nvlist_t *nv, const char *type, uint64_t children)
{
uint64_t nparity = 1;
uint64_t nspares = 0;
uint64_t ndata = UINT64_MAX;
uint64_t ngroups = 1;
long value;
if (strncmp(type, VDEV_TYPE_DRAID, strlen(VDEV_TYPE_DRAID)) != 0)
return (EINVAL);
nparity = (uint64_t)get_parity(type);
if (nparity == 0)
return (EINVAL);
char *p = (char *)type;
while ((p = strchr(p, ':')) != NULL) {
char *end;
p = p + 1;
errno = 0;
if (!isdigit(p[0])) {
(void) fprintf(stderr, gettext("invalid dRAID "
"syntax; expected [:<number><c|d|s>] not '%s'\n"),
type);
return (EINVAL);
}
/* Expected non-zero value with c/d/s suffix */
value = strtol(p, &end, 10);
char suffix = tolower(*end);
if (errno != 0 ||
(suffix != 'c' && suffix != 'd' && suffix != 's')) {
(void) fprintf(stderr, gettext("invalid dRAID "
"syntax; expected [:<number><c|d|s>] not '%s'\n"),
type);
return (EINVAL);
}
if (suffix == 'c') {
if ((uint64_t)value != children) {
fprintf(stderr,
gettext("invalid number of dRAID children; "
"%llu required but %llu provided\n"),
(u_longlong_t)value,
(u_longlong_t)children);
return (EINVAL);
}
} else if (suffix == 'd') {
ndata = (uint64_t)value;
} else if (suffix == 's') {
nspares = (uint64_t)value;
} else {
verify(0); /* Unreachable */
}
}
/*
* When a specific number of data disks is not provided limit a
* redundancy group to 8 data disks. This value was selected to
* provide a reasonable tradeoff between capacity and performance.
*/
if (ndata == UINT64_MAX) {
if (children > nspares + nparity) {
ndata = MIN(children - nspares - nparity, 8);
} else {
fprintf(stderr, gettext("request number of "
"distributed spares %llu and parity level %llu\n"
"leaves no disks available for data\n"),
(u_longlong_t)nspares, (u_longlong_t)nparity);
return (EINVAL);
}
}
/* Verify the maximum allowed group size is never exceeded. */
if (ndata == 0 || (ndata + nparity > children - nspares)) {
fprintf(stderr, gettext("requested number of dRAID data "
"disks per group %llu is too high,\nat most %llu disks "
"are available for data\n"), (u_longlong_t)ndata,
(u_longlong_t)(children - nspares - nparity));
return (EINVAL);
}
if (nparity == 0 || nparity > VDEV_DRAID_MAXPARITY) {
fprintf(stderr,
gettext("invalid dRAID parity level %llu; must be "
"between 1 and %d\n"), (u_longlong_t)nparity,
VDEV_DRAID_MAXPARITY);
return (EINVAL);
}
/*
* Verify the requested number of spares can be satisfied.
* An arbitrary limit of 100 distributed spares is applied.
*/
if (nspares > 100 || nspares > (children - (ndata + nparity))) {
fprintf(stderr,
gettext("invalid number of dRAID spares %llu; additional "
"disks would be required\n"), (u_longlong_t)nspares);
return (EINVAL);
}
/* Verify the requested number children is sufficient. */
if (children < (ndata + nparity + nspares)) {
fprintf(stderr, gettext("%llu disks were provided, but at "
"least %llu disks are required for this config\n"),
(u_longlong_t)children,
(u_longlong_t)(ndata + nparity + nspares));
}
if (children > VDEV_DRAID_MAX_CHILDREN) {
fprintf(stderr, gettext("%llu disks were provided, but "
"dRAID only supports up to %u disks"),
(u_longlong_t)children, VDEV_DRAID_MAX_CHILDREN);
}
/*
* Calculate the minimum number of groups required to fill a slice.
* This is the LCM of the stripe width (ndata + nparity) and the
* number of data drives (children - nspares).
*/
while (ngroups * (ndata + nparity) % (children - nspares) != 0)
ngroups++;
/* Store the basic dRAID configuration. */
fnvlist_add_uint64(nv, ZPOOL_CONFIG_NPARITY, nparity);
fnvlist_add_uint64(nv, ZPOOL_CONFIG_DRAID_NDATA, ndata);
fnvlist_add_uint64(nv, ZPOOL_CONFIG_DRAID_NSPARES, nspares);
fnvlist_add_uint64(nv, ZPOOL_CONFIG_DRAID_NGROUPS, ngroups);
return (0);
}
/*
* Construct a syntactically valid vdev specification,
* and ensure that all devices and files exist and can be opened.
@@ -1447,8 +1178,8 @@ construct_spec(nvlist_t *props, int argc, char **argv)
{
nvlist_t *nvroot, *nv, **top, **spares, **l2cache;
int t, toplevels, mindev, maxdev, nspares, nlogs, nl2cache;
const char *type, *fulltype;
boolean_t is_log, is_special, is_dedup, is_spare;
const char *type;
uint64_t is_log, is_special, is_dedup;
boolean_t seen_logs;
top = NULL;
@@ -1458,20 +1189,18 @@ construct_spec(nvlist_t *props, int argc, char **argv)
nspares = 0;
nlogs = 0;
nl2cache = 0;
is_log = is_special = is_dedup = is_spare = B_FALSE;
is_log = is_special = is_dedup = B_FALSE;
seen_logs = B_FALSE;
nvroot = NULL;
while (argc > 0) {
fulltype = argv[0];
nv = NULL;
/*
* If it's a mirror, raidz, or draid the subsequent arguments
* are its leaves -- until we encounter the next mirror,
* raidz or draid.
* If it's a mirror or raidz, the subsequent arguments are
* its leaves -- until we encounter the next mirror or raidz.
*/
if ((type = is_grouping(fulltype, &mindev, &maxdev)) != NULL) {
if ((type = is_grouping(argv[0], &mindev, &maxdev)) != NULL) {
nvlist_t **child = NULL;
int c, children = 0;
@@ -1483,7 +1212,6 @@ construct_spec(nvlist_t *props, int argc, char **argv)
"specified only once\n"));
goto spec_out;
}
is_spare = B_TRUE;
is_log = is_special = is_dedup = B_FALSE;
}
@@ -1497,7 +1225,8 @@ construct_spec(nvlist_t *props, int argc, char **argv)
}
seen_logs = B_TRUE;
is_log = B_TRUE;
is_special = is_dedup = is_spare = B_FALSE;
is_special = B_FALSE;
is_dedup = B_FALSE;
argc--;
argv++;
/*
@@ -1509,7 +1238,8 @@ construct_spec(nvlist_t *props, int argc, char **argv)
if (strcmp(type, VDEV_ALLOC_BIAS_SPECIAL) == 0) {
is_special = B_TRUE;
is_log = is_dedup = is_spare = B_FALSE;
is_log = B_FALSE;
is_dedup = B_FALSE;
argc--;
argv++;
continue;
@@ -1517,7 +1247,8 @@ construct_spec(nvlist_t *props, int argc, char **argv)
if (strcmp(type, VDEV_ALLOC_BIAS_DEDUP) == 0) {
is_dedup = B_TRUE;
is_log = is_special = is_spare = B_FALSE;
is_log = B_FALSE;
is_special = B_FALSE;
argc--;
argv++;
continue;
@@ -1531,8 +1262,7 @@ construct_spec(nvlist_t *props, int argc, char **argv)
"specified only once\n"));
goto spec_out;
}
is_log = is_special = B_FALSE;
is_dedup = is_spare = B_FALSE;
is_log = is_special = is_dedup = B_FALSE;
}
if (is_log || is_special || is_dedup) {
@@ -1550,15 +1280,13 @@ construct_spec(nvlist_t *props, int argc, char **argv)
for (c = 1; c < argc; c++) {
if (is_grouping(argv[c], NULL, NULL) != NULL)
break;
children++;
child = realloc(child,
children * sizeof (nvlist_t *));
if (child == NULL)
zpool_no_memory();
if ((nv = make_leaf_vdev(props, argv[c],
!(is_log || is_special || is_dedup ||
is_spare))) == NULL) {
B_FALSE)) == NULL) {
for (c = 0; c < children - 1; c++)
nvlist_free(child[c]);
free(child);
@@ -1607,11 +1335,10 @@ construct_spec(nvlist_t *props, int argc, char **argv)
type) == 0);
verify(nvlist_add_uint64(nv,
ZPOOL_CONFIG_IS_LOG, is_log) == 0);
if (is_log) {
if (is_log)
verify(nvlist_add_string(nv,
ZPOOL_CONFIG_ALLOCATION_BIAS,
VDEV_ALLOC_BIAS_LOG) == 0);
}
if (is_special) {
verify(nvlist_add_string(nv,
ZPOOL_CONFIG_ALLOCATION_BIAS,
@@ -1627,15 +1354,6 @@ construct_spec(nvlist_t *props, int argc, char **argv)
ZPOOL_CONFIG_NPARITY,
mindev - 1) == 0);
}
if (strcmp(type, VDEV_TYPE_DRAID) == 0) {
if (draid_config_by_type(nv,
fulltype, children) != 0) {
for (c = 0; c < children; c++)
nvlist_free(child[c]);
free(child);
goto spec_out;
}
}
verify(nvlist_add_nvlist_array(nv,
ZPOOL_CONFIG_CHILDREN, child,
children) == 0);
@@ -1649,19 +1367,12 @@ construct_spec(nvlist_t *props, int argc, char **argv)
* We have a device. Pass off to make_leaf_vdev() to
* construct the appropriate nvlist describing the vdev.
*/
if ((nv = make_leaf_vdev(props, argv[0], !(is_log ||
is_special || is_dedup || is_spare))) == NULL)
if ((nv = make_leaf_vdev(props, argv[0],
is_log)) == NULL)
goto spec_out;
verify(nvlist_add_uint64(nv,
ZPOOL_CONFIG_IS_LOG, is_log) == 0);
if (is_log) {
verify(nvlist_add_string(nv,
ZPOOL_CONFIG_ALLOCATION_BIAS,
VDEV_ALLOC_BIAS_LOG) == 0);
if (is_log)
nlogs++;
}
if (is_special) {
verify(nvlist_add_string(nv,
ZPOOL_CONFIG_ALLOCATION_BIAS,
-1
View File
@@ -1 +0,0 @@
/zpool_influxdb
-13
View File
@@ -1,13 +0,0 @@
include $(top_srcdir)/config/Rules.am
zfsexec_PROGRAMS = zpool_influxdb
zpool_influxdb_SOURCES = \
zpool_influxdb.c
zpool_influxdb_LDADD = \
$(top_builddir)/lib/libspl/libspl.la \
$(top_builddir)/lib/libnvpair/libnvpair.la \
$(top_builddir)/lib/libzfs/libzfs.la
include $(top_srcdir)/config/CppCheck.am
-294
View File
@@ -1,294 +0,0 @@
# Influxdb Metrics for ZFS Pools
The _zpool_influxdb_ program produces
[influxdb](https://github.com/influxdata/influxdb) line protocol
compatible metrics from zpools. In the UNIX tradition, _zpool_influxdb_
does one thing: read statistics from a pool and print them to
stdout. In many ways, this is a metrics-friendly output of
statistics normally observed via the `zpool` command.
## Usage
When run without arguments, _zpool_influxdb_ runs once, reading data
from all imported pools, and prints to stdout.
```shell
zpool_influxdb [options] [poolname]
```
If no poolname is specified, then all pools are sampled.
| option | short option | description |
|---|---|---|
| --execd | -e | For use with telegraf's `execd` plugin. When [enter] is pressed, the pools are sampled. To exit, use [ctrl+D] |
| --no-histogram | -n | Do not print histogram information |
| --signed-int | -i | Use signed integer data type (default=unsigned) |
| --sum-histogram-buckets | -s | Sum histogram bucket values |
| --tags key=value[,key=value...] | -t | Add tags to data points. No tag sanity checking is performed. |
| --help | -h | Print a short usage message |
#### Histogram Bucket Values
The histogram data collected by ZFS is stored as independent bucket values.
This works well out-of-the-box with an influxdb data source and grafana's
heatmap visualization. The influxdb query for a grafana heatmap
visualization looks like:
```
field(disk_read) last() non_negative_derivative(1s)
```
Another method for storing histogram data sums the values for lower-value
buckets. For example, a latency bucket tagged "le=10" includes the values
in the bucket "le=1".
This method is often used for prometheus histograms.
The `zpool_influxdb --sum-histogram-buckets` option presents the data from ZFS
as summed values.
## Measurements
The following measurements are collected:
| measurement | description | zpool equivalent |
|---|---|---|
| zpool_stats | general size and data | zpool list |
| zpool_scan_stats | scrub, rebuild, and resilver statistics (omitted if no scan has been requested) | zpool status |
| zpool_vdev_stats | per-vdev statistics | zpool iostat -q |
| zpool_io_size | per-vdev I/O size histogram | zpool iostat -r |
| zpool_latency | per-vdev I/O latency histogram | zpool iostat -w |
| zpool_vdev_queue | per-vdev instantaneous queue depth | zpool iostat -q |
### zpool_stats Description
zpool_stats contains top-level summary statistics for the pool.
Performance counters measure the I/Os to the pool's devices.
#### zpool_stats Tags
| label | description |
|---|---|
| name | pool name |
| path | for leaf vdevs, the pathname |
| state | pool state, as shown by _zpool status_ |
| vdev | vdev name (root = entire pool) |
#### zpool_stats Fields
| field | units | description |
|---|---|---|
| alloc | bytes | allocated space |
| free | bytes | unallocated space |
| size | bytes | total pool size |
| read_bytes | bytes | bytes read since pool import |
| read_errors | count | number of read errors |
| read_ops | count | number of read operations |
| write_bytes | bytes | bytes written since pool import |
| write_errors | count | number of write errors |
| write_ops | count | number of write operations |
### zpool_scan_stats Description
Once a pool has been scrubbed, resilvered, or rebuilt, the zpool_scan_stats
contain information about the status and performance of the operation.
Otherwise, the zpool_scan_stats do not exist in the kernel, and therefore
cannot be reported by this collector.
#### zpool_scan_stats Tags
| label | description |
|---|---|
| name | pool name |
| function | name of the scan function running or recently completed |
| state | scan state, as shown by _zpool status_ |
#### zpool_scan_stats Fields
| field | units | description |
|---|---|---|
| errors | count | number of errors encountered by scan |
| examined | bytes | total data examined during scan |
| to_examine | bytes | prediction of total bytes to be scanned |
| pass_examined | bytes | data examined during current scan pass |
| issued | bytes | size of I/Os issued to disks |
| pass_issued | bytes | size of I/Os issued to disks for current pass |
| processed | bytes | data reconstructed during scan |
| to_process | bytes | total bytes to be repaired |
| rate | bytes/sec | examination rate |
| start_ts | epoch timestamp | start timestamp for scan |
| pause_ts | epoch timestamp | timestamp for a scan pause request |
| end_ts | epoch timestamp | completion timestamp for scan |
| paused_t | seconds | elapsed time while paused |
| remaining_t | seconds | estimate of time remaining for scan |
### zpool_vdev_stats Description
The ZFS I/O (ZIO) scheduler uses five queues to schedule I/Os to each vdev.
These queues are further divided into active and pending states.
An I/O is pending prior to being issued to the vdev. An active
I/O has been issued to the vdev. The scheduler and its tunable
parameters are described at the
[ZFS documentation for ZIO Scheduler]
(https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/ZIO%20Scheduler.html)
The ZIO scheduler reports the queue depths as gauges where the value
represents an instantaneous snapshot of the queue depth at
the sample time. Therefore, it is not unusual to see all zeroes
for an idle pool.
#### zpool_vdev_stats Tags
| label | description |
|---|---|
| name | pool name |
| vdev | vdev name (root = entire pool) |
#### zpool_vdev_stats Fields
| field | units | description |
|---|---|---|
| sync_r_active_queue | entries | synchronous read active queue depth |
| sync_w_active_queue | entries | synchronous write active queue depth |
| async_r_active_queue | entries | asynchronous read active queue depth |
| async_w_active_queue | entries | asynchronous write active queue depth |
| async_scrub_active_queue | entries | asynchronous scrub active queue depth |
| sync_r_pend_queue | entries | synchronous read pending queue depth |
| sync_w_pend_queue | entries | synchronous write pending queue depth |
| async_r_pend_queue | entries | asynchronous read pending queue depth |
| async_w_pend_queue | entries | asynchronous write pending queue depth |
| async_scrub_pend_queue | entries | asynchronous scrub pending queue depth |
### zpool_latency Histogram
ZFS tracks the latency of each I/O in the ZIO pipeline. This latency can
be useful for observing latency-related issues that are not easily observed
using the averaged latency statistics.
The histogram fields show cumulative values from lowest to highest.
The largest bucket is tagged "le=+Inf", representing the total count
of I/Os by type and vdev.
#### zpool_latency Histogram Tags
| label | description |
|---|---|
| le | bucket for histogram, latency is less than or equal to bucket value in seconds |
| name | pool name |
| path | for leaf vdevs, the device path name, otherwise omitted |
| vdev | vdev name (root = entire pool) |
#### zpool_latency Histogram Fields
| field | units | description |
|---|---|---|
| total_read | operations | read operations of all types |
| total_write | operations | write operations of all types |
| disk_read | operations | disk read operations |
| disk_write | operations | disk write operations |
| sync_read | operations | ZIO sync reads |
| sync_write | operations | ZIO sync writes |
| async_read | operations | ZIO async reads|
| async_write | operations | ZIO async writes |
| scrub | operations | ZIO scrub/scan reads |
| trim | operations | ZIO trim (aka unmap) writes |
### zpool_io_size Histogram
ZFS tracks I/O throughout the ZIO pipeline. The size of each I/O is used
to create a histogram of the size by I/O type and vdev. For example, a
4KiB write to mirrored pool will show a 4KiB write to the top-level vdev
(root) and a 4KiB write to each of the mirror leaf vdevs.
The ZIO pipeline can aggregate I/O operations. For example, a contiguous
series of writes can be aggregated into a single, larger I/O to the leaf
vdev. The independent I/O operations reflect the logical operations and
the aggregated I/O operations reflect the physical operations.
The histogram fields show cumulative values from lowest to highest.
The largest bucket is tagged "le=+Inf", representing the total count
of I/Os by type and vdev.
Note: trim I/Os can be larger than 16MiB, but the larger sizes are
accounted in the 16MiB bucket.
#### zpool_io_size Histogram Tags
| label | description |
|---|---|
| le | bucket for histogram, I/O size is less than or equal to bucket value in bytes |
| name | pool name |
| path | for leaf vdevs, the device path name, otherwise omitted |
| vdev | vdev name (root = entire pool) |
#### zpool_io_size Histogram Fields
| field | units | description |
|---|---|---|
| sync_read_ind | blocks | independent sync reads |
| sync_write_ind | blocks | independent sync writes |
| async_read_ind | blocks | independent async reads |
| async_write_ind | blocks | independent async writes |
| scrub_read_ind | blocks | independent scrub/scan reads |
| trim_write_ind | blocks | independent trim (aka unmap) writes |
| sync_read_agg | blocks | aggregated sync reads |
| sync_write_agg | blocks | aggregated sync writes |
| async_read_agg | blocks | aggregated async reads |
| async_write_agg | blocks | aggregated async writes |
| scrub_read_agg | blocks | aggregated scrub/scan reads |
| trim_write_agg | blocks | aggregated trim (aka unmap) writes |
#### About unsigned integers
Telegraf v1.6.2 and later support unsigned 64-bit integers which more
closely matches the uint64_t values used by ZFS. By default, zpool_influxdb
uses ZFS' uint64_t values and influxdb line protocol unsigned integer type.
If you are using old telegraf or influxdb where unsigned integers are not
available, use the `--signed-int` option.
## Using _zpool_influxdb_
The simplest method is to use the execd input agent in telegraf. For older
versions of telegraf which lack execd, the exec input agent can be used.
For convenience, one of the sample config files below can be placed in the
telegraf config-directory (often /etc/telegraf/telegraf.d). Telegraf can
be restarted to read the config-directory files.
### Example telegraf execd configuration
```toml
# # Read metrics from zpool_influxdb
[[inputs.execd]]
# ## default installation location for zpool_influxdb command
command = ["/usr/libexec/zfs/zpool_influxdb", "--execd"]
## Define how the process is signaled on each collection interval.
## Valid values are:
## "none" : Do not signal anything. (Recommended for service inputs)
## The process must output metrics by itself.
## "STDIN" : Send a newline on STDIN. (Recommended for gather inputs)
## "SIGHUP" : Send a HUP signal. Not available on Windows. (not recommended)
## "SIGUSR1" : Send a USR1 signal. Not available on Windows.
## "SIGUSR2" : Send a USR2 signal. Not available on Windows.
signal = "STDIN"
## Delay before the process is restarted after an unexpected termination
restart_delay = "10s"
## Data format to consume.
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "influx"
```
### Example telegraf exec configuration
```toml
# # Read metrics from zpool_influxdb
[[inputs.exec]]
# ## default installation location for zpool_influxdb command
commands = ["/usr/libexec/zfs/zpool_influxdb"]
data_format = "influx"
```
## Caveat Emptor
* Like the _zpool_ command, _zpool_influxdb_ takes a reader
lock on spa_config for each imported pool. If this lock blocks,
then the command will also block indefinitely and might be
unkillable. This is not a normal condition, but can occur if
there are bugs in the kernel modules.
For this reason, care should be taken:
* avoid spawning many of these commands hoping that one might
finish
* avoid frequent updates or short sample time
intervals, because the locks can interfere with the performance
of other instances of _zpool_ or _zpool_influxdb_
## Other collectors
There are a few other collectors for zpool statistics roaming around
the Internet. Many attempt to screen-scrape `zpool` output in various
ways. The screen-scrape method works poorly for `zpool` output because
of its human-friendly nature. Also, they suffer from the same caveats
as this implementation. This implementation is optimized for directly
collecting the metrics and is much more efficient than the screen-scrapers.
## Feedback Encouraged
Pull requests and issues are greatly appreciated at
https://github.com/openzfs/zfs
-3
View File
@@ -1,3 +0,0 @@
### Dashboards for zpool_influxdb
This directory contains a collection of dashboards related to ZFS with data
collected from the zpool_influxdb collector.
File diff suppressed because it is too large Load Diff
-7
View File
@@ -1,7 +0,0 @@
This directory contains sample telegraf configurations for
adding `zpool_influxdb` as an input plugin. Depending on your
telegraf configuration, the installation can be as simple as
copying one of these to the `/etc/telegraf/telegraf.d` directory
and restarting `systemctl restart telegraf`
See the telegraf docs for more information on input plugins.

Some files were not shown because too many files have changed in this diff Show More