Sometimes, we'd like to know info about the metaslab groups
on special vdevs too. So let's make -MM do something useful.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12750
- Remove `SHELLCHECK_IGNORE` in favor of inline suppressions
and more general `SHELLCHECK_OPTS`.
- Exclude `SC2250` (turned on by `--enable=all`) globally
- Pass `--enable=all` to shellcheck for scripts in contrib/: it's
very important to catch errors early in areas that are not easily
testable.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes#12760
* etc/systemd/zfs-mount-generator: serialise
The wins for a relatively normal workload are rather slim:
real 0.02119s/0.00985s=2.15029x
user 0.02130s/0.00346s=6.15560x
sys 0.03858s/0.00643s=6.00062x
wall-total 0.014518s/0.005925s=2.45009x
wall-init 0.014518s/0.002457s=5.90684x
wall-real 0.014518s/0.003467s=4.18668x
But this is a big win on machines with a lot of datasets and expensive
forks.
For example, the gain on a VM on my work laptop with 900+ legacy-mount
Docker datasets, the original gains from the C rewrite were
only five-fold:
real 0.516s/0.102s=5.05882x
user 0.237s/0.143s=1.65734x
sys 0.287s/0.100s=2.87x
And this serial variant gains this back there as well:
real 0.102s/0.008s=12.75x
user 0.143s/0.007s=20.42857
sys 0.100s/0.001s=100x
wall-total 0.09717s/0.00319s=30.40255x
wall-init 0.00203s/0.00200s=1.015941x
wall-real 0.09513s/0.00118s=80.02043x
For a total of
real 0.516s/0.008s=64.5x
user 0.237s/0.007s=33.85714x
sys 0.287s/0.001s=287x
Suggested-by: Richard Laager <rlaager@wiktel.com>
* etc/systemd/zfs-mount-generator: pull in network for keylocation=https
Also simplify RequiresMountsFor= handling
Ref: #11956
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12138
Add properties, similar to pool properties, to each vdev.
This makes use of the existing per-vdev ZAP that was added as
part of device evacuation/removal.
A large number of read-only properties are exposed,
many of the members of struct vdev_t, that provide useful
statistics.
Adds support for read-only "removing" vdev property.
Adds the "allocating" property that defaults to "on" and
can be set to "off" to prevent future allocations from that
top-level vdev.
Supports user-defined vdev properties.
Includes support for properties.vdev in SYSFS.
Co-authored-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes#11711
Update zpool.8 to avoid parseltongue.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Piotr P. Stefaniak <pstef@freebsd.org>
Closes#12763
Linux 5.16 moved these functions into this new header in commit
1b4fb8545f2b00f2844c4b7619d64d98440a477c. This change adds code to look
for the presence of this header, and include it so that the code using
xgetbv & xsetbv will compile again.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes#12800
Instead, linux/pagemap.h offers a number of folio-specific functions to
be called instead. In this case, module/os/linux/zfs/zfs_vnops_os.c
wants to call wait_on_page_bit(pp, PG_writeback). This gets replaced
with folio_wait_bit(folio_page(pp), PG_writeback). This change modifies
the code to conditionally compile that if configure identifies th
presence of the folio_wait_bit() function.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes#12800
- To avoid a use-after-free, zfsvfs->z_log needs to be loaded after the
teardown lock is acquired with ZFS_ENTER().
- Avoid leaking vnode locks in zfs_rename_relock() and zfs_rename_()
when the ZFS_ENTER() macros forces an early return.
Refactor the rename implementation so that ZFS_ENTER() can be used
safely. As a bonus, this lets us use the ZFS_VERIFY_ZP() macro instead
of open-coding its implementation.
Reported-by: Peter Holm <pho@FreeBSD.org>
Tested-by: Peter Holm <pho@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes#12717
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes#12771
The code is integrated, builds fine, runs fine, there's not really
any reason not to.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12735
The inline function vn_flush_cached_data() in vnode.h
must not be compiled when building BASE.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes#12743
Don't exit early in find_rootfs() when zpool.bootfs
is set to `zfs:AUTO`.
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes#12658
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes#12748
Special allocation class or dedup vdevs may have roughly the same
performance as L2ARC vdevs. Introduce a new tunable to exclude those
buffers from being cacheable on L2ARC.
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#11761Closes#12285
The l2cache device could be added twice because vdev_inuse() does not
check spa_l2cache for added devices. Make l2cache vdevs inuse checking
logic more closer to spare vdevs.
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes#9153Closes#12689
In case if all label checksums will be invalid on any vdev, the pool
will become unimportable. The zhack with newly added cli options could
be used to restore label checksums and make pool importable again.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes#2510Closes#12686
When ZFS is on root, /tmp is a ZFS. This causes zfs_list_004_neg to
fail since `zfs list` on /tmp passes when the test expects it not to.
The fix is to exclude paths that belong to ZFS.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Palash Gandhi <pbg4930@rit.edu>
Closes#12744
In addition to flushing memory mapped regions when checking holes,
commit de198f2d95 modified the dirty dnode detection logic to check
the dn->dn_dirty_records instead of the dn->dn_dirty_link. Relying
on the dirty record has not be reliable, switch back to the previous
method.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900Closes#12745
This test case may fail on 5.13 and newer Linux kernels if the
/dev/zvol/ device is not created by udev.
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12301
Closes #12738
In case if all label checksums will be invalid on any vdev, the pool
will become unimportable. From other side zdb with -l option will not
provide any useful information why it happened. Add notifications
about corrupted label checksums.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes#2509Closes#12685
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Closes#12722Closes#12739
The ZED code currently can only turn on the fault LED for
a faulted disk in a JBOD enclosure. This extends support
for faulted NVMe disks as well.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#12648Closes#12695
The only zdb utility require to read metaslab-related data during
read-only pool import because of spacemaps validation. Add global
variable which will allow zdb read spacemaps in case of readonly
import mode.
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes#9095Closes#12687
In order to reduce contention on the vq_lock, optional skip sectors
for Raidz writes can be placed into a single IO request. This is done by
padding out the linear ABD for a parity column to contain the skip
sector and by creating gang ABD to contain the data and skip sector for
data columns.
The vdev_raidz_map_alloc() function now contains specific functions for
both reads and write to allocate the ABD's that will be issued down to
the VDEV chldren.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-By: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes#12333
The submit_bio() prototype has changed again. The version is 5.16
still only expects a single argument but the return type has changed
to void. Since we never used the returned value before update the
configure check to detect both single arg versions.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12725
Commit https://github.com/torvalds/linux/commit/2e9bc346 moved
the elevator.h header under the block/ directory as part of some
refactoring. This turns out not to be a problem since there's
no longer anything we need from the header. This has been the
case for some time, this change removes the elevator.h include
and replaces it with a major.h include.
Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12725
It keeps failing, on changes which aren't related at all.
So until someone runs down why, I'd like it to stop being the
sole reason for CI failures.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12733
When using lseek(2) to report data/holes memory mapped regions of
the file were ignored. This could result in incorrect results.
To handle this zfs_holey_common() was updated to asynchronously
writeback any dirty mmap(2) regions prior to reporting holes.
Additionally, while not strictly required, the dn_struct_rwlock is
now held over the dirty check to prevent the dnode structure from
changing. This ensures that a clean dnode can't be dirtied before
the data/hole is located. The range lock is now also taken to
ensure the call cannot race with zfs_write().
Furthermore, the code was refactored to provide a dnode_is_dirty()
helper function which checks the dnode for any dirty records to
determine its dirtiness.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900Closes#12724
Note that Dropbear supports ed25519 keys since version 2020.79.
See https://github.com/mkj/dropbear/pull/91
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Michael Franzl <michael@franzl.name>
Closes#12715
It turns out that short-circuiting the EFAULT behavior on a short read
breaks things on FreeBSD. So until there's a nicer solution, let's
just revert the behavior for not-Linux.
Reference:
https://reviews.freebsd.org/R10:70f51f0e474ffe1fb74cb427423a2fba3637544d
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12698
On Linux, sometimes, when ZFS goes to unmount an automounted snap,
it fails a VERIFY check on debug builds, because taskq_cancel_id
returned ENOENT after not finding the taskq it was trying to cancel.
This presumably happens when it already died for some reason; in this
case, we don't really mind it already being dead, since we're just
going to dispatch a new task to unmount it right after.
So we just ignore it if we get back ENOENT trying to cancel here,
retry a couple times if we get back the only other possible condition
(EBUSY), and log to dbgmsg if we got anything but ENOENT or success.
(We also add some locking around taskqid, to avoid one or two cases
of two instances of trying to cancel something at once.)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#11632Closes#12670
"has unsupported feature: [number]" seems reasonable when we can't
know what the problem was, but with the send -D removal, we know
what it was, and can explicitly tell people "don't do that; try
this if you must".
So let's.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12708
We move the spinlock unlock before the thread creation. This should be
safe because the thread creation code doesn't actually manipulate any
taskq data structures; that's done by the thread once it's created.
We also remove the assertion that the maxthreads is the current threads
plus one; that assertion could fail if multiple hotplug events come in
quick succession, and the first new taskq thread hasn't had a chance to
start processing yet.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
eviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes#12714
When a parent dataset has normalization set to any value other than
"none", and a file system is created with the property "utf8only=off",
implicitly also set "normalization=none" instead of overriding the
desire for a non-UTF8 enforcing file system.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com>
Closes#11892Closes#12038
We have to hold the teardown lock while dereferencing zfsvfs->z_os and,
I believe, when committing to the ZIL.
Note that jumping to the "out" label, "error" is always non-zero.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#12704
The objset object is reallocated during certain dataset operations, such
as rollbacks, so the objset pointer must be loaded after acquiring the
teardown lock.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#12704
This change primarily seeks to make implicit documentation explicit, as
it is not outright stated that options should be comma-separated, nor is
there a reason given for it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Daniel Ebdrup Jensen <debdrup@FreeBSD.org>
Closes#12579
The values of next properties: filesystem_limit, filesystem_count,
snapshot_limit, snapshot_count were returned to user as UINT64_MAX
integers in case if -p cli option is used, return 'none' value instead.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes#9306Closes#12690
Currently, you get back "can only attach to mirrors and top-level disks"
unconditionally if zpool attach returns ENOTSUP, but that also happens
if, say, feature@device_rebuild=disabled and you tried attach -s.
So let's print an error for that case, lest people go down a rabbit hole
looking into what they did wrong.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#11414Closes#12680
It turns out, userland is much more happy with aliased property
names than the kernel is.
So let's normalize those to the expected names before we pass
them off.
Added a test case hacked up from the other recv -o/-x test that fails
on unpatched git and passes here.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12607Closes#12609
There was a fallback case I overlooked in the initial patch, with
a similarly imperfect version extractor.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12045Closes#12673
Gentoo and Alpine always set the rc init scripts' shebang to
#!/sbin/openrc-run, whether or not openrc is installed.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Peter Levine <plevine457@gmail.com>
Closes#12683Closes#12692
One of our developers noticed a bug in vdev_id where we were incorrectly
sorting PHYs using alphabetical sorting (which usually works) instead
of natural sorting (-v). For example:
[port-0:0]# ls -d phy*
phy-0:10 phy-0:11 phy-0:8 phy-0:9
[port-0:0]# ls -vd phy*
phy-0:8 phy-0:9 phy-0:10 phy-0:11
This fixes the issue.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#12699
If you've got multiple scrubs/resilvers going, it's rather helpful
to know which pool each scan line refers to.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12674
The fnvlist versions of the functions are fatal if they fail,
saving each call from having to include checking the result.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Allan Jude <allan@klarasystems.com>
When cleaning up a test case standardize on using the convention:
datasetexists $ds && destroy_dataset $ds <flags>
By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure
that the destroy is retried in the event that a ZFS volume is busy.
This helps ensures ensure tests are fully cleaned up and prevents false
positive test failures on Linux.
Note that all of the tests which used 'zfs destroy' in cleanup have
been updated even if they don't use volumes. This was done to
clearly establish the expected convention.
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12663
cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.
So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12644Closes#12669